Request body
| Field | Type | Description | |||
|---|---|---|---|---|---|
model | string (required) | provider/model (e.g. openai/gpt-4o), a bare name, or "auto" for semantic routing. | |||
messages | array (required) | OpenAI-format messages. role ∈ `user | assistant | system | tool`. |
temperature | float | 0.0–2.0. Omitted uses the provider default. | |||
max_tokens | int | Output cap. | |||
top_p | float | Nucleus sampling. | |||
stream | bool | true → Server-Sent Events. See Streaming. | |||
stop | `string | array` | Stop sequence(s). | ||
tools | array | OpenAI-format tool definitions. Engine translates to provider-native shapes. | |||
tool_choice | `“auto" | "none" | "required” | object` | Force a specific tool or let the model pick. |
Response body (non-streaming)
cost object is an OpenTracy extra; the rest matches OpenAI exactly.
Response headers
| Header | Example | Meaning |
|---|---|---|
X-OpenTracy-Selected-Model | gpt-4o-mini | Which concrete model answered. |
X-OpenTracy-Cluster-ID | 84 | Semantic cluster assigned to the prompt (0–99). |
X-OpenTracy-Expected-Error | 0.08 | Predicted error rate for the selected model. |
X-OpenTracy-Routing-Ms | 1.3 | Time spent in routing decision. |
X-OpenTracy-Session-Id | sess_af91 | For multi-turn tool calls — echo back on next call. |
Curl
TypeScript / Node (openai SDK)
Go (net/http)
Semantic auto-routing
Pass"model": "auto" and the engine picks per-prompt based on its learned
cluster/error profiles:
/v1/route if you want the decision
without generating a completion.
Streaming
Set"stream": true. Responses come back as Server-Sent Events:
Tool calls
Pass OpenAI-formattools. The engine maps them to provider-native
shapes (Anthropic tools, Gemini function declarations, etc.):
tool_calls in the assistant message.
Errors
| Status | error.code | Meaning |
|---|---|---|
400 | invalid_request | Malformed body / missing required field. |
401 | unauthorized | Bearer token missing or invalid (if auth is enabled). |
404 | model_not_found | Unknown model string. |
429 | rate_limit | Upstream provider rate-limited the request. |
500 | provider_error | Provider returned an error; body echoes their message. |
504 | timeout | Upstream took longer than the configured timeout. |

