Skip to main content
The gateway’s main endpoint. Accepts OpenAI-format chat requests, routes to any of the 13 supported providers, streams responses back, and writes a trace to ClickHouse on the way out.
POST /v1/chat/completions HTTP/1.1
Host: localhost:8080
Content-Type: application/json

Request body

{
  "model": "openai/gpt-4o-mini",
  "messages": [
    { "role": "user", "content": "Hello" }
  ],
  "temperature": 0.7,
  "max_tokens": 200,
  "stream": false
}
FieldTypeDescription
modelstring (required)provider/model (e.g. openai/gpt-4o), a bare name, or "auto" for semantic routing.
messagesarray (required)OpenAI-format messages. role ∈ `userassistantsystemtool`.
temperaturefloat0.02.0. Omitted uses the provider default.
max_tokensintOutput cap.
top_pfloatNucleus sampling.
streambooltrue → Server-Sent Events. See Streaming.
stop`stringarray`Stop sequence(s).
toolsarrayOpenAI-format tool definitions. Engine translates to provider-native shapes.
tool_choice`“auto""none""required”object`Force a specific tool or let the model pick.
Any OpenAI field not listed above is passed through to the provider untouched.

Response body (non-streaming)

{
  "id": "chatcmpl-xyz",
  "object": "chat.completion",
  "created": 1713465600,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "Hello!" },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 2,
    "total_tokens": 10
  },
  "cost": {
    "input_cost_usd": 0.0000012,
    "output_cost_usd": 0.0000012,
    "total_cost_usd": 0.0000024
  }
}
The cost object is an OpenTracy extra; the rest matches OpenAI exactly.

Response headers

HeaderExampleMeaning
X-OpenTracy-Selected-Modelgpt-4o-miniWhich concrete model answered.
X-OpenTracy-Cluster-ID84Semantic cluster assigned to the prompt (0–99).
X-OpenTracy-Expected-Error0.08Predicted error rate for the selected model.
X-OpenTracy-Routing-Ms1.3Time spent in routing decision.
X-OpenTracy-Session-Idsess_af91For multi-turn tool calls — echo back on next call.

Curl

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Say hello in three words."}]
  }'

TypeScript / Node (openai SDK)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8080/v1",
  apiKey: "any", // engine holds provider keys
});

const resp = await client.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [{ role: "user", content: "Hello" }],
});

console.log(resp.choices[0].message.content);

Go (net/http)

body := []byte(`{
  "model": "openai/gpt-4o-mini",
  "messages": [{"role": "user", "content": "Hello"}]
}`)
req, _ := http.NewRequest("POST",
    "http://localhost:8080/v1/chat/completions", bytes.NewReader(body))
req.Header.Set("Content-Type", "application/json")
resp, _ := http.DefaultClient.Do(req)
defer resp.Body.Close()

Semantic auto-routing

Pass "model": "auto" and the engine picks per-prompt based on its learned cluster/error profiles:
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Prove √2 is irrational."}]
  }'
The response headers show you which model was picked:
X-OpenTracy-Selected-Model: gpt-4o
X-OpenTracy-Cluster-ID: 47
X-OpenTracy-Expected-Error: 0.01
See /v1/route if you want the decision without generating a completion.

Streaming

Set "stream": true. Responses come back as Server-Sent Events:
curl -N http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "count to 5"}],
    "stream": true
  }'
data: {"id":"chatcmpl-xyz","choices":[{"delta":{"content":"1"},"index":0}]}
data: {"id":"chatcmpl-xyz","choices":[{"delta":{"content":", 2"},"index":0}]}
...
data: [DONE]
The engine translates Anthropic and Bedrock event-streams into OpenAI’s SSE format, so clients don’t need per-provider logic.

Tool calls

Pass OpenAI-format tools. The engine maps them to provider-native shapes (Anthropic tools, Gemini function declarations, etc.):
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-6",
    "messages": [{"role": "user", "content": "Weather in Paris?"}],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "parameters": {"type":"object","properties":{"city":{"type":"string"}}}
      }
    }]
  }'
The response comes back with tool_calls in the assistant message.

Errors

Statuserror.codeMeaning
400invalid_requestMalformed body / missing required field.
401unauthorizedBearer token missing or invalid (if auth is enabled).
404model_not_foundUnknown model string.
429rate_limitUpstream provider rate-limited the request.
500provider_errorProvider returned an error; body echoes their message.
504timeoutUpstream took longer than the configured timeout.