POST /v1/route - OpenTracy

Returns the model the auto-router would pick for a given prompt, along with the full per-candidate scores. No provider call is made and no trace is written — this is a pure routing decision. Useful for:

Pre-flight A/B experiments (see what would be picked before sending it)
Analytics dashboards (“what percentage of my traffic lands in cluster 47?”)
Custom client logic that wants to make the final routing call itself

POST /v1/route HTTP/1.1
Host: localhost:8080
Content-Type: application/json

Request body

{
  "prompt": "Prove the square root of 2 is irrational.",
  "cost_weight": 0.5,
  "available_models": ["gpt-4o-mini", "gpt-4o", "ministral-3b-latest"]
}

Field	Type	Description
`prompt`	`string`	The user text. Required unless `embedding` is supplied.
`embedding`	`array of float`	Optional pre-computed vector (384-d MiniLM). Skips embedder.
`available_models`	`array of string`	Restrict candidates to this subset. If omitted, all registered models.
`cost_weight`	`float`	Override λ for this call. `0.0` = quality-first; `1.0+` = cheap-first.

Response body

{
  "selected_model": "gpt-4o",
  "cluster_id": 47,
  "expected_error": 0.01,
  "cost_adjusted_score": 0.0031,
  "all_scores": {
    "gpt-4o": 0.0031,
    "gpt-4o-mini": 0.0412,
    "ministral-3b-latest": 0.1821
  },
  "cache_hit": false,
  "usage": {
    "routing_ms": 1.3,
    "embedding_ms": 0.8
  }
}

Field	Type	Meaning
`selected_model`	`string`	The model that minimizes `error + λ·cost` on this cluster.
`cluster_id`	`int`	Which of the 100 semantic clusters the prompt landed in.
`expected_error`	`float`	Predicted error rate for `selected_model` on `cluster_id`.
`cost_adjusted_score`	`float`	The minimized score — `expected_error + λ · cost_per_1k`.
`all_scores`	`object`	Score for every candidate considered.
`cache_hit`	`bool`	Whether the decision came from the LRU cache (same prompt seen recently).
`usage.routing_ms`	`float`	Wall time spent in routing logic.
`usage.embedding_ms`	`float`	Time spent embedding the prompt (0 if `embedding` was supplied).

Curl

curl http://localhost:8080/v1/route \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Classify this ticket: billing | technical | other",
    "cost_weight": 0.7
  }'

Using a pre-computed embedding

If you already have the embedding (e.g. from another pipeline stage), pass it to skip the embedder:

curl http://localhost:8080/v1/route \
  -H "Content-Type: application/json" \
  -d '{
    "embedding": [0.021, -0.114, ...384 floats...],
    "cost_weight": 0.5
  }'

usage.embedding_ms will be 0 when the embedding is supplied.

Errors

Status	`error.code`	Meaning
`400`	`invalid_request`	Neither `prompt` nor `embedding` provided, or malformed body.
`503`	`router_not_ready`	Router weights still loading. Check `/health`.

​Request body

​Response body

​Curl

​Using a pre-computed embedding

​Errors

Request body

Response body

Curl

Using a pre-computed embedding

Errors