- Pre-flight A/B experiments (see what would be picked before sending it)
- Analytics dashboards (“what percentage of my traffic lands in cluster 47?”)
- Custom client logic that wants to make the final routing call itself
Request body
| Field | Type | Description |
|---|---|---|
prompt | string | The user text. Required unless embedding is supplied. |
embedding | array of float | Optional pre-computed vector (384-d MiniLM). Skips embedder. |
available_models | array of string | Restrict candidates to this subset. If omitted, all registered models. |
cost_weight | float | Override λ for this call. 0.0 = quality-first; 1.0+ = cheap-first. |
Response body
| Field | Type | Meaning |
|---|---|---|
selected_model | string | The model that minimizes error + λ·cost on this cluster. |
cluster_id | int | Which of the 100 semantic clusters the prompt landed in. |
expected_error | float | Predicted error rate for selected_model on cluster_id. |
cost_adjusted_score | float | The minimized score — expected_error + λ · cost_per_1k. |
all_scores | object | Score for every candidate considered. |
cache_hit | bool | Whether the decision came from the LRU cache (same prompt seen recently). |
usage.routing_ms | float | Wall time spent in routing logic. |
usage.embedding_ms | float | Time spent embedding the prompt (0 if embedding was supplied). |
Curl
Using a pre-computed embedding
If you already have the embedding (e.g. from another pipeline stage), pass it to skip the embedder:usage.embedding_ms will be 0 when the embedding is supplied.
Errors
| Status | error.code | Meaning |
|---|---|---|
400 | invalid_request | Neither prompt nor embedding provided, or malformed body. |
503 | router_not_ready | Router weights still loading. Check /health. |

