Skip to main content
Returns the model the auto-router would pick for a given prompt, along with the full per-candidate scores. No provider call is made and no trace is written — this is a pure routing decision. Useful for:
  • Pre-flight A/B experiments (see what would be picked before sending it)
  • Analytics dashboards (“what percentage of my traffic lands in cluster 47?”)
  • Custom client logic that wants to make the final routing call itself
POST /v1/route HTTP/1.1
Host: localhost:8080
Content-Type: application/json

Request body

{
  "prompt": "Prove the square root of 2 is irrational.",
  "cost_weight": 0.5,
  "available_models": ["gpt-4o-mini", "gpt-4o", "ministral-3b-latest"]
}
FieldTypeDescription
promptstringThe user text. Required unless embedding is supplied.
embeddingarray of floatOptional pre-computed vector (384-d MiniLM). Skips embedder.
available_modelsarray of stringRestrict candidates to this subset. If omitted, all registered models.
cost_weightfloatOverride λ for this call. 0.0 = quality-first; 1.0+ = cheap-first.

Response body

{
  "selected_model": "gpt-4o",
  "cluster_id": 47,
  "expected_error": 0.01,
  "cost_adjusted_score": 0.0031,
  "all_scores": {
    "gpt-4o": 0.0031,
    "gpt-4o-mini": 0.0412,
    "ministral-3b-latest": 0.1821
  },
  "cache_hit": false,
  "usage": {
    "routing_ms": 1.3,
    "embedding_ms": 0.8
  }
}
FieldTypeMeaning
selected_modelstringThe model that minimizes error + λ·cost on this cluster.
cluster_idintWhich of the 100 semantic clusters the prompt landed in.
expected_errorfloatPredicted error rate for selected_model on cluster_id.
cost_adjusted_scorefloatThe minimized score — expected_error + λ · cost_per_1k.
all_scoresobjectScore for every candidate considered.
cache_hitboolWhether the decision came from the LRU cache (same prompt seen recently).
usage.routing_msfloatWall time spent in routing logic.
usage.embedding_msfloatTime spent embedding the prompt (0 if embedding was supplied).

Curl

curl http://localhost:8080/v1/route \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Classify this ticket: billing | technical | other",
    "cost_weight": 0.7
  }'

Using a pre-computed embedding

If you already have the embedding (e.g. from another pipeline stage), pass it to skip the embedder:
curl http://localhost:8080/v1/route \
  -H "Content-Type: application/json" \
  -d '{
    "embedding": [0.021, -0.114, ...384 floats...],
    "cost_weight": 0.5
  }'
usage.embedding_ms will be 0 when the embedding is supplied.

Errors

Statuserror.codeMeaning
400invalid_requestNeither prompt nor embedding provided, or malformed body.
503router_not_readyRouter weights still loading. Check /health.