Skip to main content
class ot.Router(
    model_list: list[dict],
    fallbacks: Optional[list[dict]] = None,
    strategy: str = "round-robin",
    num_retries: int = 2,
    timeout: float = 120.0,
)
Use this when you want explicit, deterministic routing — “try GPT-4o first, then Claude, then DeepSeek if both fail”. For semantic auto-routing (the model picks itself based on cluster + error profile), use load_router instead.

Constructor parameters

model_list (required)

A list of deployment configs. Each entry maps a logical alias (model_name) to a concrete provider/model:
model_list = [
    {"model_name": "smart", "model": "openai/gpt-4o"},
    {"model_name": "smart", "model": "anthropic/claude-sonnet-4-6"},  # redundancy
    {"model_name": "fast",  "model": "groq/llama-3.1-8b-instant"},
    {"model_name": "cheap", "model": "deepseek/deepseek-chat"},
]
Optional per-entry fields:
FieldTypeDescription
model_namestrThe alias your app uses (e.g. "smart").
modelstrThe concrete "provider/model" to call.
api_keystr?Override provider key for this deployment.
api_basestr?Override provider base URL.
weightfloatFor weighted-random strategy. Default 1.0.

fallbacks

A list of {alias: [fallback_models]} maps:
fallbacks = [
    {"smart": ["deepseek/deepseek-chat", "mistral/mistral-large-latest"]},
    {"fast":  ["anthropic/claude-3-haiku-20240307"]},
]
Fallbacks are tried after all model_list deployments for the alias fail. They’re fully-qualified "provider/model" strings (not aliases).

strategy

How to order the deployments within one alias on each call:
StrategyBehavior
"round-robin" (default)Cycle through deployments in order.
"least-cost"Pick the cheapest deployment first.
"lowest-latency"Pick the one with the best recent latency.
"weighted-random"Random pick weighted by weight field.
Strategy only changes which deployment is tried first; on failure the router falls through to the others.

num_retries

Retries per deployment before moving to the next. Default 2.

timeout

Per-request timeout in seconds. Default 120.

Methods

.completion(model, messages, **kwargs) → ModelResponse

Sync completion, same shape as ot.completion. The model argument is the alias (e.g. "smart"), not a provider/model string. **kwargs is passed through to the underlying ot.completion call.
resp = router.completion(
    model="smart",                                 # alias
    messages=[{"role": "user", "content": "..."}],
    temperature=0,
    max_tokens=200,
)
print(resp.choices[0].message.content)

.acompletion(model, messages, **kwargs) → ModelResponse

Async version of .completion. Same API, returns a coroutine.

Full example

import opentracy as ot

router = ot.Router(
    model_list=[
        {"model_name": "smart", "model": "openai/gpt-4o"},
        {"model_name": "smart", "model": "anthropic/claude-sonnet-4-6"},
        {"model_name": "fast",  "model": "groq/llama-3.1-8b-instant"},
    ],
    fallbacks=[{"smart": ["deepseek/deepseek-chat"]}],
    strategy="least-cost",
    num_retries=2,
    timeout=60,
)

resp = router.completion(
    model="smart",
    messages=[{"role": "user", "content": "Explain Bayes' theorem."}],
)
print(resp.choices[0].message.content)

How failure handling works

For a call to alias "smart" with num_retries=2:
  1. Order the "smart" deployments per strategy[D1, D2].
  2. Try D1 up to 1 + num_retries = 3 times; 300ms backoff between attempts.
  3. If all attempts on D1 fail, move to D2, try 3 times.
  4. If both deployments are exhausted, try each entry in fallbacks["smart"] exactly once.
  5. If everything fails, raise the last captured exception.
Stats per deployment are updated on every attempt (dep.requests, dep.errors, dep.total_latency_ms), which is how lowest-latency and least-cost strategies get their data.

When to use Router vs load_router

Use ot.RouterUse ot.load_router
You want explicit rulesYou want the model picked per-prompt
You care about availability (fallbacks)You care about cost-quality tradeoff
You’re doing A/B across known modelsYou have traffic but no routing rules
Fast to configure, understood by opsPre-trained — no config needed
They compose: Router aliases can point at models that in turn go through the semantic router via "auto", so you can layer rule-based policy on top of learned routing.