load_router instead.
Constructor parameters
model_list (required)
A list of deployment configs. Each entry maps a logical alias (model_name)
to a concrete provider/model:
| Field | Type | Description |
|---|---|---|
model_name | str | The alias your app uses (e.g. "smart"). |
model | str | The concrete "provider/model" to call. |
api_key | str? | Override provider key for this deployment. |
api_base | str? | Override provider base URL. |
weight | float | For weighted-random strategy. Default 1.0. |
fallbacks
A list of {alias: [fallback_models]} maps:
model_list deployments for the alias
fail. They’re fully-qualified "provider/model" strings (not aliases).
strategy
How to order the deployments within one alias on each call:
| Strategy | Behavior |
|---|---|
"round-robin" (default) | Cycle through deployments in order. |
"least-cost" | Pick the cheapest deployment first. |
"lowest-latency" | Pick the one with the best recent latency. |
"weighted-random" | Random pick weighted by weight field. |
num_retries
Retries per deployment before moving to the next. Default 2.
timeout
Per-request timeout in seconds. Default 120.
Methods
.completion(model, messages, **kwargs) → ModelResponse
Sync completion, same shape as ot.completion. The model argument is
the alias (e.g. "smart"), not a provider/model string. **kwargs
is passed through to the underlying ot.completion call.
.acompletion(model, messages, **kwargs) → ModelResponse
Async version of .completion. Same API, returns a coroutine.
Full example
How failure handling works
For a call to alias"smart" with num_retries=2:
- Order the
"smart"deployments perstrategy→[D1, D2]. - Try
D1up to1 + num_retries = 3times; 300ms backoff between attempts. - If all attempts on
D1fail, move toD2, try 3 times. - If both deployments are exhausted, try each entry in
fallbacks["smart"]exactly once. - If everything fails, raise the last captured exception.
dep.requests,
dep.errors, dep.total_latency_ms), which is how lowest-latency and
least-cost strategies get their data.
When to use Router vs load_router
Use ot.Router | Use ot.load_router |
|---|---|
| You want explicit rules | You want the model picked per-prompt |
| You care about availability (fallbacks) | You care about cost-quality tradeoff |
| You’re doing A/B across known models | You have traffic but no routing rules |
| Fast to configure, understood by ops | Pre-trained — no config needed |
Router aliases can point at models that in turn go through
the semantic router via "auto", so you can layer rule-based policy on
top of learned routing.
