ot.Router - OpenTracy

class ot.Router(
    model_list: list[dict],
    fallbacks: Optional[list[dict]] = None,
    strategy: str = "round-robin",
    num_retries: int = 2,
    timeout: float = 120.0,
)

Use this when you want explicit, deterministic routing — “try GPT-4o first, then Claude, then DeepSeek if both fail”. For semantic auto-routing (the model picks itself based on cluster + error profile), use load_router instead.

Constructor parameters

`model_list` (required)

A list of deployment configs. Each entry maps a logical alias (model_name) to a concrete provider/model:

model_list = [
    {"model_name": "smart", "model": "openai/gpt-4o"},
    {"model_name": "smart", "model": "anthropic/claude-sonnet-4-6"},  # redundancy
    {"model_name": "fast",  "model": "groq/llama-3.1-8b-instant"},
    {"model_name": "cheap", "model": "deepseek/deepseek-chat"},
]

Optional per-entry fields:

Field	Type	Description
`model_name`	`str`	The alias your app uses (e.g. `"smart"`).
`model`	`str`	The concrete `"provider/model"` to call.
`api_key`	`str?`	Override provider key for this deployment.
`api_base`	`str?`	Override provider base URL.
`weight`	`float`	For `weighted-random` strategy. Default `1.0`.

`fallbacks`

A list of {alias: [fallback_models]} maps:

fallbacks = [
    {"smart": ["deepseek/deepseek-chat", "mistral/mistral-large-latest"]},
    {"fast":  ["anthropic/claude-3-haiku-20240307"]},
]

Fallbacks are tried after all model_list deployments for the alias fail. They’re fully-qualified "provider/model" strings (not aliases).

`strategy`

How to order the deployments within one alias on each call:

Strategy	Behavior
`"round-robin"` (default)	Cycle through deployments in order.
`"least-cost"`	Pick the cheapest deployment first.
`"lowest-latency"`	Pick the one with the best recent latency.
`"weighted-random"`	Random pick weighted by `weight` field.

Strategy only changes which deployment is tried first; on failure the router falls through to the others.

`num_retries`

Retries per deployment before moving to the next. Default 2.

`timeout`

Per-request timeout in seconds. Default 120.

Methods

`.completion(model, messages, **kwargs) → ModelResponse`

Sync completion, same shape as ot.completion. The model argument is the alias (e.g. "smart"), not a provider/model string. **kwargs is passed through to the underlying ot.completion call.

resp = router.completion(
    model="smart",                                 # alias
    messages=[{"role": "user", "content": "..."}],
    temperature=0,
    max_tokens=200,
)
print(resp.choices[0].message.content)

`.acompletion(model, messages, **kwargs) → ModelResponse`

Async version of .completion. Same API, returns a coroutine.

Full example

import opentracy as ot

router = ot.Router(
    model_list=[
        {"model_name": "smart", "model": "openai/gpt-4o"},
        {"model_name": "smart", "model": "anthropic/claude-sonnet-4-6"},
        {"model_name": "fast",  "model": "groq/llama-3.1-8b-instant"},
    ],
    fallbacks=[{"smart": ["deepseek/deepseek-chat"]}],
    strategy="least-cost",
    num_retries=2,
    timeout=60,
)

resp = router.completion(
    model="smart",
    messages=[{"role": "user", "content": "Explain Bayes' theorem."}],
)
print(resp.choices[0].message.content)

How failure handling works

For a call to alias "smart" with num_retries=2:

Order the "smart" deployments per strategy → [D1, D2].
Try D1 up to 1 + num_retries = 3 times; 300ms backoff between attempts.
If all attempts on D1 fail, move to D2, try 3 times.
If both deployments are exhausted, try each entry in fallbacks["smart"] exactly once.
If everything fails, raise the last captured exception.

Stats per deployment are updated on every attempt (dep.requests, dep.errors, dep.total_latency_ms), which is how lowest-latency and least-cost strategies get their data.

When to use `Router` vs `load_router`

Use `ot.Router`	Use `ot.load_router`
You want explicit rules	You want the model picked per-prompt
You care about availability (fallbacks)	You care about cost-quality tradeoff
You’re doing A/B across known models	You have traffic but no routing rules
Fast to configure, understood by ops	Pre-trained — no config needed

They compose: Router aliases can point at models that in turn go through the semantic router via "auto", so you can layer rule-based policy on top of learned routing.

​Constructor parameters

​model_list (required)

​fallbacks

​strategy

​num_retries

​timeout

​Methods

​.completion(model, messages, **kwargs) → ModelResponse

​.acompletion(model, messages, **kwargs) → ModelResponse

​Full example

​How failure handling works

​When to use Router vs load_router