ot.load_router - OpenTracy

ot.load_router(
    weights_path: Optional[Path] = None,
    weights_name: str = "default",
    embedding_model: str = "all-MiniLM-L6-v2",
    cost_weight: float = 0.0,
    use_soft_assignment: bool = True,
    allowed_models: Optional[list[str]] = None,
    download_if_missing: bool = True,
    verbose: bool = True,
    engine: str = "go",
) -> UniRouteRouter

Returns a router that picks a model per prompt based on learned per-cluster error profiles + a cost weight. See Auto-routing for the conceptual model.

Parameters

Name	Type	Description
`weights_path`	`Path?`	Directory containing `clusters/`, `profiles/`, and `manifest.json`. If `None`, resolves to the wheel-bundled `weights-mmlu-v1/` (zero network on the default path); non-default packs fall back to a hub download.
`weights_name`	`str`	Named weights package. `"default"` (alias for `weights-mmlu-v1`) ships in every wheel so `load_router()` works offline on first call. Pass a different name to fetch from the hub.
`embedding_model`	`str`	SentenceTransformer model name. Only used by `engine="python"` — the Go backend uses its bundled ONNX MiniLM.
`cost_weight`	`float`	λ in the decision rule `score = expected_error + λ · cost`. Range `[0, ∞)`. `0.0` = quality-first; `0.5` = balanced; `1.0+` = cheap-first.
`use_soft_assignment`	`bool`	If `True`, compute a probability distribution over all 100 clusters instead of hard-assigning to one. Slightly slower but more robust at cluster boundaries.
`allowed_models`	`list[str]?`	Restrict candidates. E.g. `["gpt-4o-mini", "gpt-4o"]`.
`download_if_missing`	`bool`	Download the weights package on first run. Default `True`.
`verbose`	`bool`	Print progress / health info on load.
`engine`	`str`	`"go"` (default, production path — bundled binary), `"python"` (pure Python, no subprocess), or `"auto"` (prefer Go, silent fallback).

Returns — `UniRouteRouter`

Attributes

router.registry                # LLMRegistry — get_model_ids(), get(id) → LLMProfile
router.cluster_assigner        # KMeansClusterAssigner — .num_clusters, .assign(vec)
router.embedder                # PromptEmbedder — .embed(text), .dimension (=384)
router.cost_weight             # the λ you passed
router.allowed_models          # list[str] or None
router.stats                   # dict of per-model routing counts / latencies

Methods

`.route(prompt, available_models=None, cost_weight_override=None) → RoutingDecision`

Routes a single prompt.

d = router.route("Write a Python function that reverses a linked list.")
d.selected_model          # str   — the chosen model id
d.cluster_id              # int   — 0..99
d.expected_error          # float — Ψ[model, cluster]
d.cost_adjusted_score     # float — error + λ·cost
d.all_scores              # dict[str, float] — scores for every candidate
d.cluster_probabilities   # np.ndarray(100,) — soft distribution if enabled
d.reasoning               # str   — human-readable explanation

`.route_batch(prompts: list[str]) → list[RoutingDecision]`

Batched routing. Embeds everything in one call for throughput.

`.route_and_execute(prompt, messages, **kwargs) → ModelResponse`

Convenience that routes, then immediately calls ot.completion on the chosen model. Shorter than writing the pair yourself.

`.get_best_model_for_cluster(cluster_id) → str`

Given a cluster id, returns the model that minimizes expected_error + λ·cost. Useful for analytics (“what would I route to on cluster 42?”).

`.analyze_routing_distribution(prompts) → dict`

Returns a histogram: {model_id: count} over a batch of prompts.

`.reset_stats()`

Zero out the routing counters.

Examples

Simple load and route

import opentracy as ot

router = ot.load_router(cost_weight=0.5)

for p in ["What is 2+2?", "Prove √2 is irrational.", "Write a haiku."]:
    d = router.route(p)
    print(d.selected_model, "→", p)

Restrict candidates (cost ceiling)

router = ot.load_router(
    cost_weight=0.5,
    allowed_models=["ministral-3b-latest", "gpt-4o-mini", "gpt-4o"],
)

Override λ per-call

# Normal mode: balanced
d = router.route(prompt)

# Cost-sensitive burst: force cheap
d_cheap = router.route(prompt, cost_weight_override=2.0)

Python backend (introspection)

# Useful when you want to inspect profiles, centroids, etc.
router = ot.load_router(engine="python")
for mid in router.registry.get_model_ids():
    p = router.registry.get(mid)
    print(mid, p.cost_per_1k_tokens, p.psi_vector[:5])

Failure modes

Error	Cause	Fix
`FileNotFoundError: opentracy-engine binary not bundled`	Running on a platform without a published wheel (e.g. macOS Intel).	Fall back to `engine="python"` or file an issue for the missing platform.
`ImportError: sentence-transformers package required`	`engine="python"` without `[research]` extra.	`pip install opentracy[research]`.
`ValueError: Unknown package 'weights-default'`	`hub/index.json` missing (old wheel).	Upgrade: `pip install -U opentracy`.
Network error on first run (non-default pack)	Hub download failed for a non-default weights name.	The default pack (`"default"` / `"weights-mmlu-v1"`) ships in the wheel — try it first.

Engine backends — why the default matters

engine="go" is the default because the Go backend:

Is 5–10× faster per routing decision (sub-millisecond vs a few ms).
Runs in a subprocess so Python GIL contention doesn’t slow routing.
Uses the same ONNX runtime across all platforms — deterministic behavior.

The Python backend exists for:

Research / inspection (you can swap the cluster assigner, monkey-patch profiles, etc.).
Environments that forbid process-spawn (some sandboxes, Lambda, etc.).

Avoid engine="auto" in production. It silently falls back to Python if the Go binary is missing, which usually means a misconfigured install rather than an intentional choice. Explicit "go" fails loudly with a clear message.

​Parameters

​Returns — UniRouteRouter

​Attributes

​Methods

​.route(prompt, available_models=None, cost_weight_override=None) → RoutingDecision

​.route_batch(prompts: list[str]) → list[RoutingDecision]

​.route_and_execute(prompt, messages, **kwargs) → ModelResponse

​.get_best_model_for_cluster(cluster_id) → str

​.analyze_routing_distribution(prompts) → dict

​.reset_stats()

​Examples

​Simple load and route

​Restrict candidates (cost ceiling)

​Override λ per-call

​Python backend (introspection)

​Failure modes

​Engine backends — why the default matters