Skip to main content
ot.load_router(
    weights_path: Optional[Path] = None,
    weights_name: str = "default",
    embedding_model: str = "all-MiniLM-L6-v2",
    cost_weight: float = 0.0,
    use_soft_assignment: bool = True,
    allowed_models: Optional[list[str]] = None,
    download_if_missing: bool = True,
    verbose: bool = True,
    engine: str = "go",
) -> UniRouteRouter
Returns a router that picks a model per prompt based on learned per-cluster error profiles + a cost weight. See Auto-routing for the conceptual model.

Parameters

NameTypeDescription
weights_pathPath?Directory containing clusters/, profiles/, and manifest.json. If None, downloads from the Hub.
weights_namestrNamed weights package to fetch. Default "default" (alias for weights-mmlu-v1). See the Hub for others.
embedding_modelstrSentenceTransformer model name. Only used by engine="python" — the Go backend uses its bundled ONNX MiniLM.
cost_weightfloatλ in the decision rule score = expected_error + λ · cost. Range [0, ∞). 0.0 = quality-first; 0.5 = balanced; 1.0+ = cheap-first.
use_soft_assignmentboolIf True, compute a probability distribution over all 100 clusters instead of hard-assigning to one. Slightly slower but more robust at cluster boundaries.
allowed_modelslist[str]?Restrict candidates. E.g. ["gpt-4o-mini", "gpt-4o"].
download_if_missingboolDownload the weights package on first run. Default True.
verboseboolPrint progress / health info on load.
enginestr"go" (default, production path — bundled binary), "python" (pure Python, no subprocess), or "auto" (prefer Go, silent fallback).

Returns — UniRouteRouter

Attributes

router.registry                # LLMRegistry — get_model_ids(), get(id) → LLMProfile
router.cluster_assigner        # KMeansClusterAssigner — .num_clusters, .assign(vec)
router.embedder                # PromptEmbedder — .embed(text), .dimension (=384)
router.cost_weight             # the λ you passed
router.allowed_models          # list[str] or None
router.stats                   # dict of per-model routing counts / latencies

Methods

.route(prompt, available_models=None, cost_weight_override=None) → RoutingDecision

Routes a single prompt.
d = router.route("Write a Python function that reverses a linked list.")
d.selected_model          # str   — the chosen model id
d.cluster_id              # int   — 0..99
d.expected_error          # float — Ψ[model, cluster]
d.cost_adjusted_score     # float — error + λ·cost
d.all_scores              # dict[str, float] — scores for every candidate
d.cluster_probabilities   # np.ndarray(100,) — soft distribution if enabled
d.reasoning               # str   — human-readable explanation

.route_batch(prompts: list[str]) → list[RoutingDecision]

Batched routing. Embeds everything in one call for throughput.

.route_and_execute(prompt, messages, **kwargs) → ModelResponse

Convenience that routes, then immediately calls ot.completion on the chosen model. Shorter than writing the pair yourself.

.get_best_model_for_cluster(cluster_id) → str

Given a cluster id, returns the model that minimizes expected_error + λ·cost. Useful for analytics (“what would I route to on cluster 42?”).

.analyze_routing_distribution(prompts) → dict

Returns a histogram: {model_id: count} over a batch of prompts.

.reset_stats()

Zero out the routing counters.

Examples

Simple load and route

import opentracy as ot

router = ot.load_router(cost_weight=0.5)

for p in ["What is 2+2?", "Prove √2 is irrational.", "Write a haiku."]:
    d = router.route(p)
    print(d.selected_model, "→", p)

Restrict candidates (cost ceiling)

router = ot.load_router(
    cost_weight=0.5,
    allowed_models=["ministral-3b-latest", "gpt-4o-mini", "gpt-4o"],
)

Override λ per-call

# Normal mode: balanced
d = router.route(prompt)

# Cost-sensitive burst: force cheap
d_cheap = router.route(prompt, cost_weight_override=2.0)

Python backend (introspection)

# Useful when you want to inspect profiles, centroids, etc.
router = ot.load_router(engine="python")
for mid in router.registry.get_model_ids():
    p = router.registry.get(mid)
    print(mid, p.cost_per_1k_tokens, p.psi_vector[:5])

Failure modes

ErrorCauseFix
FileNotFoundError: opentracy-engine binary not bundledRunning on a platform without a published wheel (e.g. macOS Intel).Fall back to engine="python" or file an issue for the missing platform.
ImportError: sentence-transformers package requiredengine="python" without [research] extra.pip install opentracy[research].
ValueError: Unknown package 'weights-default'hub/index.json missing (old wheel).Upgrade: pip install -U opentracy.
Network error on first runWeights download couldn’t reach HuggingFace.Pre-download: ot.download("weights-default").

Engine backends — why the default matters

engine="go" is the default because the Go backend:
  • Is 5–10× faster per routing decision (sub-millisecond vs a few ms).
  • Runs in a subprocess so Python GIL contention doesn’t slow routing.
  • Uses the same ONNX runtime across all platforms — deterministic behavior.
The Python backend exists for:
  • Research / inspection (you can swap the cluster assigner, monkey-patch profiles, etc.).
  • Environments that forbid process-spawn (some sandboxes, Lambda, etc.).
Avoid engine="auto" in production. It silently falls back to Python if the Go binary is missing, which usually means a misconfigured install rather than an intentional choice. Explicit "go" fails loudly with a clear message.