Parameters
| Name | Type | Description |
|---|---|---|
weights_path | Path? | Directory containing clusters/, profiles/, and manifest.json. If None, downloads from the Hub. |
weights_name | str | Named weights package to fetch. Default "default" (alias for weights-mmlu-v1). See the Hub for others. |
embedding_model | str | SentenceTransformer model name. Only used by engine="python" — the Go backend uses its bundled ONNX MiniLM. |
cost_weight | float | λ in the decision rule score = expected_error + λ · cost. Range [0, ∞). 0.0 = quality-first; 0.5 = balanced; 1.0+ = cheap-first. |
use_soft_assignment | bool | If True, compute a probability distribution over all 100 clusters instead of hard-assigning to one. Slightly slower but more robust at cluster boundaries. |
allowed_models | list[str]? | Restrict candidates. E.g. ["gpt-4o-mini", "gpt-4o"]. |
download_if_missing | bool | Download the weights package on first run. Default True. |
verbose | bool | Print progress / health info on load. |
engine | str | "go" (default, production path — bundled binary), "python" (pure Python, no subprocess), or "auto" (prefer Go, silent fallback). |
Returns — UniRouteRouter
Attributes
Methods
.route(prompt, available_models=None, cost_weight_override=None) → RoutingDecision
Routes a single prompt.
.route_batch(prompts: list[str]) → list[RoutingDecision]
Batched routing. Embeds everything in one call for throughput.
.route_and_execute(prompt, messages, **kwargs) → ModelResponse
Convenience that routes, then immediately calls ot.completion on the
chosen model. Shorter than writing the pair yourself.
.get_best_model_for_cluster(cluster_id) → str
Given a cluster id, returns the model that minimizes expected_error + λ·cost. Useful for analytics (“what would I route to on cluster 42?”).
.analyze_routing_distribution(prompts) → dict
Returns a histogram: {model_id: count} over a batch of prompts.
.reset_stats()
Zero out the routing counters.
Examples
Simple load and route
Restrict candidates (cost ceiling)
Override λ per-call
Python backend (introspection)
Failure modes
| Error | Cause | Fix |
|---|---|---|
FileNotFoundError: opentracy-engine binary not bundled | Running on a platform without a published wheel (e.g. macOS Intel). | Fall back to engine="python" or file an issue for the missing platform. |
ImportError: sentence-transformers package required | engine="python" without [research] extra. | pip install opentracy[research]. |
ValueError: Unknown package 'weights-default' | hub/index.json missing (old wheel). | Upgrade: pip install -U opentracy. |
| Network error on first run | Weights download couldn’t reach HuggingFace. | Pre-download: ot.download("weights-default"). |
Engine backends — why the default matters
engine="go" is the default because the Go backend:
- Is 5–10× faster per routing decision (sub-millisecond vs a few ms).
- Runs in a subprocess so Python GIL contention doesn’t slow routing.
- Uses the same ONNX runtime across all platforms — deterministic behavior.
- Research / inspection (you can swap the cluster assigner, monkey-patch profiles, etc.).
- Environments that forbid process-spawn (some sandboxes, Lambda, etc.).
engine="auto" in production. It silently falls back to Python
if the Go binary is missing, which usually means a misconfigured install
rather than an intentional choice. Explicit "go" fails loudly with a
clear message.
