Skip to main content
The Python SDK (opentracy) is the native entry point. Use it if you’re starting a new project or if you want features (auto-routing, distillation, trace ingestion) that aren’t part of the OpenAI API shape.

Install

pip install opentracy
One install pulls a platform-specific wheel with the Go engine binary, the ONNX embedder, and pre-trained routing weights bundled in. No extras needed for the core path.
pip install "opentracy[distill]"    # adds training deps (torch, unsloth, peft, trl)
pip install "opentracy[research]"   # adds sentence-transformers for the Python router backend
pip install "opentracy[server]"     # adds FastAPI + ClickHouse for self-hosting
pip install "opentracy[anthropic]"  # native Anthropic SDK path
pip install "opentracy[all]"        # everything

The four things you’ll do

1. One-off completion

Just a chat completion, no routing, no trace.
import opentracy as ot

resp = ot.completion(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "What's 2+2?"}],
    temperature=0,
)
print(resp.choices[0].message.content)
Full API: completion reference.

2. Explicit router with fallbacks

When you want deterministic rules (“try GPT-4o first, then Claude, then DeepSeek”), use the Router class:
router = ot.Router(
    model_list=[
        {"model_name": "smart", "model": "openai/gpt-4o"},
        {"model_name": "smart", "model": "anthropic/claude-sonnet-4-6"},
    ],
    fallbacks=[{"smart": ["deepseek/deepseek-chat"]}],
    strategy="round-robin",   # or "least-cost", "lowest-latency", "weighted-random"
    num_retries=2,
    timeout=60,
)

resp = router.completion(
    model="smart",   # logical alias, resolved to one of the deployments
    messages=[{"role": "user", "content": "..."}],
)
Full API: Router reference.

3. Semantic auto-router

Load the pre-trained router once; it picks the right model per prompt:
auto = ot.load_router(cost_weight=0.5)

decision = auto.route("Write a haiku about autumn")
print(decision.selected_model)      # e.g. "ministral-3b-latest"
print(decision.cluster_id)          # e.g. 87
print(decision.expected_error)      # e.g. 0.212
print(decision.all_scores)          # full score dict
Combined with ot.completion this becomes a cost-optimizing client:
def smart_call(prompt: str, api_key: str) -> str:
    d = auto.route(prompt)
    resp = ot.completion(
        model=d.selected_model,
        messages=[{"role": "user", "content": prompt}],
        api_key=api_key,
    )
    return resp.choices[0].message.content
Full API: load_router reference.

4. Distillation

from opentracy import Distiller

d = Distiller(base_url="http://localhost:8000")

# Requires the engine + REST API to be running. See "Self-host" guide.
job = d.create(
    name="support-ticket-triage",
    dataset_id="ds_abc123",
    teacher_model="openai/gpt-4o",
    student_model="llama-3.2-1b",
    num_prompts=500,
    n_samples=4,
    training_steps=100,
)

job = d.wait(job["id"])
artifacts = d.artifacts(job["id"])
Full API: Distiller reference.

Async

Everything that has a sync version has async:
import asyncio
import opentracy as ot

async def main():
    resp = await ot.acompletion(
        model="openai/gpt-4o-mini",
        messages=[{"role": "user", "content": "hello"}],
    )
    print(resp.choices[0].message.content)

asyncio.run(main())
acompletion shares its request-preparation path with the sync version, so force_engine, force_direct, fallbacks, and engine-prefix handling all behave identically.

Trace ingestion

If you have existing logs from another LLM provider and want to use them for dataset building or distillation in OpenTracy, you can import them directly:
from opentracy import add_trace, add_traces, import_traces

# Single trace
add_trace({
    "prompt": "Classify: ...",
    "response": "billing",
    "model": "openai/gpt-4o",
    "total_cost_usd": 0.00025,
    "latency_ms": 340,
    "metadata": {"source": "legacy-log-export"},
})

# Batch
add_traces([{...}, {...}, {...}])

# From a JSONL file
import_traces("path/to/exported-traces.jsonl")

Engine routing opt-in

By default the SDK calls providers directly. To route through an OpenTracy engine (for observability, aliases, etc.), set the env var once:
export OPENTRACY_ENGINE_URL="http://localhost:8080"
From that point on, ot.completion(...) routes through the engine. Per-call overrides:
# Always engine (even if OPENTRACY_ENGINE_URL is unset):
ot.completion(..., force_engine=True)

# Always direct (even if OPENTRACY_ENGINE_URL is set):
ot.completion(..., force_direct=True)
Why isn’t this automatic? Because silently routing through whatever happens to be listening on localhost:8080 is a footgun. Opt-in is explicit.

13 providers via create_client

If you want a first-class LLMClient object (for profiling, or to fit into custom routing code), create_client covers every provider:
c = ot.create_client("openai",   "gpt-4o-mini")       # dedicated class
c = ot.create_client("deepseek", "deepseek-chat")     # UnifiedClient wrapper
c = ot.create_client("together", "meta-llama/Llama-3")# UnifiedClient wrapper

out = c.generate("Hello", max_tokens=64, temperature=0.0)
print(out.text, out.latency_ms, out.tokens_used)
Five providers have dedicated classes (OpenAI, Anthropic, Google, Groq, Mistral); the remaining seven (DeepSeek, Perplexity, Cerebras, Sambanova, Together, Fireworks, Cohere) route through a UnifiedClient that speaks the OpenAI-chat protocol. Bedrock is registered but raises a clear error on construction — AWS SigV4 is not handled by UnifiedClient yet; use ot.completion(force_engine=True) instead.

Public API — the 18 names

Everything import opentracy as ot exposes publicly:
# Core
ot.completion, ot.acompletion, ot.Router, ot.ModelResponse, ot.StreamChunk, ot.parse_model
# Multi-provider
ot.create_client, ot.LLMResponse
# Pricing
ot.model_cost, ot.get_model_info, ot.supported_models
# Trace ingestion
ot.add_trace, ot.add_traces, ot.import_traces
# Distillation
ot.Distiller, ot.TrainingClient, ot.DistillerError
# Version
ot.__version__
Legacy research APIs (load_router, UniRouteRouter, RouterEvaluator, LLMJudge, …) resolve lazily via __getattr__ — they import the first time you touch them, so they don’t slow down the initial import opentracy.
Legacy code using import lunar_router as lr keeps working via a backwards-compat shim that redirects to opentracy and emits a DeprecationWarning. New code should use import opentracy as ot.

Next

Self-host

Run engine + ClickHouse + UI locally or in your cloud.

API Reference

Every parameter and return value.