Skip to main content
By the end of this page — in under three minutes — you’ll have made a real LLM call, seen the cost and latency on the response, swapped providers with one string change, and added automatic fallbacks. No server, no Docker, no config files.
What you need right now: an OpenAI API key (or Anthropic, Groq, etc. — any of the 13 providers). Nothing else.

1. Install — 30 seconds

pip install opentracy
export OPENAI_API_KEY=sk-...

2. Your first call — 30 seconds

import opentracy as ot

resp = ot.completion(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Say hello in three words."}],
)

print(resp.choices[0].message.content)
print(f"cost: ${resp._cost:.6f}  latency: {resp._latency_ms:.0f}ms")
Hi there, friend!
cost: $0.000008  latency: 612ms
This is the hook. Every response already carries _cost and _latency_ms. You didn’t wire up any observability — it’s on by default. ot.completion is OpenAI-compatible, so resp.choices[0].message.content, resp.usage, and streaming all work like you’d expect.

3. Switch providers with one string — 1 minute

Same function, same message shape, different provider. No new SDK, no new auth code:
# Anthropic
resp = ot.completion(
    model="anthropic/claude-haiku-4-5-20251001",
    messages=[{"role": "user", "content": "Say hello in three words."}],
)

# Groq (Llama, sub-second)
resp = ot.completion(
    model="groq/llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Say hello in three words."}],
)

# DeepSeek (cheap reasoning)
resp = ot.completion(
    model="deepseek/deepseek-chat",
    messages=[{"role": "user", "content": "Say hello in three words."}],
)
Each provider reads its own env var (ANTHROPIC_API_KEY, GROQ_API_KEY, DEEPSEEK_API_KEY, …). The 13-provider matrix is in the completion reference.

4. Add fallbacks — 1 minute

Production calls that survive one provider being down:
resp = ot.completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Draft a pithy tagline."}],
    fallbacks=[
        "anthropic/claude-sonnet-4-6",
        "deepseek/deepseek-chat",
    ],
    num_retries=1,
)

print(resp._provider)   # which one actually answered
If OpenAI rate-limits you, Anthropic picks up. If Anthropic is degraded, DeepSeek does. You don’t get paged.

5. Done. What you now have.

OpenAI-compatible

Same message format, same response shape. Any existing code moves over.

13 providers

Switch at any time. One string change, no auth rewrite.

Cost + latency by default

_cost and _latency_ms on every response. No setup.

Production fallbacks

Survive provider outages without writing retry logic yourself.

Where to go next

Drop in over the OpenAI SDK

Point existing OpenAI code at OpenTracy — zero library changes.
~2 minutes.

Semantic auto-routing

Let the router pick the cheapest model that’s good enough per prompt.
~5 minutes (downloads ~100 MB of weights once).

Full observability

Self-host to capture every trace in ClickHouse + a UI for analytics.
~30 minutes (needs Docker).

Distill your own model

Fine-tune a tiny student from your traffic. The cost-reduction wedge.
~2 hours (needs self-host + a GPU).

Optional: try the semantic auto-router

If you want to see the full pipeline in action — including the model picking itself per prompt based on learned error profiles — load the pre-trained router. This downloads ~100 MB of weights on first run and caches them in ~/.local/share/opentracy/.
import opentracy as ot

router = ot.load_router(cost_weight=0.5)

for prompt in [
    "What is the capital of France?",
    "Prove the square root of 2 is irrational.",
    "Write a haiku about autumn.",
]:
    d = router.route(prompt)
    print(f"[{d.selected_model:<24}] cluster={d.cluster_id:>3}  {prompt}")
[ministral-3b-latest     ] cluster= 84  What is the capital of France?
[gpt-4o                  ] cluster= 47  Prove the square root of 2 is irrational.
[ministral-3b-latest     ] cluster= 29  Write a haiku about autumn.
Easy trivia → a cheap small model. Math proof → a strong model. No rules from you. See Auto-routing for the full picture.