The Complete LLM Operations Platform
Everything OpenTracy delivers -- from unified gateway to model distillation. Not marketing. Real capabilities, real architecture.
Unified Gateway
One OpenAI-compatible API that routes to 13 providers and 70+ models. Change one line of code to start.
- OpenAI-compatible API -- drop-in replacement, same SDK, same format
- 13 providers: OpenAI, Anthropic, Google Gemini, Mistral, Groq, DeepSeek, Perplexity, Cerebras, SambaNova, Together, Fireworks, Cohere, AWS Bedrock
- 70+ models with automatic per-token pricing baked in
- Full streaming support for all providers including Anthropic SSE translation
- Vision and multimodal support (base64 or URL images)
- Tool calling with cross-provider format translation
import openai
# Just change the base URL — everything else stays the same
client = openai.OpenAI(
base_url="https://api.opentracy.com/v1",
api_key="your-opentracy-key"
)
response = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)Smart Routing
Route requests to the right model based on cost, latency, complexity, or custom rules. Automatic fallbacks when providers go down.
- Router class with strategies: round-robin, least-cost, lowest-latency, weighted-random
- Semantic routing -- classifies prompt complexity, sends simple prompts to cheap models, complex ones to powerful models
- Automatic fallbacks with configurable retry chains (e.g. GPT-4o -> Claude -> Gemini)
- Load balancing across model pools for high-throughput workloads
- Go engine for high-performance routing with <2ms overhead
import opentracy as ot
# Semantic routing: simple -> cheap, complex -> powerful
router = ot.Router(
strategy="semantic",
models={
"simple": "openai/gpt-4o-mini",
"complex": "anthropic/claude-sonnet-4-20250514",
},
fallbacks=["google/gemini-2.0-flash"]
)
response = router.completion(
messages=[{"role": "user", "content": prompt}]
)
print(f"Routed to: {response.model}")
print(f"Cost: ${response._cost:.6f}")Real-Time Traces
Every request logged with full input, output, cost, latency, model, and token counts. Query millions of traces instantly.
- Full trace logging: input messages, output, cost, latency, model, tokens in/out
- ClickHouse analytics backend -- query millions of traces in milliseconds
- Real-time dashboard UI with filters, search, and trace detail view
- Model-level performance stats: latency P50/P95/P99, error rates, cost per request
- Export traces for offline analysis or integration with your data pipeline
Full trace logging: input messages, output, cost, latency, model, tokens in/out
ClickHouse analytics backend -- query millions of traces in milliseconds
Real-time dashboard UI with filters, search, and trace detail view
Model-level performance stats: latency P50/P95/P99, error rates, cost per request
Cost Intelligence
Automatic per-token pricing for every model. See exactly where your money goes and how much smart routing saves you.
- Automatic per-token pricing for 70+ models (continuously updated pricing database)
- Cost attached to every response -- no more guessing or manual calculation
- Baseline vs actual cost comparison: see what you'd pay with the most expensive model vs smart routing
- Net savings calculation with monthly projections
- Cost breakdown by model, by provider, by time period
- Budget alerts and anomaly detection for unexpected cost spikes
Automatic per-token pricing for 70+ models (continuously updated pricing database)
Cost attached to every response -- no more guessing or manual calculation
Baseline vs actual cost comparison: see what you'd pay with the most expensive model vs smart routing
Net savings calculation with monthly projections
Quality Monitoring
7 autonomous AI agents continuously scan your production traffic for issues. Catch problems before your users do.
- Cluster Labeler -- groups prompts by domain automatically
- Trace Scanner -- detects hallucinations, refusals, PII leaks, and format issues
- Outlier Detector -- flags anomalous traces that deviate from normal patterns
- Coherence Scorer -- rates cluster quality to ensure consistent behavior
- Heuristic detection: incomplete responses, refusal phrases, latency spikes, cost anomalies
- LLM-based hallucination detection with confidence scoring (0-1)
Cluster Labeler -- groups prompts by domain automatically
Trace Scanner -- detects hallucinations, refusals, PII leaks, and format issues
Outlier Detector -- flags anomalous traces that deviate from normal patterns
Coherence Scorer -- rates cluster quality to ensure consistent behavior
Evaluations
LLM-as-Judge for pairwise comparison and pointwise scoring. Track quality across model updates with real metrics.
- Pairwise comparison: model A vs B, pick the winner on your production data
- Pointwise scoring: rate responses 1-5 with customizable rubrics
- RouterEvaluator: benchmark routing decisions against cached responses
- AUROC metrics, Pareto curves, and win rate calculations
- Domain-specific evaluation with AI-suggested quality metrics
- Track quality over time across model updates and routing changes
Pairwise comparison: model A vs B, pick the winner on your production data
Pointwise scoring: rate responses 1-5 with customizable rubrics
RouterEvaluator: benchmark routing decisions against cached responses
AUROC metrics, Pareto curves, and win rate calculations
Model Distillation (BOND Pipeline)
Train smaller, faster, cheaper models from your production data. Full pipeline from teacher model to deployed LoRA.
- Pipeline: Teacher model -> LLM-as-Judge curation -> LoRA training (Unsloth) -> GGUF export
- Automatic training data extraction from production traces
- Preference pair generation for DPO/RLHF alignment
- Golden dataset augmentation for evaluation benchmarks
- Own your models -- no vendor lock-in, deploy anywhere
- Eval Generator creates evaluation datasets from real production data
Pipeline: Teacher model -> LLM-as-Judge curation -> LoRA training (Unsloth) -> GGUF export
Automatic training data extraction from production traces
Preference pair generation for DPO/RLHF alignment
Golden dataset augmentation for evaluation benchmarks
Prompt Clustering
Automatic domain discovery from your production traffic. Understand what your users actually ask and how each domain performs.
- Automatic domain discovery from production traffic patterns
- KMeans + learned map clustering for grouping similar prompts
- Embedding-based similarity using sentence transformers
- Per-cluster quality metrics and cost analysis
- Drift detection when traffic patterns change unexpectedly
- Merge Checker suggests cluster consolidation to reduce noise
Automatic domain discovery from production traffic patterns
KMeans + learned map clustering for grouping similar prompts
Embedding-based similarity using sentence transformers
Per-cluster quality metrics and cost analysis
Deployment
Full stack with Docker. Self-host with MIT license or use the managed cloud. Production-ready from day one.
- Full stack Docker deployment: ClickHouse + Go engine + Python API + React UI
- Self-host option with MIT license -- your data stays on your infrastructure
- Go engine for high-performance routing (<2ms overhead per request)
- Python SDK: pip install opentracy
- OpenAI SDK drop-in: just change base_url to your OpenTracy instance
# Install the SDK
pip install opentracy
# Or self-host the full stack
git clone https://github.com/lunar-org-ai/lunar-router.git
cd lunar-router && docker compose up -dReady to take control of your LLM stack?
Open source, self-hostable, MIT licensed. Start in 5 minutes.