The Complete LLM Operations Platform

Everything OpenTracy ships — from a unified model gateway to routing, tracing, evaluations, and distillation. Built for production teams with real architecture, measurable quality, and cost control.

Unified GatewaySmart RoutingQuality MonitoringSelf-hostable
Platform

Core capabilities

Everything in one place, following the same visual system and interaction patterns as the home page.

Unified Gateway

One OpenAI-compatible API that routes to 13 providers and 300+ models. Change one line of code to start.

import openai

# Just change the base URL — everything else stays the same
client = openai.OpenAI(
    base_url="https://api.opentracy.com/v1",
    api_key="your-opentracy-key"
)

response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)
  • OpenAI-compatible API -- drop-in replacement, same SDK, same format
  • 13 providers: OpenAI, Anthropic, Google Gemini, Mistral, Groq, DeepSeek, Perplexity, Cerebras, SambaNova, Together, Fireworks, Cohere, AWS Bedrock
  • 300+ models with automatic per-token pricing baked in
  • Full streaming support for all providers including Anthropic SSE translation
  • Vision and multimodal support (base64 or URL images)
  • Tool calling with cross-provider format translation

Smart Routing

Route requests to the right model based on cost, latency, complexity, or custom rules. Automatic fallbacks when providers go down.

import opentracy as ot

# Semantic routing: simple -> cheap, complex -> powerful
router = ot.Router(
    strategy="semantic",
    models={
        "simple": "openai/gpt-4o-mini",
        "complex": "anthropic/claude-sonnet-4-20250514",
    },
    fallbacks=["google/gemini-2.0-flash"]
)

response = router.completion(
    messages=[{"role": "user", "content": prompt}]
)
print(f"Routed to: {response.model}")
print(f"Cost: ${response._cost:.6f}")
  • Router class with strategies: round-robin, least-cost, lowest-latency, weighted-random
  • Semantic routing -- classifies prompt complexity, sends simple prompts to cheap models, complex ones to powerful models
  • Automatic fallbacks with configurable retry chains (e.g. GPT-4o -> Claude -> Gemini)
  • Load balancing across model pools for high-throughput workloads
  • Go engine for high-performance routing with <2ms overhead

Real-Time Traces

Every request logged with full input, output, cost, latency, model, and token counts. Query millions of traces instantly.

  • Full trace logging: input messages, output, cost, latency, model, tokens in/out
  • ClickHouse analytics backend -- query millions of traces in milliseconds
  • Real-time dashboard UI with filters, search, and trace detail view
  • Model-level performance stats: latency P50/P95/P99, error rates, cost per request
  • Export traces for offline analysis or integration with your data pipeline

Cost Intelligence

Automatic per-token pricing for every model. See exactly where your money goes and how much smart routing saves you.

  • Automatic per-token pricing for 300+ models (continuously updated pricing database)
  • Cost attached to every response -- no more guessing or manual calculation
  • Baseline vs actual cost comparison: see what you'd pay with the most expensive model vs smart routing
  • Net savings calculation with monthly projections
  • Cost breakdown by model, by provider, by time period
  • Budget alerts and anomaly detection for unexpected cost spikes

Quality Monitoring

7 autonomous AI agents continuously scan your production traffic for issues. Catch problems before your users do.

  • Cluster Labeler -- groups prompts by domain automatically
  • Trace Scanner -- detects hallucinations, refusals, PII leaks, and format issues
  • Outlier Detector -- flags anomalous traces that deviate from normal patterns
  • Coherence Scorer -- rates cluster quality to ensure consistent behavior
  • Heuristic detection: incomplete responses, refusal phrases, latency spikes, cost anomalies
  • LLM-based hallucination detection with confidence scoring (0-1)

Evaluations

LLM-as-Judge for pairwise comparison and pointwise scoring. Track quality across model updates with real metrics.

  • Pairwise comparison: model A vs B, pick the winner on your production data
  • Pointwise scoring: rate responses 1-5 with customizable rubrics
  • RouterEvaluator: benchmark routing decisions against cached responses
  • AUROC metrics, Pareto curves, and win rate calculations
  • Domain-specific evaluation with AI-suggested quality metrics
  • Track quality over time across model updates and routing changes

Model Distillation (BOND Pipeline)

Train smaller, faster, cheaper models from your production data. Full pipeline from teacher model to deployed LoRA.

  • Pipeline: Teacher model -> LLM-as-Judge curation -> LoRA training (Unsloth) -> GGUF export
  • Automatic training data extraction from production traces
  • Preference pair generation for DPO/RLHF alignment
  • Golden dataset augmentation for evaluation benchmarks
  • Own your models -- no vendor lock-in, deploy anywhere
  • Eval Generator creates evaluation datasets from real production data

Prompt Clustering

Automatic domain discovery from your production traffic. Understand what your users actually ask and how each domain performs.

  • Automatic domain discovery from production traffic patterns
  • KMeans + learned map clustering for grouping similar prompts
  • Embedding-based similarity using sentence transformers
  • Per-cluster quality metrics and cost analysis
  • Drift detection when traffic patterns change unexpectedly
  • Merge Checker suggests cluster consolidation to reduce noise

Deployment

Full stack with Docker. Self-host with MIT license or use the managed cloud. Production-ready from day one.

# Install the SDK
pip install opentracy

# Or self-host the full stack
git clone https://github.com/lunar-org-ai/lunar-router.git
cd lunar-router && docker compose up -d
  • Full stack Docker deployment: ClickHouse + Go engine + Python API + React UI
  • Self-host option with MIT license -- your data stays on your infrastructure
  • Go engine for high-performance routing (<2ms overhead per request)
  • Python SDK: pip install opentracy
  • OpenAI SDK drop-in: just change base_url to your OpenTracy instance

Ready to take control of your LLM stack?

Open source, self-hostable, MIT licensed. Start in 5 minutes.