Back to blog

Introducing OpenTracy: Automated Distillation for Production LLMs

OpenTracy Team//
announcementproduct

Today we're launching OpenTracy, a platform that automatically creates Small Language Models from your production traces, cutting inference costs by up to 57%.


Today we're launching OpenTracy, a platform that automatically creates Small Language Models from your production traces, cutting inference costs by up to 57%.

The Problem

Running LLMs in production is expensive. Most teams start with GPT-4 or Claude for quality, then struggle to optimize costs as they scale. The options are limited:

  • Prompt engineering: Limited gains, lots of trial and error
  • Caching: Only helps with exact matches
  • Cheaper models: Quality drops significantly
  • Our Solution

    OpenTracy takes a different approach. We analyze your production traces—the actual inputs and outputs from your LLM calls—and use them to train a smaller, specialized model that handles your specific use case.

    How It Works

  • Connect your traces: Point OpenTracy at your production logs
  • Automated curation: We filter and prepare high-quality training data
  • Distillation: Train a small model on your specific domain
  • Evaluation: Comprehensive testing against your success criteria
  • Deployment: One-click deploy to your infrastructure
  • Results

    In our beta, customers saw:

  • 57% average cost reduction
  • Sub-100ms latency (down from 2-3 seconds)
  • 95%+ quality retention on domain-specific tasks
  • Get Started

    OpenTracy is available today. Sign up for free at opentracy.dev and start cutting your LLM costs.