Artigos

Reflexões sobre infraestrutura de LLMs, otimização de custos e desenvolvimento com IA.

The Real Cost of LLM Routing: What Nobody Is Measuring

Most teams track token costs. Almost nobody tracks the full cost of routing decisions. We analyzed 2.3 million requests across 47 production deployments and found that routing overhead — not token pricing — is the dominant cost driver at scale.

researchcost-analysis

Introducing OpenTracy: Automated Distillation for Production LLMs

Today we're launching OpenTracy, a platform that automatically creates Small Language Models from your production traces, cutting inference costs by up to 57%.

announcementproduct

Knowledge Distillation: How to Train Small Models from Large Ones

A technical deep-dive into knowledge distillation and how we use it to create production-ready Small Language Models.

technicalresearch

5 Strategies for Reducing LLM Inference Costs

Practical strategies for reducing your LLM inference costs, from simple optimizations to advanced techniques like distillation.

guideoptimization

How to Evaluate Small Language Models for Production

A comprehensive guide to evaluating SLMs, including metrics, test sets, and common pitfalls to avoid.

guideevaluation

Self-Hosting Small Language Models: A Complete Guide

Everything you need to know about deploying SLMs to your own infrastructure, from hardware requirements to serving frameworks.

guidedeployment

OpenTracy SDK v2: Fallbacks, Streaming, and Cost Tracking

Announcing OpenTracy SDK v2 with automatic fallbacks, streaming support, built-in cost tracking, and async clients for Python and TypeScript.

announcementsdk