Skip to main content
ot.completion(
    model: str,
    messages: list[dict],
    *,
    api_key: Optional[str] = None,
    api_base: Optional[str] = None,
    temperature: Optional[float] = None,
    max_tokens: Optional[int] = None,
    top_p: Optional[float] = None,
    stream: bool = False,
    stop: Optional[str | list[str]] = None,
    tools: Optional[list[dict]] = None,
    tool_choice: Optional[str | dict] = None,
    timeout: float = 120.0,
    num_retries: int = 0,
    fallbacks: Optional[list[str]] = None,
    force_engine: bool = False,
    force_direct: bool = False,
    **kwargs,
) -> ModelResponse | Iterator[StreamChunk]

Parameters

NameTypeDescription
modelstr"provider/model" (e.g. "openai/gpt-4o-mini") or a bare name that auto-detects ("gpt-4o-mini", "claude-3-haiku-20240307"). "auto" means semantic routing — requires the engine.
messageslist[dict]OpenAI-format messages: [{"role": "user" | "assistant" | "system" | "tool", "content": "..."}].
api_keystr?Override the provider key from env. Defaults to the provider’s env var (OPENAI_API_KEY, ANTHROPIC_API_KEY, …).
api_basestr?Override the provider base URL. Useful for proxies, vLLM, local models.
temperaturefloat?0.02.0. Omitted if None (uses provider default).
max_tokensint?Output cap.
top_pfloat?Nucleus sampling.
streamboolTrue → returns an iterator of StreamChunk.
stopstr or list[str]?Stop sequence(s).
toolslist[dict]?Function/tool definitions — OpenAI format. The engine translates to provider-native shapes.
tool_choicestr or dict?"auto", "required", "none", or {"type": "function", "function": {"name": "..."}}.
timeoutfloatSeconds. Default 120.
num_retriesintRetries on transient errors before falling through to the next fallback model.
fallbackslist[str]?Other model strings to try in order if model fails.
force_engineboolAlways route through the OpenTracy engine even if OPENTRACY_ENGINE_URL is unset.
force_directboolAlways call the provider directly, skipping any engine routing.
**kwargsPassed through to the request body (e.g. user, logprobs, response_format).

Returns

Non-streaming — ModelResponse

An OpenAI-compatible chat-completion dict with attribute access. Standard fields plus OpenTracy extras:
resp.id                                  # str
resp.choices[0].message.content          # the answer
resp.choices[0].message.tool_calls       # if tools were used
resp.usage.prompt_tokens                 # int
resp.usage.completion_tokens             # int
resp.usage.total_tokens                  # int

# Extras
resp._provider                           # "openai" | "anthropic" | ...
resp._cost                               # USD for this call (float)
resp._latency_ms                         # float
resp._routing                            # dict — alias, selected model, scores if engine route

Streaming — Iterator[StreamChunk]

for chunk in ot.completion(..., stream=True):
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
StreamChunk mirrors OpenAI’s SSE delta format across all providers. The engine translates Anthropic / Bedrock event-streams into OpenAI SSE.

Examples

Basic call:
resp = ot.completion(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "hello"}],
    temperature=0,
    max_tokens=20,
)
Cross-provider with fallbacks:
resp = ot.completion(
    model="anthropic/claude-sonnet-4-6",
    messages=[...],
    fallbacks=["openai/gpt-4o", "deepseek/deepseek-chat"],
    num_retries=1,
)
Tool calling (provider-agnostic):
resp = ot.completion(
    model="anthropic/claude-sonnet-4-6",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "parameters": {"type": "object", "properties": {"city": {"type": "string"}}},
        },
    }],
    tool_choice="auto",
)
print(resp.choices[0].message.tool_calls)
Engine routing (semantic auto):
# Set OPENTRACY_ENGINE_URL="http://localhost:8080", or pass force_engine=True
import opentracy as ot

resp = ot.completion(
    model="auto",          # engine picks per-prompt based on learned clusters
    messages=[...],
    force_engine=True,
)
print(resp._routing)       # {"selected_model": "gpt-4o-mini", "cluster_id": 84, ...}

Async

import asyncio
import opentracy as ot

async def main():
    resp = await ot.acompletion(
        model="openai/gpt-4o-mini",
        messages=[{"role": "user", "content": "hi"}],
    )
    return resp.choices[0].message.content

asyncio.run(main())
acompletion takes the same parameters and returns the same shape.

Errors

ErrorMeaning
ValueError("Cannot resolve provider for model '...'")Bare model name that didn’t match any known prefix, and OPENTRACY_ENGINE_URL isn’t set. Add provider/ prefix or set the env var.
ValueError("No API key for <provider>")Provider’s env var isn’t set and api_key= wasn’t passed.
ValueError("Unknown provider: <name>")Provider string doesn’t match any of the 13 known.
ImportError("openai package required for async...")acompletion needs the openai Python package (it’s in the default install — reinstall if you see this).
Provider-specific HTTP errorsSurfaced as openai.APIError / urllib.error.HTTPError with the provider’s status + message.