ot.completion - OpenTracy

ot.completion(
    model: str,
    messages: list[dict],
    *,
    api_key: Optional[str] = None,
    api_base: Optional[str] = None,
    temperature: Optional[float] = None,
    max_tokens: Optional[int] = None,
    top_p: Optional[float] = None,
    stream: bool = False,
    stop: Optional[str | list[str]] = None,
    tools: Optional[list[dict]] = None,
    tool_choice: Optional[str | dict] = None,
    timeout: float = 120.0,
    num_retries: int = 0,
    fallbacks: Optional[list[str]] = None,
    force_engine: bool = False,
    force_direct: bool = False,
    **kwargs,
) -> ModelResponse | Iterator[StreamChunk]

Parameters

Name	Type	Description
`model`	`str`	`"provider/model"` (e.g. `"openai/gpt-4o-mini"`) or a bare name that auto-detects (`"gpt-4o-mini"`, `"claude-3-haiku-20240307"`). `"auto"` means semantic routing — requires the engine.
`messages`	`list[dict]`	OpenAI-format messages: `[{"role": "user" \| "assistant" \| "system" \| "tool", "content": "..."}]`.
`api_key`	`str?`	Override the provider key from env. Defaults to the provider’s env var (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, …).
`api_base`	`str?`	Override the provider base URL. Useful for proxies, vLLM, local models.
`temperature`	`float?`	`0.0`–`2.0`. Omitted if `None` (uses provider default).
`max_tokens`	`int?`	Output cap.
`top_p`	`float?`	Nucleus sampling.
`stream`	`bool`	`True` → returns an iterator of `StreamChunk`.
`stop`	`str` or `list[str]?`	Stop sequence(s).
`tools`	`list[dict]?`	Function/tool definitions — OpenAI format. The engine translates to provider-native shapes.
`tool_choice`	`str` or `dict?`	`"auto"`, `"required"`, `"none"`, or `{"type": "function", "function": {"name": "..."}}`.
`timeout`	`float`	Seconds. Default 120.
`num_retries`	`int`	Retries on transient errors before falling through to the next fallback model.
`fallbacks`	`list[str]?`	Other model strings to try in order if `model` fails.
`force_engine`	`bool`	Always route through the OpenTracy engine even if `OPENTRACY_ENGINE_URL` is unset.
`force_direct`	`bool`	Always call the provider directly, skipping any engine routing.
`**kwargs`		Passed through to the request body (e.g. `user`, `logprobs`, `response_format`).

Returns

Non-streaming — `ModelResponse`

An OpenAI-compatible chat-completion dict with attribute access. Standard fields plus OpenTracy extras:

resp.id                                  # str
resp.choices[0].message.content          # the answer
resp.choices[0].message.tool_calls       # if tools were used
resp.usage.prompt_tokens                 # int
resp.usage.completion_tokens             # int
resp.usage.total_tokens                  # int

# Extras
resp._provider                           # "openai" | "anthropic" | ...
resp._cost                               # USD for this call (float)
resp._latency_ms                         # float
resp._routing                            # dict — alias, selected model, scores if engine route

Streaming — `Iterator[StreamChunk]`

for chunk in ot.completion(..., stream=True):
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

StreamChunk mirrors OpenAI’s SSE delta format across all providers. The engine translates Anthropic / Bedrock event-streams into OpenAI SSE.

Examples

Basic call:

resp = ot.completion(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "hello"}],
    temperature=0,
    max_tokens=20,
)

Cross-provider with fallbacks:

resp = ot.completion(
    model="anthropic/claude-sonnet-4-6",
    messages=[...],
    fallbacks=["openai/gpt-4o", "deepseek/deepseek-chat"],
    num_retries=1,
)

Tool calling (provider-agnostic):

resp = ot.completion(
    model="anthropic/claude-sonnet-4-6",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "parameters": {"type": "object", "properties": {"city": {"type": "string"}}},
        },
    }],
    tool_choice="auto",
)
print(resp.choices[0].message.tool_calls)

Engine routing (semantic auto):

# Set OPENTRACY_ENGINE_URL="http://localhost:8080", or pass force_engine=True
import opentracy as ot

resp = ot.completion(
    model="auto",          # engine picks per-prompt based on learned clusters
    messages=[...],
    force_engine=True,
)
print(resp._routing)       # {"selected_model": "gpt-4o-mini", "cluster_id": 84, ...}

Async

import asyncio
import opentracy as ot

async def main():
    resp = await ot.acompletion(
        model="openai/gpt-4o-mini",
        messages=[{"role": "user", "content": "hi"}],
    )
    return resp.choices[0].message.content

asyncio.run(main())

acompletion takes the same parameters and returns the same shape.

Errors

Error	Meaning
`ValueError("Cannot resolve provider for model '...'")`	Bare model name that didn’t match any known prefix, and `OPENTRACY_ENGINE_URL` isn’t set. Add `provider/` prefix or set the env var.
`ValueError("No API key for <provider>")`	Provider’s env var isn’t set and `api_key=` wasn’t passed.
`ValueError("Unknown provider: <name>")`	Provider string doesn’t match any of the 13 known.
`ImportError("openai package required for async...")`	`acompletion` needs the `openai` Python package (it’s in the default install — reinstall if you see this).
Provider-specific HTTP errors	Surfaced as `openai.APIError` / `urllib.error.HTTPError` with the provider’s status + message.

​Parameters

​Returns

​Non-streaming — ModelResponse

​Streaming — Iterator[StreamChunk]

​Examples

​Async

​Errors