GET /v1/models & /health

Two lightweight endpoints you’ll use when scripting against the engine without a client library.

GET /v1/models

Returns every model the engine knows about, with pricing and per-model routing stats.

GET /v1/models HTTP/1.1
Host: localhost:8080

Response

{
  "models": [
    {
      "model_id": "gpt-4o",
      "cost_per_1k_tokens": 0.015,
      "num_clusters": 100,
      "overall_accuracy": 0.92
    },
    {
      "model_id": "gpt-4o-mini",
      "cost_per_1k_tokens": 0.00015,
      "num_clusters": 100,
      "overall_accuracy": 0.81
    }
  ],
  "default_model": "gpt-4o-mini"
}

Field	Type	Meaning
`models[].model_id`	`string`	Canonical ID. Pair with a provider prefix for completions.
`models[].cost_per_1k_tokens`	`float`	USD per 1,000 tokens (blended input+output; see pricing tables).
`models[].num_clusters`	`int`	Clusters this model has an error profile for.
`models[].overall_accuracy`	`float`	Average accuracy across all profiled clusters.
`default_model`	`string`	Model used when `"model"` is `"auto"` and the router is unset.

Curl

curl -s http://localhost:8080/v1/models | jq .

GET /health

Used by load balancers and healthchecks.

curl -s http://localhost:8080/health
curl -s http://localhost:8000/health

Response (`:8080` — gateway)

{
  "status": "healthy",
  "router_initialized": true,
  "num_models": 12,
  "num_clusters": 100,
  "embedder_ready": true
}

Response (`:8000` — management API)

{
  "status": "healthy",
  "router_initialized": true,
  "num_models": 12,
  "num_clusters": 100
}

status is one of healthy, degraded (router loaded but embedder down, for example), or unhealthy. Treat anything other than healthy as “don’t route new traffic”.

​GET /v1/models

​Response

​Curl

​GET /health

​Response (:8080 — gateway)

​Response (:8000 — management API)