Distillation endpoints

The management API on port 8000 exposes the distillation pipeline over REST. Use this if you’re driving training from a language that doesn’t have a Python client — CI jobs, a TypeScript backend, or a Rust CLI, for example.

The REST endpoints backing the Python Distiller client. Any call you make through the SDK can also be made over HTTP.

POST /v1/distillation

Create a new distillation job. Returns immediately with status: "pending"; training happens asynchronously on the engine host.

POST /v1/distillation HTTP/1.1
Host: localhost:8000
Content-Type: application/json

Request body

{
  "tenant_id": "default",
  "name": "ticket-triage v1",
  "description": "distill GPT-4o onto a 1B llama for support tickets",
  "config": {
    "teacher_model": "openai/gpt-4o",
    "student_model": "llama-3.2-1b",
    "num_prompts": 500,
    "n_samples": 4,
    "training_steps": 100,
    "bond_beta": 0.5,
    "bond_gamma": 0.1,
    "temperature": 0.8,
    "export_gguf": true,
    "quantization_types": ["q4_k_m", "q8_0"]
  }
}

Field	Type	Notes
`tenant_id`	`string`	Workspace key. Defaults to `"default"`.
`name`	`string`	Human label.
`description`	`string`	Optional.
`config.teacher_model`	`string`	Provider-prefixed, e.g. `openai/gpt-4o`.
`config.student_model`	`string`	HF-style ID, e.g. `llama-3.2-1b`.
`config.num_prompts`	`int`	Cap on dataset rows to use.
`config.n_samples`	`int`	Best-of-N candidates per prompt (default 4).
`config.training_steps`	`int`	Fine-tune steps.
`config.bond_beta`	`float`	BOND preference weight (default 0.5).
`config.bond_gamma`	`float`	KL regularization strength (default 0.1).
`config.export_gguf`	`bool`	Convert trained adapter to GGUF after training.
`config.quantization_types`	`array of string`	Quantization flavors, e.g. `["q4_k_m", "q8_0"]`.

Response

{
  "id": "job_abc123",
  "name": "ticket-triage v1",
  "tenant_id": "default",
  "status": "pending",
  "phase": "initializing",
  "progress": {},
  "results": {},
  "cost_accrued": 0.0,
  "created_at": "2026-04-19T12:00:00Z",
  "updated_at": "2026-04-19T12:00:00Z"
}

Curl

curl -X POST http://localhost:8000/v1/distillation \
  -H "Content-Type: application/json" \
  -d '{
    "name": "demo",
    "config": {
      "teacher_model": "openai/gpt-4o-mini",
      "student_model": "llama-3.2-1b",
      "num_prompts": 50,
      "training_steps": 30
    }
  }'

GET /v1/distillation/

Fetch the current state of a job.

GET /v1/distillation/job_abc123?tenant_id=default HTTP/1.1
Host: localhost:8000

Response

{
  "id": "job_abc123",
  "status": "training",
  "phase": "data_generation",
  "progress": {
    "prompts_done": 120,
    "prompts_total": 500,
    "training_step": 45
  },
  "results": {},
  "cost_accrued": 0.82,
  "created_at": "2026-04-19T12:00:00Z",
  "updated_at": "2026-04-19T12:03:41Z"
}

Status values progress: pending → running → completed | failed | cancelled. phase is more granular: initializing → data_generation → curation → training → export → (done).

Polling idiom

while true; do
  state=$(curl -s "http://localhost:8000/v1/distillation/$JOB?tenant_id=default")
  status=$(echo "$state" | jq -r .status)
  echo "status=$status phase=$(echo "$state" | jq -r .phase)"
  [ "$status" = "completed" ] || [ "$status" = "failed" ] && break
  sleep 10
done

GET /v1/distillation — list jobs

curl -s "http://localhost:8000/v1/distillation?tenant_id=default&limit=20"

Response

{
  "jobs": [ /* same shape as GET /{id} */ ],
  "total": 42,
  "has_more": true
}

Supported query params: tenant_id, status, limit (max 100), offset.

POST /v1/distillation//cancel

Cancel a running job. Safe at any phase — partial artifacts are kept.

curl -X POST http://localhost:8000/v1/distillation/job_abc123/cancel

GET /v1/distillation//artifacts

Fetch file paths on the engine host for the trained adapter + GGUF exports. Paths are relative to the engine’s OPENTRACY_DATA_DIR.

{
  "adapter_path": "/app/data/distillation/job_abc123/adapter/",
  "gguf_paths": {
    "q4_k_m": "/app/data/distillation/job_abc123/gguf/model-q4_k_m.gguf",
    "q8_0":   "/app/data/distillation/job_abc123/gguf/model-q8_0.gguf"
  },
  "tokenizer_path": "/app/data/distillation/job_abc123/adapter/tokenizer.model",
  "config_path":    "/app/data/distillation/job_abc123/train_config.json"
}

Errors

Status	`error.code`	Meaning
`400`	`invalid_config`	Unknown model, missing required field, or bad range.
`402`	`insufficient_credits`	Cost estimate exceeds tenant’s budget.
`404`	`job_not_found`	`job_id` doesn’t exist (or belongs to another tenant).
`409`	`job_already_running`	Attempted to mutate a terminal job.
`500`	`training_error`	Subprocess crashed — see `logs` endpoint for details.

​POST /v1/distillation

​Request body

​Response

​Curl

​GET /v1/distillation/

​Response

​Polling idiom

​GET /v1/distillation — list jobs

​Response

​POST /v1/distillation//cancel

​GET /v1/distillation//artifacts

​Errors

POST /v1/distillation

Request body

Response

Curl

GET /v1/distillation/

Response

Polling idiom

GET /v1/distillation — list jobs

Response

POST /v1/distillation//cancel

GET /v1/distillation//artifacts

Errors