Skip to main content
The management API on port 8000 exposes the distillation pipeline over REST. Use this if you’re driving training from a language that doesn’t have a Python client — CI jobs, a TypeScript backend, or a Rust CLI, for example.
The REST endpoints backing the Python Distiller client. Any call you make through the SDK can also be made over HTTP.

POST /v1/distillation

Create a new distillation job. Returns immediately with status: "pending"; training happens asynchronously on the engine host.
POST /v1/distillation HTTP/1.1
Host: localhost:8000
Content-Type: application/json

Request body

{
  "tenant_id": "default",
  "name": "ticket-triage v1",
  "description": "distill GPT-4o onto a 1B llama for support tickets",
  "config": {
    "teacher_model": "openai/gpt-4o",
    "student_model": "llama-3.2-1b",
    "num_prompts": 500,
    "n_samples": 4,
    "training_steps": 100,
    "bond_beta": 0.5,
    "bond_gamma": 0.1,
    "temperature": 0.8,
    "export_gguf": true,
    "quantization_types": ["q4_k_m", "q8_0"]
  }
}
FieldTypeNotes
tenant_idstringWorkspace key. Defaults to "default".
namestringHuman label.
descriptionstringOptional.
config.teacher_modelstringProvider-prefixed, e.g. openai/gpt-4o.
config.student_modelstringHF-style ID, e.g. llama-3.2-1b.
config.num_promptsintCap on dataset rows to use.
config.n_samplesintBest-of-N candidates per prompt (default 4).
config.training_stepsintFine-tune steps.
config.bond_betafloatBOND preference weight (default 0.5).
config.bond_gammafloatKL regularization strength (default 0.1).
config.export_ggufboolConvert trained adapter to GGUF after training.
config.quantization_typesarray of stringQuantization flavors, e.g. ["q4_k_m", "q8_0"].

Response

{
  "id": "job_abc123",
  "name": "ticket-triage v1",
  "tenant_id": "default",
  "status": "pending",
  "phase": "initializing",
  "progress": {},
  "results": {},
  "cost_accrued": 0.0,
  "created_at": "2026-04-19T12:00:00Z",
  "updated_at": "2026-04-19T12:00:00Z"
}

Curl

curl -X POST http://localhost:8000/v1/distillation \
  -H "Content-Type: application/json" \
  -d '{
    "name": "demo",
    "config": {
      "teacher_model": "openai/gpt-4o-mini",
      "student_model": "llama-3.2-1b",
      "num_prompts": 50,
      "training_steps": 30
    }
  }'

GET /v1/distillation/

Fetch the current state of a job.
GET /v1/distillation/job_abc123?tenant_id=default HTTP/1.1
Host: localhost:8000

Response

{
  "id": "job_abc123",
  "status": "training",
  "phase": "data_generation",
  "progress": {
    "prompts_done": 120,
    "prompts_total": 500,
    "training_step": 45
  },
  "results": {},
  "cost_accrued": 0.82,
  "created_at": "2026-04-19T12:00:00Z",
  "updated_at": "2026-04-19T12:03:41Z"
}
Status values progress: pendingrunningcompleted | failed | cancelled. phase is more granular: initializingdata_generationcurationtrainingexport → (done).

Polling idiom

while true; do
  state=$(curl -s "http://localhost:8000/v1/distillation/$JOB?tenant_id=default")
  status=$(echo "$state" | jq -r .status)
  echo "status=$status phase=$(echo "$state" | jq -r .phase)"
  [ "$status" = "completed" ] || [ "$status" = "failed" ] && break
  sleep 10
done

GET /v1/distillation — list jobs

curl -s "http://localhost:8000/v1/distillation?tenant_id=default&limit=20"

Response

{
  "jobs": [ /* same shape as GET /{id} */ ],
  "total": 42,
  "has_more": true
}
Supported query params: tenant_id, status, limit (max 100), offset.

POST /v1/distillation//cancel

Cancel a running job. Safe at any phase — partial artifacts are kept.
curl -X POST http://localhost:8000/v1/distillation/job_abc123/cancel

GET /v1/distillation//artifacts

Fetch file paths on the engine host for the trained adapter + GGUF exports. Paths are relative to the engine’s OPENTRACY_DATA_DIR.
{
  "adapter_path": "/app/data/distillation/job_abc123/adapter/",
  "gguf_paths": {
    "q4_k_m": "/app/data/distillation/job_abc123/gguf/model-q4_k_m.gguf",
    "q8_0":   "/app/data/distillation/job_abc123/gguf/model-q8_0.gguf"
  },
  "tokenizer_path": "/app/data/distillation/job_abc123/adapter/tokenizer.model",
  "config_path":    "/app/data/distillation/job_abc123/train_config.json"
}

Errors

Statuserror.codeMeaning
400invalid_configUnknown model, missing required field, or bad range.
402insufficient_creditsCost estimate exceeds tenant’s budget.
404job_not_foundjob_id doesn’t exist (or belongs to another tenant).
409job_already_runningAttempted to mutate a terminal job.
500training_errorSubprocess crashed — see logs endpoint for details.