Documentation

API reference

Base URL: https://porten.ai/v1. Authenticate with Authorization: Bearer sk-porten-…. The surface is OpenAI-compatible, so any OpenAI SDK works by overriding base_url.

POST /v1/chat/completions

The core endpoint. Streaming and non-streaming.

Request:

{
  "model": "qwen2.5-coder-32b",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Write a haiku about the aurora." }
  ],
  "temperature": 0.7,
  "max_tokens": 256,
  "stream": false
}

Response (non-streaming):

{
  "id": "chatcmpl-porten-7f3a2b",
  "object": "chat.completion",
  "model": "qwen2.5-coder-32b",
  "choices": [
    { "index": 0,
      "message": { "role": "assistant", "content": "Green fire dances…" },
      "finish_reason": "stop" }
  ],
  "usage": { "prompt_tokens": 32, "completion_tokens": 41, "total_tokens": 73 }
}

Response (streaming, stream: true) — OpenAI-style SSE:

data: {"id":"…","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Green"},"finish_reason":null}]}

data: {"id":"…","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Parameters

Param Status Behaviour
model Canonical id; unknown → 404 model_not_found
messages system / user / assistant / tool roles
stream SSE chunks
stream_options.include_usage Hub fills usage from its own count
max_tokens / max_completion_tokens Both accepted
temperature, top_p, stop, seed ⚠️ Passed to the engine; seed honoured only if it supports it
presence_penalty, frequency_penalty ⚠️ Passed through; ignored if the engine lacks them
response_format JSON mode / JSON schema, forwarded to the engine (best-effort per engine)
tools, tool_choice Routed to a model whose catalog entry declares tools (the same capability /v1/models shows); response carries tool_calls + finish_reason: "tool_calls". A model without tool support → 400 (clear message), not a capacity error
content with image_url Inline data: images forwarded to vision models. Remote http(s) image URLs are not fetched (SSRF protection) — inline them as data URLs
n ⚠️ Only n=1; n>1400 unsupported_parameter
user, metadata Logged for usage/abuse

Principle: unknown convenience params are ignored silently (forward-compatible); params that would change semantics but can't be honoured (n>1) are rejected with 400 rather than silently producing the wrong result.

Reasoning models

Models that "think" (e.g. DeepSeek-R1 family) return their reasoning separately as reasoning_content (a delta field in streaming), kept distinct from the answer text — so you can show or hide the chain of thought.

Budget enough tokens. Reasoning is generated before the answer and counts against max_tokens. With a small cap the model can spend the whole budget thinking and you get finish_reason: "length" with empty content. For reasoning models, set max_tokens to at least 4096 (more for hard prompts).

POST /v1/embeddings

{ "model": "nomic-embed-text", "input": ["text to embed", "and another"] }
{
  "object": "list",
  "data": [
    { "object": "embedding", "index": 0, "embedding": [0.0123, -0.045] },
    { "object": "embedding", "index": 1, "embedding": [0.0210, -0.011] }
  ],
  "model": "nomic-embed-text",
  "usage": { "prompt_tokens": 12, "total_tokens": 12 }
}

GET /v1/models

Every offered model, aggregated and deduplicated across the fleet.

{
  "object": "list",
  "data": [
    { "id": "qwen2.5-coder-32b", "object": "model", "owned_by": "porten",
      "x_porten": { "ready": true, "type": "chat", "ctx": 32768 } },
    { "id": "qwen3-coder-next", "object": "model", "owned_by": "porten",
      "x_porten": { "ready": false, "type": "chat", "ctx": 262144 } }
  ]
}

ready: false means the model is offered but not loaded this instant — your first request will trigger an on-demand load. See Models & on-demand loading.

Headers

Header Direction Note
Authorization: Bearer sk-porten-… in Required
X-Request-Id out Correlation id, echoed in logs
X-Porten-Node out Which node served the request
Retry-After out On 429 / 503

Errors

All errors follow OpenAI's format: {"error":{"message","type","code","param"}}.

HTTP code Meaning
401 invalid_api_key Invalid or revoked key
403 model_not_allowed The key may not use this model (region/policy)
404 model_not_found No node advertises this model and it isn't offered
429 rate_limit_exceeded Quota/rate exhausted (Retry-After)
502 node_error All candidate nodes failed
503 no_available_node / model_warming Model exists but no free/healthy node, or it's still loading (Retry-After)
504 gateway_timeout Total timeout exceeded

A 503 model_warming is expected the first time you hit a cold model and the load takes longer than the request's warm-up budget. Retry — it'll be ready shortly. Most clients won't see it because the request blocks until the model is ready.

📄 Reading as a machine? This page is available as raw Markdown at https://porten.ai/docs/api-reference.md — or grab the whole site via llms.txt / llms-full.txt.