Quickstart

You'll get an API key, list the models on offer, and stream your first chat completion.

1. Get an API key

Portal: /build/keys
Keys can be restricted to a region (e.g. EU-only) — see Regions & data sovereignty.

Set it in your shell so the examples below work as-is:

export PORTEN_API_KEY="sk-porten-…"
export PORTEN_BASE_URL="https://porten.ai/v1"

2. List the models on offer

curl "$PORTEN_BASE_URL/models" \
  -H "Authorization: Bearer $PORTEN_API_KEY"

Every offered model is listed, whether or not it's loaded this instant. Each carries an x_porten block telling you whether a node is serving it right now (ready) or whether it will load on first use.

3. Stream a chat completion

curl "$PORTEN_BASE_URL/chat/completions" \
  -H "Authorization: Bearer $PORTEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-coder-32b",
    "messages": [
      {"role": "system", "content": "You are a concise coding assistant."},
      {"role": "user", "content": "Write a Go function that reverses a string rune-safely."}
    ],
    "stream": true
  }'

The response is OpenAI-style Server-Sent Events (data: {…} chunks ending in data: [DONE]).

First call to a cold model takes longer. If you pick a model that isn't loaded yet, the fleet loads it on demand and your request blocks until it's ready (a big model's first load can take a few minutes while weights download). Subsequent calls are fast. There's nothing special to handle — the request just takes longer. See Models & on-demand loading.

4. Use it from an SDK

Any OpenAI SDK works — just override the base URL. Python:

from openai import OpenAI

client = OpenAI(
    api_key="sk-porten-…",
    base_url="https://porten.ai/v1",
)

stream = client.chat.completions.create(
    model="qwen2.5-coder-32b",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

JavaScript / TypeScript:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.PORTEN_API_KEY,
  baseURL: "https://porten.ai/v1",
});

const stream = await client.chat.completions.create({
  model: "qwen2.5-coder-32b",
  messages: [{ role: "user", content: "Hello!" }],
  stream: true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

5. Generate an image

Same key, same base URL — text-to-image on EU-sovereign GPUs. The image comes back base64-encoded in data[].b64_json:

curl "$PORTEN_BASE_URL/images/generations" \
  -H "Authorization: Bearer $PORTEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "flux.1-schnell",
    "prompt": "a stylized editorial illustration of a lakeside town hall, flat vector, muted colors",
    "size": "1024x1024"
  }' | jq -r '.data[0].b64_json' | base64 -d > out.png

One image per request (n=1); see the API reference for the full field list. (Image generation is rolling out — availability depends on a node serving an image model.)

6. Try it without writing code

The playground lets you chat with any model — or generate an image from a prompt — in the browser, watch a cold model's load progress as a real progress bar, and copy the request back out as a curl command.

API reference — every endpoint, parameter, and error code.
Use it from your tools — wire it into OpenCode, Cursor, LangChain.

📄 Reading as a machine? This page is available as raw Markdown at https://porten.ai/docs/quickstart.md — or grab the whole site via llms.txt / llms-full.txt.