Documentation

Quickstart

You'll get an API key, list the models on offer, and stream your first chat completion.

1. Get an API key

Sign in to the portal, open API keys, and create one. It looks like sk-porten-…. Treat it like a password.

Set it in your shell so the examples below work as-is:

export PORTEN_API_KEY="sk-porten-…"
export PORTEN_BASE_URL="https://porten.ai/v1"

2. List the models on offer

curl "$PORTEN_BASE_URL/models" \
  -H "Authorization: Bearer $PORTEN_API_KEY"

Every offered model is listed, whether or not it's loaded this instant. Each carries an x_porten block telling you whether a node is serving it right now (ready) or whether it will load on first use.

3. Stream a chat completion

curl "$PORTEN_BASE_URL/chat/completions" \
  -H "Authorization: Bearer $PORTEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-coder-32b",
    "messages": [
      {"role": "system", "content": "You are a concise coding assistant."},
      {"role": "user", "content": "Write a Go function that reverses a string rune-safely."}
    ],
    "stream": true
  }'

The response is OpenAI-style Server-Sent Events (data: {…} chunks ending in data: [DONE]).

First call to a cold model takes longer. If you pick a model that isn't loaded yet, the fleet loads it on demand and your request blocks until it's ready (a big model's first load can take a few minutes while weights download). Subsequent calls are fast. There's nothing special to handle — the request just takes longer. See Models & on-demand loading.

4. Use it from an SDK

Any OpenAI SDK works — just override the base URL. Python:

from openai import OpenAI

client = OpenAI(
    api_key="sk-porten-…",
    base_url="https://porten.ai/v1",
)

stream = client.chat.completions.create(
    model="qwen2.5-coder-32b",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

JavaScript / TypeScript:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.PORTEN_API_KEY,
  baseURL: "https://porten.ai/v1",
});

const stream = await client.chat.completions.create({
  model: "qwen2.5-coder-32b",
  messages: [{ role: "user", content: "Hello!" }],
  stream: true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

5. Try it without writing code

The playground lets you chat with any model in the browser, watch a cold model's load progress as a real progress bar, and copy the request back out as a curl command.

Next

📄 Reading as a machine? This page is available as raw Markdown at https://porten.ai/docs/quickstart.md — or grab the whole site via llms.txt / llms-full.txt.