Quickstart
You'll get an API key, list the models on offer, and stream your first chat completion.
1. Get an API key
Sign in to the portal, open API keys, and create one. It looks like sk-porten-…. Treat it like a password.
- Portal:
/build/keys - Keys can be restricted to a region (e.g. EU-only) — see Regions & data sovereignty.
Set it in your shell so the examples below work as-is:
export PORTEN_API_KEY="sk-porten-…"
export PORTEN_BASE_URL="https://porten.ai/v1"
2. List the models on offer
curl "$PORTEN_BASE_URL/models" \
-H "Authorization: Bearer $PORTEN_API_KEY"
Every offered model is listed, whether or not it's loaded this instant. Each carries an x_porten block telling you whether a node is serving it right now (ready) or whether it will load on first use.
3. Stream a chat completion
curl "$PORTEN_BASE_URL/chat/completions" \
-H "Authorization: Bearer $PORTEN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-coder-32b",
"messages": [
{"role": "system", "content": "You are a concise coding assistant."},
{"role": "user", "content": "Write a Go function that reverses a string rune-safely."}
],
"stream": true
}'
The response is OpenAI-style Server-Sent Events (data: {…} chunks ending in data: [DONE]).
First call to a cold model takes longer. If you pick a model that isn't loaded yet, the fleet loads it on demand and your request blocks until it's ready (a big model's first load can take a few minutes while weights download). Subsequent calls are fast. There's nothing special to handle — the request just takes longer. See Models & on-demand loading.
4. Use it from an SDK
Any OpenAI SDK works — just override the base URL. Python:
from openai import OpenAI
client = OpenAI(
api_key="sk-porten-…",
base_url="https://porten.ai/v1",
)
stream = client.chat.completions.create(
model="qwen2.5-coder-32b",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
JavaScript / TypeScript:
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.PORTEN_API_KEY,
baseURL: "https://porten.ai/v1",
});
const stream = await client.chat.completions.create({
model: "qwen2.5-coder-32b",
messages: [{ role: "user", content: "Hello!" }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
5. Try it without writing code
The playground lets you chat with any model in the browser, watch a cold model's load progress as a real progress bar, and copy the request back out as a curl command.
Next
- API reference — every endpoint, parameter, and error code.
- Use it from your tools — wire it into OpenCode, Cursor, LangChain.
https://porten.ai/docs/quickstart.md — or grab the
whole site via llms.txt / llms-full.txt.