# Quickstart

# Quickstart

You'll get an API key, list the models on offer, and stream your first chat completion.

## 1. Get an API key

Sign in to the portal, open **API keys**, and create one. It looks like `sk-porten-…`. Treat it like a password.

- Portal: [`/build/keys`](/build/keys)
- Keys can be restricted to a region (e.g. EU-only) — see [Regions & data sovereignty](/docs/regions).

Set it in your shell so the examples below work as-is:

```bash
export PORTEN_API_KEY="sk-porten-…"
export PORTEN_BASE_URL="https://porten.ai/v1"
```

## 2. List the models on offer

```bash
curl "$PORTEN_BASE_URL/models" \
  -H "Authorization: Bearer $PORTEN_API_KEY"
```

Every **offered** model is listed, whether or not it's loaded this instant. Each carries an `x_porten` block telling you whether a node is serving it right now (`ready`) or whether it will load on first use.

## 3. Stream a chat completion

```bash
curl "$PORTEN_BASE_URL/chat/completions" \
  -H "Authorization: Bearer $PORTEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-coder-32b",
    "messages": [
      {"role": "system", "content": "You are a concise coding assistant."},
      {"role": "user", "content": "Write a Go function that reverses a string rune-safely."}
    ],
    "stream": true
  }'
```

The response is OpenAI-style Server-Sent Events (`data: {…}` chunks ending in `data: [DONE]`).

> **First call to a cold model takes longer.** If you pick a model that isn't loaded yet, the fleet loads it on demand and your request blocks until it's ready (a big model's first load can take a few minutes while weights download). Subsequent calls are fast. There's nothing special to handle — the request just takes longer. See [Models & on-demand loading](/docs/models).

## 4. Use it from an SDK

Any OpenAI SDK works — just override the base URL. Python:

```python
from openai import OpenAI

client = OpenAI(
    api_key="sk-porten-…",
    base_url="https://porten.ai/v1",
)

stream = client.chat.completions.create(
    model="qwen2.5-coder-32b",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")
```

JavaScript / TypeScript:

```ts
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.PORTEN_API_KEY,
  baseURL: "https://porten.ai/v1",
});

const stream = await client.chat.completions.create({
  model: "qwen2.5-coder-32b",
  messages: [{ role: "user", content: "Hello!" }],
  stream: true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
```

## 5. Try it without writing code

The [playground](/build/playground) lets you chat with any model in the browser, watch a cold model's load progress as a real progress bar, and copy the request back out as a `curl` command.

## Next

- **[API reference](/docs/api-reference)** — every endpoint, parameter, and error code.
- **[Use it from your tools](/docs/integrations)** — wire it into OpenCode, Cursor, LangChain.
