Overview

Porten is OpenAI-compatible LLM inference, routed across a fleet of GPU nodes. You point any OpenAI SDK at one base URL, pick a model, and your request is routed to a node that can serve it. If no node has that model loaded yet, the fleet loads it on demand and your request completes once it's ready.

It is built to be EU-sovereign: the Hub and its nodes run on European-owned infrastructure, and API keys can be pinned to a region so traffic never leaves it.

The shape of it

your app ──OpenAI SDK──▶  Porten Hub  ──▶  a node that serves the model
   (base_url + API key)      (routing, billing,        (your Mac, a 3090 box,
                              on-demand loading)         a Thunderbolt cluster…)

There are two sides to the marketplace:

Build — you're a developer. Get an API key, call the API, ship. See the Quickstart.
Earn — you have a GPU. Enroll a node, serve models, get paid for the tokens it produces. See Run a node.

What makes it different

One API, many models. A single endpoint exposes every model the fleet offers — small instruct models, coding models, reasoning models, and frontier MoEs running on multi-machine clusters.
Models load on demand. You can select any offered model even if nothing is serving it this second. The first request triggers the fleet to load it (the playground shows a real download/progress bar; API clients just see a slightly longer first request). Idle models are unloaded automatically to free VRAM. See Models & on-demand loading.
Demand-driven capacity. The Hub measures what's being requested and places models on nodes that fit them — evicting lower-demand models to make room when something is urgently needed.
Self-hostable end to end. The Hub is a single Go binary (Postgres + Redis). Nodes are a thin agent. You can run the whole thing yourself.

Compatibility at a glance

Endpoint	Status
`POST /v1/chat/completions` (streaming + non-streaming)	✅
`POST /v1/embeddings`	✅
`GET /v1/models`	✅

Tool/function calling, JSON mode (response_format), vision (inline data: images), and separated reasoning output (reasoning_content) are supported where the underlying model can do them. Errors follow OpenAI's {"error":{...}} shape. Full details in the API reference.

Next steps

Quickstart — your first request in a few minutes.
Use it from your tools — OpenCode, Cursor, the OpenAI SDKs, LangChain.
Hardware guide — what to build to run good models locally.
Build a combined machine — pool Macs over Thunderbolt 5 for frontier-size models.

For machines: every page here is available as raw Markdown — append .md to any docs URL (e.g. /docs/overview.md). There's also an /llms.txt index and a single-file /llms-full.txt.

📄 Reading as a machine? This page is available as raw Markdown at https://porten.ai/docs/overview.md — or grab the whole site via llms.txt / llms-full.txt.