Overview
Set up in four steps
- Mint a token for a seat — a principal (who), a tenant (their org), and a spend cap.
- Hand over the secret — it is shown once, on creation. The user pastes it into the desktop app (Settings → Proxy key) or sends it as
Authorization: Bearer. - Set an org pool — a shared ceiling across every seat in the org, on top of each seat's own cap.
- Watch usage — durable per-seat and per-org spend for the current window, across all replicas.
Data-plane endpoints
Every call authenticates with a minted token as Authorization: Bearer zmc-….
POST /v1/messages- Anthropic-Messages-compatible chat, streaming included.
POST /v1/embeddings- OpenAI-compatible embeddings (when configured).
GET /v1/models- Models this deployment advertises.
GET /v1/budget- The caller's own seat + org budget status — apps draw budget bars from it.
New token — copy it now
Tokens
| Principal | Tenant | Budget | Spent | Remaining | Expires | Prefix | Status |
|---|
No tokens yet.
Org pool budgets
A shared ceiling across every seat in an org, on top of each token's own cap. A request is blocked if either the seat cap or the org pool is exhausted.
| Tenant | Pool | Spent | Remaining | Window |
|---|
No org pool budgets set.
Durable rolling-window spend, per org and per seat — consistent across replicas.
No usage yet — mint a token and make a call.
Models
Limits & defaults
Platform
This view is read-only: values are set with PROXY_*
environment variables on the deployment and change via a rollout,
not from the console.