User guide

Go from zero to a streamed completion against the Basis inference API. Basis is OpenAI-compatible, so each step is the same call you already know — just pointed at the Basis base URL.

Choose a model

Pick a model id for the model field. Start with basis-default — it is the baseline. Use basis-small for cheaper, lighter calls or basis-large for a larger context window.

Model	Context	Multiplier	Status
basis-small	16,384	0.50x	soon
basis-default	32,768	1.00x	soon
basis-large	131,072	3.00x	soon

The multiplier scales deterministic per-token accounting (1.00x baseline). Full model details are in the API reference.

To manage keys, credits, receipts, and a worker, sign in on basis.watch with Privy — connect a wallet or use an email. Reading the docs, the data, and the public pages needs no account; you only sign in to manage your own things.

01Sign in with a wallet or email (Privy).
02Open /dashboard — your DID, linked wallet, credits, keys, receipts, and worker.
03Link a wallet so you can fund credits and (optionally) register a worker against it.
04Check your credit balance under billing.

Full sign-in, dashboard, and linked-wallet details are on the authentication page.

Get an API key

Once signed in, mint an API key in the dashboard. The raw key is shown once — copy it immediately, because Basis stores only a peppered hash and cannot show it again. Authenticate with it as a bearer token prefixed sk-basis- in the Authorization header, just like OpenAI.

http

Authorization: Bearer sk-basis-...

keys pending

API key issuance is pending. Pre-launch the API is open, so a key is accepted but not required — you can send a placeholder sk-basis-... today and follow the steps below.

Check launch & runtime status

Before sending traffic, confirm what is configured. The network exposes its honest state at GET /api/launch-status — a fresh deployment with nothing configured reads as pending, never a promise.

bash

curl -s https://basis.watch/api/launch-status

The same response reports the inference runtime mode under inferenceRuntime, which mirrors the server's inferenceMode(): "proxy" when an inference backend is configured (models report available: true), or "pending" when none is. Right now the runtime is "pending" and models report available: false.

Send your first request

Post a model and a messages array to /api/v1/chat/completions. This is the standard OpenAI chat-completions call.

bash

curl https://basis.watch/api/v1/chat/completions \
  -H "Authorization: Bearer sk-basis-..." \
  -H "Content-Type: application/json" \
  -d '{"model":"basis-default","messages":[{"role":"user","content":"Summarize this for an agent."}]}'

If a backend is configured you get an OpenAI-shaped completion. If not, you get a structured runtime_pending (503) — see Handle a pending-runtime response.

Stream the result

Set stream=True to receive tokens as Server-Sent Events. The OpenAI SDK handles the wire format and the terminating [DONE] sentinel for you — iterate the chunks and print the deltas.

python

from openai import OpenAI

client = OpenAI(base_url="https://basis.watch/api/v1", api_key="sk-basis-...")

stream = client.chat.completions.create(
    model="basis-default",
    messages=[{"role": "user", "content": "Write a one paragraph summary."}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

The streaming wire format is documented in the API reference.

Look up a receipt

Every settled job produces a verifiable inference receipt — a canonical-JSON, SHA-256-hashed accounting record. Fetch one by its hash; the response includes the receipt and a verified boolean from a server-side re-derivation of the hash.

bash

curl -s https://basis.watch/api/inference/receipts/<receipt_hash>

An unknown hash returns 404 not_found. The full receipt schema and the hashing scheme are in Inference receipts.

Understand credits

Usage is metered with deterministic per-token accounting, denominated in planned $BASIS credits and computed with integer math — no floating point in the money path. A model's multiplier scales its per-token cost. Basis does not publish a USD price.

deterministic

See Credits for the per-token formula and the reserve → debit → release → refund lifecycle. $BASIS is planned, not live. soon

Handle a pending-runtime response

When no inference backend is configured, chat completions return 503 with a structured runtime_pending error in the OpenAI error envelope. This is expected: the API contract is live, the worker backend is pending. Detect it by status code plus error.code — "runtime_pending" — and surface a clear message instead of retrying blindly.

python

import httpx

resp = httpx.post(
    "https://basis.watch/api/v1/chat/completions",
    headers={"Authorization": "Bearer sk-basis-..."},
    json={"model": "basis-default",
          "messages": [{"role": "user", "content": "ping"}]},
)
if resp.status_code == 503:
    err = resp.json()["error"]
    if err["code"] == "runtime_pending":
        print("Runtime backend is pending — try again once a backend is configured.")
else:
    resp.raise_for_status()
    print(resp.json())

Why this happens

Basis ships the OpenAI-compatible contract independently of any hosted backend. Until an upstream backend is configured (inferenceMode() === "proxy"), the route is honest about being unable to serve tokens rather than returning fabricated output. Poll runtime status to know when it flips to configured.

Choose a model

Sign in & dashboard

Get an API key

Check launch & runtime status

Send your first request

Stream the result

Look up a receipt

Understand credits

Handle a pending-runtime response