How Basis works

One inference call travels from an authenticated app or agent, through the API gateway — which prices the job and reserves $BASIS credits — to either a hosted upstream backend or a contributor GPU and back, leaving behind a receipt, a credit debit, and (when a contributor served it) a worker reward that a keeper later settles on Base. Here is that path, step by step.

End-to-end flow

The numbered path from a request to a settled reward batch.

text

1.  USER / AGENT APP
      Authenticates (a Privy token or an sk-basis API key) and builds an
      OpenAI-style chat-completions request (model + messages, stream: true),
      then POSTs it to the Basis API.

2.  BASIS API  (/api/v1/chat/completions, on Vercel)
      Validates the body, resolves the model, prices the job at the active
      pricing epoch, and reserves credits. The caller funds those credits in
      $BASIS — paying directly or routing ETH/WETH/USDC into $BASIS at payment
      time, out-of-band. No inference backend configured -> structured
      runtime_pending (503).

3.  INFERENCE BACKEND
      The request is served by EITHER a hosted OpenAI-compatible upstream the
      gateway proxies to (the canonical serverless backend) OR — self-hosted —
      a long-lived orchestrator that matches the job to a contributor GPU
      worker. Exactly one backend serves a request.

4.  STREAMED TOKENS
      Output tokens relay to the caller as Server-Sent Events (SSE) chunks,
      terminated by data: [DONE].

5.  RECEIPT
      On completion the server writes a canonical-JSON, SHA-256-hashed
      receipt — using SERVER-counted output tokens, carrying the locked pricing
      snapshot — into the receipt ledger. Duplicate jobId / receiptHash is
      rejected here.

6.  CREDIT DEBIT
      The reserved credits are settled against the deterministic (bigint)
      charge; a failed job is not charged.

7.  REWARD LEDGER
      When a contributor worker served the job, its reward is computed and
      accrued to the reward ledger (no on-chain write in this hot path). A
      gateway-proxied job has no worker wallet, so it is metered, not paid.

8.  SETTLEMENT BATCH ON BASE
      A keeper batches accrued rewards and settles them idempotently on
      Base — a batchHash settles once; failed batches stay recoverable.

      Bankr is NOT in this path. It is only the platform used to LAUNCH the
      $BASIS token on Base; after launch it offers read-only fee observation
      and operator-signed fee claims — it never serves inference, prices a job,
      reserves credits, pays a worker, or settles a receipt.

Request intake

A request arrives at /api/v1/chat/completions, a Vercel-hosted route. The gateway authenticates the caller (a Privy token or an sk-basis API key), validates the OpenAI-shaped body (model and messages, with stream optional), prices the job at the active pricing epoch, and reserves the caller's $BASIS credits before serving it. If a hosted upstream backend is configured the gateway proxies and streams; otherwise it returns a structured runtime_pending error in the OpenAI error shape — the contract is real, the backend is honestly absent.

Credits are funded in $BASIS out-of-band — the payer either holds $BASIS or routes ETH, WETH, or USDC into $BASIS at payment time through the payment router (user-signed, no custody). Funding a balance is a distinct flow from running a completion; a caller waiting for tokens never waits on a swap.

Model selection

The requested model is resolved against the network model registry. An empty model falls back to basis-default; an unknown model is rejected. Each model carries a pricing/reward multiplier and a context window, and is marked planned until the runtime confirms a worker actually serving it — so a model is never advertised as live merely because it appears in the registry.

Worker matching

On the contributor-mesh path (the alternative to a hosted upstream), the orchestrator holds the live worker registry. To place a job it considers only workers that are idle and currently serving the requested model, then picks one by weighted-random selection — weighted by each candidate's measured tokens-per-second. Faster workers are proportionally more likely to be chosen, while randomness spreads load and avoids starving slower contributors. A worker proves it is reachable by heartbeating; one that stops beating drops out of the idle set.

Streaming

Output tokens relay back to the caller as Server-Sent Events. Each chunk is an OpenAI-shaped chat.completion.chunk carrying a content delta, and the stream terminates with a final stop chunk followed by data: [DONE]. Non-streaming requests instead await the full completion and return a single OpenAI-shaped body. Either way the caller never has to learn a new wire format.

Authoritative token accounting

The server counts output tokens as they stream, and that server count is authoritative. A worker-reported count is advisory only and never becomes the billing or reward source of truth — it cannot inflate what a user is charged or what a worker earns. All downstream accounting uses the server count.

Invariant

Server-counted output tokens are authoritative; worker-reported counts are advisory. Charges and rewards are computed in $BASIS base units with integer (bigint) math — no floating point in the money path.

Receipt creation

On completion the server builds a receipt: a canonical-JSON record of the job — model, server-counted prompt and output tokens, the charge, and the worker reward — hashed with SHA-256. Writing the receipt is the idempotency gate: a duplicate jobId or receiptHash is rejected before any credit or reward mutation, so a job can never be settled — or paid — twice. Anyone can re-derive the hash from the receipt body.

See inference receipts for the full field list.

Reward calculation

The worker reward is computed from the server-counted tokens, the model multiplier, and the deterministic reward formula, then accrued to the reward ledger against the worker's EVM address. Jobs served by the gateway itself carry no contributor wallet and therefore accrue no reward — they are metered, not paid. A failed job is not charged to the user and produces no worker reward.

Settlement batching

Settlement is deliberately separate from the inference hot path: no on-chain write happens while a user is waiting for tokens. Instead a keeper periodically batches accrued rewards from the reward ledger, computes a batch hash, and settles the batch on Base. Settlement is idempotent — a given batch hash settles exactly once, and a batch that fails mid-settlement stays recoverable rather than double-paying.

See settlement for batch formation and keeper operation.

Pending states

The flow above is what runs when everything is configured. Several pieces render their real state until configuration is supplied.

pending
Inference backend / worker path
With no backend configured, intake returns runtime_pending; the orchestrator + worker path runs locally or self-hosted, with a hosted deployment pending.
process-local
Persistence
Ledgers and receipts live in a process-local, non-durable store that resets on a cold start until a durable store is configured.
pending
On-chain settlement
The keeper can run as a dry-run and compute batch hashes, but it does not write on-chain until the reward distributor address is configured.