Platform Concepts

This page is the mental model for Bolti. Read it once and the rest of the docs make sense in half the time.

We'll walk through the four big ideas in order — the voice pipeline, the agent, the resource model (organizations, workspaces, etc.), and the runtime that ties it all together.

1. The voice pipeline

Every Bolti call runs the same loop, many times per second:

Caller's voice  ──▶  STT (Speech-to-Text)
                       │
                       ▼
                Transcript text
                       │
                       ▼
            LLM (with system prompt + tools)
                       │
                       ▼
                Reply text  ──▶  Tool calls (optional)
                       │            │
                       │            ▼
                       │      Tool results fed back to LLM
                       ▼
              TTS (Text-to-Speech)
                       │
                       ▼
              Synthesized audio  ──▶  Caller's ear

Three components do the heavy lifting:

Component	Role	Common providers
STT	Transcribes the caller's audio in real time	Deepgram, AssemblyAI, ElevenLabs, Cartesia, Azure
LLM	Picks the next thing to say (and which tools to call)	OpenAI, Gemini, Groq, Baseten, DeepSeek
TTS	Synthesizes the agent's reply back into audio	Cartesia, ElevenLabs, SarvamAI, SmallestAI, Inworld

Around those three, Bolti adds the things that make a voice call feel natural:

Voice activity detection (VAD) — figures out when the caller has stopped speaking
Turn detection — decides when it's the agent's turn to talk
Interruption handling — lets the caller cut the agent off mid-sentence
Telephony noise cancellation — strips line noise so STT works on real phone lines
Streaming — STT, LLM, and TTS all stream so the agent can start replying before the caller has fully finished

Every one of these is configurable per agent. See Agent Setup for the per-tab breakdown.

2. The agent

An agent is a configured instance of the voice pipeline plus the prompt and tools that define its behavior. In the Bolti data model, an agent is a single row with:

Identity: name, agent ID, active flag
Behavior: system prompt, first message (greeting), persona, language, timezone, goal, guardrails
Pipeline: chosen LLM provider/model, TTS provider/voice, STT provider/model/language, plus speed/pitch/volume tuning
Capabilities: tools attached to it (built-in + workspace HTTP tools), knowledge bases, dynamic context
Telephony: phone numbers assigned for inbound calls

You configure all of this in the dashboard's Settings tabs, or via the API (POST /workspaces/{ws}/agents).

Agents are the unit of deployment

When you change an agent, the change applies to the next call — in-flight calls finish on their old configuration. There's no "deploy" step. No rebuild. No restart. The runtime fetches the current agent config when each call starts.

This is deliberate: it means you can iterate live without affecting current callers, and that the dashboard, API, and MCP server are all just CRUD on the same agents resource.

A single agent runs many concurrent calls

One agent can handle hundreds of simultaneous calls. The agent is a configuration, not a process. The runtime spins up a fresh session per call, each isolated from the others, each with its own conversation history.

Concurrency limits apply at the workspace level (your tier), not per-agent.

3. The resource model

Bolti is built for teams, agencies, and platforms — not just solo developers. The resource hierarchy reflects that:

Organization
  └── Workspace 1
  │     ├── Agents
  │     ├── Phone numbers
  │     ├── Workspace tools (HTTP)
  │     ├── Knowledge bases
  │     ├── Call logs / recordings / transcripts
  │     └── SIP trunks
  └── Workspace 2
        └── ... (same shape)

Organization

Your top-level account. Owns billing, members, roles, and (on Enterprise) sub-accounts. Everyone who joins your team joins your organization.

Workspace

A scoped sandbox inside the organization. Almost every resource in Bolti — agents, phone numbers, tools, call logs — belongs to a single workspace. Workspaces are the natural way to:

Separate environments (production vs staging)
Separate customers (one workspace per client, for agencies)
Separate teams (sales vs support)

Members get permissions per workspace. Resources don't cross workspaces — you can move agents between workspaces explicitly, but they can't be shared.

Members and roles

Members live at the organization level. Each member has a role per workspace (e.g. owner, admin, member, viewer). Roles control what you can edit, who you can invite, and whether you can manage billing.

Sub-accounts (Enterprise)

Sub-accounts are a layer above organizations, designed for platforms reselling Bolti to their own customers. Each sub-account gets its own org, full data isolation, separate concurrency limits, and rolls up into the parent org for unified billing. See Enterprise → Sub-Accounts.

4. The runtime

This is the part most teams never need to think about — but if you're evaluating Bolti for production use, here's how it works.

Every call is a realtime room

Every call — phone call or browser preview — runs as a realtime call room. The caller (a SIP participant for phone calls, a WebRTC participant for browser preview) joins the room. A Python agent worker also joins the room and runs the voice pipeline.

The agent worker:

Receives the caller's audio frame-by-frame from the room
Streams it to the configured STT provider
Sends transcripts to the configured LLM with the agent's system prompt
Streams the LLM's reply to the configured TTS provider
Publishes the synthesized audio back into the room

This room-per-call design buys us a few things:

WebRTC and SIP look identical to the agent. The same code handles a browser preview and a real PSTN call.
Recording is a separate process. A recording worker subscribes to the room and writes audio to object storage in parallel — it doesn't slow down the call.
Horizontal scaling is just more workers. Incoming calls are dispatched across the worker pool; we run as many replicas as concurrent calls demand.
Calls survive backend restarts. The backend is for control-plane CRUD; the actual audio flows through the realtime stack, which is its own service.

How a call lifecycle looks

For a phone call:

Inbound SIP INVITE arrives at the trunk → the realtime stack accepts it
Dispatch rule routes the SIP participant into a fresh room
Backend issues a token for the agent (with agent_id in metadata) and dispatches the agent
Agent worker picks up the dispatch, fetches the agent's current config, builds the pipeline, joins the room
Conversation runs — STT/LLM/TTS streaming, tool calls firing, transcripts appended to the call log
Recording worker writes the audio to private object storage in parallel
On hangup, webhooks update the call log status; the recording URL is finalized; the conversation is queryable from the API

For an outbound call:

The backend asks the realtime stack to dial out (creating a SIP participant) instead of receiving an INVITE.
Everything else is identical.

For a browser preview:

The frontend is the participant (WebRTC, not SIP).
Everything else is identical.

Recording, transcripts, and call logs

Recordings are written to private object storage during the call by the egress worker. Playback in the dashboard happens via short-lived signed URLs (default 15 minutes).
Transcripts are streamed live during the call — every speech turn is appended to the conversation row as it commits.
Call logs combine status, duration, transcript, recording URL, and any extracted data into a single record. Available in the dashboard, via the API, and pushable via webhooks.

See Conversation Intelligence for the full surface.

5. The control surfaces

Three different ways to drive the same underlying platform — pick whichever fits your workflow:

Surface	What it's for	Audience
Dashboard (app.bolti.co.in)	Visual CRUD for agents, numbers, tools, logs. Browser-based testing.	Everyone on the team
REST API (api.bolti.co.in/reference)	Programmatic access from any language. Automation, integrations, custom UIs.	Developers, integrations
MCP server (Bolti MCP)	Drive Bolti from Cursor, Claude Desktop, any MCP client. Agent operations from inside your editor.	Developers using AI tools

All three talk to the same backend, see the same data, and respect the same workspace permissions. There's no "primary" surface — pick what fits the task.

6. The mental shortcut

If you remember three things:

An agent is a configuration, not a process. Edit it, save it, and the change applies to the next call. No deploy.
Workspaces are the boundary for almost everything — agents, numbers, tools, logs, members.
Phone calls and browser previews are the same thing to the runtime. If it works in Preview, it works on a phone (modulo audio codec quality).

That's the whole platform. Everything else in these docs is just the details of which knob does what.

Where to next

Build something: Quick Start →
Configure an agent end-to-end: Agent Setup overview →
Understand the data model: Workspaces & Organizations →
Explore the API: API Reference →

1. The voice pipeline​

2. The agent​

Agents are the unit of deployment​

A single agent runs many concurrent calls​

3. The resource model​

Organization​

Workspace​

Members and roles​

Sub-accounts (Enterprise)​

4. The runtime​

Every call is a realtime room​

How a call lifecycle looks​

Recording, transcripts, and call logs​

5. The control surfaces​

6. The mental shortcut​

Where to next​