Understanding Providers

Every voice call on Bolti is powered by four categories of providers working together. You don't pick a provider once and live with it — you choose them per agent, and you can mix them however you like.

Caller's audio
   │
   ▼
[STT]  Speech-to-Text         ── transcribes speech to text
   │
   ▼
[LLM]  Large Language Model   ── decides what to say (and which tools to call)
   │
   ▼
[TTS]  Text-to-Speech         ── synthesizes the agent's voice
   │
   ▼
[Telephony]                   ── carries the call over PSTN/SIP

This page is an opinionated tour of what each category does, who Bolti supports, and how to choose. It's not a full reference — for that, jump to Agent Setup or the relevant deep-dive page.

Why provider choice matters

For a real-time voice agent, you're optimizing for three competing things:

Latency — how fast the round trip from caller speech → agent reply happens. Anything over ~800ms feels sluggish.
Quality — how natural the voice sounds, how accurate the transcription is, how good the model's reasoning is.
Cost — every minute on the phone is paying STT + LLM tokens + TTS characters + telephony minutes.

Providers trade these off differently. Bolti's defaults are tuned to be a reasonable middle ground; you'll change them when you've validated your use case and know which axis you actually need to push on.

STT — Speech-to-Text

Turns the caller's audio into text the LLM can read. The biggest impact on perceived latency in the call (because the LLM can't start thinking until the STT decides the caller is done speaking).

Supported providers

Provider	Notes
Deepgram	Strong default for English and most major languages. Low latency, accurate.
AssemblyAI	Great for conversational English with diarization-friendly behavior.
Cartesia	Very low latency, English-focused.
ElevenLabs	Modern multilingual STT, pairs naturally with their TTS.
Azure	Wide language coverage, strong enterprise compliance story.
Fennec	Optimized for Indian languages and accents (Hindi, Tamil, Telugu, etc.).

The exact list of supported models per provider — and which languages each model supports — surfaces in the Speech tab of an agent's settings. Bolti keeps that list in sync with what's actually deployed.

How to choose

Sticking with English? Deepgram is the safe default.
Indian-language calls? Fennec or Sarvam-backed STT will outperform global vendors.
Need ultra-low latency? Cartesia tends to win head-to-head.
Compliance-driven (e.g. healthcare, finance)? Azure has the broadest certifications.

Configure all of this in the agent's Speech tab. Details: Agent Setup → Speech.

LLM — Large Language Model

The brain. Reads the conversation transcript so far and decides what the agent should say next, and which tools to invoke. This is the biggest knob for the agent's behavior.

Supported providers

Provider	Models we support
OpenAI	GPT-class models (4o-family, o-mini variants, etc.)
Google Gemini	Gemini 2 Flash and Pro variants
Groq	Llama-family models served on Groq's accelerators (very low latency)
DeepSeek	DeepSeek chat and reasoning models
Baseten	Custom-deployed open models

The full live list (with current model IDs) is in the agent's LLM tab.

How to choose

This is mostly a latency vs. capability tradeoff:

Fastest replies, lowest cost → Groq Llama models. Surprisingly capable for the price; great for scripted-feeling agents.
Best general reasoning → OpenAI GPT-4o family or Gemini 2 Pro. Use when the agent has to handle ambiguous customer requests or long, branching conversations.
Mid-tier balanced → Gemini 2 Flash, GPT-4o-mini, DeepSeek chat. Good defaults if you don't know yet.
Custom fine-tunes → Baseten lets you ship your own model.

The LLM also chooses which tool to call mid-conversation, so its capability directly affects tool-calling reliability. If you're seeing the model fail to invoke tools correctly, upgrading to a stronger LLM is usually the first fix.

Details: Agent Setup → LLM.

Bring your own LLM

Bolti also supports pointing an agent at a custom OpenAI-compatible endpoint — useful if you self-host a model or have negotiated direct vendor pricing. See Customizations → Custom LLMs.

TTS — Text-to-Speech

Speaks the agent's reply. This is the part of the stack callers literally hear, so it has the most direct impact on how "human" your agent feels.

Supported providers

Provider	Strengths
ElevenLabs	Best-in-class realism for English. Huge library of voices.
Cartesia	Very low latency, expressive prosody. Strong default for real-time.
SarvamAI	Indian-language voices (Hindi, Tamil, Telugu, Bengali, etc.) with native prosody.
SmallestAI	Fast, cost-efficient English voices.
Inworld	Character-style voices, good for branded experiences.

How to choose

Caller-facing English brand voice → ElevenLabs. Often the right choice even if it costs more — it's the difference between sounding like an "AI" and sounding like a person.
Indian-language outbound → SarvamAI for native-sounding regional voices.
High-volume cost-sensitive → SmallestAI or Cartesia.
Real-time / very latency-sensitive → Cartesia.

Bolti ships a curated list of preset voices per provider, with previewable samples in the Voice tab. You can also enter a custom voice ID if you've cloned or trained one with the provider directly.

Details: Agent Setup → Voice.

Telephony — the phone network

Everything above happens over a phone line. The telephony provider is what gives you a number callers can dial, and what carries audio into and out of the realtime room where the agent runs.

Supported options

Option	What it is
Bolti-managed numbers	One-click purchase of a DID inside the dashboard. Bolti handles provisioning, billing, and carrier setup. The fastest way to get on the phone.
Twilio / Plivo / Exotel / Vobiz	Bring your own provider account. Connect credentials in Integrations, and your existing numbers become available to assign to agents.
BYO SIP trunk (BYOT)	Register a SIP trunk with credentials from any SIP provider. Bring your existing carrier relationship and just use Bolti as the orchestration layer.

How to choose

Just want to make a call now? Buy a Bolti-managed number. You'll be live in about 60 seconds.
Already have Twilio/Plivo/Exotel/Vobiz? Connect it as an integration so you keep your existing numbers, billing, and compliance setup.
Have your own carrier or SIP provider? Register a SIP trunk (BYOT) and assign DIDs to it.

Details: Telephony → Connect a Provider and SIP Trunking.

How providers are billed

Two models, depending on the provider category:

Bolti-managed (default)

For STT, LLM, TTS, and Bolti-purchased phone numbers, Bolti pays the underlying provider and bills you a transparent per-minute rate that bundles all four. You don't manage API keys for any of these — the platform handles credentials, rotation, and quotas. This is what every account gets out of the box.

See exact rates on the pricing page.

Bring-your-own (BYO) where supported

Some categories let you supply your own credentials so the provider bills you directly:

LLM — point an agent at a custom OpenAI-compatible endpoint (Custom LLMs).
Telephony — connect Twilio / Plivo / Exotel / Vobiz, or register a SIP trunk (Telephony → Connect a Provider).

In BYO mode, Bolti charges only the platform fee for orchestration; the provider bills you directly for usage.

STT and TTS are platform-managed today.

Picking your first stack

If you're just getting started and don't know what to pick, this is a sensible default:

Category	Pick
STT	Deepgram (English) or Fennec (Indian languages)
LLM	Gemini 2 Flash or GPT-4o-mini — fast, cheap, capable
TTS	Cartesia for low latency, ElevenLabs if voice quality is the headline feature
Telephony	Bolti-managed number for the fastest path to a real call

Run a few real test calls with this stack first. Once you have a baseline, swap one provider at a time and listen to the difference.

Next: get your team in

Once you understand the provider stack, the last onboarding step is making sure the right people can collaborate on it. Continue with Invite Your Team →.

Why provider choice matters​

STT — Speech-to-Text​

Supported providers​

How to choose​

LLM — Large Language Model​

Supported providers​

How to choose​

Bring your own LLM​

TTS — Text-to-Speech​

Supported providers​

How to choose​

Telephony — the phone network​

Supported options​

How to choose​

How providers are billed​

Bolti-managed (default)​

Bring-your-own (BYO) where supported​

Picking your first stack​

Next: get your team in​

Why provider choice matters

STT — Speech-to-Text

Supported providers

How to choose

LLM — Large Language Model

Supported providers

How to choose

Bring your own LLM

TTS — Text-to-Speech

Supported providers

How to choose

Telephony — the phone network

Supported options

How to choose

How providers are billed

Bolti-managed (default)

Bring-your-own (BYO) where supported

Picking your first stack

Next: get your team in