Understanding Providers
Every voice call on Bolti is powered by four categories of providers working together. You don't pick a provider once and live with it — you choose them per agent, and you can mix them however you like.
Caller's audio
│
▼
[STT] Speech-to-Text ── transcribes speech to text
│
▼
[LLM] Large Language Model ── decides what to say (and which tools to call)
│
▼
[TTS] Text-to-Speech ── synthesizes the agent's voice
│
▼
[Telephony] ── carries the call over PSTN/SIP
This page is an opinionated tour of what each category does, who Bolti supports, and how to choose. It's not a full reference — for that, jump to Agent Setup or the relevant deep-dive page.
Why provider choice matters
For a real-time voice agent, you're optimizing for three competing things:
- Latency — how fast the round trip from caller speech → agent reply happens. Anything over ~800ms feels sluggish.
- Quality — how natural the voice sounds, how accurate the transcription is, how good the model's reasoning is.
- Cost — every minute on the phone is paying STT + LLM tokens + TTS characters + telephony minutes.
Providers trade these off differently. Bolti's defaults are tuned to be a reasonable middle ground; you'll change them when you've validated your use case and know which axis you actually need to push on.
STT — Speech-to-Text
Turns the caller's audio into text the LLM can read. The biggest impact on perceived latency in the call (because the LLM can't start thinking until the STT decides the caller is done speaking).
Supported providers
| Provider | Notes |
|---|---|
| Deepgram | Strong default for English and most major languages. Low latency, accurate. |
| AssemblyAI | Great for conversational English with diarization-friendly behavior. |
| Cartesia | Very low latency, English-focused. |
| ElevenLabs | Modern multilingual STT, pairs naturally with their TTS. |
| Azure | Wide language coverage, strong enterprise compliance story. |
| Fennec | Optimized for Indian languages and accents (Hindi, Tamil, Telugu, etc.). |
The exact list of supported models per provider — and which languages each model supports — surfaces in the Speech tab of an agent's settings. Bolti keeps that list in sync with what's actually deployed.
How to choose
- Sticking with English? Deepgram is the safe default.
- Indian-language calls? Fennec or Sarvam-backed STT will outperform global vendors.
- Need ultra-low latency? Cartesia tends to win head-to-head.
- Compliance-driven (e.g. healthcare, finance)? Azure has the broadest certifications.
Configure all of this in the agent's Speech tab. Details: Agent Setup → Speech.
LLM — Large Language Model
The brain. Reads the conversation transcript so far and decides what the agent should say next, and which tools to invoke. This is the biggest knob for the agent's behavior.
Supported providers
| Provider | Models we support |
|---|---|
| OpenAI | GPT-class models (4o-family, o-mini variants, etc.) |
| Google Gemini | Gemini 2 Flash and Pro variants |
| Groq | Llama-family models served on Groq's accelerators (very low latency) |
| DeepSeek | DeepSeek chat and reasoning models |
| Baseten | Custom-deployed open models |
The full live list (with current model IDs) is in the agent's LLM tab.
How to choose
This is mostly a latency vs. capability tradeoff:
- Fastest replies, lowest cost → Groq Llama models. Surprisingly capable for the price; great for scripted-feeling agents.
- Best general reasoning → OpenAI GPT-4o family or Gemini 2 Pro. Use when the agent has to handle ambiguous customer requests or long, branching conversations.
- Mid-tier balanced → Gemini 2 Flash, GPT-4o-mini, DeepSeek chat. Good defaults if you don't know yet.
- Custom fine-tunes → Baseten lets you ship your own model.
The LLM also chooses which tool to call mid-conversation, so its capability directly affects tool-calling reliability. If you're seeing the model fail to invoke tools correctly, upgrading to a stronger LLM is usually the first fix.
Details: Agent Setup → LLM.
Bring your own LLM
Bolti also supports pointing an agent at a custom OpenAI-compatible endpoint — useful if you self-host a model or have negotiated direct vendor pricing. See Customizations → Custom LLMs.
TTS — Text-to-Speech
Speaks the agent's reply. This is the part of the stack callers literally hear, so it has the most direct impact on how "human" your agent feels.
Supported providers
| Provider | Strengths |
|---|---|
| ElevenLabs | Best-in-class realism for English. Huge library of voices. |
| Cartesia | Very low latency, expressive prosody. Strong default for real-time. |
| SarvamAI | Indian-language voices (Hindi, Tamil, Telugu, Bengali, etc.) with native prosody. |
| SmallestAI | Fast, cost-efficient English voices. |
| Inworld | Character-style voices, good for branded experiences. |
How to choose
- Caller-facing English brand voice → ElevenLabs. Often the right choice even if it costs more — it's the difference between sounding like an "AI" and sounding like a person.
- Indian-language outbound → SarvamAI for native-sounding regional voices.
- High-volume cost-sensitive → SmallestAI or Cartesia.
- Real-time / very latency-sensitive → Cartesia.
Bolti ships a curated list of preset voices per provider, with previewable samples in the Voice tab. You can also enter a custom voice ID if you've cloned or trained one with the provider directly.
Details: Agent Setup → Voice.
Telephony — the phone network
Everything above happens over a phone line. The telephony provider is what gives you a number callers can dial, and what carries audio into and out of the realtime room where the agent runs.
Supported options
| Option | What it is |
|---|---|
| Bolti-managed numbers | One-click purchase of a DID inside the dashboard. Bolti handles provisioning, billing, and carrier setup. The fastest way to get on the phone. |
| Twilio / Plivo / Exotel / Vobiz | Bring your own provider account. Connect credentials in Integrations, and your existing numbers become available to assign to agents. |
| BYO SIP trunk (BYOT) | Register a SIP trunk with credentials from any SIP provider. Bring your existing carrier relationship and just use Bolti as the orchestration layer. |
How to choose
- Just want to make a call now? Buy a Bolti-managed number. You'll be live in about 60 seconds.
- Already have Twilio/Plivo/Exotel/Vobiz? Connect it as an integration so you keep your existing numbers, billing, and compliance setup.
- Have your own carrier or SIP provider? Register a SIP trunk (BYOT) and assign DIDs to it.
Details: Telephony → Connect a Provider and SIP Trunking.
How providers are billed
Two models, depending on the provider category:
Bolti-managed (default)
For STT, LLM, TTS, and Bolti-purchased phone numbers, Bolti pays the underlying provider and bills you a transparent per-minute rate that bundles all four. You don't manage API keys for any of these — the platform handles credentials, rotation, and quotas. This is what every account gets out of the box.
See exact rates on the pricing page.
Bring-your-own (BYO) where supported
Some categories let you supply your own credentials so the provider bills you directly:
- LLM — point an agent at a custom OpenAI-compatible endpoint (Custom LLMs).
- Telephony — connect Twilio / Plivo / Exotel / Vobiz, or register a SIP trunk (Telephony → Connect a Provider).
In BYO mode, Bolti charges only the platform fee for orchestration; the provider bills you directly for usage.
STT and TTS are platform-managed today.
Picking your first stack
If you're just getting started and don't know what to pick, this is a sensible default:
| Category | Pick |
|---|---|
| STT | Deepgram (English) or Fennec (Indian languages) |
| LLM | Gemini 2 Flash or GPT-4o-mini — fast, cheap, capable |
| TTS | Cartesia for low latency, ElevenLabs if voice quality is the headline feature |
| Telephony | Bolti-managed number for the fastest path to a real call |
Run a few real test calls with this stack first. Once you have a baseline, swap one provider at a time and listen to the difference.
Next: get your team in
Once you understand the provider stack, the last onboarding step is making sure the right people can collaborate on it. Continue with Invite Your Team →.