Speech Tab
The Speech tab controls how your agent hears — which speech-to-text (STT) engine transcribes the caller's audio in real time.
There are three settings:
| Field | What it controls |
|---|---|
| STT Provider | The transcription service (Deepgram, AssemblyAI, ElevenLabs, etc.) |
| STT Model | The specific model within that provider (e.g. nova-3, universal-streaming) |
| STT Language | The expected language of the caller's audio |
Switching the provider auto-selects that provider's first model — change the model right after if needed.
Supported providers
| Provider | Models | Notes |
|---|---|---|
| Deepgram | nova-3, nova-2 | Default. Excellent latency, broad language support including Hindi and Indian English |
| AssemblyAI | universal-streaming, universal-streaming-multilingual | High accuracy, especially for English and Spanish |
| ElevenLabs | scribe_v2_realtime | Real-time scribe with very low latency |
| Cartesia | ink-whisper | Whisper-based, supports 90+ languages |
| Azure | default | Microsoft Azure Speech, enterprise-grade |
| Fennec | fennec-asr | Specialty model |
The default for new agents is Deepgram / nova-3, which is the right choice for the vast majority of use cases — fast, accurate, multilingual, and battle-tested.
STT Language
Sets the expected language code for the model. Examples:
en— generic Englishen-IN— Indian English (better for Indian accents)en-US,en-GB,en-AU— region-specific Englishhi— Hindies,es-MX,es-ES— Spanish variantsmulti— multilingual mode (Deepgram nova-3 only; auto-detects)
The dropdown shows every language code supported by the currently-selected provider/model combination.
When you change Primary Language on the Basic tab, Bolti automatically resets the Speech tab to Deepgram / nova-3 with the matching language code. Only override here if you've measured a real transcription problem.
Picking the right combination
Use this decision tree:
- Single language, English or major European? → Deepgram
nova-3with the regional code (en-IN,en-GB, etc.). - Hindi or other Indic language? → Deepgram
nova-3withhi. If accuracy is poor, try Cartesiaink-whisper. - Caller might switch between two languages mid-call? → Deepgram
nova-3withmulti, or AssemblyAIuniversal-streaming-multilingual. - Long-form, recorded-style audio (rare for voice agents)? → AssemblyAI
universal-streaming. - Enterprise compliance / Azure tenancy required? → Azure
default.
Why STT matters more than you'd think
Bad transcription is the #1 cause of voice agents misbehaving. The LLM only sees the transcript — if the STT mishears "refund" as "reform", the model has no way to recover. Symptoms include:
- Agent answers a question the caller didn't ask
- Agent loops on the same question because it can't parse the response
- Agent fails to detect numbers (account IDs, phone numbers) accurately
When you see these, the fix is usually:
- Switch to a region-specific language code (
en-INinstead ofen) - Try
multimode if calls cross languages - Try a different provider — Deepgram and AssemblyAI have noticeably different strengths per language
You can verify transcription quality by reviewing past calls on the Logs tab — every utterance is shown alongside the audio.
STT for non-English deployments
For India and South Asia, the recommended defaults are:
- English (with Indian accents): Deepgram
nova-3+en-IN - Hindi: Deepgram
nova-3+hi - Hindi/English code-switching: Deepgram
nova-3+multi
For a deeper guide on multilingual setup, see Customizations → Multilingual Support.