Speech Tab

The Speech tab controls how your agent hears — which speech-to-text (STT) engine transcribes the caller's audio in real time.

There are three settings:

Field	What it controls
STT Provider	The transcription service (Deepgram, AssemblyAI, ElevenLabs, etc.)
STT Model	The specific model within that provider (e.g. `nova-3`, `universal-streaming`)
STT Language	The expected language of the caller's audio

Switching the provider auto-selects that provider's first model — change the model right after if needed.

Supported providers

Provider	Models	Notes
Deepgram	`nova-3`, `nova-2`	Default. Excellent latency, broad language support including Hindi and Indian English
AssemblyAI	`universal-streaming`, `universal-streaming-multilingual`	High accuracy, especially for English and Spanish
ElevenLabs	`scribe_v2_realtime`	Real-time scribe with very low latency
Cartesia	`ink-whisper`	Whisper-based, supports 90+ languages
Azure	`default`	Microsoft Azure Speech, enterprise-grade
Fennec	`fennec-asr`	Specialty model

The default for new agents is Deepgram / nova-3, which is the right choice for the vast majority of use cases — fast, accurate, multilingual, and battle-tested.

STT Language

Sets the expected language code for the model. Examples:

en — generic English
en-IN — Indian English (better for Indian accents)
en-US, en-GB, en-AU — region-specific English
hi — Hindi
es, es-MX, es-ES — Spanish variants
multi — multilingual mode (Deepgram nova-3 only; auto-detects)

The dropdown shows every language code supported by the currently-selected provider/model combination.

Auto-set by the Basic tab

When you change Primary Language on the Basic tab, Bolti automatically resets the Speech tab to Deepgram / nova-3 with the matching language code. Only override here if you've measured a real transcription problem.

Picking the right combination

Use this decision tree:

Single language, English or major European? → Deepgram nova-3 with the regional code (en-IN, en-GB, etc.).
Hindi or other Indic language? → Deepgram nova-3 with hi. If accuracy is poor, try Cartesia ink-whisper.
Caller might switch between two languages mid-call? → Deepgram nova-3 with multi, or AssemblyAI universal-streaming-multilingual.
Long-form, recorded-style audio (rare for voice agents)? → AssemblyAI universal-streaming.
Enterprise compliance / Azure tenancy required? → Azure default.

Why STT matters more than you'd think

Bad transcription is the #1 cause of voice agents misbehaving. The LLM only sees the transcript — if the STT mishears "refund" as "reform", the model has no way to recover. Symptoms include:

Agent answers a question the caller didn't ask
Agent loops on the same question because it can't parse the response
Agent fails to detect numbers (account IDs, phone numbers) accurately

When you see these, the fix is usually:

Switch to a region-specific language code (en-IN instead of en)
Try multi mode if calls cross languages
Try a different provider — Deepgram and AssemblyAI have noticeably different strengths per language

You can verify transcription quality by reviewing past calls on the Logs tab — every utterance is shown alongside the audio.

STT for non-English deployments

For India and South Asia, the recommended defaults are:

English (with Indian accents): Deepgram nova-3 + en-IN
Hindi: Deepgram nova-3 + hi
Hindi/English code-switching: Deepgram nova-3 + multi

For a deeper guide on multilingual setup, see Customizations → Multilingual Support.

Supported providers​

STT Language​

Picking the right combination​

Why STT matters more than you'd think​

STT for non-English deployments​

Supported providers

STT Language

Picking the right combination

Why STT matters more than you'd think

STT for non-English deployments