Skip to main content

Speech Tab

The Speech tab controls how your agent hears — which speech-to-text (STT) engine transcribes the caller's audio in real time.

There are three settings:

FieldWhat it controls
STT ProviderThe transcription service (Deepgram, AssemblyAI, ElevenLabs, etc.)
STT ModelThe specific model within that provider (e.g. nova-3, universal-streaming)
STT LanguageThe expected language of the caller's audio

Switching the provider auto-selects that provider's first model — change the model right after if needed.

Supported providers

ProviderModelsNotes
Deepgramnova-3, nova-2Default. Excellent latency, broad language support including Hindi and Indian English
AssemblyAIuniversal-streaming, universal-streaming-multilingualHigh accuracy, especially for English and Spanish
ElevenLabsscribe_v2_realtimeReal-time scribe with very low latency
Cartesiaink-whisperWhisper-based, supports 90+ languages
AzuredefaultMicrosoft Azure Speech, enterprise-grade
Fennecfennec-asrSpecialty model

The default for new agents is Deepgram / nova-3, which is the right choice for the vast majority of use cases — fast, accurate, multilingual, and battle-tested.

STT Language

Sets the expected language code for the model. Examples:

  • en — generic English
  • en-IN — Indian English (better for Indian accents)
  • en-US, en-GB, en-AU — region-specific English
  • hi — Hindi
  • es, es-MX, es-ES — Spanish variants
  • multi — multilingual mode (Deepgram nova-3 only; auto-detects)

The dropdown shows every language code supported by the currently-selected provider/model combination.

Auto-set by the Basic tab

When you change Primary Language on the Basic tab, Bolti automatically resets the Speech tab to Deepgram / nova-3 with the matching language code. Only override here if you've measured a real transcription problem.

Picking the right combination

Use this decision tree:

  1. Single language, English or major European? → Deepgram nova-3 with the regional code (en-IN, en-GB, etc.).
  2. Hindi or other Indic language? → Deepgram nova-3 with hi. If accuracy is poor, try Cartesia ink-whisper.
  3. Caller might switch between two languages mid-call? → Deepgram nova-3 with multi, or AssemblyAI universal-streaming-multilingual.
  4. Long-form, recorded-style audio (rare for voice agents)? → AssemblyAI universal-streaming.
  5. Enterprise compliance / Azure tenancy required? → Azure default.

Why STT matters more than you'd think

Bad transcription is the #1 cause of voice agents misbehaving. The LLM only sees the transcript — if the STT mishears "refund" as "reform", the model has no way to recover. Symptoms include:

  • Agent answers a question the caller didn't ask
  • Agent loops on the same question because it can't parse the response
  • Agent fails to detect numbers (account IDs, phone numbers) accurately

When you see these, the fix is usually:

  • Switch to a region-specific language code (en-IN instead of en)
  • Try multi mode if calls cross languages
  • Try a different provider — Deepgram and AssemblyAI have noticeably different strengths per language

You can verify transcription quality by reviewing past calls on the Logs tab — every utterance is shown alongside the audio.

STT for non-English deployments

For India and South Asia, the recommended defaults are:

  • English (with Indian accents): Deepgram nova-3 + en-IN
  • Hindi: Deepgram nova-3 + hi
  • Hindi/English code-switching: Deepgram nova-3 + multi

For a deeper guide on multilingual setup, see Customizations → Multilingual Support.