LLM Tab
The LLM tab controls which language model generates your agent's replies on every turn of every conversation.
Provider and model
Pick a provider, then pick a model from that provider's catalog. Switching the provider auto-selects that provider's first model — change the model right after if needed.
Currently supported providers and models:
| Provider | Models |
|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, gpt-5.1, gpt-5-mini, gpt-5-nano, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano |
| Gemini | gemini-2.0-flash, gemini-2.0-flash-lite, gemini-1.5-flash, gemini-1.5-pro, gemini-3.1-pro-preview, gemini-3-pro-preview |
| Groq | llama-3.3-70b-versatile, llama-3.1-70b-versatile, llama-3.1-8b-instant, mixtral-8x7b-32768 |
| Baseten | Qwen3-235B-A22B, Llama-4-Maverick-17B-128E-Instruct, DeepSeek-V3.1 |
| DeepSeek | deepseek-chat, deepseek-reasoner |
The default for new agents is OpenAI / gpt-5-mini — a good balance of quality, latency, and cost for voice.
How to choose
For voice, three properties matter, in roughly this order:
- First-token latency. Voice agents feel slow when the first word of the reply takes longer than ~400ms to arrive. Smaller and faster models (
gpt-5-mini,gpt-5-nano,gemini-2.0-flash-lite, Groq Llama 3.1 8B) win on this axis. - Instruction following. The model must respect the system prompt — staying in character, refusing forbidden topics, and calling tools correctly. Bigger models (
gpt-5.1,gpt-4.1,gemini-1.5-pro,Qwen3-235B-A22B) are stronger here. - Cost per turn. Voice calls have many short turns. A 10-minute call can easily run 60+ LLM calls, so model price matters more than for chat.
Common starting points:
- Default for most agents:
gpt-5-mini— fast, follows instructions well, sensible price. - Lowest latency: Groq
llama-3.1-8b-instantorgpt-5-nano. - Highest quality (e.g. healthcare, complex sales):
gpt-5.1orgemini-1.5-pro. - Multilingual / non-English:
gemini-1.5-proorgpt-4otend to handle code-switching best.
If your agent is misbehaving (going off-topic, ignoring the goal, missing tool calls), the fix is almost always on the Basic tab — a sharper system prompt, better goal, more guardrails. Only swap the LLM after you've tightened the prompt and confirmed it's still wrong.
Temperature, max tokens, and other knobs
Currently exposed only on the Create Agent wizard and the API; not yet editable from the LLM tab in the dashboard. Defaults:
llm_temperature:0.7llm_max_tokens:1024
These cover the vast majority of conversational use cases. If you need to override them on an existing agent, use the API PATCH /workspaces/{workspace_id}/agents/{agent_id} endpoint with llm_temperature / llm_max_tokens in the body.
Using your own API keys (BYO)
By default Bolti uses platform-managed credentials — you pay per minute of usage, no API keys needed. To bring your own keys (cheaper at scale, or required for compliance), see Customizations → Custom LLMs.