LLM Tab

The LLM tab controls which language model generates your agent's replies on every turn of every conversation.

Provider and model

Pick a provider, then pick a model from that provider's catalog. Switching the provider auto-selects that provider's first model — change the model right after if needed.

Currently supported providers and models:

Provider	Models
OpenAI	`gpt-4o`, `gpt-4o-mini`, `gpt-5.1`, `gpt-5-mini`, `gpt-5-nano`, `gpt-4.1`, `gpt-4.1-mini`, `gpt-4.1-nano`
Gemini	`gemini-2.0-flash`, `gemini-2.0-flash-lite`, `gemini-1.5-flash`, `gemini-1.5-pro`, `gemini-3.1-pro-preview`, `gemini-3-pro-preview`
Groq	`llama-3.3-70b-versatile`, `llama-3.1-70b-versatile`, `llama-3.1-8b-instant`, `mixtral-8x7b-32768`
Baseten	`Qwen3-235B-A22B`, `Llama-4-Maverick-17B-128E-Instruct`, `DeepSeek-V3.1`
DeepSeek	`deepseek-chat`, `deepseek-reasoner`

The default for new agents is OpenAI / gpt-5-mini — a good balance of quality, latency, and cost for voice.

How to choose

For voice, three properties matter, in roughly this order:

First-token latency. Voice agents feel slow when the first word of the reply takes longer than ~400ms to arrive. Smaller and faster models (gpt-5-mini, gpt-5-nano, gemini-2.0-flash-lite, Groq Llama 3.1 8B) win on this axis.
Instruction following. The model must respect the system prompt — staying in character, refusing forbidden topics, and calling tools correctly. Bigger models (gpt-5.1, gpt-4.1, gemini-1.5-pro, Qwen3-235B-A22B) are stronger here.
Cost per turn. Voice calls have many short turns. A 10-minute call can easily run 60+ LLM calls, so model price matters more than for chat.

Common starting points:

Default for most agents: gpt-5-mini — fast, follows instructions well, sensible price.
Lowest latency: Groq llama-3.1-8b-instant or gpt-5-nano.
Highest quality (e.g. healthcare, complex sales): gpt-5.1 or gemini-1.5-pro.
Multilingual / non-English: gemini-1.5-pro or gpt-4o tend to handle code-switching best.

Don't change this first

If your agent is misbehaving (going off-topic, ignoring the goal, missing tool calls), the fix is almost always on the Basic tab — a sharper system prompt, better goal, more guardrails. Only swap the LLM after you've tightened the prompt and confirmed it's still wrong.

Temperature, max tokens, and other knobs

Currently exposed only on the Create Agent wizard and the API; not yet editable from the LLM tab in the dashboard. Defaults:

llm_temperature: 0.7
llm_max_tokens: 1024

These cover the vast majority of conversational use cases. If you need to override them on an existing agent, use the API PATCH /workspaces/{workspace_id}/agents/{agent_id} endpoint with llm_temperature / llm_max_tokens in the body.

Using your own API keys (BYO)

By default Bolti uses platform-managed credentials — you pay per minute of usage, no API keys needed. To bring your own keys (cheaper at scale, or required for compliance), see Customizations → Custom LLMs.

Provider and model​

How to choose​

Temperature, max tokens, and other knobs​

Using your own API keys (BYO)​

Provider and model

How to choose

Temperature, max tokens, and other knobs

Using your own API keys (BYO)