PII Data Protection
Voice agents handle some of the most sensitive data your business sees — names, phone numbers, account IDs, payment details, medical context, addresses. Every minute of a call passes that data through speech-to-text, a large language model, and text-to-speech, often hosted by different vendors in different regions. This page is about what Bolti does to protect that data, what you control, and what's available on enterprise plans.
The threat model
It's worth being explicit about what we're actually protecting against. The risks fall into three buckets:
- Third-party LLM exposure. When the LLM that powers your agent is hosted by OpenAI, Google, Anthropic, etc., every prompt — including transcripts of what your customers just said — leaves your environment for theirs. Most providers retain prompts for some period (sometimes for abuse review, sometimes for training). Sending un-redacted PII into that pipeline is a meaningful risk.
- Call recording / transcript storage. Recordings and transcripts sit in object storage and a database after the call ends. The risk here is unauthorized read — by other workspace members, by support staff, by an attacker with stolen credentials.
- Operational visibility. Logs, metrics, error traces. PII can leak into these accidentally if the platform isn't careful — which then puts it in front of on-call engineers, observability tooling, and log archives.
Bolti's controls are organized around these three buckets.
What's protected today, by default
These behaviors are on for every Bolti workspace, no configuration required.
Recordings and transcripts
- Recordings live in private object storage. The bucket is not publicly readable. The only way to get audio out is via a time-limited signed URL that the API generates after a permission check.
- Transcripts and call records are workspace-scoped. Even other workspaces in the same organization can't read them. Workspace Viewer is the minimum role required to see a transcript at all. (Workspaces & Organizations)
- Encryption at rest on the database (Postgres) and object storage layers.
- TLS in transit on every API call, dashboard request, and signed playback URL.
In-flight calls
- The realtime audio path is encrypted end-to-end between the realtime audio service, the SIP carrier, and the agent runtime.
- Call audio is never written to disk by the realtime stack outside of the dedicated recording pipeline (which writes to your private bucket).
- Active call sessions are isolated per call room — there's no cross-call audio routing path.
Operational data
- Application logs are scrubbed of common credential patterns (Authorization headers, API keys) before they're shipped to the central log store.
- Database access from Bolti's operations team is logged and audited, and only used for direct support requests you've raised.
These controls give you a reasonable default posture for most consumer-facing voice products. For higher-sensitivity workloads — healthcare, finance, government — the next sections cover what you can configure.
Why mask PII before it reaches the LLM
Even with all the storage controls above, the LLM call itself is the highest-risk step in the pipeline. Two reasons:
- The prompt contains the conversation transcript so far, including everything the caller has said. If the caller spoke their card number, it's in the next prompt.
- The LLM provider's data handling policy applies to that prompt. Most enterprise tiers offer "no training on customer data" agreements; few offer "no logging at all." Logs that retain raw prompts are a compliance liability.
The fix is PII masking: detect sensitive content in the transcript, replace it with placeholder tokens before the prompt goes out, and (optionally) restore the original values when post-processing the response.
Caller transcript:
"My order number is 4500-2398 and my card on file ends in 4242."
After PII masking → goes to LLM:
"My order number is [ORDER_ID_1] and my card on file ends in [CARD_LAST4_1]."
LLM reply:
"Thanks, looking up order [ORDER_ID_1] now."
After unmasking → goes to TTS:
"Thanks, looking up order 4500-2398 now."
The LLM never sees the real values. It still has full conversational context — the placeholder tokens are stable within the call so the model can reason about "the same order" — but the sensitive bytes never leave your environment.
Bolti's PII redaction (enterprise)
PII redaction is available as part of Bolti's enterprise plan. It's a configurable layer that sits in the agent runtime, between the transcript and the LLM call.
What it covers
A hybrid pipeline that combines two complementary approaches:
| Approach | What it catches | Examples |
|---|---|---|
| Regex / pattern detectors | Structured PII with predictable shapes. Fast, deterministic, no false negatives within the pattern. | Email, phone (E.164 + national formats), credit-card (Luhn-checked), CVV, SSN, Aadhaar, PAN, IBAN, IP address, MAC, IMEI. |
| Named-entity recognition (NER) | Context-dependent PII that doesn't fit a fixed pattern. | Person names, organization names, addresses, locations, occupations. |
The two run together because neither alone is enough — regex misses names, NER misses card numbers in unusual formats. The hybrid is the standard recommendation for production PII redaction.
Categories you can enable
You pick which categories to redact per workspace, per agent, or globally. Common configurations:
| Profile | Categories | Use case |
|---|---|---|
| Conservative | Card numbers, CVV, SSN, government IDs | Default safe choice — protects only the obvious financial / identity fields. |
| Healthcare (HIPAA-aligned) | All conservative + names, addresses, phone numbers, dates of birth, medical record numbers | US healthcare and similar regulated health workflows. |
| EU consumer (GDPR-aligned) | All conservative + names, email, phone, IP address, location | EU consumer data where most identifiable fields are in scope. |
| Maximum | Everything detectable | Highest-sensitivity workflows where the LLM should reason about structure but never see real values. |
You can also define custom patterns — internal account-ID formats, customer reference codes, anything specific to your domain.
Reversible vs. one-way redaction
Two modes, picked per category:
- Reversible (mask + restore) — the original value is held in the agent's session memory, the placeholder goes to the LLM, and the placeholder is replaced with the real value before the reply is spoken. The caller hears their own data spoken back. Good for: order IDs, account numbers, anything the agent needs to confirm out loud.
- One-way (mask, no restore) — the original is dropped after redaction. The LLM sees a placeholder, and the agent speaks the placeholder (or a sanitized phrase). Good for: card numbers, CVVs, anything the agent should never repeat.
What the LLM still sees
It's worth being concrete: even with maximum redaction enabled, the LLM still has the conversational structure — speaker turns, intents, the flow of the conversation, your system prompt, the tool calls available. It just doesn't see the literal sensitive values. In our testing this preserves 95%+ of agent quality on conversational tasks while completely removing identifiable values from the LLM's view.
Other lever you have, today
PII redaction is the headline feature, but there are several things you can do right now on the standard plan to reduce PII exposure:
1. Don't store recordings you don't need
Recordings are the longest-lived copy of call audio. If you don't need them for QA or compliance, you can:
- Disable recording per agent (talk to your account team — this is currently a workspace-level setting)
- Set a short retention window so recordings are auto-deleted after N days
- Periodically export and delete from object storage on your own schedule
2. Use a self-hosted or no-retention LLM endpoint
The cleanest way to limit LLM exposure is to never send your data to a third-party endpoint at all. Bolti supports custom OpenAI-compatible LLM endpoints (Custom LLMs) — point your agent at a self-hosted Llama / Mistral / DeepSeek deployment, or at a vendor's "zero-retention" endpoint, and the data never reaches the major hyperscalers' general APIs.
3. Lock down workspace access
PII protection is also access control. The most common breach pattern isn't a sophisticated attack; it's a former employee whose account was never disabled clicking through old call transcripts. The standard hygiene:
- Use workspace roles — most people should be Viewer or Editor, not Admin
- Remove members the moment they leave (Invite Your Team → Removing a member)
- Scope clients/contractors to a single workspace, never the org as a whole
4. Configure greetings to disclose recording
Many jurisdictions (two-party consent states in the US, India under the DPDP Act, EU under GDPR for some processing bases) require callers to know they're being recorded. Configure the greeting in Agent Setup → Basic so the disclosure happens in the first second.
What about outbound PII?
A subtle risk: tools the agent calls during the conversation. If your lookup_customer tool is configured to send the full transcript along with the customer ID, you're shipping PII to your own backend (which may or may not be ready for it).
Two practices:
- Send only what the tool needs. The tool's input schema controls what the LLM passes in. Restrict it to the specific fields the tool actually requires (just the order ID, not the conversation context).
- Audit your tool endpoints' logging. If your internal API logs every request body, your application logs are now a PII store too. Make sure you're scrubbing on the receive side.
Compliance posture
Bolti's controls map onto the major regulatory frameworks as follows. None of this is a substitute for your own compliance review — these are the controls that exist; your auditor decides if they're sufficient for your specific use case.
| Framework | What Bolti provides today |
|---|---|
| GDPR | DPA available; managed cloud regions including EU; PII redaction; deletion-on-request via API; right-to-export via call/transcript download. |
| HIPAA | BAA available on US deployments + on-prem; encryption at rest and in transit; audit logging; PII redaction with healthcare profile. |
| DPDP Act (India) | India-resident managed cloud (default); India-region providers available; addenda available; deletion + portability via API. |
| SOC 2 Type II | In progress — current report and roadmap available under NDA. |
| PCI DSS | One-way card-number redaction available; for full PCI scope reduction, recommend keeping card capture in a separate PCI-scoped IVR and using Bolti only for non-payment portions of the call. |
Honest scope
A few things this page doesn't claim, because we don't think they're true:
- No system catches 100% of PII. Hybrid regex+NER is the industry's best practical answer, but novel formats and obfuscated mentions ("my number ends in two-five-five-five") will sometimes slip through. Defense in depth — redaction + encryption + access control + minimal retention — is the right posture.
- PII redaction is not anonymization. Even with names and IDs removed, conversation patterns can sometimes be re-identified. If you need true anonymization (e.g. for a research dataset), that's a separate process from PII redaction.
- The LLM provider's policies still apply. Even fully-masked prompts go through the provider's pipeline. Vendor selection (Understanding Providers) is part of your compliance story, not just Bolti's.
Related
- Data Residency — geographical control over where data lives
- On-Prem Deployment — running the entire stack inside your environment
- Custom LLMs — using a self-hosted or zero-retention LLM endpoint
- Workspaces & Organizations — access control fundamentals
- Tool Calling — controlling what gets sent to your tool endpoints
To enable PII redaction or discuss a compliance-driven deployment, reach out to your account team or hello@bolti.co.in.