How to Redact PII Before an LLM Call

01Redact at the input, before the model sees it

To redact PII before an LLM call, insert a redaction step after retrieval and before the model call: resolve the caller's clearance, inspect the candidate context, apply a per-field pass or redact verdict, and build the prompt from only the approved result. Because the sensitive data is withheld before the prompt is assembled, the model never receives it, and the model cannot leak what it never received.

That last sentence is the whole idea. Most teams reach first for a filter on the model's answer, but by then the exposure has already happened. The durable control is at the input: govern what enters the context, so the question of leakage in the output becomes moot for the data you withheld. The rest of this guide works through why output filtering is too late, exactly where the step belongs, and how to keep the redaction role-aware so the context stays useful. It is the redaction half of AI data governance for LLM applications.

02Why output filtering is too late

An output filter inspects the model's response after generation and tries to strip anything sensitive. It is a reasonable backstop, but it cannot be the primary control, for reasons that are structural rather than fixable.

The model already received the data. If PII was in the prompt, it was in the model's input, in transit to the provider, and potentially in provider-side logs, all before the output filter ran. Filtering the answer does nothing about the copy that already left your boundary.
It is detection, not prevention. An output filter has to recognize sensitive content in free text to remove it, and recognition is imperfect. Anything it does not match passes through. That is failing open: the default on uncertainty is to release.
It cannot see the authorization context. By the time the answer exists, the filter no longer knows which caller asked, what they were cleared to see, or which field a string came from. It can only guess from surface patterns, so it cannot make a per-caller decision.

Input redaction and output filtering are not two options for the same job. Input redaction prevents the exposure; output filtering, at best, catches a fraction of what slipped through. Treat the input as the load-bearing control and the output as a secondary net, never the reverse.

For the full set of paths a leak can take through a model, including the ones output filtering misses, see LLM data leakage.

03Where the redaction step belongs

The redaction step belongs at the prompt boundary: after retrieval has gathered the candidate context, and before that context is assembled into the prompt and sent to the model. This is the one point where you have everything you need to make the decision and still have time to act on it.

At that boundary you know the caller and their clearance, you have the actual records and text in hand, and the prompt has not yet been built, so withholding a field simply means it is never added. Place the step earlier, at retrieval, and you can filter whole records but not yet see every field in context. Place it later, at the output, and the data is already gone. For the retrieval-side filtering that complements this step, see permission-aware RAG; for where the step sits in a full pipeline, see adding a data control plane to your AI stack.

04Blanket redaction versus role-aware, field-level redaction

The naive approach is blanket redaction: find every entity that looks like PII and mask it, for everyone, always. It is safe in the narrow sense, but it often destroys the context. A support summary with every name, date, and identifier blacked out may no longer contain enough for the model to answer the question at all.

The better approach is role-aware, field-level redaction. Each field is evaluated independently against the caller's clearance, and only the fields that exceed that clearance are withheld; the useful remainder passes through. The same record then yields a different, correct view for each role. A clinician sees the diagnosis but not the patient identifier; an analyst sees the aggregate but not the underlying account number. The model still receives enough to be useful, but never receives what the specific caller is not entitled to see.

Withhold the field, keep the context.

This is the difference between an allow-or-deny on the whole record and a record that is reshaped per caller. Custosa expresses clearance as a five-level lattice, so policy can map each role to exactly the tier of data it may read, rather than forcing a single global mask.

05Deterministic verdicts, entity-level and field-level

Two design choices make the redaction trustworthy: how the decision is made, and what it operates on.

On how the decision is made, the pass-or-redact verdict should come from a deterministic policy engine, not from a model. Custosa uses a formal policy engine built on Cedar, so the same inputs always produce the same verdict. Determinism is what makes the result explainable, reproducible, and testable: each decision traces to a written rule, and the same request always re-evaluates to the same outcome. A redactor that sometimes guesses is not a control; deciding what to withhold with another model only moves the trust problem rather than solving it. The engine also fails closed, so when it cannot reach a confident verdict it withholds rather than releases.

On what it operates on, redaction has to cover both shapes of data. For structured records, each field gets a verdict directly against policy. For unstructured text, entity recognition using regex and named-entity recognition (NER) locates the spans that carry PII, so a name or identifier buried inside a free-text note is caught and masked rather than passed through whole. Covering only structured fields would leave the most common leak path, prose, wide open.

06The same step works across OpenAI, Anthropic, and Gemini

Because the redaction runs before the prompt is assembled, it is independent of which model receives the prompt. The exact same step applies whether the approved context then goes to OpenAI, Anthropic's Claude, Google Gemini, or a model you host yourself. The provider sits downstream of the redaction and never participates in it.

The practical consequence is that you do not build PII redaction per provider. You can switch models, run two side by side, or route different workloads to different providers without redoing the control, because it was never tied to a provider's API in the first place. For the full treatment of why a data-layer control is identical across providers, see model-agnostic AI data protection.

07How Custosa does it

Custosa runs the redaction step as a data control plane between your data and the model, either as a gateway (reverse proxy) or called in process via an SDK. It inspects every record and field at runtime, applies the deterministic Cedar policy, and returns only the approved context, with the data plane running inside your environment so records never leave your boundary for the decision. As a numbered flow, the step looks like this.

Resolve the caller and clearance. Establish who the request is on behalf of and what they may see, so redaction is role-aware rather than blanket.
Retrieve the candidate context. Gather the records and text your application would normally send to the model for this request.
Inspect every field and span. Evaluate structured fields against policy and scan unstructured text with regex and NER, so PII is detected wherever it appears.
Redact above clearance, per field. Apply a deterministic pass or redact verdict to each field and entity, withholding what exceeds clearance and keeping the rest.
Send only the approved context. Build the prompt from the redacted result and send it to the model, whether OpenAI, Anthropic, Gemini, or self-hosted.
Seal a content-free proof. Record a signed, hash-chained, content-free entry that the redaction occurred, so it can be verified offline without storing the PII.

# Conceptual flow, not a live API. Provider call shown generically. context = retriever.search(query) # your retrieval, unchanged safe = redact(context, caller) # Custosa inspects + redacts here (gateway or SDK) answer = model.generate(prompt(safe, query)) # OpenAI / Anthropic / Gemini / self-hosted

The middle step is marked as a placeholder on purpose. The exact API, client libraries, and parameters for the redaction call are part of the developer documentation available to design partners; what is public and stable is the position and behavior of the step.

08Proving redaction happened

Removing the data is necessary; proving you removed it is what an auditor will ask for. A plain application log will not do, because a log you control is mutable and is therefore only a claim. And a proof that copied the PII to show what was redacted would just create a new copy of the very data you withheld.

The answer is content-free evidence. Each pass or redact decision is signed with HMAC-SHA256, so it is attributable; the entries are hash-chained into an append-only ledger, so altering or removing one breaks the chain and is detectable; the record is content-free, so it proves that a given field was redacted for a given caller without storing the field itself; and it is offline-verifiable, so a third party can confirm it without contacting Custosa. That turns "we redacted the PII" into something independently checkable, without the proof becoming a liability of its own. For the full treatment, see content-free, tamper-evident evidence.

Redact-before-the-call checklist

Put the step after retrieval, before the model call, where you can still withhold a field before the prompt is built.
Do not rely on output filtering as the primary control; the model already received the data by then.
Make it role-aware and field-level, not blanket, so the context stays useful for the caller who is allowed to see it.
Decide with a deterministic policy engine and fail closed, so verdicts are explainable and uncertainty reduces exposure.
Cover unstructured text with NER, not just structured fields, so PII in prose is caught too.
Keep the step provider-agnostic, so OpenAI, Anthropic, and Gemini all use the same redaction.
Record content-free, signed, hash-chained evidence, so you can prove redaction happened without storing the PII.

Strip PII before the prompt, not after the answer

See Custosa inspect every record and field at runtime and redact above clearance before the context reaches the model, the same way whether you call OpenAI, Anthropic, or Gemini.

Request access See it work

Frequently asked questions

How do you redact PII before sending it to an LLM?

You insert a redaction step after retrieval and before the model call. Resolve the caller's clearance, inspect the candidate context (evaluating structured fields against policy and scanning unstructured text with regex and NER), apply a deterministic pass or redact verdict to each field and entity, then build the prompt from only the approved result. Because the personally identifiable information is withheld before the prompt is assembled, the model never receives it, and therefore cannot leak it in an answer.

Why redact before the call instead of filtering output?

Output filtering runs after the model has already received the data, so the exposure has already happened: the PII was in the prompt, in transit, and in any provider logs before the filter ever ran. Output filters are also detection, not prevention, and they fail open, letting through anything they do not recognize. Redacting before the call prevents the exposure by construction, because unauthorized data never enters the context. Output checks can serve as a backstop, but they cannot be the load-bearing control.

Does redaction make the model less useful?

Not if it is role-aware and field-level rather than blanket. Blanket redaction strips every entity and can leave the context too sparse to be useful. Field-level, clearance-based redaction withholds only the fields a given caller is not entitled to see and passes the useful remainder through, so the same record yields a different, correct view for each role. A clinician sees the diagnosis but not the identifier; the model still has enough context to answer well, without receiving what the caller may not see.

Does this work with OpenAI, Anthropic, and Gemini?

Yes. The redaction step runs before the prompt is assembled, so it is independent of which model receives the prompt. The exact same step applies whether the approved context then goes to OpenAI, Anthropic's Claude, Google Gemini, or a self-hosted model. The provider is downstream of the redaction, so switching or mixing providers does not change how PII is removed and does not require redoing the controls.

How do you prove redaction happened?

With content-free evidence. Each pass or redact decision is signed with HMAC-SHA256 so it is attributable, and the entries are hash-chained into an append-only ledger so altering or removing one breaks the chain and is detectable. The record is content-free: it proves that a given field was redacted for a given caller without copying the field itself. Because it can be verified offline without contacting the vendor, an auditor can confirm independently that the redaction occurred, without the proof itself becoming a new copy of the PII.

How to redact PII before an LLM call

01Redact at the input, before the model sees it

02Why output filtering is too late

03Where the redaction step belongs

04Blanket redaction versus role-aware, field-level redaction

05Deterministic verdicts, entity-level and field-level

06The same step works across OpenAI, Anthropic, and Gemini

07How Custosa does it

08Proving redaction happened

Strip PII before the prompt, not after the answer

Frequently asked questions