01What LLM data leakage is
LLM data leakage is the unintended exposure of sensitive data through a large language model. It occurs when data that a user is not authorized to see, or that should never have entered the system at all, reaches the model and then surfaces in an output, a log, a trace, or a stored record. The model is the conduit; the leak is the data going somewhere it should not.
The framing that matters is causal. A model can only expose what it was given. Leakage is therefore less a property of the model and more a property of the pipeline feeding it: what data enters the prompt, who is allowed to retrieve what, and where prompts and completions are stored afterward. Get the inputs right and most leakage disappears, because there is nothing for the model to reveal.
02The leakage vectors
Sensitive data reaches a model through a handful of recurring paths. Naming them precisely is the first step to closing them.
Training and fine-tuning data. If secrets, credentials, or personal data are present in a training or fine-tuning set, a model can memorize them and later reproduce them verbatim for an unrelated prompt. This is the hardest vector to reverse, because once a fact is baked into weights it is difficult to remove.
Prompt and context injection of sensitive fields. The most common enterprise vector is also the most mundane: an application places sensitive fields into the prompt as context. A diagnosis, a salary, a full account number gets concatenated into the request because it was in the record. The model now holds data the user may have no right to see.
RAG retrieval of unauthorized records. Retrieval ranks by relevance, not by permission. A retrieval step that is blind to access control can surface exactly the record the requesting user is not entitled to, and hand it straight to the model.
Logs and traces. Observability stacks routinely capture full prompts and completions. Any sensitive value in them is now copied into a logging system that often has broader access and longer retention than the source data. The leak is not in the answer; it is in the trace.
Model output and memorization. Finally, the model can simply state a sensitive value in its response, whether it was retrieved this turn or memorized in training, exposing it to whoever can read the output.
03Why most fixes are detection after the fact
The instinct, when teams confront leakage, is to inspect the model's output: run a classifier over completions, scan for things that look like PII, redact the response. This is detection after the fact, and it inherits two structural weaknesses.
First, by the time output detection runs, the model has already received and processed the sensitive data. It may already sit in the context window, in conversation memory, and in any trace captured along the way. Scrubbing the visible answer does nothing about those copies. Second, a detector has to recognize every form a leak can take, across paraphrase, partial disclosure, and inference, and it will miss some. A control whose best case is catching a leak that already happened inside your system is not where the primary defense belongs.
04The prevention principle: it cannot leak what it never saw
There is a single principle that turns leakage from a detection problem into a design problem.
If a sensitive field is removed before the prompt is assembled, the model never holds it, so it cannot repeat it in an answer, cannot expose it in a trace, and cannot carry it into later turns. Prevention is upstream of the model by construction. This reframes the goal: instead of asking how to catch sensitive data on the way out, ask how to keep unauthorized data from ever going in. That is authorization before augmentation, and it is the difference between prevention and cleanup. For the broader pattern, see AI data governance.
05Controls that prevent leakage
The principle becomes a system through a few concrete controls, each acting before the model rather than after it.
Authorization before augmentation. Resolve the requesting actor's identity and clearance, then decide what that actor may see before any data is added to the prompt. Retrieval that respects permissions closes the unauthorized-record vector at its source.
Field-level redaction. Decide per field, not per record. Custosa issues a per-field verdict of PASS or REDACT for each field by the actor's role, via a five-level clearance lattice. Authorized fields pass into the prompt; sensitive fields are withheld before they enter it, so the model cannot leak what it never received.
Fail-closed behavior. When the system cannot reach a verdict, it blocks rather than allowing data through. A fail-closed default means uncertainty resolves toward protection, not exposure.
Content-free logging. Keep the audit trail without copying the data into it. Custosa records content-free evidence: which actor, which fields passed or were redacted, under which policy, signed and hash-chained, but never the field values. Decisions are made with a deterministic policy engine (Cedar), so the same inputs always produce the same verdict and every redaction is explainable.
06Detection vs prevention
Both approaches have a place, but they are not interchangeable, and treating detection as sufficient is how leaks persist. The table makes the contrast explicit.
| Dimension | Prevention (redact before inference) | Detection (scan the output) |
|---|---|---|
| When it acts | Before the model sees data | After the model answers |
| Does the model receive sensitive data? | No | Yes |
| Covers context, traces, logs? | Yes, removed at source | No, only the visible output |
| Misses produce | A blocked or redacted field | A silent, completed leak |
| Determinism | Same inputs, same verdict | Classifier-dependent |
| Role | Primary control | Optional second layer |
Detection is a reasonable backstop. It is a poor foundation. The foundation is removing the data before the model can ever touch it.
07A prevention checklist
Translating the principle into practice comes down to a short set of commitments about how data flows into the model.
- Authorize before you augment. Decide what the actor may see before adding anything to the prompt.
- Redact at the field level, per actor, before inference, not on the output.
- Make retrieval permission-aware so relevance never surfaces unauthorized records.
- Fail closed when a verdict cannot be reached; block rather than allow through.
- Log content-free evidence, not raw prompts and completions.
- Keep decisions deterministic so every redaction is explainable and reproducible.
- Treat output detection as a second layer, never as the primary defense.
Stop leaks before the model sees the data
Custosa inspects every record and field at runtime and withholds sensitive fields before they enter the prompt, with content-free evidence of every decision.
Frequently asked questions
What is LLM data leakage?
LLM data leakage is the unintended exposure of sensitive data through a large language model. It happens when data the user is not authorized to see, or that should never have entered the system, reaches the model through training, context, retrieval, or logs and then surfaces in an output, a trace, or a stored record.
How does sensitive data leak into an LLM?
The main vectors are training data that memorizes secrets, sensitive fields injected into the prompt or context, retrieval of records the user is not authorized to see, and logs or traces that capture prompts and completions. In each case the model is given data it should not have, and it can then repeat or store it.
Can prompt logs leak PII?
Yes. If prompts and completions are logged in full for debugging or analytics, any PII in them is now in your logging system, often with broader access and longer retention than the source data. Logging verdicts and metadata instead of content keeps the audit trail without copying sensitive data into logs.
How do you prevent data leakage in LLMs?
Prevent leakage by authorizing data before it augments the prompt, redacting sensitive fields per actor before inference, failing closed when a verdict cannot be reached, and logging content-free evidence instead of raw prompts. The principle is that a model cannot leak what it never received.
Is redaction or detection better?
Redaction before inference is prevention; output detection is after the fact. Detection can only catch a leak the model already produced, and it must recognize every form a leak takes. Redaction removes the data before the model sees it, so there is nothing to leak. Detection is a useful second layer, not the primary control.