Home/Learn/LLM data leakage
Core guide

LLM data leakage: how it happens and how to prevent it

Updated June 2026 · 8 min read

Sensitive data leaks through large language models when the model is given something it should never have received. The durable fix is not to scrub the output, but to withhold the data before the model ever sees it.

01What LLM data leakage is

LLM data leakage is the unintended exposure of sensitive data through a large language model. It occurs when data that a user is not authorized to see, or that should never have entered the system at all, reaches the model and then surfaces in an output, a log, a trace, or a stored record. The model is the conduit; the leak is the data going somewhere it should not.

The framing that matters is causal. A model can only expose what it was given. Leakage is therefore less a property of the model and more a property of the pipeline feeding it: what data enters the prompt, who is allowed to retrieve what, and where prompts and completions are stored afterward. Get the inputs right and most leakage disappears, because there is nothing for the model to reveal.

02The leakage vectors

Sensitive data reaches a model through a handful of recurring paths. Naming them precisely is the first step to closing them.

Training and fine-tuning data. If secrets, credentials, or personal data are present in a training or fine-tuning set, a model can memorize them and later reproduce them verbatim for an unrelated prompt. This is the hardest vector to reverse, because once a fact is baked into weights it is difficult to remove.

Prompt and context injection of sensitive fields. The most common enterprise vector is also the most mundane: an application places sensitive fields into the prompt as context. A diagnosis, a salary, a full account number gets concatenated into the request because it was in the record. The model now holds data the user may have no right to see.

RAG retrieval of unauthorized records. Retrieval ranks by relevance, not by permission. A retrieval step that is blind to access control can surface exactly the record the requesting user is not entitled to, and hand it straight to the model.

Logs and traces. Observability stacks routinely capture full prompts and completions. Any sensitive value in them is now copied into a logging system that often has broader access and longer retention than the source data. The leak is not in the answer; it is in the trace.

Model output and memorization. Finally, the model can simply state a sensitive value in its response, whether it was retrieved this turn or memorized in training, exposing it to whoever can read the output.

03Why most fixes are detection after the fact

The instinct, when teams confront leakage, is to inspect the model's output: run a classifier over completions, scan for things that look like PII, redact the response. This is detection after the fact, and it inherits two structural weaknesses.

First, by the time output detection runs, the model has already received and processed the sensitive data. It may already sit in the context window, in conversation memory, and in any trace captured along the way. Scrubbing the visible answer does nothing about those copies. Second, a detector has to recognize every form a leak can take, across paraphrase, partial disclosure, and inference, and it will miss some. A control whose best case is catching a leak that already happened inside your system is not where the primary defense belongs.

Detecting a leak in the output is reacting to a breach that has already occurred inside the system. Prevention has to act earlier, before the data reaches the model.

04The prevention principle: it cannot leak what it never saw

There is a single principle that turns leakage from a detection problem into a design problem.

A model cannot leak what it never received. Withhold the data before inference and the leak has no source.

If a sensitive field is removed before the prompt is assembled, the model never holds it, so it cannot repeat it in an answer, cannot expose it in a trace, and cannot carry it into later turns. Prevention is upstream of the model by construction. This reframes the goal: instead of asking how to catch sensitive data on the way out, ask how to keep unauthorized data from ever going in. That is authorization before augmentation, and it is the difference between prevention and cleanup. For the broader pattern, see AI data governance.

05Controls that prevent leakage

The principle becomes a system through a few concrete controls, each acting before the model rather than after it.

Authorization before augmentation. Resolve the requesting actor's identity and clearance, then decide what that actor may see before any data is added to the prompt. Retrieval that respects permissions closes the unauthorized-record vector at its source.

Field-level redaction. Decide per field, not per record. Custosa issues a per-field verdict of PASS or REDACT for each field by the actor's role, via a five-level clearance lattice. Authorized fields pass into the prompt; sensitive fields are withheld before they enter it, so the model cannot leak what it never received.

Fail-closed behavior. When the system cannot reach a verdict, it blocks rather than allowing data through. A fail-closed default means uncertainty resolves toward protection, not exposure.

Content-free logging. Keep the audit trail without copying the data into it. Custosa records content-free evidence: which actor, which fields passed or were redacted, under which policy, signed and hash-chained, but never the field values. Decisions are made with a deterministic policy engine (Cedar), so the same inputs always produce the same verdict and every redaction is explainable.

06Detection vs prevention

Both approaches have a place, but they are not interchangeable, and treating detection as sufficient is how leaks persist. The table makes the contrast explicit.

DimensionPrevention (redact before inference)Detection (scan the output)
When it actsBefore the model sees dataAfter the model answers
Does the model receive sensitive data?NoYes
Covers context, traces, logs?Yes, removed at sourceNo, only the visible output
Misses produceA blocked or redacted fieldA silent, completed leak
DeterminismSame inputs, same verdictClassifier-dependent
RolePrimary controlOptional second layer

Detection is a reasonable backstop. It is a poor foundation. The foundation is removing the data before the model can ever touch it.

07A prevention checklist

Translating the principle into practice comes down to a short set of commitments about how data flows into the model.

Leakage-prevention checklist
  • Authorize before you augment. Decide what the actor may see before adding anything to the prompt.
  • Redact at the field level, per actor, before inference, not on the output.
  • Make retrieval permission-aware so relevance never surfaces unauthorized records.
  • Fail closed when a verdict cannot be reached; block rather than allow through.
  • Log content-free evidence, not raw prompts and completions.
  • Keep decisions deterministic so every redaction is explainable and reproducible.
  • Treat output detection as a second layer, never as the primary defense.

Stop leaks before the model sees the data

Custosa inspects every record and field at runtime and withholds sensitive fields before they enter the prompt, with content-free evidence of every decision.

Frequently asked questions

What is LLM data leakage?

LLM data leakage is the unintended exposure of sensitive data through a large language model. It happens when data the user is not authorized to see, or that should never have entered the system, reaches the model through training, context, retrieval, or logs and then surfaces in an output, a trace, or a stored record.

How does sensitive data leak into an LLM?

The main vectors are training data that memorizes secrets, sensitive fields injected into the prompt or context, retrieval of records the user is not authorized to see, and logs or traces that capture prompts and completions. In each case the model is given data it should not have, and it can then repeat or store it.

Can prompt logs leak PII?

Yes. If prompts and completions are logged in full for debugging or analytics, any PII in them is now in your logging system, often with broader access and longer retention than the source data. Logging verdicts and metadata instead of content keeps the audit trail without copying sensitive data into logs.

How do you prevent data leakage in LLMs?

Prevent leakage by authorizing data before it augments the prompt, redacting sensitive fields per actor before inference, failing closed when a verdict cannot be reached, and logging content-free evidence instead of raw prompts. The principle is that a model cannot leak what it never received.

Is redaction or detection better?

Redaction before inference is prevention; output detection is after the fact. Detection can only catch a leak the model already produced, and it must recognize every form a leak takes. Redaction removes the data before the model sees it, so there is nothing to leak. Detection is a useful second layer, not the primary control.