HIPAA-Compliant AI: Protecting PHI in LLMs

AI is HIPAA compliant when the system around the model enforces what HIPAA requires: a business associate agreement with every party that touches protected health information, the minimum necessary standard applied to what each user can see, role-based access controls, encryption in transit and at rest, and a reliable audit trail. No model is compliant by itself. The most durable control is structural. Withhold the PHI fields a given asker is not cleared for before the prompt is assembled, so the model cannot disclose what it never received.

That last point is the difference between policy on paper and policy in practice. A well-written access matrix does nothing if every clinician's prompt still carries the full chart into the context window. HIPAA compliant AI moves the control to the moment of inference, where the data actually enters the model, and proves what happened with evidence a regulator can verify.

01What HIPAA requires of AI systems

HIPAA does not name large language models, but its rules apply to any system that creates, receives, maintains, or transmits protected health information. The requirements that shape an AI deployment are these.

Protected health information. PHI is individually identifiable health information: a diagnosis, a medication, a lab value, a note, tied to identifiers such as a name, a date, a medical record number, or any of the eighteen identifiers the Privacy Rule enumerates. The moment a prompt or a retrieved document carries identified health data, HIPAA is in scope.

The business associate agreement. If a vendor handles PHI on your behalf, a BAA is required before any PHI is shared. For AI this reaches the model provider, the retrieval infrastructure, and any tool that processes prompts or responses. The agreement defines permitted uses and must restrict training on your data.

The minimum necessary standard. Uses and disclosures of PHI must be limited to what is reasonably needed for the task. Applied to inference, this means the model should receive only the fields the requesting role is authorized to use for the question asked.

Access controls. The Security Rule requires that access to PHI be restricted to authorized users by role. For AI, the relevant access decision is per field and per request, not just per application login.

Audit logging. Covered entities must be able to account for access to and disclosure of PHI. An AI system needs a record of who asked what, which fields were released, and which were withheld, durable enough to survive an investigation.

Encryption. PHI must be protected in transit and at rest. TLS in transit and strong encryption such as AES-256-GCM at rest are the baseline expectation.

None of these is satisfied by the model. They are satisfied by the data-control layer in front of it. This is where AI data governance for healthcare lives.

02The minimum necessary standard at inference and retrieval

The minimum necessary standard is the rule most often broken by AI, because the default behavior of an LLM pipeline is to over-collect. Builders dump the whole record into context to be safe, then rely on the prompt to keep the model from saying the wrong thing. That inverts the control. The model now holds PHI it should never have received, and the only thing standing between that data and a leak is the model's own behavior.

Applied correctly, minimum necessary operates in two places.

At retrieval. When a RAG pipeline searches an index, it should return only documents and passages the requester is authorized to access. A retriever that ignores authorization will surface PHI from any indexed chart, regardless of who is asking.

At the field level. Within an authorized record, the model should receive only the fields the role needs for the question. A nurse confirming a medication schedule does not need the substance-use history. A scheduling assistant does not need lab results. The principle is the same one HIPAA has always required, applied to the prompt instead of the printout.

The model should receive only the PHI the asker is cleared to see. It cannot leak what it never received.

This is the core idea behind PHI minimization: enforce minimum necessary per field, by role, at the moment the prompt is built, and the leakage surface collapses because the sensitive data is simply not present.

03De-identification versus runtime redaction

Two techniques reduce PHI exposure, and they are not interchangeable. Choosing the wrong one is a common and expensive mistake.

De-identification removes enough identifiers that data falls outside HIPAA entirely. The Privacy Rule recognizes two methods. Safe Harbor requires removing all eighteen specified identifiers, including names, geographic detail finer than the first three ZIP digits, dates more precise than the year, ages over 89, contact details, record and device numbers, biometric identifiers, and full-face images, then confirming no actual knowledge that the remainder could re-identify someone. Expert Determination uses a qualified statistician to certify that the re-identification risk is very small and to document the methods and assumptions. De-identified data is no longer PHI, so it can be used for analytics, research, and model training without a BAA.

Runtime redaction keeps data identified but withholds the specific fields a given role is not cleared for, at the moment of the request. The patient stays identified because the clinician legitimately needs to know which patient. What changes is which fields enter the prompt.

Dimension	De-identification	Runtime redaction
Goal	Remove data from HIPAA scope	Limit PHI to the minimum necessary per request
Patient identity	Removed	Preserved
Decision basis	Fixed identifier list or statistical model	Requesting role and the field
Best fit	Analytics, research, model training	Clinical and operational AI, copilots, RAG
BAA needed	No, once de-identified	Yes, data stays PHI

The two are complementary. De-identify for the data lake and the training set. Redact at runtime for the copilot that serves identified patient care. A clinical AI program usually needs both.

04Role-based PHI access in practice

Minimum necessary becomes concrete when expressed by role. The same patient record yields different views depending on who is asking and why. A clinician treating the patient sees the most. A nurse sees what the immediate task requires. A researcher sees de-identified data or an authorized cohort. Billing sees codes and dates, not clinical narrative.

The table below shows one way a five-level clearance lattice maps roles to field-level access. It is illustrative; the actual policy is yours to define and your compliance team to own.

Field	Clinician	Nurse	Researcher	Billing
Demographics	Pass	Pass	Redact	Pass
Active medications	Pass	Pass	Redact	Redact
Substance-use history	Pass	Redact	Redact	Redact
Lab results	Pass	Pass	Redact	Redact
Diagnosis codes	Pass	Pass	Redact	Pass
Mental-health notes	Pass	Redact	Redact	Redact

The decision that produces each Pass or Redact should be deterministic. Given the same role, field, and policy, the verdict must be identical every time, because an access decision that varies run to run is neither explainable nor defensible. A formal policy engine, evaluated before the prompt is built, gives you that property. A model asked to judge its own access does not.

05How RAG creates PHI exposure, and how to close it

Retrieval-augmented generation is the dominant pattern for healthcare AI, and it is where PHI most often escapes its boundary. The mechanism is straightforward. RAG indexes documents, retrieves the passages most relevant to a query, and injects them into the prompt. If the index holds PHI and retrieval ignores who is asking, the pipeline will happily place another patient's chart, or a field the asker is not cleared for, into the context window.

Three failure modes recur:

Retrieval bypasses record-level access. The vector store returns the best semantic match, not the records the user is authorized to read. Authorization that lives in the source system is lost the moment content is embedded.
Whole documents enter the prompt. Even an authorized record carries fields the role does not need, and all of them reach the model.
The prompt leaves the boundary. If retrieval and assembly happen outside the covered entity's environment, PHI has already crossed a line before any model runs.

Closing these means enforcing authorization before augmentation and redaction at the field level. Resolve the requester's clearance, filter retrieval to authorized records, strip the fields the role is not entitled to, then build the prompt. Run the whole pipeline inside your boundary. Done in that order, a passage the asker may not see never reaches the model, which is the only guarantee that holds when the model is probed. See RAG security for the full pattern.

06Audit evidence for investigations and breach response

When the Office for Civil Rights opens an investigation, or when you must determine whether an incident was a breach, the question is concrete: who accessed what PHI, when, and under what authorization. An AI system that cannot answer this is a liability, because the absence of a record is treated as the absence of control.

The evidence has to satisfy a few properties at once. It must be content-free, recording the decision, the fields, the role, and a hash rather than the PHI itself, so the audit trail is not a second copy of the data to protect. It must be signed, so each entry's integrity can be verified. It must be hash-chained and append-only, so that altering or deleting any entry breaks the chain and the tampering is visible. And it should be verifiable offline, so an auditor can confirm the record without trusting the vendor that produced it.

A breach analysis turns on what the model received, not just what it returned. If you can show a field was withheld before the prompt was built, you can show the model never held it, and there is nothing to have disclosed.

Logged this way, the audit trail does double duty: it demonstrates minimum necessary was enforced field by field, and it gives breach response a precise account of exposure instead of a guess.

07FHIR R4 classification

Healthcare data does not arrive as neat labeled fields. It arrives as FHIR resources, X12 claims, and free text. To apply minimum necessary, the control layer first has to know what each field is. A FHIR R4 Observation, Condition, MedicationRequest, or Patient resource carries dozens of elements, and a policy can only act on them once they are classified by sensitivity and type.

Automatic FHIR R4 classification maps incoming resources to the categories your policy understands, so that a substance-use Condition, a psychotherapy note, or an HIV lab Observation is recognized and governed without someone hand-labeling every feed. The same applies to X12 EDI claims data on the billing side. Classification is the step that makes field-level redaction possible at scale rather than as a one-off integration.

08What to require from an AI vendor

If you are evaluating an AI vendor that will touch PHI, the diligence is specific. Ask for these and read the answers closely.

Vendor requirements

A signed BAA before any PHI is shared, with use restricted to permitted purposes and training on your data prohibited unless explicitly authorized.
Field-level access control tied to role, not just an application-level login, so the minimum necessary standard is enforceable per request.
Deterministic policy decisions that produce the same verdict for the same inputs, so access is explainable and reproducible.
Data residency that keeps PHI inside your boundary, with self-managed, on-premises, or air-gapped options where required.
Signed, content-free, tamper-evident evidence of every access decision, verifiable offline.
Encryption in transit and at rest, with bring-your-own-key available where your policy demands it.
Fail-closed behavior, so that an error or timeout blocks the request rather than releasing PHI by default.

Note what is not on this list: a claim of being HIPAA certified. No such certification exists. A vendor provides controls and evidence; the covered entity or business associate remains responsible for compliance. A vendor that markets itself as HIPAA certified is telling you something about its rigor, and not in its favor.

09How Custosa applies minimum-necessary PHI control at runtime

Custosa is the runtime data-control plane for enterprise AI. It sits between your data and the LLM and inspects every record and field at runtime, before the model sees it. For a healthcare deployment, that inspection is where the minimum necessary standard is enforced.

The data plane runs inside your environment, so records never leave your boundary; self-managed, on-premises, and air-gapped options are available, along with a FIPS build. The managed control plane receives only content-free verdict evidence, never the records themselves. Decisions are made by a deterministic formal policy engine, Cedar, not a model, so the same inputs always produce the same verdict and the system fails closed. Verdicts are per-field Pass or Redact, by role, through a five-level clearance lattice, and sensitive fields are withheld before they enter the prompt, so the model cannot leak what it never received.

Every decision is signed with HMAC-SHA256 and hash-chained into an append-only, tamper-evident, content-free evidence ledger that can be verified offline. For healthcare specifically, Custosa ships a HIPAA Pack, FHIR R4 auto-classification, and X12 EDI claims handling, alongside SOC 2 and SOC 1 packs and mappings used in regulated environments. Data is protected with TLS in transit and AES-256-GCM at rest, with BYOK on request, and the p99 added-latency target is ≤50ms. Custosa is early-stage and in production with design partners.

To be precise about scope: Custosa provides the controls and the evidence. It does not make you HIPAA certified, because nothing can. A BAA is still required between a covered entity and any vendor handling PHI, and the covered entity remains responsible for its own compliance program.

Enforce minimum necessary before the model sees PHI

See how Custosa applies field-level, role-based PHI control at runtime and records signed evidence of every decision.

Request access See it work

Frequently asked questions

Is ChatGPT HIPAA compliant?

A consumer chatbot is not HIPAA compliant for PHI. No software is HIPAA compliant on its own. To use a model with protected health information, you need a business associate agreement with the provider, controls that enforce the minimum necessary standard and access, and an audit trail. Enterprise and API tiers can sometimes be covered under a BAA, but the covered entity remains responsible for how PHI reaches the model and what the model is allowed to see.

What makes an LLM HIPAA compliant?

Compliance is a property of the system around the model, not the model itself. A HIPAA compliant LLM deployment requires a BAA with anyone handling PHI, the minimum necessary standard applied to what each user can retrieve, role-based access, encryption in transit and at rest, and a reliable audit trail of who accessed what. The most durable control withholds PHI fields the asker is not cleared for before the prompt is built, so the model cannot disclose what it never received.

How does the minimum necessary standard apply to AI?

The minimum necessary standard requires limiting PHI to what is reasonably needed for a task. For AI, this applies at retrieval and at inference: a model should receive only the fields the requesting role is authorized to use for the question asked. A nurse asking about medication timing does not need the full psychiatric history. Enforcing minimum necessary per field, by role, before the model sees the data is how the standard maps onto LLMs and RAG.

Do I need a BAA for an AI vendor?

Yes, if the vendor creates, receives, maintains, or transmits PHI on your behalf, a business associate agreement is required before any PHI is shared. This includes model providers, RAG infrastructure, and tools that process prompts or responses containing PHI. The BAA must restrict use of PHI to permitted purposes and prohibit training on your data unless explicitly allowed. The covered entity remains responsible for compliance even with a BAA in place.

Is de-identification or redaction better for healthcare AI?

They solve different problems. De-identification under Safe Harbor or Expert Determination removes enough identifiers that data falls outside HIPAA, which suits analytics, research, and model training on populations. Runtime redaction keeps data identified but withholds the specific fields a given role is not cleared for at the moment of the request, which suits clinical and operational AI where the user legitimately needs some PHI but not all of it. Many programs use both.

Can RAG be HIPAA compliant?

Yes, but RAG adds exposure that has to be closed. Retrieval can surface PHI from any indexed document into a prompt, bypassing record-level access controls if authorization is not enforced at retrieval. A compliant RAG pipeline checks the requester's clearance before augmentation, redacts PHI fields at the field level by role, runs inside the covered entity's boundary, and records signed evidence of every decision.

HIPAA-compliant AI: protecting PHI in LLMs and RAG

01What HIPAA requires of AI systems

02The minimum necessary standard at inference and retrieval

03De-identification versus runtime redaction

04Role-based PHI access in practice

05How RAG creates PHI exposure, and how to close it

06Audit evidence for investigations and breach response

07FHIR R4 classification

08What to require from an AI vendor

09How Custosa applies minimum-necessary PHI control at runtime

Enforce minimum necessary before the model sees PHI

Frequently asked questions