Home/Learn/Model-agnostic AI data protection
For builders

Model-agnostic AI data protection for OpenAI, Anthropic, and Gemini

Updated June 2026 · 7 min read

Teams rarely commit to one model forever; they adopt new ones, run several at once, and switch as the field moves. This guide explains why governing data before the prompt makes your data protection identical across providers, so changing models never means redoing the controls.

01Governing data before the prompt is model-agnostic

Model-agnostic AI data protection is data governance applied before the prompt is built, so it does not depend on which model receives the prompt. Because the control inspects and redacts records at the data layer rather than inside any provider's API, the same policy and the same evidence apply whether the approved context then goes to OpenAI, Anthropic's Claude, Google Gemini, or a self-hosted model.

The logic is simple once stated. The exposure you are trying to prevent happens when sensitive data enters a prompt. If you decide what may enter the prompt before the prompt is assembled, that decision is finished before any model is involved. The model is downstream of the control and never participates in it, so the protection is the same no matter which provider you send the approved context to. This is the cross-provider property of the broader practice of AI data governance.

02The provider landscape and the shared risk

The model market is plural and moving. Most teams building on AI use more than one provider, and the leaders change often enough that committing to a single one is a liability rather than a simplification. The realistic set looks like this.

  • OpenAI. A common default for general-purpose generation and a frequent first integration.
  • Anthropic (Claude). Often chosen for long-context work and for use cases where teams weigh its safety posture.
  • Google Gemini. Adopted both standalone and as part of a wider cloud footprint.
  • Self-hosted and open-source models. Run inside the team's own perimeter for control, cost, or data-residency reasons.

What unites them, from a data-protection standpoint, is the risk. Every one of these models will use whatever is placed in its context, and none of them has a concept of who is asking or what that caller is cleared to see. The exposure is identical regardless of provider: sensitive data in the prompt is sensitive data at risk. Because the risk lives at the prompt, not at the provider, the control belongs at the prompt too. For the disclosure paths this risk can take, see LLM data leakage.

03Why a data-layer control is identical across providers

There are two places you could put data protection, and only one of them is provider-independent.

ProviderSame data-layer control applies?What changes
OpenAIYesNothing. The same policy decides what enters the prompt before the OpenAI call.
Anthropic (Claude)YesNothing. The approved context is identical; only the downstream endpoint differs.
Google GeminiYesNothing. Governance is finished before the Gemini call is made.
Self-hosted / open-sourceYesNothing. The control does not care whether the model is hosted or local.

The first option is to build protection into each provider's API path: a wrapper around the OpenAI call, another around the Anthropic call, another around Gemini. That approach makes the control a property of the integration, so it has to be rebuilt for every provider and kept in sync across all of them, and any provider you add without the wrapper is unprotected by default.

The second option is to put the control at the data layer, before the prompt. Written once, it applies to every provider equally, because the providers are all downstream of it. There is no per-provider rework and no risk of the controls drifting apart, because there is only one control. The same inspection, the same redaction, and the same verdicts run regardless of where the prompt is ultimately sent.

A control bound to a provider's API is rework waiting to happen. A control placed before the prompt is written once and inherited by every provider, present and future. The difference is not a feature; it is which side of the prompt boundary the control lives on.

04Switching or mixing models without redoing governance

Because the control sits before the prompt, the things teams actually do with models become free from a governance standpoint.

You can switch providers when a better model appears, and the data controls do not move. You can run two providers in parallel, routing a fraction of traffic to each for an evaluation, and both inherit the same policy. You can mix models within one application, sending different workloads to different providers, and every path is governed the same way. None of these require re-implementing redaction, re-deriving policy, or re-validating that the controls still hold, because the control was never tied to any one model in the first place. The model becomes a choice you make on its merits, not a commitment that drags your data governance along with it. For where this control sits in a full pipeline, see adding a data control plane to your AI stack; for the redaction step specifically, see how to redact PII before an LLM call.

05Consistent signed evidence regardless of provider

Model-agnostic protection has to mean model-agnostic proof as well. If the evidence of what was withheld looked different depending on which provider you called, you would be back to per-provider work, just moved from enforcement to audit.

Because the decision is made before the prompt, the evidence is produced before the prompt too, and it describes the data decision rather than the model call. Every pass or redact verdict is signed with HMAC-SHA256, hash-chained into an append-only ledger, kept content-free, and verifiable offline, and that record is the same whether the approved context then went to OpenAI, Anthropic, or Gemini. An auditor sees one consistent evidence trail for the whole application, not one per provider. The proof of governance is as model-agnostic as the governance itself. For the full treatment, see content-free, tamper-evident evidence.

06How Custosa provides this one layer

Custosa is the runtime data-control plane for enterprise AI, and it is model-agnostic by design. It sits between your data and the model and inspects every record and field at runtime, before the model sees it. The verdict comes from a deterministic policy engine built on Cedar, applying a per-field pass or redact decision by role across a five-level clearance lattice, and the engine fails closed when it cannot reach a confident verdict. Only the approved context is sent onward, and the data plane runs inside your environment so records never leave your boundary for the decision.

Because all of that happens before the prompt, the provider you send the approved context to is just a downstream choice. OpenAI, Anthropic's Claude, Google Gemini, and self-hosted models all receive a prompt that has already been governed the same way, and every decision is sealed as signed, content-free evidence regardless of provider. The result is one layer of data protection that you write once and keep across every model you use, with no model lock-in.

Model-agnostic data protection in short
  • The control runs before the prompt, so it is independent of which model receives the prompt.
  • The same policy and redaction apply to OpenAI, Anthropic, Gemini, and self-hosted models alike.
  • There is no per-provider rework, because there is one control, not one per integration.
  • Switching or mixing models is free from a governance standpoint; the controls do not move.
  • The evidence is consistent across providers, signed and content-free, so audit is provider-independent too.
  • The data plane stays in your environment, and self-hosted, on-premises, and air-gapped deployments are available.

One data-protection layer for every model

See Custosa govern data before the prompt, so the same controls and the same evidence hold whether you send the approved context to OpenAI, Anthropic, or Gemini.

Frequently asked questions

What is model-agnostic AI data protection?

Model-agnostic AI data protection is data governance that is applied before the prompt is built, so it does not depend on which model receives the prompt. Because the control inspects and redacts records at the data layer rather than inside any provider's API, the same policy and the same evidence apply whether the approved context then goes to OpenAI, Anthropic's Claude, Google Gemini, or a self-hosted model. The model is downstream of the control, so the protection is identical across providers.

Does Custosa work with OpenAI, Anthropic, and Gemini?

Yes. Custosa governs data before the prompt, so it is model-agnostic by design and works the same with OpenAI, Anthropic's Claude, and Google Gemini, as well as with self-hosted models. It inspects every record and field at runtime, applies a deterministic policy to pass or redact each field by role, and sends only the approved context onward. Which provider receives that context does not change how the data was governed, so there is no model lock-in.

Do I need different data controls for each provider?

No. That is the point of placing the control at the data layer. If you built data protection into each provider's API path, you would rebuild it every time you added or switched a provider, and you would risk the controls drifting apart. A single control before the prompt is written once and applies to every provider equally, which removes the per-provider rework and keeps the policy and the evidence consistent no matter where the prompt is sent.

What happens if I switch models?

Nothing changes about how data is governed. Because the redaction and policy run before the prompt is assembled, switching from one provider to another, or running two in parallel, does not touch the control. The same fields are withheld for the same callers, and the same signed evidence is produced, regardless of which model you route to. You can adopt a new model on its merits without redoing governance or re-certifying your data controls.

Does it work with self-hosted or open-source models?

Yes. The control governs the data before the prompt, so it is indifferent to whether the model is a hosted API or one you run yourself. The same inspection, redaction, and evidence apply to a self-hosted or open-source model exactly as they do to OpenAI, Anthropic, or Gemini. For teams that run models inside their own perimeter, self-managed, on-premises, and air-gapped deployments are available, along with a FIPS build, so both the data and the model can stay inside your boundary.