LIMINA VS. OPENAI PRIVACY FILTER

OpenAI Covers 8 HIPAA Identifiers. Limina Covers All 18.

On real privacy data, OpenAI Privacy Filter recall is 0.400. Limina's is 0.936. A newer entrant built around a bare model, the Privacy Filter has no PCI coverage, covers only 8 of the 18 identifiers required for HIPAA Safe Harbor, and produces non-deterministic output.

IMPACT

93.55%

Recall on ai4privacy dataset

+33.72%

Recall improvement over OpenAI Privacy Filter in English

18 of 18

HIPAA Safe Harbor identifiers covered

0

Bytes shared with third parties

HEAD TO HEAD

Where Limina Wins

A direct comparison across the dimensions that matter most to engineering and compliance teams.

Limina

OpenAI Privacy Filter

RECALL (ENGLISH)

+33.72% recall advantage on ai4privacy 500k. Adjusted recall advantage of +31.89%.

0.400 recall on custom privacy-focused dataset. More than half of sensitive entities go undetected in production.

HIPAA COVERAGE

All 18 identifiers required for HIPAA Safe Harbor compliance.

8 of 18 required identifiers. The missing 10 make Safe Harbor compliance structurally difficult in regulated healthcare environments.

PCI COVERAGE

Full PCI coverage including disfluent and misformatted card numbers.

No PCI data coverage. Disfluent or misformatted card numbers are not caught. For financial services, that gap is not a footnote.

PRECISION

0.9611 on custom privacy-focused dataset. +4.77% precision advantage in English.

0.832 on the same custom dataset. Precision trails recall as the more significant problem. It misses far more than it misclassifies.

F1 SCORE

0.9477 on custom dataset. +21.23% F1 advantage in English.

0.5031 on the same custom dataset.

DETERMINISTIC OUTPUT

Fully deterministic. The same input produces the same output every time. Auditors can verify it.

Deterministic output listed as supported—but recall varies run to run at initial release version. Results may differ on repeated passes of the same data.

DEPLOYMENT

Full containerized API with on-prem or VPC deployment. Data never leaves your environment.

On-prem deployment available as bare model only. No containerized API.

FULL COMPARISON

Feature by Feature

Limina vs. OpenAI Privacy Filter across the full capability set.

Capability

Limina

OpenAI Privacy Filter

On-prem / VPC deployment

Containerized with API

Bare model only

All 18 HIPAA identifiers

8 of 18

Full PCI coverage

Catches misformatted card numbers

Coreference resolution

Code-switching

Deterministic output

Languages supported

50+

English-focused

Recall on ai4privacy (English)

+33.72%

Baseline

Recall on custom dataset

0.9355

0.3998

F1 on custom dataset

0.9477

0.5031

Adjusted recall on custom dataset

0.9568

0.4230

Production-ready audit logging

Build it yourself

WHERE IT MATTERS

Built for Teams with Real Exposure

The organizations that choose Limina over the OpenAI Privacy Filter are the ones where HIPAA Safe Harbor, PCI compliance, and production-grade recall are non-negotiable.

Healthcare & Life Sciences

The OpenAI Privacy Filter covers 8 of the 18 identifiers required for HIPAA Safe Harbor. The missing identifiers aren't edge cases, they're the identifiers that appear in clinical notes, patient recordings, and EMR free-text fields. Limina covers all 18, on-premises, with expert determination-ready output and a production recall of 0.936.

Financial Services

The OpenAI Privacy Filter provides no PCI data coverage. Disfluent card numbers (spoken aloud in a call, misformatted in a transaction record, or split across a chat message) pass through undetected. Limina catches them. For contact center, banking, and insurance data at scale, the difference between no PCI coverage and full PCI coverage is the entire compliance posture.

Global Enterprises

An English-focused tool isn't a global compliance strategy. Limina supports 50+ languages with the same accuracy benchmark and native code-switching, so EMEA, APAC, and LATAM teams aren't operating at a lower standard — or routing data through a US-based API.

AI & LLM Initiatives

A recall rate of 0.400 means more than half of sensitive entities reach downstream model training pipelines. For AI teams that need to train on de-identified data, the Privacy Filter's coverage gaps translate directly into compliance exposure in the training set. Limina strips PII at ingestion (inside your infrastructure) with 0.936 recall and full HIPAA and PCI coverage.

Regulated & High-security Environments

Safe Harbor requires all 18 identifiers to be de-identified. The OpenAI Privacy Filter covers 8. That's not a partial solution—it's a structural gap that prevents Safe Harbor compliance for any dataset containing the missing identifiers. Limina covers all 18 and supports the expert determination process with sample reports available on request.

LET’S BE DIRECT

The Honest Comparison

A missed entity isn't a classification error. It's data exposure. Here's how the two products actually compare.

"The OpenAI Privacy Filter is a newer entrant. It will improve."

Noted, and these benchmarks are run at initial release version. But for organizations making compliance decisions today, the gaps are structural: 8 of 18 HIPAA identifiers, no PCI coverage, and a recall rate of 0.400 on real privacy data. These aren't calibration issues. They're coverage decisions. Limina has been in production for years across healthcare, financial services, and regulated enterprise environments.

"The OpenAI Privacy Filter runs on-premises."

As a bare model only. Limina ships as a full containerized API — deployable on-prem or in your VPC, with deterministic output, no integration overhead, and no data leaving your environment.

"The OpenAI Privacy Filter precision is competitive."

0.832 on the custom privacy-focused dataset versus Limina's 0.961. A 13-point precision gap compounds a 54-point recall gap. The Privacy Filter neither finds nor correctly flags what's there at production scale. Recall is the metric with consequences. At 0.400, more than half of sensitive entities go undetected.

"The OpenAI Privacy Filter supports HIPAA compliance."

Partially. It covers 8 of the 18 identifiers required for HIPAA Safe Harbor. The missing 10 identifiers appear in healthcare datasets regularly. HIPAA Safe Harbor requires all 18 to be de-identified—partial coverage does not satisfy the standard. Limina covers all 18 and supports the expert determination process with documentation available on request.

"The OpenAI Privacy Filter is benchmarked on standard datasets."

Limina's benchmarks run on the ai4privacy 500k dataset—publicly available, multi-domain, spanning finance, healthcare, and legal text. Limina has not trained on any split of it. Labels from both solutions were mapped to a common schema for a direct comparison. The evaluation code and datasets are available on request. We'd welcome a head-to-head on your data.

One API. Every format. Nothing leaves your environment.

See Limina on Your Data

Most teams know within a single proof of concept whether Limina fits. We'll run it against your formats, your languages, your edge cases—so the comparison is real, not theoretical.