LIMINA VS. OPENAI PRIVACY FILTER

OpenAI Covers 8 HIPAA Identifiers. Limina Covers All 18.

On real privacy data, OpenAI Privacy Filter recall is 0.400. Limina's is 0.936. A newer entrant built around a bare model, the Privacy Filter has no PCI coverage, covers only 8 of the 18 identifiers required for HIPAA Safe Harbor, and produces non-deterministic output.

IMPACT

93.55%

Recall on ai4privacy dataset

+33.72%

Recall improvement over OpenAI Privacy Filter in English

18 of 18

HIPAA Safe Harbor identifiers covered

0

Bytes shared with third parties
HEAD TO HEAD

Where Limina Wins

A direct comparison across the dimensions that matter most to engineering and compliance teams.

Limina
OpenAI Privacy Filter
RECALL (ENGLISH)
+33.72% recall advantage on ai4privacy 500k. Adjusted recall advantage of +31.89%.
0.400 recall on custom privacy-focused dataset. More than half of sensitive entities go undetected in production.
HIPAA COVERAGE
All 18 identifiers required for HIPAA Safe Harbor compliance.
8 of 18 required identifiers. The missing 10 make Safe Harbor compliance structurally difficult in regulated healthcare environments.
PCI COVERAGE
Full PCI coverage including disfluent and misformatted card numbers.
No PCI data coverage. Disfluent or misformatted card numbers are not caught. For financial services, that gap is not a footnote.
PRECISION
0.9611 on custom privacy-focused dataset. +4.77% precision advantage in English.
0.832 on the same custom dataset. Precision trails recall as the more significant problem. It misses far more than it misclassifies.
F1 SCORE
0.9477 on custom dataset. +21.23% F1 advantage in English.
0.5031 on the same custom dataset.
DETERMINISTIC OUTPUT
Fully deterministic. The same input produces the same output every time. Auditors can verify it.
Deterministic output listed as supported—but recall varies run to run at initial release version. Results may differ on repeated passes of the same data.
DEPLOYMENT
Full containerized API with on-prem or VPC deployment. Data never leaves your environment.
On-prem deployment available as bare model only. No containerized API.
FULL COMPARISON

Feature by Feature

Limina vs. OpenAI Privacy Filter across the full capability set.

Capability
Limina
OpenAI Privacy Filter
On-prem / VPC deployment
Containerized with API
Bare model only
All 18 HIPAA identifiers
8 of 18
Full PCI coverage
-
Catches misformatted card numbers
-
Coreference resolution
-
Code-switching
-
Deterministic output
Languages supported
50+
English-focused
Recall on ai4privacy (English)
+33.72%
Baseline
Recall on custom dataset
0.9355
0.3998
F1 on custom dataset
0.9477
0.5031
Adjusted recall on custom dataset
0.9568
0.4230
Production-ready audit logging
Build it yourself
WHERE IT MATTERS

Built for Teams with Real Exposure

The organizations that choose Limina over the OpenAI Privacy Filter are the ones where HIPAA Safe Harbor, PCI compliance, and production-grade recall are non-negotiable.

Healthcare & Life Sciences

The OpenAI Privacy Filter covers 8 of the 18 identifiers required for HIPAA Safe Harbor. The missing identifiers aren't edge cases, they're the identifiers that appear in clinical notes, patient recordings, and EMR free-text fields. Limina covers all 18, on-premises, with expert determination-ready output and a production recall of 0.936.

Financial Services

The OpenAI Privacy Filter provides no PCI data coverage. Disfluent card numbers (spoken aloud in a call, misformatted in a transaction record, or split across a chat message) pass through undetected. Limina catches them. For contact center, banking, and insurance data at scale, the difference between no PCI coverage and full PCI coverage is the entire compliance posture.

Global Enterprises

An English-focused tool isn't a global compliance strategy. Limina supports 50+ languages with the same accuracy benchmark and native code-switching, so EMEA, APAC, and LATAM teams aren't operating at a lower standard — or routing data through a US-based API.

AI & LLM Initiatives

A recall rate of 0.400 means more than half of sensitive entities reach downstream model training pipelines. For AI teams that need to train on de-identified data, the Privacy Filter's coverage gaps translate directly into compliance exposure in the training set. Limina strips PII at ingestion (inside your infrastructure) with 0.936 recall and full HIPAA and PCI coverage.

Regulated & High-security Environments

Safe Harbor requires all 18 identifiers to be de-identified. The OpenAI Privacy Filter covers 8. That's not a partial solution—it's a structural gap that prevents Safe Harbor compliance for any dataset containing the missing identifiers. Limina covers all 18 and supports the expert determination process with sample reports available on request.

LET’S BE DIRECT

The Honest Comparison

A missed entity isn't a classification error. It's data exposure. Here's how the two products actually compare.

"The OpenAI Privacy Filter is a newer entrant. It will improve."

Noted, and these benchmarks are run at initial release version. But for organizations making compliance decisions today, the gaps are structural: 8 of 18 HIPAA identifiers, no PCI coverage, and a recall rate of 0.400 on real privacy data. These aren't calibration issues. They're coverage decisions. Limina has been in production for years across healthcare, financial services, and regulated enterprise environments.

"The OpenAI Privacy Filter runs on-premises."

As a bare model only. Limina ships as a full containerized API — deployable on-prem or in your VPC, with deterministic output, no integration overhead, and no data leaving your environment.

"The OpenAI Privacy Filter precision is competitive."

0.832 on the custom privacy-focused dataset versus Limina's 0.961. A 13-point precision gap compounds a 54-point recall gap. The Privacy Filter neither finds nor correctly flags what's there at production scale. Recall is the metric with consequences. At 0.400, more than half of sensitive entities go undetected.

"The OpenAI Privacy Filter supports HIPAA compliance."

Partially. It covers 8 of the 18 identifiers required for HIPAA Safe Harbor. The missing 10 identifiers appear in healthcare datasets regularly. HIPAA Safe Harbor requires all 18 to be de-identified—partial coverage does not satisfy the standard. Limina covers all 18 and supports the expert determination process with documentation available on request.

"The OpenAI Privacy Filter is benchmarked on standard datasets."

Limina's benchmarks run on the ai4privacy 500k dataset—publicly available, multi-domain, spanning finance, healthcare, and legal text. Limina has not trained on any split of it. Labels from both solutions were mapped to a common schema for a direct comparison. The evaluation code and datasets are available on request. We'd welcome a head-to-head on your data.

One API. Every format. Nothing leaves your environment.

See Limina on Your Data

Most teams know within a single proof of concept whether Limina fits. We'll run it against your formats, your languages, your edge cases—so the comparison is real, not theoretical.