LIMINA VS. AWS COMPREHEND

AWS Comprehend Misses 1 in 4 Sensitive Entities. Limina Misses 1 in 25.

On real privacy data, AWS Comprehend recall is 0.73. Limina's is 0.96. Built for broad NLP, Comprehend wasn't designed to catch what matters most. Limina was. And unlike Comprehend, it runs entirely on-premises. In other words, your data never leaves.

IMPACT

93.86%

Recall on ai4privacy dataset

+10.41%

Recall improvement over AWS

50+

Languages supported

0

Bytes shared with third parties

HEAD TO HEAD

Where Limina Wins

A direct comparison across the dimensions that matter most to engineering and compliance teams.

Limina

AWS Comprehend

DEPLOYMENT

Runs on-premises or in your VPC. Data never leaves your environment.

Cloud-only. Data must leave your environment to be processed by AWS.

RECALL (ENGLISH)

93.86% recall on ai4privacy 500k. Catches the entities that matter — adjusted recall of 96.12%.

the entities that matter — adjusted recall of 96.12%.~83% recall. A 10.41% gap means thousands of missed entities at scale.

RECALL (SPANISH)

Same accuracy benchmark holds. +10.87% recall advantage, +11.01% adjusted recall.

Performance drops outside English. Spanish is one of only two supported languages.

LANGUAGE COVERAGE

50+ languages supported async. Code-switching handled natively.

2 languages for PII detection. No code-switching support.

PRECISION

93.99% on custom privacy-focused dataset.

90.40% — marginally close, but recall is where the consequence lives.

F1 SCORE

0.9381 on custom dataset. +5.48% F1 advantage on ai4privacy.

0.7768 on the same custom dataset.

FULL COMPARISON

Feature by Feature

Limina vs. AWS Comprehend across the full capability set.

Capability

Limina

AWS Comprehend

On-prem / VPC deployment

Data leaves environment

Never

Yes

Languages (async)

50+

Coreference resolution

Code-switching

Deterministic output

Air-gapped deployment

Recall on ai4privacy (EN)

93.86%

~83%

Adjusted recall

96.12%

~85%

F1 (custom dataset)

0.9381

0.7768

Entity types

extensive

limited

Healthcare/finance/legal domain coverage

partial

WHERE IT MATTERS

Built for Teams with Real Exposure

The organizations that choose Limina over Tonic are the ones where a missed identifier has real consequences.

Healthcare & Life Sciences

Clinical data lives in EHRs, dictated notes, radiology images, and patient call recordings. Limina de-identifies all of it on-premises—with 93.86% recall and full HIPAA/HITECH compliance—without sending PHI to a vendor's cloud.

Financial Services

Customer records span structured databases, free-text notes, and recorded calls. Limina catches what cloud-based tools miss—with a 10%+ recall advantage over AWS Comprehend—and nothing crosses a boundary you don't control.

Global Enterprises

An English-and-Spanish-only tool isn't a global compliance strategy. Limina supports 50+ languages async with the same accuracy benchmark, so EMEA, APAC, and LATAM teams aren't operating at a lower standard—or routing traffic through a US-based cloud.

AI & LLM Initiatives

You can't train on data you can't touch. Limina strips PII at ingestion — inside your infrastructure—so AI teams work on the full dataset, not a degraded scrubbed version. Deterministic output means de-identified data is as useful for model training as it is for audit.

Regulated & High-security Environments

If your data can't leave your walls — not even to AWS—you need true on-premises deployment. Limina runs entirely inside your infrastructure, including air-gapped environments. Nothing is stored, transmitted, or logged outside your environment.

LET’S BE DIRECT

The Honest Comparison

The organizations that choose Limina over AWS Comprehend understand that a missed entity has real consequences.

"AWS Comprehend's precision is competitive with Limina's."

Marginally true. Comprehend edges Limina by 1.51% on precision in English. But precision tells you whether a flagged entity is correct. Recall tells you what slipped through undetected. On real privacy data, Limina's recall outperforms Comprehend by 10.41% in English and 10.87% in Spanish. On a custom privacy-focused dataset, Comprehend recall is 0.698. Limina's is 0.939. That gap is names, diagnoses, and account numbers reaching your models, your logs, and your auditors.

"AWS Comprehend offers ecosystem upside."

Fair. If your entire stack lives in AWS and data residency isn't a constraint, that integration has real value. Limina is built for the case where data can't leave your environment. In healthcare, financial services, and any organization subject to GDPR, HIPAA, or CPRA, that constraint isn't optional.

"AWS Comprehend requires no infrastructure to manage."

True. The tradeoff is that your data travels to AWS to be processed. For teams under HIPAA, GDPR, or internal data-residency policy, "no infrastructure" stops being an advantage the moment sensitive data leaves your environment. Limina runs on-prem or in your VPC. Your data never leaves.

"AWS Comprehend supports two languages for PII detection."

Two. English and Spanish. Limina supports 50+ languages for async processing, with coreference resolution and code-switching support that Comprehend doesn't offer. If your data isn't English or Spanish, Comprehend has no PII model for it.

"AWS Comprehend is benchmarked on standard datasets."

Limina's benchmarks run on the ai4privacy 500k dataset—publicly available, multi-domain, spanning finance, healthcare, and legal text. Limina has not trained on any split of it. Labels from both solutions were mapped to a common schema for a direct comparison. The evaluation code and datasets are available on request. We'd welcome a head-to-head on your data.

One API. Every format. Nothing leaves your environment.

See Limina on Your Data

Most teams know within a single proof of concept whether Limina fits. We'll run it against your formats, your languages, your edge cases—so the comparison is real, not theoretical.