In 2024, the Department of Health and Human Services (HHS) Office for Civil Rights (OCR) confirmed it had received more than 371,000 Health Insurance Portability and Accountability Act (HIPAA) complaints since the Privacy Rule took effect, with civil penalties and settlements reaching nearly $144 million. A surprising share of those cases trace back to the same root cause—data that was treated as anonymous when it wasn't. The 18 HIPAA identifiers are the bright line between Protected Health Information (PHI) you can't share without authorization and de-identified data you can use freely for research, analytics and AI training. Miss one, and your "de-identified" dataset is still PHI.
This guide walks through every identifier, the rules around the trickier ones (dates, ZIP codes, the catch-all 18th category), the two methods HIPAA recognizes for de-identification, and what happens when you get it wrong.
What are the 18 HIPAA identifiers?
The 18 HIPAA identifiers come from one specific provision of the HIPAA Privacy Rule: 45 CFR 164.514(b)(2)(i). HHS created this list to give covered entities and business associates a deterministic checklist—remove these data elements from a record set, satisfy one additional condition, and the data is no longer considered PHI.
The list applies to identifiers of the individual and to identifiers of the individual's relatives, employers and household members. That last point is often missed. A clinical note that mentions a patient's spouse by name still contains PHI even after the patient's own name is removed.
There's also a second condition that runs alongside the list: the covered entity must have no actual knowledge that the remaining information could be used alone or in combination with other reasonably available information to identify the individual. We'll come back to this requirement, because it trips up more organizations than the list itself.
The complete list of 18 HIPAA identifiers
The table below is the full list as enumerated in the Privacy Rule, with the practical removal rule for each. The numbering follows the order of the regulatory text.
| # |
Identifier |
Removal rule |
| 1 |
Names |
Remove all names of the individual, relatives, employers and household members. |
| 2 |
Geographic subdivisions smaller than a state |
Remove street address, city, county, precinct, ZIP code and equivalent geocodes. The first three digits of a ZIP code may remain only if the combined area covered by all ZIP codes with those three digits has more than 20,000 people per current Census data. Otherwise, change the first three digits to 000. |
| 3 |
Dates directly related to an individual |
Remove all elements of dates except year. This includes birth date, admission date, discharge date and date of death. Ages over 89 must be aggregated into a single "age 90 or older" category. |
| 4 |
Telephone numbers |
Remove all phone numbers, including mobile, work and emergency contacts. |
| 5 |
Fax numbers |
Remove all fax numbers. |
| 6 |
Email addresses |
Remove all email addresses. |
| 7 |
Social Security numbers |
Remove the full Social Security number. Partial representations, such as the last four digits, are also identifiers and must be removed. |
| 8 |
Medical record numbers |
Remove all medical record numbers used by the covered entity or any other organization. |
| 9 |
Health plan beneficiary numbers |
Remove all member IDs, subscriber IDs and policy numbers. |
| 10 |
Account numbers |
Remove all financial and patient account numbers. |
| 11 |
Certificate or license numbers |
Remove driver's license numbers, professional license numbers and similar credentials. |
| 12 |
Vehicle identifiers and serial numbers |
Remove license plate numbers and Vehicle Identification Numbers (VINs). |
| 13 |
Device identifiers and serial numbers |
Remove medical device serial numbers, implant identifiers and equipment IDs that link to the patient. |
| 14 |
Web Universal Resource Locators (URLs) |
Remove personal URLs, including patient portal links and personal websites. |
| 15 |
Internet Protocol (IP) address numbers |
Remove all IP addresses captured in the record set. |
| 16 |
Biometric identifiers |
Remove fingerprints, voice prints, retinal scans and other biometric data. |
| 17 |
Full-face photographic images and comparable images |
Remove face photos and any image that could identify the individual. |
| 18 |
Any other unique identifying number, characteristic or code |
Remove any remaining unique identifier not listed above—with one exception. A re-identification code created under 45 CFR 164.514(c) is excepted, provided the code is not derived from any individual data and the mechanism for re-identification is not disclosed. |
The HHS guidance is explicit on a point that often surprises teams: removing these 18 identifiers is necessary, but not sufficient. The "actual knowledge" condition must be met as well.
The two HIPAA de-identification methods
HIPAA recognizes exactly two methods for de-identifying PHI. The 18 identifiers belong to the first one. Knowing both is essential because the right method depends on what you're trying to do with the data.
Safe Harbor (the 18-identifier method)
Safe Harbor is the deterministic, checklist-based path. You remove the 18 identifiers listed in 45 CFR 164.514(b)(2)(i) from the record set—including identifiers of relatives, employers and household members—and confirm you have no actual knowledge that the remaining data could re-identify someone. If both conditions are met, the data is no longer PHI.
Safe Harbor is fast, auditable and inexpensive. It's the right choice when you can tolerate the precision loss that comes from stripping all dates, all geography below the state level and the last two digits of ages above 89.
Expert Determination
Expert Determination is described in 45 CFR 164.514(b)(1). A qualified expert—someone with appropriate knowledge of statistical and scientific principles for rendering data not individually identifiable—analyzes the dataset and determines that the risk of re-identification is "very small." The expert documents the methodology and results.
This method is more flexible. You can keep date elements, sub-state geography or other quasi-identifiers if the expert concludes the residual re-identification risk is acceptably low for the specific data, recipients and use context. It's the standard path for AI training datasets and longitudinal research where Safe Harbor's stripping is too lossy.
Pharma and life sciences research teams running longitudinal clinical studies often rely on Expert Determination precisely because it preserves date-level granularity and geographic precision that make time-series analysis meaningful.
Side-by-side comparison
| Factor |
Safe Harbor |
Expert Determination |
| Regulatory citation |
45 CFR 164.514(b)(2) |
45 CFR 164.514(b)(1) |
| Approach |
Remove 18 specific identifiers |
Statistical risk assessment by qualified expert |
| Data utility |
Lower—dates, granular geography and exact ages above 89 are stripped |
Higher—more detail can be retained based on risk |
| Cost and effort |
Low—deterministic checklist |
Higher—expert engagement, documentation, ongoing review |
| Best for |
Routine data sharing, basic analytics and low-risk releases |
AI training, longitudinal research and complex analytics |
| Documentation burden |
Process and verification logs |
Full methodology, results and justification report |
For a deeper look at each method, see our HIPAA Safe Harbor method guide and our Expert Determination guide.
How to remove the 18 HIPAA identifiers from your data
The order matters. A clean Safe Harbor process generally follows these six steps:
- Inventory the data. Map every field, free-text section, attachment and embedded metadata in the record set. Unstructured data—clinical notes, call transcripts, emails, scanned PDFs—is where most teams underestimate exposure.
- Map fields to the 18 identifier categories. Some fields map cleanly (a "DOB" column to identifier 3). Others don't. Free-text notes can contain any of the 18 in any combination.
- Apply the removal rules. Use deterministic rules where possible (regex for Social Security numbers, date parsers for dates) and natural-language processing for free text. ZIP codes need the 20,000-population check; ages need the 90-and-older aggregation.
- Address the catch-all (identifier 18). Look for unique combinations and codes that aren't on the explicit list but could identify someone. Rare diagnoses, unusual occupations and small-population demographics fall here.
- Verify the "actual knowledge" condition. Document that you've considered whether the remaining data could re-identify an individual alone or in combination with reasonably available information. The HHS guidance gives the example of an "occupation" field reading "former president of the State University"—even after the 18 identifiers are removed, that record fails Safe Harbor because the covered entity has actual knowledge it could identify the patient.
- Document everything. Record the method used, who performed the work, the verification steps and the date completed. This documentation is your defense if OCR ever asks.
For unstructured data—clinical notes, call recordings, Automatic Speech Recognition (ASR) transcripts, support tickets—manual review doesn't scale. Common cloud Natural Language Processing (NLP) tools miss a meaningful share of entities in real-world clinical text. Limina's de-identification platform achieves 99.5 percent or higher accuracy on physician conversations, compared with 60 to 70 percent for general-purpose cloud Application Programming Interfaces (APIs) on the same data—the difference between a defensible de-identification process and a leaky one.
Why the 18 identifiers are not the same as PHI
This is the most common misconception in HIPAA de-identification, and it matters.
PHI is defined in 45 CFR 160.103 as individually identifiable health information that relates to an individual's past, present or future health condition, treatment or payment for treatment, transmitted or maintained by a covered entity or business associate. The 18 identifiers are not the definition of PHI. They are the list of elements you must remove to qualify for one specific de-identification method.
The practical implication: data can contain none of the 18 identifiers and still be PHI if the covered entity has actual knowledge it could identify someone. Conversely, the same 18 elements appearing in a record set that has no health context aren't PHI at all—they're just personal data subject to other privacy laws.
This distinction matters for AI builders, healthcare organizations, contact centers and analytics teams who often hear "remove the 18 and you're fine." Removing the 18 is a step. The actual standard is the de-identification standard at 45 CFR 164.514(a): the data must not identify an individual, and you must have no reasonable basis to believe it can identify an individual.
The "actual knowledge" requirement that trips most teams
The actual knowledge condition is the part of Safe Harbor that catches organizations off guard during audits. The HHS guidance illustrates it with this example: a record that lists a patient's occupation as "former president of the State University" doesn't satisfy Safe Harbor even after the 18 identifiers are removed—because the covered entity could reasonably conclude the patient is identifiable from that occupation field alone.
In practice, you should pay attention to:
- Rare medical conditions or unusual treatment patterns
- Unique combinations of demographic variables in small populations
- High-profile occupations or public roles
- Geographic data combined with rare attributes
- Unique events (large multiple births, public-record incidents)
You don't need to certify zero re-identification risk. You need to confirm you have no actual knowledge of an identification path, and document that you considered it.
What it costs to get de-identification wrong
HIPAA penalties are tiered by culpability and adjusted annually for inflation. The current schedule, published in the Federal Register on January 28, 2026, runs as follows:
| Tier |
Culpability |
Per-violation range |
Annual cap (per identical provision) |
| 1 |
Did not know |
$145 to $73,011 |
$2,190,294 |
| 2 |
Reasonable cause, not willful neglect |
$1,461 to $73,011 |
$2,190,294 |
| 3 |
Willful neglect, corrected within 30 days |
$14,602 to $73,011 |
$2,190,294 |
| 4 |
Willful neglect, not corrected |
$73,011 to $2,190,294 |
$2,190,294 |
OCR's 2019 Notice of Enforcement Discretion currently applies lower annual caps for Tiers 1 through 3, but that discretion can be rescinded at any time. Beyond fines, an improperly de-identified dataset that gets shared can trigger breach notification obligations, state attorney general action and reputational damage that lasts longer than any settlement.
Build a de-identification process that actually holds up
Removing the 18 HIPAA identifiers sounds straightforward on paper. In practice—across electronic health records, clinical notes, call transcripts, emails and scanned forms—the failure modes are everywhere: a patient name in a free-text comment, a date in an image caption, a rare diagnosis in a populated ZIP code. Limina was built to de-identify the unstructured data that other tools miss, with deployment inside your Virtual Private Cloud (VPC) or on-premises so PHI never leaves your environment.
Get a demo to see how Limina handles all 18 identifiers across structured and unstructured PHI. You may also want our HIPAA Safe Harbor step-by-step guide and our deep dive on HIPAA Expert Determination.