May 29, 2026
.

The 18 HIPAA Identifiers: Full List and De-identification Guide

The 18 HIPAA identifiers are the specific data elements listed in 45 CFR 164.514(b)(2) that you must remove from a record set to qualify for the HIPAA Privacy Rule’s Safe Harbor de-identification method. Once all 18 are removed and you have no actual knowledge that the remaining data could re-identify an individual, the information is no longer PHI and falls outside HIPAA’s use and disclosure restrictions.

Limina
Company
HIPAA Identifiers

In 2024, the Department of Health and Human Services (HHS) Office for Civil Rights (OCR) confirmed it had received more than 371,000 Health Insurance Portability and Accountability Act (HIPAA) complaints since the Privacy Rule took effect, with civil penalties and settlements reaching nearly $144 million. A surprising share of those cases trace back to the same root cause—data that was treated as anonymous when it wasn't. The 18 HIPAA identifiers are the bright line between Protected Health Information (PHI) you can't share without authorization and de-identified data you can use freely for research, analytics and AI training. Miss one, and your "de-identified" dataset is still PHI.

This guide walks through every identifier, the rules around the trickier ones (dates, ZIP codes, the catch-all 18th category), the two methods HIPAA recognizes for de-identification, and what happens when you get it wrong.

What are the 18 HIPAA identifiers?

The 18 HIPAA identifiers come from one specific provision of the HIPAA Privacy Rule: 45 CFR 164.514(b)(2)(i). HHS created this list to give covered entities and business associates a deterministic checklist—remove these data elements from a record set, satisfy one additional condition, and the data is no longer considered PHI.

The list applies to identifiers of the individual and to identifiers of the individual's relatives, employers and household members. That last point is often missed. A clinical note that mentions a patient's spouse by name still contains PHI even after the patient's own name is removed.

There's also a second condition that runs alongside the list: the covered entity must have no actual knowledge that the remaining information could be used alone or in combination with other reasonably available information to identify the individual. We'll come back to this requirement, because it trips up more organizations than the list itself.

The complete list of 18 HIPAA identifiers

The table below is the full list as enumerated in the Privacy Rule, with the practical removal rule for each. The numbering follows the order of the regulatory text.

# Identifier Removal rule
1 Names Remove all names of the individual, relatives, employers and household members.
2 Geographic subdivisions smaller than a state Remove street address, city, county, precinct, ZIP code and equivalent geocodes. The first three digits of a ZIP code may remain only if the combined area covered by all ZIP codes with those three digits has more than 20,000 people per current Census data. Otherwise, change the first three digits to 000.
3 Dates directly related to an individual Remove all elements of dates except year. This includes birth date, admission date, discharge date and date of death. Ages over 89 must be aggregated into a single "age 90 or older" category.
4 Telephone numbers Remove all phone numbers, including mobile, work and emergency contacts.
5 Fax numbers Remove all fax numbers.
6 Email addresses Remove all email addresses.
7 Social Security numbers Remove the full Social Security number. Partial representations, such as the last four digits, are also identifiers and must be removed.
8 Medical record numbers Remove all medical record numbers used by the covered entity or any other organization.
9 Health plan beneficiary numbers Remove all member IDs, subscriber IDs and policy numbers.
10 Account numbers Remove all financial and patient account numbers.
11 Certificate or license numbers Remove driver's license numbers, professional license numbers and similar credentials.
12 Vehicle identifiers and serial numbers Remove license plate numbers and Vehicle Identification Numbers (VINs).
13 Device identifiers and serial numbers Remove medical device serial numbers, implant identifiers and equipment IDs that link to the patient.
14 Web Universal Resource Locators (URLs) Remove personal URLs, including patient portal links and personal websites.
15 Internet Protocol (IP) address numbers Remove all IP addresses captured in the record set.
16 Biometric identifiers Remove fingerprints, voice prints, retinal scans and other biometric data.
17 Full-face photographic images and comparable images Remove face photos and any image that could identify the individual.
18 Any other unique identifying number, characteristic or code Remove any remaining unique identifier not listed above—with one exception. A re-identification code created under 45 CFR 164.514(c) is excepted, provided the code is not derived from any individual data and the mechanism for re-identification is not disclosed.

The HHS guidance is explicit on a point that often surprises teams: removing these 18 identifiers is necessary, but not sufficient. The "actual knowledge" condition must be met as well.

The two HIPAA de-identification methods

HIPAA recognizes exactly two methods for de-identifying PHI. The 18 identifiers belong to the first one. Knowing both is essential because the right method depends on what you're trying to do with the data.

Safe Harbor (the 18-identifier method)

Safe Harbor is the deterministic, checklist-based path. You remove the 18 identifiers listed in 45 CFR 164.514(b)(2)(i) from the record set—including identifiers of relatives, employers and household members—and confirm you have no actual knowledge that the remaining data could re-identify someone. If both conditions are met, the data is no longer PHI.

Safe Harbor is fast, auditable and inexpensive. It's the right choice when you can tolerate the precision loss that comes from stripping all dates, all geography below the state level and the last two digits of ages above 89.

Expert Determination

Expert Determination is described in 45 CFR 164.514(b)(1). A qualified expert—someone with appropriate knowledge of statistical and scientific principles for rendering data not individually identifiable—analyzes the dataset and determines that the risk of re-identification is "very small." The expert documents the methodology and results.

This method is more flexible. You can keep date elements, sub-state geography or other quasi-identifiers if the expert concludes the residual re-identification risk is acceptably low for the specific data, recipients and use context. It's the standard path for AI training datasets and longitudinal research where Safe Harbor's stripping is too lossy.

Pharma and life sciences research teams running longitudinal clinical studies often rely on Expert Determination precisely because it preserves date-level granularity and geographic precision that make time-series analysis meaningful.

Side-by-side comparison

Factor Safe Harbor Expert Determination
Regulatory citation 45 CFR 164.514(b)(2) 45 CFR 164.514(b)(1)
Approach Remove 18 specific identifiers Statistical risk assessment by qualified expert
Data utility Lower—dates, granular geography and exact ages above 89 are stripped Higher—more detail can be retained based on risk
Cost and effort Low—deterministic checklist Higher—expert engagement, documentation, ongoing review
Best for Routine data sharing, basic analytics and low-risk releases AI training, longitudinal research and complex analytics
Documentation burden Process and verification logs Full methodology, results and justification report

For a deeper look at each method, see our HIPAA Safe Harbor method guide and our Expert Determination guide.

How to remove the 18 HIPAA identifiers from your data

The order matters. A clean Safe Harbor process generally follows these six steps:

  • Inventory the data. Map every field, free-text section, attachment and embedded metadata in the record set. Unstructured data—clinical notes, call transcripts, emails, scanned PDFs—is where most teams underestimate exposure.
  • Map fields to the 18 identifier categories. Some fields map cleanly (a "DOB" column to identifier 3). Others don't. Free-text notes can contain any of the 18 in any combination.
  • Apply the removal rules. Use deterministic rules where possible (regex for Social Security numbers, date parsers for dates) and natural-language processing for free text. ZIP codes need the 20,000-population check; ages need the 90-and-older aggregation.
  • Address the catch-all (identifier 18). Look for unique combinations and codes that aren't on the explicit list but could identify someone. Rare diagnoses, unusual occupations and small-population demographics fall here.
  • Verify the "actual knowledge" condition. Document that you've considered whether the remaining data could re-identify an individual alone or in combination with reasonably available information. The HHS guidance gives the example of an "occupation" field reading "former president of the State University"—even after the 18 identifiers are removed, that record fails Safe Harbor because the covered entity has actual knowledge it could identify the patient.
  • Document everything. Record the method used, who performed the work, the verification steps and the date completed. This documentation is your defense if OCR ever asks.

For unstructured data—clinical notes, call recordings, Automatic Speech Recognition (ASR) transcripts, support tickets—manual review doesn't scale. Common cloud Natural Language Processing (NLP) tools miss a meaningful share of entities in real-world clinical text. Limina's de-identification platform achieves 99.5 percent or higher accuracy on physician conversations, compared with 60 to 70 percent for general-purpose cloud Application Programming Interfaces (APIs) on the same data—the difference between a defensible de-identification process and a leaky one.

Why the 18 identifiers are not the same as PHI

This is the most common misconception in HIPAA de-identification, and it matters.

PHI is defined in 45 CFR 160.103 as individually identifiable health information that relates to an individual's past, present or future health condition, treatment or payment for treatment, transmitted or maintained by a covered entity or business associate. The 18 identifiers are not the definition of PHI. They are the list of elements you must remove to qualify for one specific de-identification method.

The practical implication: data can contain none of the 18 identifiers and still be PHI if the covered entity has actual knowledge it could identify someone. Conversely, the same 18 elements appearing in a record set that has no health context aren't PHI at all—they're just personal data subject to other privacy laws.

This distinction matters for AI builders, healthcare organizations, contact centers and analytics teams who often hear "remove the 18 and you're fine." Removing the 18 is a step. The actual standard is the de-identification standard at 45 CFR 164.514(a): the data must not identify an individual, and you must have no reasonable basis to believe it can identify an individual.

The "actual knowledge" requirement that trips most teams

The actual knowledge condition is the part of Safe Harbor that catches organizations off guard during audits. The HHS guidance illustrates it with this example: a record that lists a patient's occupation as "former president of the State University" doesn't satisfy Safe Harbor even after the 18 identifiers are removed—because the covered entity could reasonably conclude the patient is identifiable from that occupation field alone.

In practice, you should pay attention to:

  • Rare medical conditions or unusual treatment patterns
  • Unique combinations of demographic variables in small populations
  • High-profile occupations or public roles
  • Geographic data combined with rare attributes
  • Unique events (large multiple births, public-record incidents)

You don't need to certify zero re-identification risk. You need to confirm you have no actual knowledge of an identification path, and document that you considered it.

What it costs to get de-identification wrong

HIPAA penalties are tiered by culpability and adjusted annually for inflation. The current schedule, published in the Federal Register on January 28, 2026, runs as follows:

Tier Culpability Per-violation range Annual cap (per identical provision)
1 Did not know $145 to $73,011 $2,190,294
2 Reasonable cause, not willful neglect $1,461 to $73,011 $2,190,294
3 Willful neglect, corrected within 30 days $14,602 to $73,011 $2,190,294
4 Willful neglect, not corrected $73,011 to $2,190,294 $2,190,294

OCR's 2019 Notice of Enforcement Discretion currently applies lower annual caps for Tiers 1 through 3, but that discretion can be rescinded at any time. Beyond fines, an improperly de-identified dataset that gets shared can trigger breach notification obligations, state attorney general action and reputational damage that lasts longer than any settlement.

Build a de-identification process that actually holds up

Removing the 18 HIPAA identifiers sounds straightforward on paper. In practice—across electronic health records, clinical notes, call transcripts, emails and scanned forms—the failure modes are everywhere: a patient name in a free-text comment, a date in an image caption, a rare diagnosis in a populated ZIP code. Limina was built to de-identify the unstructured data that other tools miss, with deployment inside your Virtual Private Cloud (VPC) or on-premises so PHI never leaves your environment.

Get a demo to see how Limina handles all 18 identifiers across structured and unstructured PHI. You may also want our HIPAA Safe Harbor step-by-step guide and our deep dive on HIPAA Expert Determination.

Related Articles

Frequently Asked Questions

Are the 18 HIPAA identifiers the same as PHI?

No. The 18 identifiers are the data elements you must remove to qualify for the Safe Harbor de-identification method under 45 CFR 164.514(b)(2). PHI is defined separately at 45 CFR 160.103 as individually identifiable health information held by a covered entity or business associate. Data can contain none of the 18 identifiers and still be PHI if other information in the record could identify the patient.

Can I share data freely once the 18 identifiers are removed?

Only if you’ve also satisfied the “actual knowledge” condition. Safe Harbor requires both removing the 18 identifiers and confirming you have no actual knowledge that the remaining data could be used alone or with other reasonably available information to identify an individual. If both conditions are met, the data is no longer PHI and the HIPAA Privacy Rule no longer restricts its use or disclosure.

What does the “actual knowledge” rule mean in practice?

It means you must consider, and document that you considered, whether anything in the remaining data could identify someone. Rare diagnoses, unusual occupations, small-population demographic combinations and high-profile patients all create actual knowledge concerns. The HHS example of an “occupation” field reading “former president of the State University” shows that obvious identifiers can survive removal of the 18 categories.

Can ZIP codes appear in de-identified data?

The first three digits of a ZIP code can remain only if the geographic area formed by all ZIP codes sharing those three digits has a population greater than 20,000 according to current Census data. If the population is 20,000 or fewer, the first three digits must be changed to 000. Full five-digit ZIP codes always count as identifiers and must be removed.

How are dates handled under Safe Harbor?

You must remove all elements of dates other than the year for any date directly related to the individual—birth, admission, discharge, death and similar dates. Ages above 89 also have to be aggregated into a single “age 90 or older” category, because age 90+ is itself indicative of identity in small populations. If you need granular dates for analytics, you’ll need Expert Determination, not Safe Harbor.

Is Expert Determination better than Safe Harbor for AI training data?

Often, yes. Safe Harbor strips dates, sub-state geography and exact ages above 89, which can damage the temporal patterns and geographic signals that machine-learning models rely on. Expert Determination preserves more detail in exchange for documented statistical analysis of re-identification risk. For longitudinal models, geographic models or any AI use case where temporal precision matters, Expert Determination is usually the right path.

What happens if I miss one of the 18 identifiers?

The data has not been de-identified. It remains PHI, and any use or disclosure beyond what the Privacy Rule permits is a violation. Depending on the circumstances, this can trigger HIPAA penalties (up to $2,190,294 per identical provision in 2026), breach notification obligations under the Breach Notification Rule and potential state-law liability. The risk is highest with unstructured data, where automated tools with sub-95 percent accuracy can leave thousands of identifiers in a single dataset.