Personally Identifiable Information (PII) is any data that can identify a specific person—a name, email address, or Social Security number, for example. Protected Health Information (PHI) is a subcategory of PII that specifically covers individually identifiable health data regulated under HIPAA. Payment Card Industry (PCI) data is another subcategory covering cardholder and payment authentication information governed by the PCI Data Security Standard.
In an increasingly digital world where customer data is collected at every touchpoint, understanding what is PII—and how it differs from PHI and PCI—is one of the most foundational questions in data privacy. Three acronyms come up repeatedly across compliance frameworks, legal discussions, and enterprise security conversations: Personally Identifiable Information (PII), Payment Card Industry data (PCI), and Protected Health Information (PHI). This guide explains what each term means, how they relate, and what regulated organizations are required to do about them.
Each refers to a distinct category of sensitive personal information, each has its own regulatory origin, and each carries specific obligations for organizations that handle it. Depending on your industry, you may be subject to rules governing one, two, or all three simultaneously.
This article provides a clear, authoritative explanation of each type, how they relate to one another, and what the regulatory landscape looks like across jurisdictions. It is intended for privacy and compliance professionals who need a reliable reference, business leaders evaluating their compliance obligations, and data engineers building systems that touch sensitive information.
What is PII (personally identifiable information)?
PII is the broadest of the three categories. It refers to any information that can be used to distinguish or trace an individual's identity, either on its own or in combination with other data. While the term is widely used across industries and regulatory frameworks, it does not originate from a single federal statute in the United States. Its most commonly cited definition comes from the Office of Management and Budget (OMB) Memorandum M-07-16, which defines PII as "information that can be used to distinguish or trace an individual's identity, alone or when combined with other personal or identifying information that is linked or linkable to a specific individual."
What counts as PII?
In practice, PII includes a wide range of data types:
- Names, dates of birth, and mailing addresses
- Telephone numbers and email addresses
- Social Security numbers, account numbers, and license numbers
- Vehicle identifiers, including license plates
- Static IP addresses and uniform resource locators (URLs)
- Biometric identifiers, such as fingerprints
- Photographic facial images
- Any other unique identifying number or characteristic
- Any information where it is reasonably foreseeable that it will be linked with other data to identify the individual
That last point is important. PII is an expansive concept by design. It captures indirect identifiers—data that may not identify someone on its own but could do so when combined with other information. Organizations that handle even seemingly innocuous data must consider whether it falls within PII's scope when combined with other data they hold.
It's also worth noting that hidden PII in unstructured data is a common and serious risk. PII doesn't only live in structured databases—it appears in emails, call transcripts, PDFs, free-text fields, and chat logs, where it's far harder to detect and control.
PII in other jurisdictions
While PII is a U.S.-origin term, equivalent concepts exist elsewhere. The California Consumer Privacy Act (CCPA) and Canada's PIPEDA use the term 'personal information.' The EU's General Data Protection Regulation (GDPR) and the proposed New York Privacy Act use 'personal data.' The underlying principle is the same: information that relates to an identifiable natural person deserves protection.
Because PII is defined so broadly, it functions as an umbrella category. Both PHI and PCI fall within its scope. However, the sensitivity of health and payment data—and the potential harm from their misuse—is significant enough to warrant dedicated regulatory frameworks.
What is PHI (protected health information)?
PHI is a subcategory of PII that refers specifically to individually identifiable health information. It is defined and protected under the U.S. Health Insurance Portability and Accountability Act (HIPAA), specifically under the HIPAA Privacy Rule.
The formal HIPAA definition of PHI
The HIPAA definition of PHI contains five elements:
- The information is created or received by a covered entity (a healthcare provider, health plan, or healthcare clearinghouse) or a business associate.
- The content relates to an individual's past, present, or future physical or mental health, the provision of healthcare, or payment for healthcare.
- The information identifies—or is reasonably likely to identify—the individual.
- The information is transmitted or maintained in any form: electronic, paper, or oral.
- The information does not fall within one of the defined exclusions, such as certain employment records or education records.
In plain terms, PHI encompasses the kind of information found in a medical record: diagnoses, treatment plans, lab results, prescriptions, and the billing records tied to those services. The combination of health data with identity is what makes PHI particularly sensitive. A disclosed diagnosis can affect a person's employment, insurance eligibility, and personal relationships in ways that go far beyond an ordinary data breach.
For organizations operating in the healthcare industry, managing PHI is one of the most consequential compliance responsibilities they face. Organizations that rely on Expert Determination as their de-identification method must ensure their statistical expert's qualifications meet HIPAA's standards.
PHI in other jurisdictions
Outside the U.S., comparable frameworks exist. Ontario's Personal Health Information Protection Act (PHIPA) uses the term 'personal health information.' The GDPR classifies health data as a 'special category of personal data' under Article 9, requiring a higher standard of protection and permitting processing only under specific legal bases.
What is PCI data (payment card industry data)?
PCI data is governed by the PCI Data Security Standard (PCI DSS), currently at version 4.0.1 . Unlike PHI, PCI DSS was developed not by a government body but by the PCI Security Standards Council—an independent organization established by Visa, Mastercard, American Express, Discover, and JCB.
What data does PCI DSS protect?
PCI DSS protects 'account data,' split into two categories:
- Cardholder data: the Primary Account Number (PAN) that identifies the card issuer and account holder, the cardholder's name, the card expiration date, and the service code.
- Sensitive Authentication Data (SAD): card validation codes (CVV/CVC), full magnetic stripe or chip data, PINs, and PIN blocks.
Compliance obligations under PCI DSS apply to any organization that stores, processes, or transmits cardholder data—merchants, payment processors, and third-party service providers in the payment chain. Compliance is enforced contractually: organizations that accept card payments agree to comply as a condition of that privilege. Failure can result in fines, higher transaction fees, and loss of card acceptance rights.
For contact center teams, the challenge is acute. PCI data surfaces in real-time during voice transactions and persists in call recordings, chat transcripts, and other unstructured formats that are difficult to secure without dedicated tooling.
The EU equivalent is found in the Payment Services Directive 2 (PSD2), which uses the terms 'personalized security credentials' and 'sensitive payment data' to describe information that must be protected in electronic payment transactions.
How are PII, PCI, and PHI related?
When you look at the formal definitions together, a clear hierarchy emerges. PII is the broadest category—any information that can identify an individual. PHI and PCI are both subcategories of PII: health data identifies individuals, and payment card data does too when combined with account information. All three can be used, alone or in combination, to distinguish or trace a person's identity.
The reason these subcategories exist as distinct concepts in U.S. law is largely historical. The U.S. has never enacted a single comprehensive federal data protection law—regulation has developed sector by sector, responding to the most acute perceived harms. HIPAA addressed health data because the political will to protect medical privacy was strong. PCI was addressed through private-sector standards because the major card networks had a direct financial incentive to reduce fraud and could impose requirements contractually without legislation.
The situation in Europe reflects the inverse logic. The GDPR succeeded in establishing a general data protection framework precisely because the EU operates as a supranational body where member states could reach a consensus, even if that required years of negotiation and built-in flexibility for national variation. Health data is treated as a special category under GDPR rather than receiving its own separate law. On the payment side, Europe has struggled to establish a harmonized card payment regime, partly because of greater reliance on proprietary national schemes, though the European Central Bank has noted ongoing efforts toward a unified European card payment system. For global organizations, understanding how HIPAA and GDPR differ on health data is essential for building compliant cross-border data workflows.
The practical takeaway: the regulatory framework you operate under depends on where your organization is located, where your data subjects are, and what type of data you handle. Most large organizations are subject to multiple overlapping frameworks simultaneously. De-identification is the primary technical mechanism for reducing regulatory risk across all three frameworks—it allows organizations to process and analyze sensitive data while removing the identifiers that trigger compliance obligations.
PII, PCI, and PHI at a glance
The following table summarizes the key distinctions between the three data types.
|
Meaning |
Regulatory origin |
Examples |
Terms in other jurisdictions |
| PII |
Personally Identifiable Information |
U.S. (federal); not defined in any single act; most cited definition from OMB Memorandum M-07-16 |
Name, date of birth, address, phone, SSN, email, zip code, account numbers, license numbers, vehicle IDs, URLs, static IP addresses, biometric identifiers (fingerprints), facial images, any data linkable to a specific individual |
Personal information (CCPA, PIPEDA); Personal data (GDPR, proposed New York Privacy Act) |
| PCI |
Payment Card Industry (data) |
PCI DSS v4.0.1, developed by the PCI Security Standards Council (private sector) |
Cardholder data: PAN, cardholder name, expiration date, service code. Sensitive Authentication Data (SAD): CVV/CVC, full magnetic stripe or chip data, PINs, PIN blocks |
'Personalized security credentials' and 'sensitive payment data' (EU's PSD2) |
| PHI |
Protected Health Information |
U.S. HIPAA Privacy Rule |
Individually identifiable information relating to a person's health in medical records: diagnoses, treatment plans, lab results, prescriptions, billing information |
Personal health information (PHIPA); Special categories of personal data (GDPR) |
Why identifying these data types is a compliance prerequisite
Understanding what PII, PCI, and PHI mean in theory is only the first step. The harder, more operationally demanding challenge is knowing where these data types live within your organization—and ensuring they're handled appropriately.
PII, PHI, and PCI don't always appear in structured databases where they're easy to query and audit. They frequently appear in unstructured formats: clinical notes, support call transcripts, chat logs, email threads, PDF documents, and free-text fields. These formats are harder to scan, classify, and protect—and they're often overlooked in compliance programs that focus exclusively on structured data.
The numbers are stark. Limina's benchmark of 45,000 words across real-world domains found that general-purpose cloud PII detection tools miss between 13 and 46 percent of entities in real-world unstructured data. Gaining visibility into unstructured data isn't optional: if you can't identify where sensitive data exists across your organization, you can't make informed decisions about what technical and organizational measures are required—whether under HIPAA, PCI DSS, the GDPR, or any applicable framework.
Limina's data de-identification platform is purpose-built for exactly this challenge. Built by linguists and powered by advances in machine learning, Limina identifies 50+ entity types of PII, PHI, and PCI in unstructured data across 52+ languages. Because it's context-aware and understands the nuances of language and entity relationships within documents, it doesn't rely on simple pattern matching that misses indirect identifiers or novel data formats. The result: 99.5%+ accuracy on real healthcare data, compared to 60–70% for general-purpose cloud tools.
For a deeper look at how automation compares to manual processes, see our guide to manual vs automated PII redaction.
Free Resource Bundle
Your PII detection has gaps.
Here's the data to prove it.
Benchmark report, enterprise case study, and a 15-point production-readiness checklist — free for engineering teams evaluating PII detection.
↓ Benchmark Whitepaper
↓ Boehringer Case Study
↓ Readiness Checklist
What does this mean for your industry?
The stakes around PII, PCI, and PHI differ by sector—but no organization that handles personal data is exempt.
Healthcare organizations and those in pharma and life sciences face the most demanding PHI obligations. HIPAA's breach notification requirements, minimum necessary standards, and business associate obligations create a compliance infrastructure that must span every system and workflow that touches patient data—including, increasingly, AI systems trained on or operating with clinical data.
Financial services firms and insurers navigate a layered landscape of PCI, PII, and AI governance requirements. Insurance organizations handle large volumes of sensitive personal and health data in claim files, medical records, and underwriting documents—much of it in unstructured form.
Contact centers sit at an unusual intersection: they handle PCI data in real time during payment transactions, may handle PHI if they serve healthcare clients, and generate massive volumes of unstructured data in call recordings and transcripts. For contact center teams, the ability to automatically detect and redact sensitive data across voice and text is a compliance requirement, not a nice-to-have.
Regardless of industry, the underlying need is the same: clear visibility into what sensitive data exists in your systems, where it lives, and how it's being protected.
Ready to close the gap in your sensitive data coverage?
Knowing what PII, PHI, and PCI are is only half the challenge—the harder part is finding them wherever they hide in your organization's data. Limina's de-identification platform identifies 50+ entity types across 52+ languages with 99.5%+ accuracy on real healthcare data, including in the unstructured formats that most compliance programs miss.
Get a demo: getlimina.ai/en/contact-us
See how Limina de-identifies PII, PHI, and PCI at scale