March 28, 2023

What Are PII, PCI, and PHI? A Plain-English Compliance Guide

Personally Identifiable Information (PII) is any data that can identify a specific person—a name, email address, or Social Security number, for example. Protected Health Information (PHI) is a subcategory of PII that specifically covers individually identifiable health data regulated under HIPAA. Payment Card Industry (PCI) data is another subcategory covering cardholder and payment authentication information governed by the PCI Data Security Standard.

Patricia Thaine

Founder, Chairwoman, Thought Leader

Personally Identifiable Information (PII) is any data that can identify a specific person—a name, email address, or Social Security number, for example. Protected Health Information (PHI) is a subcategory of PII that specifically covers individually identifiable health data regulated under HIPAA. Payment Card Industry (PCI) data is another subcategory covering cardholder and payment authentication information governed by the PCI Data Security Standard.

In an increasingly digital world where customer data is collected at every touchpoint, understanding what is PII—and how it differs from PHI and PCI—is one of the most foundational questions in data privacy. Three acronyms come up repeatedly across compliance frameworks, legal discussions, and enterprise security conversations: Personally Identifiable Information (PII), Payment Card Industry data (PCI), and Protected Health Information (PHI). This guide explains what each term means, how they relate, and what regulated organizations are required to do about them.

Each refers to a distinct category of sensitive personal information, each has its own regulatory origin, and each carries specific obligations for organizations that handle it. Depending on your industry, you may be subject to rules governing one, two, or all three simultaneously.

This article provides a clear, authoritative explanation of each type, how they relate to one another, and what the regulatory landscape looks like across jurisdictions. It is intended for privacy and compliance professionals who need a reliable reference, business leaders evaluating their compliance obligations, and data engineers building systems that touch sensitive information.

What is PII (personally identifiable information)?

PII is the broadest of the three categories. It refers to any information that can be used to distinguish or trace an individual's identity, either on its own or in combination with other data. While the term is widely used across industries and regulatory frameworks, it does not originate from a single federal statute in the United States. Its most commonly cited definition comes from the Office of Management and Budget (OMB) Memorandum M-07-16, which defines PII as "information that can be used to distinguish or trace an individual's identity, alone or when combined with other personal or identifying information that is linked or linkable to a specific individual."

What counts as PII?

In practice, PII includes a wide range of data types:

Names, dates of birth, and mailing addresses
Telephone numbers and email addresses
Social Security numbers, account numbers, and license numbers
Vehicle identifiers, including license plates
Static IP addresses and uniform resource locators (URLs)
Biometric identifiers, such as fingerprints
Photographic facial images
Any other unique identifying number or characteristic
Any information where it is reasonably foreseeable that it will be linked with other data to identify the individual

That last point is important. PII is an expansive concept by design. It captures indirect identifiers—data that may not identify someone on its own but could do so when combined with other information. Organizations that handle even seemingly innocuous data must consider whether it falls within PII's scope when combined with other data they hold.

It's also worth noting that hidden PII in unstructured data is a common and serious risk. PII doesn't only live in structured databases—it appears in emails, call transcripts, PDFs, free-text fields, and chat logs, where it's far harder to detect and control.

PII in other jurisdictions

While PII is a U.S.-origin term, equivalent concepts exist elsewhere. The California Consumer Privacy Act (CCPA) and Canada's PIPEDA use the term 'personal information.' The EU's General Data Protection Regulation (GDPR) and the proposed New York Privacy Act use 'personal data.' The underlying principle is the same: information that relates to an identifiable natural person deserves protection.

Because PII is defined so broadly, it functions as an umbrella category. Both PHI and PCI fall within its scope. However, the sensitivity of health and payment data—and the potential harm from their misuse—is significant enough to warrant dedicated regulatory frameworks.

What is PHI (protected health information)?

PHI is a subcategory of PII that refers specifically to individually identifiable health information. It is defined and protected under the U.S. Health Insurance Portability and Accountability Act (HIPAA), specifically under the HIPAA Privacy Rule.

The formal HIPAA definition of PHI

The HIPAA definition of PHI contains five elements:

The information is created or received by a covered entity (a healthcare provider, health plan, or healthcare clearinghouse) or a business associate.
The content relates to an individual's past, present, or future physical or mental health, the provision of healthcare, or payment for healthcare.
The information identifies—or is reasonably likely to identify—the individual.
The information is transmitted or maintained in any form: electronic, paper, or oral.
The information does not fall within one of the defined exclusions, such as certain employment records or education records.

In plain terms, PHI encompasses the kind of information found in a medical record: diagnoses, treatment plans, lab results, prescriptions, and the billing records tied to those services. The combination of health data with identity is what makes PHI particularly sensitive. A disclosed diagnosis can affect a person's employment, insurance eligibility, and personal relationships in ways that go far beyond an ordinary data breach.

For organizations operating in the healthcare industry, managing PHI is one of the most consequential compliance responsibilities they face. Organizations that rely on Expert Determination as their de-identification method must ensure their statistical expert's qualifications meet HIPAA's standards.

PHI in other jurisdictions

Outside the U.S., comparable frameworks exist. Ontario's Personal Health Information Protection Act (PHIPA) uses the term 'personal health information.' The GDPR classifies health data as a 'special category of personal data' under Article 9, requiring a higher standard of protection and permitting processing only under specific legal bases.

What is PCI data (payment card industry data)?

PCI data is governed by the PCI Data Security Standard (PCI DSS), currently at version 4.0.1 . Unlike PHI, PCI DSS was developed not by a government body but by the PCI Security Standards Council—an independent organization established by Visa, Mastercard, American Express, Discover, and JCB.

What data does PCI DSS protect?

PCI DSS protects 'account data,' split into two categories:

Cardholder data: the Primary Account Number (PAN) that identifies the card issuer and account holder, the cardholder's name, the card expiration date, and the service code.
Sensitive Authentication Data (SAD): card validation codes (CVV/CVC), full magnetic stripe or chip data, PINs, and PIN blocks.

Compliance obligations under PCI DSS apply to any organization that stores, processes, or transmits cardholder data—merchants, payment processors, and third-party service providers in the payment chain. Compliance is enforced contractually: organizations that accept card payments agree to comply as a condition of that privilege. Failure can result in fines, higher transaction fees, and loss of card acceptance rights.

For contact center teams, the challenge is acute. PCI data surfaces in real-time during voice transactions and persists in call recordings, chat transcripts, and other unstructured formats that are difficult to secure without dedicated tooling.

The EU equivalent is found in the Payment Services Directive 2 (PSD2), which uses the terms 'personalized security credentials' and 'sensitive payment data' to describe information that must be protected in electronic payment transactions.

How are PII, PCI, and PHI related?

When you look at the formal definitions together, a clear hierarchy emerges. PII is the broadest category—any information that can identify an individual. PHI and PCI are both subcategories of PII: health data identifies individuals, and payment card data does too when combined with account information. All three can be used, alone or in combination, to distinguish or trace a person's identity.

The reason these subcategories exist as distinct concepts in U.S. law is largely historical. The U.S. has never enacted a single comprehensive federal data protection law—regulation has developed sector by sector, responding to the most acute perceived harms. HIPAA addressed health data because the political will to protect medical privacy was strong. PCI was addressed through private-sector standards because the major card networks had a direct financial incentive to reduce fraud and could impose requirements contractually without legislation.

The situation in Europe reflects the inverse logic. The GDPR succeeded in establishing a general data protection framework precisely because the EU operates as a supranational body where member states could reach a consensus, even if that required years of negotiation and built-in flexibility for national variation. Health data is treated as a special category under GDPR rather than receiving its own separate law. On the payment side, Europe has struggled to establish a harmonized card payment regime, partly because of greater reliance on proprietary national schemes, though the European Central Bank has noted ongoing efforts toward a unified European card payment system. For global organizations, understanding how HIPAA and GDPR differ on health data is essential for building compliant cross-border data workflows.

The practical takeaway: the regulatory framework you operate under depends on where your organization is located, where your data subjects are, and what type of data you handle. Most large organizations are subject to multiple overlapping frameworks simultaneously. De-identification is the primary technical mechanism for reducing regulatory risk across all three frameworks—it allows organizations to process and analyze sensitive data while removing the identifiers that trigger compliance obligations.

PII, PCI, and PHI at a glance

The following table summarizes the key distinctions between the three data types.

	Meaning	Regulatory origin	Examples	Terms in other jurisdictions
PII	Personally Identifiable Information	U.S. (federal); not defined in any single act; most cited definition from OMB Memorandum M-07-16	Name, date of birth, address, phone, SSN, email, zip code, account numbers, license numbers, vehicle IDs, URLs, static IP addresses, biometric identifiers (fingerprints), facial images, any data linkable to a specific individual	Personal information (CCPA, PIPEDA); Personal data (GDPR, proposed New York Privacy Act)
PCI	Payment Card Industry (data)	PCI DSS v4.0.1, developed by the PCI Security Standards Council (private sector)	Cardholder data: PAN, cardholder name, expiration date, service code. Sensitive Authentication Data (SAD): CVV/CVC, full magnetic stripe or chip data, PINs, PIN blocks	'Personalized security credentials' and 'sensitive payment data' (EU's PSD2)
PHI	Protected Health Information	U.S. HIPAA Privacy Rule	Individually identifiable information relating to a person's health in medical records: diagnoses, treatment plans, lab results, prescriptions, billing information	Personal health information (PHIPA); Special categories of personal data (GDPR)

Why identifying these data types is a compliance prerequisite

Understanding what PII, PCI, and PHI mean in theory is only the first step. The harder, more operationally demanding challenge is knowing where these data types live within your organization—and ensuring they're handled appropriately.

PII, PHI, and PCI don't always appear in structured databases where they're easy to query and audit. They frequently appear in unstructured formats: clinical notes, support call transcripts, chat logs, email threads, PDF documents, and free-text fields. These formats are harder to scan, classify, and protect—and they're often overlooked in compliance programs that focus exclusively on structured data.

The numbers are stark. Limina's benchmark of 45,000 words across real-world domains found that general-purpose cloud PII detection tools miss between 13 and 46 percent of entities in real-world unstructured data. Gaining visibility into unstructured data isn't optional: if you can't identify where sensitive data exists across your organization, you can't make informed decisions about what technical and organizational measures are required—whether under HIPAA, PCI DSS, the GDPR, or any applicable framework.

Limina's data de-identification platform is purpose-built for exactly this challenge. Built by linguists and powered by advances in machine learning, Limina identifies 50+ entity types of PII, PHI, and PCI in unstructured data across 52+ languages. Because it's context-aware and understands the nuances of language and entity relationships within documents, it doesn't rely on simple pattern matching that misses indirect identifiers or novel data formats. The result: 99.5%+ accuracy on real healthcare data, compared to 60–70% for general-purpose cloud tools.

For a deeper look at how automation compares to manual processes, see our guide to manual vs automated PII redaction.

Free Resource Bundle

Your PII detection has gaps.
Here's the data to prove it.

Benchmark report, enterprise case study, and a 15-point production-readiness checklist — free for engineering teams evaluating PII detection.

↓ Benchmark Whitepaper

↓ Boehringer Case Study

↓ Readiness Checklist

Access the Resources

What does this mean for your industry?

The stakes around PII, PCI, and PHI differ by sector—but no organization that handles personal data is exempt.

Healthcare organizations and those in pharma and life sciences face the most demanding PHI obligations. HIPAA's breach notification requirements, minimum necessary standards, and business associate obligations create a compliance infrastructure that must span every system and workflow that touches patient data—including, increasingly, AI systems trained on or operating with clinical data.

Financial services firms and insurers navigate a layered landscape of PCI, PII, and AI governance requirements. Insurance organizations handle large volumes of sensitive personal and health data in claim files, medical records, and underwriting documents—much of it in unstructured form.

Contact centers sit at an unusual intersection: they handle PCI data in real time during payment transactions, may handle PHI if they serve healthcare clients, and generate massive volumes of unstructured data in call recordings and transcripts. For contact center teams, the ability to automatically detect and redact sensitive data across voice and text is a compliance requirement, not a nice-to-have.

Regardless of industry, the underlying need is the same: clear visibility into what sensitive data exists in your systems, where it lives, and how it's being protected.

Ready to close the gap in your sensitive data coverage?

Knowing what PII, PHI, and PCI are is only half the challenge—the harder part is finding them wherever they hide in your organization's data. Limina's de-identification platform identifies 50+ entity types across 52+ languages with 99.5%+ accuracy on real healthcare data, including in the unstructured formats that most compliance programs miss.

Get a demo: getlimina.ai/en/contact-us

See how Limina de-identifies PII, PHI, and PCI at scale

Share this post

Copy link

Frequently Asked Questions

What is the difference between PII, PHI, and PCI?

PII is the broadest category—any information that can identify a specific person, such as a name, email address, or Social Security number. PHI is a subcategory of PII that covers individually identifiable health information regulated under HIPAA, including medical records, diagnoses, and billing data. PCI data is another subcategory, covering payment card information—such as account numbers and CVV codes—governed by the PCI Data Security Standard. All three carry distinct compliance obligations and may overlap in a single record.

Is PHI a type of PII?

Yes. PHI is a subcategory of PII. All PHI is PII—it identifies or is reasonably linked to a specific individual—but not all PII is PHI. PHI specifically refers to health-related information created or received by HIPAA-covered entities, such as hospitals, health plans, and their business associates. An individual's name and email address are PII but not PHI unless they appear in a healthcare context alongside health-related data.

What are examples of PII?

PII includes obvious identifiers like full names, Social Security numbers, passport numbers, and driver's license numbers. It also includes less obvious data: email addresses, IP addresses, device identifiers, biometric data such as fingerprints, photographs, and any combination of data points that could reasonably identify a person. Under most frameworks, even a ZIP code or date of birth may qualify as PII when combined with other available information. PII appears in both structured databases and unstructured formats like emails and call transcripts.

Who does HIPAA's PHI definition apply to?

HIPAA's PHI rules apply to covered entities—healthcare providers that transmit health information electronically, health plans, and healthcare clearinghouses—and to their business associates, which are third-party vendors that create, receive, maintain, or transmit PHI on a covered entity's behalf. If you provide software, analytics, cloud storage, or other services to a healthcare organization and your work involves PHI, you are almost certainly a business associate and subject to HIPAA's requirements.

Does GDPR use the term PII?

No. The GDPR uses the term 'personal data,' which it defines as any information relating to an identified or identifiable natural person. In practical scope, GDPR's personal data is broadly equivalent to PII, but there are differences in how each framework approaches de-identification, consent, and data subject rights. The GDPR treats health data as a 'special category of personal data' under Article 9, imposing stricter requirements than those applied to ordinary personal data. Organizations operating under both HIPAA and GDPR must satisfy both frameworks independently.

What happens if an organization mishandles PII or PHI?

The consequences depend on which framework governs the data. HIPAA violations can result in civil penalties ranging from $100 to $50,000 per violation, with an annual cap of $1.9 million per violation category. The GDPR can impose fines of up to 4 percent of global annual revenue or €20 million, whichever is higher. PCI DSS non-compliance can result in fines from card networks, increased transaction fees, and loss of card processing privileges. Across all frameworks, breach notification obligations, reputational damage, and litigation exposure add to the cost.

How can organizations protect PII, PCI, and PHI in unstructured data?

The most effective technical control is automated de-identification—using purpose-built software to detect and redact or replace sensitive entities across all data formats, including free text, PDFs, audio transcripts, and emails. Manual review is impractical at scale and error-prone: human reviewers typically miss a meaningful proportion of sensitive entities in complex documents. Organizations that rely on general-purpose cloud tools face similar gaps. Purpose-built platforms like Limina, which achieves 99.5%+ accuracy on real healthcare data, provide the detection depth and auditability that compliance programs require.