February 2026
11
min read

Difference Between PHI vs PII: Definition, Examples & AI Governance

Alefiyah Bhatia
Growth Marketing Specialist
banner represent phi vs pii

Table of Contents

Secure Your Employee Conversations with AI Assistants
Book A Demo

PHI and PII are closely related and often used interchangeably in compliance conversations.

That overlap leads to a common question: is PHI simply a subset of PII, or does it need to be treated differently?

Imagine a healthcare team reviewing a shared spreadsheet. It includes patient names, email addresses, appointment dates, and brief clinical notes. One person labels it personal data. Another flags it as protected health information. Both are acting in good faith, yet they reach different conclusions. The uncertainty comes from not knowing exactly where PII ends and PHI begins.

Understanding the difference between PHI and PII has always mattered for privacy and regulatory compliance. The distinction determines which rules apply, how data should be handled, and what safeguards are required, especially in healthcare and other regulated environments.

That distinction matters even more today as organizations increasingly rely on generative AI to work with real business data. Employees routinely use AI tools to summarize documents, analyze records, or draft communications. Without clear definitions and proper controls, sensitive information can be exposed in ways traditional security measures were never designed to catch.

This article explains what PII and PHI mean, how they differ, and why understanding that difference is critical in the context of HIPAA and AI governance. 

At a Glance

PII (Personally Identifiable Information) refers to personal data that can identify an individual across industries.

PHI (Protected Health Information)
is healthcare-related personal data regulated under HIPAA.

When used in AI systems, both PII and PHI can be exposed through prompts, file uploads, or model interactions, making governance, classification, and enforcement controls essential for organizations operating in regulated environments.

Deep Dive: What Is PII (Personally Identifiable Information)?

Personally Identifiable Information (PII) refers to data that can be used to identify a specific person, either directly or indirectly. Unlike PHI, PII is not tied to a single industry. It appears wherever organizations interact with individuals, from customer accounts to employee records.

The key test is not whether the data looks sensitive on its own, but whether it can reasonably point to a real person. That distinction is where confusion often begins.

A name clearly identifies someone. An email address usually does too. But many forms of PII only become identifying in context, especially when combined with other data. This is why PII classification is less about checklists and more about how information is actually used.

Examples of PII (in context)

PII commonly includes:

  • Names and contact details
  • Government-issued identifiers
  • Account, employee, or customer IDs
  • Online identifiers such as IP addresses or device IDs

On their own, some of these may appear harmless. Together, they can make an individual easy to identify.

Where PII Shows Up in Real Organizations

PII is embedded in everyday workflows:

  • Customer support tickets
  • HR and payroll systems
  • CRM platforms
  • Internal reporting and analytics

Because PII is so widespread, it often blends into normal business data. That familiarity can create blind spots, where information is shared or reused without fully considering privacy implications.

Why PII Is Treated Differently from PHI

PII is broadly regulated across industries, while PHI carries additional protections specific to healthcare. For PII, this usually means balancing accessibility with protection as information moves across teams, tools, and systems.That balance changes significantly once health-related data enters the picture.

What Is PHI (Protected Health Information)?

Protected Health Information (PHI) refers to personal data related to an individual’s health, healthcare services, or payment for healthcare that can be linked to a specific person. PHI is defined and regulated under HIPAA and applies only within the healthcare ecosystem and its extended partners.

PHI is not defined solely by the type of data, but by context. The same identifier can be ordinary PII in one setting and PHI in another. What matters is whether the information relates to a person’s health and is created, received, stored, or transmitted by a covered healthcare entity or its business associates.

Common Examples of PHI

PHI may include:

  • Patient names linked to medical records
  • Diagnoses, test results, or treatment notes
  • Appointment histories and admission dates
  • Medical record numbers
  • Insurance details tied to healthcare services

A name alone is PII. A diagnosis alone may not identify anyone. Together, in a healthcare context, they become PHI.

Where PHI Typically Exists

PHI is most commonly found in:

  • Electronic health record (EHR) systems
  • Billing and insurance platforms
  • Care coordination tools
  • Internal reports used by clinical and administrative teams

Because PHI is tightly regulated, access to it is usually restricted by role, purpose, and necessity. However, PHI still moves across systems and teams, especially in modern, digitally connected healthcare environments.

How PHI Differs From PII in Practice

While PHI often includes elements of PII, it is subject to stricter rules and narrower usage boundaries. Under HIPAA, PHI is subject to explicit rules governing how it can be accessed, used, shared, and audited. These rules apply not only to healthcare providers, but also to insurers, vendors, and service providers that handle PHI on their behalf.

HIPAA also introduces requirements that do not typically apply to general PII, including:

  • The minimum necessary standard for access and disclosure
  • Role-based access controls
  • Detailed audit and logging obligations
  • Breach notification timelines specific to healthcare data

Other healthcare-focused regulations, such as the HITECH Act, further strengthen enforcement by expanding breach notification requirements and increasing penalties for non-compliance.

In contrast, PII is governed by a broader set of privacy laws across industries, which generally focus on transparency, consent, and reasonable safeguards, rather than prescriptive controls tied to healthcare workflows.

This is why PHI is not simply “more sensitive PII.” It is a category of data with its own legal definition, compliance expectations, and enforcement model.

Field PII PHI
What it is Personal data that can identify an individual Health, treatment, or payment data tied to an identifiable individual
Primary laws (U.S.) CCPA, state privacy laws HIPAA, HITECH Act
Who is regulated Most organizations handling personal data Providers, insurers, and their vendors (business associates)
Enforcement exposure Regulatory fines, lawsuits, settlements Civil penalties, audits, corrective action plans
Financial penalties Up to $7,500 per intentional violation (CCPA); up to 4% of global annual revenue under GDPR Up to ~$1.9M per year per violation category under HIPAA, plus remediation and reporting costs
AI governance impact Requires controls to prevent unauthorized sharing in AI tools Requires strict limits, auditability, and often restricted AI usage due to healthcare obligations
Treatment in traditional DLP Detected using pattern matching (IDs, emails); often over or under-blocked Poorly handled; clinical context is usually missed, leading to false negatives or excessive blocking
Treatment in Wald endpoint DLP Classified in context at the endpoint, allowing policy-based controls without disrupting workflows Context-aware detection and enforcement, with auditability aiding healthcare compliance requirements

From Data Type to Regulation: Why the Distinction Matters

Knowing whether data is PII or PHI is only the first step. The classification determines which laws apply, what obligations follow, and how organizations are expected to protect and govern that data.

PII and PHI are not governed by a single global standard. Instead, different regulations apply based on:

  • What the data contains
  • How it is used
  • Who is handling it
  • Where the individual is located
  • Whether the data relates to healthcare

This is why the same dataset can trigger different compliance requirements depending on context. Once data is classified as PII or PHI, it immediately maps to specific regulatory frameworks, most commonly GDPR, CCPA, or HIPAA.

The sections below summarize how each of these regulations differs, so you learn how to identify the data type and the regulation simultaneously. 

GDPR, CCPA, and HIPAA: Key Differences

Before looking at real-world scenarios, it helps to understand how the most common privacy and healthcare regulations differ at a high level.

GDPR

  • Applies to personal data of individuals located in the EU, regardless of where the organization is based.
  • Governs personal data (PII) broadly across industries, with additional protections for sensitive data.
  • Emphasizes lawful processing, transparency, and individual rights.

CCPA

  • Applies to certain businesses handling personal data of California residents.
  • Governs personal information (PII) across industries, focusing on disclosure and consumer control.
  • Requires reasonable security practices and breach accountability.

HIPAA

  • Applies to healthcare providers, insurers, and service partners that handle PHI on their behalf.
  • Governs PHI, not general personal data.
  • Enforces strict rules on access, use, auditing, and disclosure.

Can You Identify the Data and the Applicable Regulation?

Use the table below to test how PII and PHI map to GDPR, CCPA, and HIPAA in common situations. As you read each scenario, try to identify the classification and regulation before looking at the answer.

Scenario Classification Primary regulation
Employee names and work emails PII GDPR, CCPA
Patient names with appointment dates PHI HIPAA
Insurance policy numbers used for claims PHI HIPAA
Diagnosis codes without direct identifiers Potentially PHI HIPAA
CRM records with names and emails PII GDPR, CCPA
Combined datasets revealing treatment history PHI HIPAA

Note: Some healthcare data may be subject to multiple regulations. The table below lists the primary regulation that governs handling and enforcement in each scenario.

Why PHI vs PII Matters for AI Governance

The risk around PHI and PII in AI systems is not theoretical. It is already being documented.

Industry reporting and compliance analyses show that healthcare staff routinely upload patient information, including PHI, into consumer AI tools and cloud services to summarize notes, draft communications, or analyze data. These tools often operate outside healthcare compliance requirements and do not provide Business Associate Agreements under HIPAA.

According to reporting from HIPAA Journal and healthcare security vendors, this behavior has led to documented HIPAA compliance failures tied specifically to AI usage. In these cases, violations occurred not because systems were breached, but because PHI was processed in environments without appropriate safeguards.

This distinction matters. Uploading PHI into a non-compliant AI system can constitute a HIPAA violation even if the data is never accessed by an external attacker. Controls that are acceptable for PII do not meet the requirements imposed on PHI.

AI governance exists to address this gap. When AI tools treat all input as interchangeable text, but regulations do not, organizations need governance controls that reflect the difference between PHI and PII in everyday AI workflows.

Why Traditional DLP Breaks in AI Workflows

Traditional Data Loss Prevention (DLP) tools were designed around three core assumptions:

  • Sensitive data lives in known locations

  • Data moves through predictable channels

  • Risk can be detected primarily through static patterns

Modern AI workflows violate all three assumptions.

Sensitive information now appears transiently in prompts, is rewritten into natural language, and is transmitted through browser-based and unsanctioned AI tools, often outside centralized visibility. As a result, controls built for data at rest or in transit struggle to govern data at the point of interaction, where AI risk actually emerges.

Why this matters
AI turns sensitive data into contextual text that moves dynamically across tools and endpoints. Traditional DLP was not designed to detect or control risk at this moment, creating a gap between policy and reality.

What Effective AI Governance Requires at the Endpoint

AI risk emerges at the moment users interact with AI tools. Governance has to exist there too.

For organizations handling PHI and other regulated data, effective AI governance at the endpoint comes down to three requirements.

1. Context-Aware Classification

Governance controls must understand whether data is PII or PHI, not just match patterns. Context determines risk.

2. Enforcement Before Data Leaves the Device

Controls must operate at the point of interaction, before prompts, uploads, or generated outputs reach external AI systems.

3. Visibility Into Unsanctioned AI Use

Governance must account for shadow AI, where users interact with browser-based or personal AI tools outside approved platforms.

What This Enables

When these requirements are met, organizations can allow AI usage while maintaining control over how sensitive data is used, shared, and audited.

How Wald.ai Supports These Requirements

Wald.ai supports AI governance at the endpoint by applying context-aware classification to distinguish between PII and PHI, enforcing controls before data is shared with external AI tools, and providing visibility into unsanctioned or browser-based AI usage. This allows organizations to govern real AI interactions as they happen, rather than relying on static policies or post-hoc detection.

Conclusion

Using AI Redaction tools with high accuracy is another way for organizations to ensure their sensitive business data does not find its way into public LLMs.
Afterall, the difference between PHI and PII is not just a matter of definitions. It determines which regulations apply, what controls are required, and how organizations can safely use AI in regulated environments. As AI becomes part of everyday workflows, misclassifying data or relying on legacy controls creates real compliance risk. Effective AI governance starts with understanding these distinctions and enforcing them at the point where AI is actually used.

PHI vs PII: Frequently Asked Questions

Is PHI a subset of PII?

PHI often contains PII, but it is regulated separately. PHI is healthcare-related data governed by HIPAA, while PII is governed by broader privacy laws.

Can PII become PHI?

Yes. PII becomes PHI when it is linked to healthcare treatment, diagnosis, or payment and can identify an individual.

Does HIPAA apply to AI tools like ChatGPT?

HIPAA applies to how organizations handle PHI, not to AI tools themselves. Uploading PHI into AI systems without a Business Associate Agreement can still violate HIPAA.

Does GDPR apply to healthcare data?

Yes. Under GDPR, health data is classified as sensitive personal data and is subject to stricter protections.

Why does traditional DLP struggle with PHI in AI workflows?

Traditional DLP relies on static patterns and predictable data flows, while AI workflows transform sensitive data into contextual text at the endpoint.

Secure Your Employee Conversations with AI Assistants
Book A Demo