June 12, 2026
.

Expert Determination for Research Data: FDA, IRB, and HIPAA Requirements

While HIPAA Safe Harbor works for routine analytics, complex clinical research often demands HIPAA Expert Determination to satisfy the overlapping FDA, IRB, and multi-site data requirements. By utilizing a qualified independent statistician, this scientific method quantitatively evaluates re-identification risk while preserving vital clinical data.

Limina
Company
HIPAA Expert Determination for Research Data FDA & IRB

Your research team has spent months building a dataset from patient records across five hospitals. The IRB application is drafted, the FDA submission is three weeks out, and your legal team has just flagged a problem: Safe Harbor de-identification alone may not satisfy the evidentiary standard your reviewers will expect.

This is not a hypothetical. FDA submissions, IRB protocols, and multi-site research data sharing agreements are each governed by distinct—and overlapping—requirements for how Protected Health Information (PHI) must be handled before it can be used in research. In many of these scenarios, HIPAA expert determination is not optional. It is the expected standard.

What is expert determination in a research context? Under 45 CFR §164.514(b)(1), HIPAA expert determination is a de-identification method in which a qualified statistician applies generally accepted statistical principles to assess the risk that a dataset could be used to identify an individual. The statistician must determine that this risk is "very small" and document their methods and results in a written report. For research data, this standard is often required—not merely preferred—because Safe Harbor's blunt removal of 18 identifiers frequently destroys the clinical specificity that makes research data valuable.

This article walks through the specific requirements that FDA, IRBs, and HIPAA impose on research data de-identification, and explains when expert determination is the right—or only—path forward.

When FDA requires statistical de-identification validation

The FDA does not have a single, uniform rule that requires expert determination by name. However, two regulatory frameworks create strong practical pressure toward it.

21 CFR Part 11 and data integrity requirements

21 CFR Part 11 governs electronic records and electronic signatures in FDA-regulated research. It requires that systems and processes producing research data demonstrate auditability, accuracy, and integrity. When de-identification is part of a data preparation pipeline for a clinical study or drug approval submission, Part 11's documentation requirements effectively require a demonstrable, auditable methodology—the kind that a well-structured expert determination report provides. Safe Harbor, which is process-based rather than evidence-based, cannot produce the statistical documentation that Part 11's audit trail expectations imply.

Real-World Evidence guidance and de-identification standards

FDA's Real-World Evidence (RWE) framework, outlined in guidance documents issued under the 21st Century Cures Act, introduces specific expectations for how real-world data (RWD)—including EHR data, insurance claims, and patient registries—is prepared for regulatory review. The guidance emphasizes the need for "fit-for-purpose" data quality and calls for documented methodology when sensitive patient data is included in RWE studies.

In practice, FDA reviewers evaluating RWE submissions increasingly expect applicants to demonstrate that de-identification was performed rigorously and that re-identification risk has been quantitatively assessed. Safe Harbor's categorical removal of 18 identifiers satisfies HIPAA technically, but it does not produce the statistical artifact that demonstrates risk quantification. Expert determination does.

Key requirements for FDA-facing research data de-identification:

  • Documented, reproducible methodology
  • Quantitative assessment of re-identification risk
  • Independent expert certification with stated credentials
  • Audit-ready output tied to the specific dataset and time period
  • Written report retained as part of the regulatory submission package

IRB requirements for de-identified research data

Institutional Review Boards (IRBs) are governed by the Common Rule (45 CFR Part 46) and, for HIPAA-covered entities, by the Privacy Rule. When a researcher seeks an IRB waiver of authorization—the standard mechanism for using PHI in research without individual consent—the quality of de-identification is a central determination.

When an IRB waiver depends on de-identification quality

Under 45 CFR §164.512(i), a covered entity may use or disclose PHI for research without individual authorization if an IRB or Privacy Board has granted a waiver. One pathway to that waiver is demonstrating that the research involves no more than minimal risk to subjects' privacy. The strength of your de-identification methodology directly influences whether an IRB will agree that privacy risk is minimal.

IRBs vary significantly in how rigorous their review is. Smaller academic IRBs may accept a Safe Harbor attestation. Large research hospitals, NIH-funded programs, and multi-site studies increasingly expect documented statistical validation. A well-constructed expert determination report—covering methodology, risk quantification, and expert credentials—strengthens any IRB application and reduces the likelihood of a request for additional information, a delay that can cost months.

What IRBs look for in de-identification documentation

IRB review criterion Safe Harbor satisfies? Expert determination satisfies?
De-identification method documented Yes—method is defined by HIPAA statute Yes—plus quantitative risk analysis
Re-identification risk quantified No—categorical removal only Yes—core deliverable of the report
Independent expert certification Not required Yes—required by definition
Data utility preserved for research Often not—removal is blunt Yes—statistical approach preserves usable fields
Suitable for longitudinal or rare disease data Rarely Yes—designed for complex datasets
Audit-ready report available for IRB file No formal report Yes—written report is the deliverable

HIPAA expert determination for multi-site research

Multi-site research—studies that aggregate patient data from multiple covered entities—creates additional de-identification complexity that Safe Harbor is not designed to handle.

The aggregation problem in multi-site datasets

Safe Harbor removes 18 specific identifiers. But when data from multiple institutions is combined, seemingly innocuous fields—rare diagnoses, unusual procedure codes, geographic data generalized only to state level—can become identifying in combination. The statistical re-identification risk in a merged multi-site dataset is higher than in any single institution's data, even after Safe Harbor removal.

Expert determination addresses this directly. A qualified statistician assesses re-identification risk in the combined dataset—accounting for population rarity, data richness, and the specific combination of fields—and documents whether risk is very small. This is why data sharing agreements in multi-site research almost always specify expert determination for sensitive datasets.

Data use agreements and covered entity obligations

When a covered entity shares data under a Data Use Agreement (DUA) for research, it retains responsibility for ensuring the data was de-identified in compliance with HIPAA before sharing. If the receiving party later uses the data in a way that causes a breach or re-identification event, the originating covered entity may face OCR scrutiny of its de-identification methodology.

Expert determination—with a documented, signed report from a qualified independent statistician—provides a defensible record that the covered entity met its HIPAA obligations. A Safe Harbor checklist does not offer the same evidentiary weight in an OCR investigation or litigation context.

Safe Harbor vs expert determination for research data

Both methods are HIPAA-compliant. The question is which is appropriate for your specific research use case. The answer depends on what you're doing with the data and who's reviewing it.

Factor Safe Harbor Expert determination
How it works Remove all 18 HIPAA identifiers Statistician assesses re-identification risk quantitatively
Who performs it Your team, following HIPAA statute Qualified independent statistician
Output Compliant dataset + process documentation Written report with methodology, risk assessment, certification
Data utility Often significantly reduced Preserves more data—structured around what's safe to keep
Suitable for FDA submissions May not meet evidentiary standard Yes—provides audit-ready statistical validation
Suitable for IRB waiver applications Depends on IRB rigor Strongest possible documentation for IRB review
Multi-site aggregated data Insufficient for combined datasets Designed for complex, aggregated datasets
Rare disease or longitudinal studies Often destroys clinical value Preserves longitudinal and rare-population data
Audit defensibility Limited High—signed expert report is a legal artifact

The general principle: Safe Harbor is appropriate for routine operational uses of de-identified data—analytics, training AI models on common disease populations, reporting. Expert determination is appropriate—and often required—when the data will be reviewed by a regulatory body, shared under a DUA, or used in longitudinal or rare-population research where data specificity matters.

What researchers need from a de-identification platform

Expert determination begins with clean, well-documented de-identification. The quality of the statistician's analysis depends entirely on the quality of the inputs. A de-identification platform used in research contexts must meet four requirements:

  • High accuracy on clinical data. General-purpose cloud tools detect 60–70 percent of PHI in real clinical datasets. That miss rate is not acceptable in research or regulatory contexts. Limina's purpose-built models achieve 99.5 percent or higher accuracy on real healthcare data—meaning the de-identified output the expert receives is genuinely clean, not nominally clean.
  • Complete audit trails. Every de-identification run must produce a documented record of what was detected, what was redacted, and what methodology governed the process. This documentation becomes part of the expert's supporting evidence and may be required by the IRB or FDA.
  • Expert-ready output structure. The de-identified dataset must be formatted in a way that allows the statistician to perform their re-identification risk analysis efficiently. This includes preserved metadata, field-level documentation, and output formats the expert's statistical tools can consume.
  • Deployment that protects data. Research data—especially multi-site PHI—cannot leave institutional infrastructure. The de-identification platform must deploy within your environment (in-VPC or on-premises), not via a cloud API that routes data to a third-party server.

How Limina supports research data de-identification

For pharma and life sciences organizations and academic medical centers preparing data for FDA submissions or IRB review, Limina's data de-identification platform handles the de-identification layer that precedes the expert's statistical analysis—producing the clean, structured, audit-ready output the report depends on.

Limina deploys in-VPC or on-premises, ensuring that research data never leaves your controlled infrastructure during the de-identification process. It detects and redacts PHI across unstructured data formats—clinical notes, EHR exports, research transcripts, PDF reports—with accuracy that meets the evidentiary standard expert determination requires.

The platform produces structured, audit-ready outputs—including field-level detection logs, redaction reports, and dataset summaries—in the format independent statisticians need to perform re-identification risk analysis efficiently. Limina also works with a partner network of qualified independent experts who produce expert determination reports specifically structured for FDA, IRB, and HIPAA audit review.

Ready to prepare your research data for FDA, IRB, and HIPAA review?

Limina's data de-identification platform produces audit-ready outputs structured for expert determination—deployed within your infrastructure, built for the accuracy standards that research and regulatory contexts demand.

Whether you're preparing a Real-World Evidence submission, an IRB application, or a multi-site data sharing package, Limina provides the de-identified input your independent expert needs to certify re-identification risk with confidence.

Get a demo—talk to us about your specific dataset and research use case.

Related Articles

Frequently Asked Questions

Does FDA require expert determination for all research submissions?

FDA does not mandate expert determination by name in a single universal rule. However, 21 CFR Part 11’s data integrity requirements and the FDA’s Real-World Evidence guidance framework create strong practical pressure toward statistical validation of de-identification for submissions that include patient-level data. Reviewers increasingly expect quantitative re-identification risk documentation—which Safe Harbor alone cannot provide.

Can an IRB accept Safe Harbor instead of expert determination?

Yes, some IRBs—particularly at smaller institutions or for lower-risk studies—will accept a Safe Harbor attestation for an authorization waiver. However, larger research programs, NIH-funded multi-site studies, and studies involving rare diseases or longitudinal data increasingly expect expert determination because it provides quantitative risk documentation that Safe Harbor cannot. The safest approach is to confirm your specific IRB’s requirements before committing to a method.

Why is Safe Harbor often insufficient for rare disease research?

Safe Harbor removes 18 identifiers categorically. In rare disease populations, the combination of a rare diagnosis code with any geographic or demographic detail may be sufficient to identify a patient—even after all 18 identifiers are stripped. Expert determination is designed for this problem: a statistician assesses re-identification risk in the specific dataset and population context, and can certify that risk remains very small while preserving the clinical specificity the research requires. Safe Harbor cannot make that determination.

Who qualifies as an expert for HIPAA expert determination?

Under 45 CFR §164.514(b)(1), the expert must be a person with knowledge of and experience applying generally accepted statistical and scientific principles for rendering information not individually identifiable. In practice, this means a statistician, epidemiologist, or data scientist with demonstrated experience in re-identification risk analysis, privacy-preserving methods, and health data. The expert must also be independent from the covered entity—your organization’s own data team cannot produce the report.

Does multi-site data sharing require expert determination?

Not universally required by statute, but it is the expected standard for multi-site research data sharing in most regulated contexts. When patient data from multiple institutions is merged, re-identification risk increases due to data richness and population rarity effects. Data Use Agreements governing multi-site sharing frequently specify statistical de-identification validation. Covered entities that share data under a DUA remain responsible for the quality of de-identification, making expert determination the defensible choice.

How does de-identification quality affect the expert’s report?

Expert determination is only as strong as the de-identification that precedes it. If the underlying de-identification missed PHI—which general-purpose cloud tools do at rates of 30–40 percent—the statistician’s risk analysis will be based on incomplete data, and the resulting report may not withstand regulatory scrutiny. A de-identification platform that achieves 99.5 percent or higher accuracy on real healthcare data produces the clean, structured input the expert needs to certify re-identification risk with confidence.

Can expert determination be used retroactively on existing datasets?

Yes. Expert determination can be applied to datasets that were previously de-identified using Safe Harbor or other methods. However, the expert will need access to the original de-identification methodology documentation and the resulting dataset to perform their analysis. If the original de-identification was performed poorly—with high miss rates or inadequate documentation—retroactive expert determination may require re-running the de-identification process with a more accurate platform before the statistician can certify that re-identification risk is very small.