NIH Genomic Data Sharing Policy: NIH Request for Public Comments

The Secretary’s Advisory Committee on Human Research Protections (SACHRP) submits these public comments in response to the National Institutes of Health (NIH) Request for Information (RFI) on Proposed Updates and Long-Term Considerations for the NIH Genomic Data Sharing Policy (GDS Policy), NOT-OD-22-029.

I. Maximizing Data Sharing while Preserving Participant Privacy and Preferences (#1-4)

1. De-identification. The risks and benefits of expanding de-identification options, including adding the expert determination described at HIPAA 45 CFR 164.514 (b)(1) (the HIPAA Privacy Rule), as an acceptable method for de-identification under the GDS Policy, and whether other de-identification strategies exist that may be acceptable in lieu of HIPAA standards.

We have five recommendations, primarily for harmonization purposes.

  1. We recommend NIH harmonize de-identification options by adding the Expert Determination method described in the HIPAA Privacy Rule (45 CFR 164.514(b)(1)).  Entities covered in whole or in part by HIPAA (HIPAA Covered Entities and their Business Associates) use this permissible method of de-identification, which requires the following safeguards:
    1. A person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable:
      1. Applying such principles and methods, determines that the risk is very small that the information could be used, alone or in combination with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the information; and
      2. Documents the methods and results of the analysis that justify such determination. Id.1
  2. As NIH is updating its GDS Policy, we recommend a small edit in the reference to the HIPAA Safe Harbor method of de-identification to achieve NIH’s intended harmonization.  NIH GDS Policy says currently that “data should be de-identified to meet the definition for de-identified data in the HHS Regulations for Protection of Human Subjects and be stripped of the 18 identifiers listed in the HIPAA Privacy Rule” [citing 45 CFR 164.514(b)(2)].  We recommend revising the sentence as follows (edits shown in italics):

    “The final GDS Policy has been clarified to state that for the purpose of the Policy, data should be de-identified to meet the definition for de-identified data in the HHS Regulations for Protection of Human Subjects and be stripped (for the individual, their relatives or household members, or employers) of the 18 identifiers listed in the HIPAA Privacy Rule, and the HIPAA covered entity does not have actual knowledge that the information could be used alone or in combination with other information to identify an individual who is a subject of the information.

    We recommend these edits for (i) consistency with the HIPAA citation and (ii) clarity to the regulated community, particularly since genomic datasets may involve individuals’ family members.  In addition, NIH is considering expanding dataset linkages in the future, and Safe Harbor de-identification requires that the Covered Entity does not have actual knowledge that the combined information could be used to identify the individual.  NIH and Office for Civil Rights (OCR) guidance is recommended to assure Covered Entities that they may meet the Safe Harbor de-identification standard even if NIH GDS Policy expands data linkages in the future.

  3. For harmonization and clarity to the regulated community, we recommend that NIH and OCR provide guidance on how NIH’s definition of “human genomic data” compares to HIPAA’s definition of “genetic information” in the 2013 Omnibus Rule.  45 CFR 160.103.  Such guidance will help institutions comply with both sets of requirements and will facilitate research.
  4. In the updated policy, we recommend that NIH clarify and update the principle in the RFI that states: “NIH relies on robust protections beyond de-identification, such as Institutional Review Board (IRB) consideration of risks associated with data submission….”  This sentence assumes an IRB will review data submissions to NIH under the GDS Policy.  However, the Revised Common Rule’s (2018 CR) new definitions and exemptions no longer require IRB review for some activities.  Particularly if NIH permits data linkage to other datasets (including, for example, data gathered in public health surveillance activities, or clinical records), an IRB may not be required to conduct a review, either because the activity has been deemed not to be research or the research is exempt from the regulatory requirements including IRB review. 45 CFR 46.102(l)(2); 45.104(d)(4)(ii-iv).  The updated GDS Policy should take into account that an IRB may no longer have jurisdiction over the use of some datasets prior to submission under GDS Policy.  If NIH believes it is important for there to be some level of IRB review for all data submitted under GDS Policy, we recommend it work together with OHRP to harmonize the 2018 CR with NIH GDS Policy (including through guidance to the regulated community).
  5. We recommend that NIH prioritizes, and continues to prioritize, principles to foster trust across broad communities, including many communities traditionally underrepresented in research and in research data repositories.  As one example, this principle would include consideration of additional protections that may apply to de-identified data, such as Tribal laws, regulations, and policies.  NIH has articulated many of these concerns in its recent Request for Public Comments on DRAFT Supplemental Information to the NIH Policy for Data Management and Sharing: Responsible Management and Sharing of American Indian/Alaska Native Participant Data (NOT-OD-22-064) (“Notice”). (  We recommend NIH extend engagement strategies and actions to additional communities that traditionally have been underrepresented in research.  As examples of engagement, the Notice described a “multi-year, stepwise process for seeking feedback from the community,” both during policy development and over the course of policy implementation.  The engagement included, in part, NIH proactively initiating consultations that led to NIH public reports, which shared in a transparent manner the input received on needs of the community and NIH’s response and planned actions.  It is important that this is an ongoing process, as also recognized in the Notice.  Another resource that we would point to for meaningful stakeholder engagement is found at  We also recommend NIH consider how to ensure the interests and needs of diverse communities are addressed during the decision-making process of the Data Access Committee, which reviews and approves or disapproves requests from extramural and intramural researchers.  We believe these types of actions are important to develop public trust in future changes to the GDS Policy, particularly if NIH permits linkage of datasets or the use of potentially identifiable information, in addition to deidentified genomic data. 

2. Use of Potentially Identifiable Information. The circumstances under which submission of data elements considered potentially identifiable to repositories under the GDS Policy would be acceptable, any additional protections (including for security) that would be warranted, and whether there is certain potentially identifiable information that would not be acceptable to submit.

We have four recommendations. 

  1. Based on the RFI’s recognition of the value of dates and certain geographic information for research, we recommend that if NIH permits such information in its updated GDS Policy, it should harmonize with HIPAA’s definition and safeguards for Limited Data Sets.  Limited Data Sets are identifiable information under HIPAA but are restricted to dates associated with an individual (e.g., dates of birth and death, treatment dates) and certain geographic information (e.g., zip codes), subject to many safeguards including that the recipient will not identify or contact the individual.  45 CFR 164.514(d)(4)(ii).  In addition, since NIH would be among the recipients of the Limited Data Sets from institutions, it should confirm in its policy that NIH (as a recipient) would meet the requirement to “not identify” the individuals in the Limited Data Sets.  Clarity on this point will facilitate institutions’ compliance and willingness to provide data, in the event NIH permits disclosure of Limited Data Sets in its updated policy.2
  2. The broader principle for NIH to consider is that if it expands its policy to permit potentially identifiable information (such as Limited Data Sets), whether through data linkages or otherwise, then consent from individuals may not occur.  The RFI states: “NIH remains committed to the principles espoused by the GDS Policy of maximizing scientific advances and public benefit by sharing genomic data and associated phenotypic data in a manner consistent with participants’ informed consent.”  HIPAA does not require individuals to give permission to use or disclose a Limited Data Set for research purposes.  So, NIH should determine if it remains committed to informed consent in all circumstances or whether it believes that other safeguards (such as privacy protections for a Limited Data Set) are sufficient.
  3. If NIH expands its policy to permit potentially identifiable information, then an additional protection that may be warranted in appropriate circumstances is an opt-in by individuals to permit secondary research use of such information.  This safeguard would respect individual autonomy and would harmonize GDS Policy with HIPAA, enabling medical centers that hold much of this information to comply more readily.  For more information, please see SACHRP’s prior recommendations on compound (dual-purpose) authorizations and secondary-research authorizations for use of identifiable information. (“Future/Secondary Research”) and
  4. In addition, we recommend that NIH consider the Precision Medicine Initiative Privacy and Trust Principles if it expands its policy to include potentially identifiable data.  An additional set of resources that NIH may wish to consult is the All of Us consent form and HIPAA authorization templates, which explain in plain language the types of identifiable data that a participant is agreeing to contribute (e.g., new data gathered from an individual, data gathered from the longitudinal medical record, and data collected from other sources) and the range of possible secondary research uses.

3. Data Linkage. Whether the GDS Policy should permit data linkage between datasets that meet GDS Policy expectations (e.g., data obtained with consent for research use and de-identification), and whether the GDS Policy should support such linkages to datasets that do not meet all GDS Policy expectations (e.g., data may have come from a clinical setting, may not have been collected with consent, may retain certain potentially identifiable information). Feedback is also requested on risks and benefits to any such approaches.

We have six recommendations.

  1. NIH should clarify who is expected to conduct data linkages and how, given that existing data submissions under GDS Policy must be de-identified and future submissions may be de-identified.  The RFI is unclear about this, and linkage requires someone to retain an identifier.  If NIH expects each research institution to engage a third party to perform data linkages, we recommend NIH work with the research community to develop a model contract template between the institution and such third parties, including limitations on the third parties’ downstream uses (commercial or otherwise) of de-identified or potentially identifiable data intended for submission to NIH under this policy.  Such protections against downstream and/or commercial uses by third parties performing these services are essential to maintain public trust in the NIH’s GDS policy and its implementation.  In parallel, NIH should provide guidelines to the research community about the standards that the third parties must adhere to, and about transparency in consent forms so that participants understand the third parties’ role in data linkage under GDS policy.  Such model templates and guidelines may help institutions to be more willing to conduct data linkages and contribute linked data.
  2. Most institutions contributing data to NIH will be medical centers that are also HIPAA Covered Entities.  NIH policy should recognize that a Business Associate Agreement normally will be needed to engage a third party to conduct data linkages (e.g., if data are included from medical records, or other data are considered to be Protected Health Information by the Covered Entity).
  3. NIH should clarify if it will review possible data linkages for consent, privacy, and security protections, or whether each research institution is expected to do so for the data it submits to NIH.   Until NIH policy on data linkage is clarified, the institution may not know what linkages are planned, and some research uses of datasets no longer require IRB review under the 2018 CR, as discussed above.  In addition, in the event NIH plans to perform any data linkages itself, institutions may be more willing to share data if they are not punished for supplying data that NIH then links to other datasets.
  4. The broader principle for NIH to consider is that if it expands its policy to permit data linkage, then consent from individuals may not occur, and the RFI emphasizes that NIH remains committed to consent (as explained above, Sec. 2(b)).  SACHRP is concerned that data linkages may raise issues of trust, given NIH’s existing policy of using only de-identified data (which cannot be used to reidentify individuals).  At the same time, NIH should carefully evaluate the usefulness of linking past data collections (which may not have had consent for GDS Policy purposes) with current data; NIH should avoid a blanket prohibition on the use of such past data when it is highly valuable for scientific advances and human health, including for both common and rare diseases.   The questions of when the use of such data is highly valuable for scientific and health purposes, and who decides, through what process, merit further attention.  We understand that NIH has experience implementing somewhat similar determinations under the original GDS Policy, which has an exception permitting use of data without consent for “compelling scientific reasons,” as determined by the NIH Funding Institute or Center.3   In light of the volume, breadth, and depth of data about an individual that is currently collected and that may not have consent for research uses, we recommend NIH share its experience with these determinations under the original policy and solicit further public input on a proposed draft policy.
  5. If NIH decides to permit data linkage, we recommend that it continue to prioritize principles to foster trust across broad, diverse communities, including many communities traditionally underrepresented in research and in research data repositories. (Please see Sec. 1(e).)
  6. If NIH decides to permit data linkage, we recommend NIH permit institutions to recover direct and indirect costs associated with such activities.  Data linkage across systems is complicated and requires resources to assure ethically appropriate, privacy-protective, and legally compliant submissions.  Direct costs include, for example, personnel time, institutional approvals, updates to policies and procedures and related costs for external counsel or in-house counsel review, and engagement of third parties to perform data linkages.  Infrastructure costs, such as for enhancements to information systems and tools, should also be recoverable to support institutions in submitting data to NIH under the updated GDS Policy.

4. Consent for Data Linkage. Whether data linkage should be addressed when obtaining consent for sharing and future use of data under the GDS Policy, as well as in IRB consideration of risks associated with submission of data to NIH genomic data repositories. And if so, how to ensure such consent is meaningful.

We have two recommendations.

  1. For meaningful consent/permission to use data, please see Sec. 2(c) and (d) above.
  2. For IRB consideration of risks associated with submission of data to NIH genomic data repositories and data linkage, please see Sec. 1(d) above.  IRBs may not be required to review all datasets that may be valuable for linkage purposes to support research.

II. International Privacy Considerations:

A great deal of research that is subject to the GDS policy is conducted at sites located outside the U.S. that receive funding directly from the NIH or through a subaward from a U.S. institution that serves as a prime awardee of NIH funding. We recommend that the GDS Policy acknowledge, and take into account, the need for compliance with an increasing array of privacy laws and regulations in countries outside of the U.S.

As a key example, many of these sites are located in the European Economic Area (EEA) and the United Kingdom (UK), given the leading role that institutions in the EEA/UK play in genomic research. Under the EEA/UK privacy law known as the General Data Protection Regulation (GDPR), transfers of personal data from the EEA/UK to the U.S. are generally prohibited unless a legal mechanism is put in place to legitimize such transfer. Genomic data that must be deposited in a repository under the GDS are generally considered “personal data” for purposes of the GDPR despite the fact that they may be “de-identified” from the standpoint of HIPAA. Accordingly, a legal basis under the GDPR or the analogous UK law is required to legitimize the transfer of such genomic data from the institution in the EEA/UK at which they were collected to the prime awardee institution in the U.S. or an NIH-designated repository located in the U.S. The mechanism most frequently used to legitimize such transfers is the European Commission-approved Standard Contractual Clauses (SCCs). Indeed, the SCCs are often the only mechanism for data transfers available for retrospective research in which there is no effective opportunity to obtain the explicit consent of the data subjects to the international transfer.

SCCs are often the only mechanism available to legitimize the transfer of genomic data for a number of reasons. First, the other mechanisms for transfer available under GDPR Article 46 (i.e., “Transfers subject to appropriate safeguards”) generally are not available in the research context. For example, “binding corporate rules” work only when the transfer is between entities in the same corporate group, and thus this mechanism would not work to legitimize transfers from an EEA/UK research institution to NIH. Another option available under Article 46 is a “code of conduct” that is approved by the supervisory authorities of the relevant EEA member states and the European Data Protection Board (EDPB). This mechanism is not, however, of practical utility presently as the EDPB has to date issued only draft guidance on codes of conduct as a mechanism to legitimize cross-border data transfers and no such codes will be approved until such guidance is issued in final form. Moreover, even once the EDPB’s guidance on codes of conduct is finalized, approval of a code of conduct is a multi-step process that will likely take months, if not years, thus making this a long-term endeavor that is unlikely to provide immediate relief.

When an Article 46 data transfer mechanism is not available to legitimize a cross-border data transfer, the parties may in limited circumstances rely on the “derogations” found in GDPR Article 49 as a means to legitimize cross-border data transfers. EDPB guidance however indicates that these mechanisms should be used for occasional rather than repeated transfers of data. Thus, these derogations often cannot be used in the context of a longitudinal study in which data are transferred at regular intervals over a long period of time. The most frequently used “derogation” in the research context is obtaining the “explicit consent” of the data subject. In many retrospective studies, however, there is no opportunity to obtain the explicit consent of the data subject. GDPR Article 49 also contains a derogation that permits transfers “necessary for important reasons of public interest.” Like the other Article 49 derogations, however, the EDPB has interpreted the availability of this exception to be extremely limited. In the context of the COVID-19 pandemic, for example, the EDPB stated in guidance that while this derogation could be used for the initial cross-border transfer of personal data in connection with COVID-19 research, it could be used only as a “temporary measure” and repetitive transfers as part of a long-lasting research project would need to be made under an alternate mechanism to safeguard the data transferred. Thus, even in the context of a worldwide pandemic, the ability to rely on GDPR derogations to legitimize cross-border transfers of personal data has remained limited.

One issue that has frustrated the ability of genomic data from the EEA/UK to be deposited in NIH-designated repositories is the inability of NIH to sign the SCCs, due to concerns of the U.S government about subjecting an arm of the U.S. government to European Member State Law. This inability of NIH to sign SCCs frustrates the research community in two ways. First, EEA/UK institutions are unable to transfer genomic data directly to NIH-designated repositories by using the SCCs. Second, passing genomic data through a prime awardee institution in the U.S. that will sign the SCCs, with the goal of that U.S. institution in turn passing the data on to the NIH-designated repository, does not eliminate this problem. This is because by signing the SCCs, the U.S. institution is agreeing to require any entities that receive the data via an “onward transfer” to themselves become a signatory to the SCCs. Thus, to pass data received via the SCCs to an NIH-designated repository, the U.S. institution must bind the repository to follow the SCCs. We are aware of situations in which U.S. universities have been found in breach of their prime awards with NIH due to their inability to pass genomic data collected in the EEA/UK to an NIH-designated repository, which is itself a consequence of the inability of NIH to sign the SCCs.

There are a few possible solutions to remedy this issue. First, the NIH might begin signing SCCs, or create a quasi-governmental entity that might be able to sign the SCCs and organize and operate a data repository that meets the necessary standards; but this may well not be allowed by the U.S. government, and for obvious reasons that the government would prefer not to have its own agency subjected to the direct jurisdiction of another nation-state. Significantly, the GDPR has led to the adoption of similar laws across the globe that restrict cross-border data transfers and often require mechanisms similar to the SCCs to legitimize cross-border transfers of personal data. Thus, the NIH’s inability to sign SCCs, if unremedied, will only continue to expand the countries from which NIH-designated repositories are unable to obtain data under the GDS. A second possibility would be for NIH to allow in the GDS a waiver of the data sharing requirements when data are collected in the EEA/UK or another jurisdiction that restricts cross-border data transfers and the data cannot be transferred due to the inability of NIH to sign the SCCs or similar mechanisms put in place by non-EEA/UK countries – although this too is suboptimal, because important data would be largely unavailable for secondary research. If this problem is not solved, then one might expect that NIH could become reluctant to fund this research at institutions in the EEA, because the research results cannot be entered into the NIH genomics databases. But the third possibility here – for which NIH has been responsibly advocating – is that the EU data privacy authorities, at both the EU and EU member state level, should amend their interpretations of GDPR to allow the trans-national shipment of these data under specified protective conditions short of full SCCs being signed.

While the above discussion focuses on cross-border transfers subject to GDPR, the GDPR is becoming a model for data protection laws worldwide, and many other jurisdictions are imposing restrictions on cross-border data transfers that are similar to those found in GDPR. For example, Brazil’s recently enacted data protection law, the LGPD, requires that certain mechanisms be put in place to legitimize cross-border data transfers, one of which is standard contractual clauses approved by the national data protection authority. The People’s Republic of China’s data privacy law, the PIPL, which took effect on November 1, 2021, similarly requires a legal basis for the transfer of personal information outside of China, with one such basis being the entry of a standard contract prescribed by the Cybersecurity Administration of China. Thus, the issues that NIH has faced under the GDS with respect to the GDPR are likely to grow in coming years as additional laws containing restrictions similar to those found in GDPR come into effect globally. Putting in place one of the solutions described above to permit data transfers under GDPR could thus provide a model that could be replicated with respect to other foreign jurisdictions.

III. Closing Comments

We appreciate the opportunity to submit these recommendations, particularly to emphasize the principles of harmonization, meaningful consent, privacy and security protections, respect for diverse communities, and policies that advance scientific discovery and human health.4

    1. As NIH seeks to harmonize its GDS Policy, including possibly with the HIPAA Expert Determination method, we also recommend NIH take the opportunity to harmonize its Certificate of Confidentiality (COC) policy guidance (which is based on 21 USC 241(d)(4)) with the Expert Determination method. Currently for COC purposes, research “in which identifiable, sensitive information” is collected or used is defined to include “[a]ny other research that involves information about an individual for which there is at least a very small risk, as determined by current scientific practices or statistical methods, that some combination of the information, a request for the information, and other available data sources could be used to deduce the identity of an individual….” In contrast, HIPAA’s Expert Determination method states that data are de-identified if the risk of re-identification is “very small.” NIH’s COC policy guidance should be amended to harmonize with HIPAA so that “greater than a very small” risk replaces “at least a very small risk.” In other words, HIPAA would consider a dataset with a “very small” risk to be de-identified (if other parts of the Expert Determination method are met), but NIH’s COC guidance considers the same “very small risk” to be “identifiable” information covered by a COC. The lack of harmony has been confusing to the regulated community, particularly since the legislation amending the definition of “identifiable” for COC purposes expressly required the Secretary to “(1) ensure the collaboration of the National Institutes of Health,… and the Office for Civil Rights of the Department of Health and Human Services; [and] (2) comply with existing laws and regulations for the protection of human subjects involved in research, including the protection of participant privacy.” (21st Century Cures Act, Sections 2011-2012: Precision Medicine Initiative.) If the conflict continues, it will impede beneficial research with datasets that are privacy-protective under HIPAA.
  • back to note 1
  • 2. If NIH permits Limited Data Sets in its updated policy, then we also recommend it work with OCR to clarify for the regulated community that disclosure of Limited Data Sets under GDS Policy does not constitute a “Sale of PHI” under HIPAA. This issue has not arisen before because current GDS Policy is limited to de-identified data. HHS considers “Sale of PHI” to include data access and data license agreements in exchange for remuneration (including financial or non-financial benefit, directly or indirectly received by the disclosing party). 78 Fed. Reg. 5566, at 5606 (Jan. 2013). HHS’s interpretation since 2013 has been that a grant supporting research (such as from NIH) does not itself trigger a “Sale” of PHI; and, a reasonable cost-based fee is permitted to cover the cost to prepare and transmit a dataset. However, if a medical center (or other Covered Entity) receives more than a cost-based fee or receives other non-financial benefits (such as access) in exchange for providing a Limited Data Set, then the Sale of PHI issue needs clarification so institutions are willing to contribute their data.
  • back to note 2
  • 3. NIH Genomic Data Sharing Policy, NOT-OD-14-124 (Aug. 27, 2014) (“The draft GDS Policy included an exception for “compelling scientific reasons” to allow the research use of data from de-identified, clinical specimens or cell lines collected or created after the effective date of the Policy and for which research consent was not obtained. Commenters did not object to the need for such an exception, but they asked for clarification on what constitutes a “compelling scientific reason,” and the process through which investigators’ justifications would be determined to be appropriate. The funding IC will determine whether the investigators’ justifications for the use of clinical specimens or cell lines for which no consent for research was obtained are acceptable, as provided in their funding application and Institutional Certification. Further guidance on what constitutes compelling scientific reasons will be made available on the GDS website and will likely evolve over time as NIH ICs, the NIH GDS governance system, and program and project staff acquire greater experience with requests for research with such specimens.”)
  • back to note 3
  • 4. Technical note: Any underlines or italics in policy and regulatory language were added.
  • back to note 4
Content created by Office for Human Research Protections (OHRP)
Content last reviewed