Attachment A - NIH Data Sharing Policy

Implications of the NIH Draft Policy for Data Management and Sharing on Data Derived from Human Participants

The U.S. Department of Health and Human Services (“HHS”) Secretary’s Advisory Committee on Human Research Protections (“SACHRP”) has taken note of the National Institutes of Health (“NIH”) Request for Public Comments on a Draft NIH Policy for Data Management and Sharing and Supplemental Draft Guidance (the “Policy”), particularly as it relates to sharing data that are derived from human participants. In this letter, SACHRP seeks to identify the challenges of applying the Policy to such data and proposes guidance to assist the research community in navigating the complexities of data sharing in the context of research involving human participants.

I. Data-Sharing Provisions in the Draft Policy as Applied to Human Participants

The Draft NIH Policy for Data Management and Sharing and Supplemental Draft Guidance was released in November 2019, as a means to share broadly data from research funded or conducted by NIH and encourage good data management practices. The Policy seeks to encourage the broad sharing of scientific data with the research community and the public.^[1] Central to the Policy is a requirement that investigators of all research that generates scientific data and is funded or conducted by NIH prospectively submit a Data Management and Sharing Plan (“Plan”) prior to initiating the study.^[2] The Plan must describe, in two or fewer pages, how scientific data will be managed, including strategies on how to ensure data security and compliance with privacy protections. Plans must be submitted to the funding NIH Institutes, Centers, and Offices (“ICOs”) as part of the funding application process (e.g., as part of Just-in-Time for extramural awards or technical evaluation for contracts).^[3] Plans are to be reviewed by appropriate NIH staff and ultimately become a term and condition of the relevant NIH award with which awardees must comply.^[4] Failure to comply with a submitted Plan may result in an enforcement action, which can include the imposition of additional special terms and conditions or termination of the award. In addition, failure to comply with the Plan may have an effect on future funding decisions.^[5]

NIH states that the Policy was intentionally kept at a high level to allow for flexibility across various scientific domains.^[6] In recognition of the generality of the Policy, NIH has also released supplemental draft guidance on (1) allowable costs for data management and sharing and (2) elements of an NIH data management and sharing plan.

With respect to human participants, the Policy recognizes that the sharing of data derived from such individuals should be afforded additional protections, and the sharing of human participants’ data is governed by applicable federal, tribal, state, and local laws, regulations, statutes, guidance, and institutional policies, whose restrictions all investigators must accommodate in any Plans.^[7] Plans must include consideration of these requirements – which SACHRP recognizes may often be unknown to researchers who are not experts in data privacy law – as the Plan describes proposed approaches to managing and sharing data derived from human participants. Further, the Policy requires that Plans describe the ways in which participants’ privacy, rights, and confidentiality will be protected, via de-identification^[8] or other means.^[9]

The Federal Register publication of the Policy is followed by supplemental draft guidance on elements of an NIH data management and sharing plan that outlines additional elements that investigators should consider as they develop their Plans. Under the supplemental draft guidance, investigators are advised to consider, among other things, the rationale for decisions about which data to share, plans for providing appropriate protections of privacy and confidentiality for scientific data derived from human participants, and whether data generated from humans will be available through restricted or through unrestricted access.^[10]

Though many experts provided thoughtful input on the Policy, there has been little discussion of the Policy’s stance on data derived from human participants. Our intent in this letter is to examine the ways in which the Policy may better protect such data and precautions that SACHRP believes should be implemented to protect the interests of human participants.

II. Potential Issues Created by the Policy; Recommendations

A. Consent to Data Sharing and Control of Data by Human Participants

As drafted, the Policy includes little to no discussion of informed consent, despite the fact that under the federal Common Rule, the Food and Drug Administration regulations on human subjects research, as well as general common law principles of consent to treatment, the data of human participants are typically collected pursuant to some form of informed consent. Investigators generally should comply with any limitations on future uses and sharing of data derived from human participants that are set forth in the relevant consent document. Though human participants may consent to use and disclosure of their data as part of a clinical trial or other research study, such consents often do not contemplate broad sharing of the collected data with the research community and the public. Moreover, even when the consent does discuss the possibility of future research, the consent document may not define future uses to include sharing the consenting patients’ data as broadly as is contemplated by the Policy. SACHRP recommends that the Policy require that investigators carefully craft consents so that they accommodate subsequent uses of data derived from human participants; that investigators using previously-collected data sets pay close attention to any specific data use limitations in the terms and conditions (including consents) under which those data were collected; and that investigators, as part of their Plans, must ensure compliance with any such data use limitations by subsequent data requestors. SACHRP further recommends that NIH consider implementing a means to track such limitations or require downstream users to enter into data use agreements so that subsequent data requesters are made aware of sharing limitations and held accountable for unauthorized future uses of data derived from human participants.

Further, upon the adopting of the Policy, investigators and institutions will need to undertake the burden of harmonizing existing consent documents and privacy notices to accommodate the type of data sharing contemplated by the Policy. For example, if there is an original consent associated with medical treatment, the consent often may not state or indicate that the data collecting during standard of care procedures may be used subsequently in a research study. Also, the Health Insurance Portability and Accountability Act of 1996 (“HIPAA”) Privacy Rule requires “covered entities,” which include health plans and most U.S. health care providers, to provide individuals a “notice of privacy practices” explaining how such entities may use and disclose individuals’ protected health information that are gathered in the course of medical care and health insurance payment. Even though data disclosed pursuant to the Policy may be de-identified and thus no longer subject to HIPAA standards, consideration of medical and research ethics should lead covered entities to inform patients and beneficiaries in these notices that their clinical data may be de-identified or anonymized and re-used for future research, and the privacy notices should describe subsequent data sharing. These disclosures should be made not only in the context of the data sharing contemplated by the Policy, but also for data sharing pursuant to other contractual obligations, regulatory requirements (e.g., European Medicines Agency Clinical Trial Regulation), publication requirements (e.g., BMJ transparency policy), and institutional commitments. SACHRP recommends that the Policy promote transparency in notices of privacy practices and in similar statements of institutional practices regarding subsequent research uses of clinical data, so that patients may be aware of data sharing for future research purposes, even in advance of any participation in a primary research study, as many data used in primary studies are themselves previously gained during delivery of standard of care or previous research. SACHRP further recommends that NIH, in collaboration with the HHS Office for Civil Rights, provide additional guidance in the form of standard text or templates that would assist investigators and institutions in developing updated consent and notice documents that conform to the Policy’s expectation for data sharing, and that these templates be crafted to respect health literacy limitations that often characterize the public at large.

The above discussion of informed consent raises the issue of the level of control that human participants have over their data and how to ensure that participants’ rights and preferences are protected as their data are being shared downstream. In general, under HIPAA standards as well as under accepted research practices, data subjects do not exercise control over the downstream research uses of any of their de-identified or anonymized data. Further, under HIPAA and under the Common Rule, identifiable data – if they are used under established regulatory pathways such as waiver of authorization and waiver of consent – also are not subject to control of the human sources of those data. Yet the Policy makes no reference to the degree to which human participants may exercise control over their data once such data are made available to the research community or the public under the Policy. Even under existing research practices, there are some circumstances in which such information may be relevant. For example, under the HIPAA Privacy Rule, individuals have the right to revoke authorizations for use or disclosure of their protected health information,^[11] and if identifiable research data have been provided to a researcher under an authorization (as opposed to under a waiver of authorization or a limited data set), then the individual may exercise this revocation right. In contrast, the Policy contains no reference to how the revocation would be fully effectuated in regard to those researchers who already have received the data under the Policy. Relatedly, the Policy does not make clear whether and how human participants may access and share their own data with other researchers. SACHRP recommends that the Policy or its supplemental guidance provide investigators with clearer guidelines on how to make clear to human participants that their rights to maintain any control of their data – whether de-identified or not – once such information has been shared with downstream users are extremely limited. SACHRP further recommends that participants be informed of mechanisms through which they may themselves seek to make their data available to third-party researchers.

B. Efficacy of De-Identification and the Heightened Risk of Re-Identification for Smaller Populations

The Policy relies on de-identification as one of the ways to protect human participants’ privacy rights and confidentiality. However, the Policy does not specify a particular standard that should be used to ensure de-identification. Various standards for de-identification exist, and some are more rigorous than others. SACHRP recommends that the Policy provide minimum standards that investigators should meet or surpass when de-identifying data derived from human participants. Notwithstanding the above, though de-identification is commonly perceived to be an effective means to protect human participants, certain studies have shown convincingly that other data can be used in conjunction with de-identified data from research studies to re-identify individuals. Increasingly, the protections afforded by removing the eighteen identifying data elements cited in HIPAA^[12] have become out of date, as technological advances and the combining of data sets increase the risk of re-identification. For example, commercial interests have increasingly been trying to combine large, de-identified data sets with real-world data collected during the course of ordinary daily activities (e.g., credit card charges, driving habits), which increases the risk of re-identification and misuse of previously de-identified data. SACHRP recommends that the Policy make note of the potential for re-identification of previously de-identified data and that the Policy direct investigators to consider the risk of re-identification for their particular data sets as they formulate their Plans. SACHRP further recommends that, to the extent an investigator’s Plan includes putting in place a data use agreement, NIH require such agreements to include a provision by which the data recipient agrees not to attempt to identify individuals who are subjects of the data.^[13]

Given that technological advances may allow for re-identification of data derived from human participants that have been de-identified, one way to mitigate the risk of re-identification is to exclude certain information from a data set that is made available for broad sharing under the Policy, particularly if such information would not materially contribute to enabling the replication and/or validation of scientific results. SACHRP recommends that a clearer articulation of standards on how to manage re-identification risks would create a baseline of human participant protections. SACHRP recommends that, at a minimum, the Policy identify common data types that investigators should consider excluding (unless sharing of such information is otherwise explicitly consented to) from a data set, in consideration of re-identification risks.

Re-identification of data may result in harm to human participants (e.g., discrimination, identity theft, illegal/non-consensual surveillance). However, individuals belonging to smaller populations and minority groups, or “discrete and insular minorities,” such as many American Indian and Alaska Native communities, may be more likely to be re-identified and potentially experience greater harm in the event that their data are re-identified. To address the increased risks associated with re-identification of data derived from such closely defined populations, the Policy generally states that investigators should consider including exceptions to sharing certain data in their Plans, especially when working with “small or underserved populations.” Recognizing that the data of certain populations warrant extra scrutiny is an important first step, but investigators and the research community more broadly would benefit from additional guidance on when and how to apply such exceptions, to ensure that the sharing of these especially sensitive data is being handled appropriately. For example, individuals belonging to certain closely defined populations, such as a specific racial, ethnic, religious, national, indigent, or disease-defined community, might consent to the sharing of their data as an exercise of their autonomy; however, the Policy does not discuss how investigators should reconcile such consents with the Policy’s suggestion that Plans might consider excluding certain information from their data sets to protect these populations. While sharing of data for additional research uses may often be beneficial, there are circumstances in which the sharing of certain data may increase the risk of harm to human participants, particularly for populations with unique sensitivities, such as those who have historically been the subject of discrimination or lack effective representation in the political process, or who may suffer dignitary harms as a result of the use of their data for certain types of research projects. SACHRP recommends that the Policy provide additional guidance on how investigators might navigate exceptions to sharing certain data derived from defined, vulnerable populations. SACHRP recommends that the Policy give examples of acceptable and prudent limits on data sharing, to signal to investigators the types of data sets and populations that would experience higher risk of harm if re-identification were to occur. SACHRP further recommends that the Policy more clearly articulate that Plans should be carefully crafted to mitigate such risks while preserving the ability for members of these populations to consent to sharing their data.

Genomic data are also particularly susceptible to re-identification – this fact has been recognized in the NIH Genomic Data Sharing Policy, which provides that such data are sufficiently sensitive to warrant obtaining from human participants informed consent explicitly discussing future research use and broad data sharing, even when the data are de-identified to the standard set forth in the HIPAA Privacy Rule. There are some researchers who assert that genomic data cannot be fully de-identified, and the U.S. federal departments and agencies that have adopted the Common Rule have announced an intention to revisit the identifiability of such data as part of the revisions to the Common Rule that took effect in 2019.^[14] SACHRP recommends that the Policy or supplemental guidance explicitly state that in the context of NIH-funded research, and given the purpose of the Policy, the unique nature of genomic data be regarded as precluding de-identification or anonymization as sufficient justification for use or sharing of human participants’ genomic data without consent, and that safeguards in addition to de-identification or anonymization should be considered when sharing genomic data under the Policy. Such positions would be consistent with those taken under NIH’s own Genomic Data Sharing Policy for NIH-funded research that falls under that policy. Further, SACHRP recommends that consent documents for participants providing genomic data that are to be shared under the Policy include a statement on the unique ability for such data to be re-identified and on any safeguards that might be added to protect the data from re-identification.

C. Controlled Access to Data Derived from Human Participants

Though the Policy provides researchers the flexibility to specify in their respective Plans when and where the scientific data are to be shared, more specific guidance should be made available to mitigate the possibility of such data being used for malicious intent. Making scientific data broadly available may help achieve legitimate goals, such as enabling the verification of scientific results or facilitating reuse of hard-to-generate data. However, broad data sharing, particularly if such sharing is agnostic as to the recipient of the data, may lead to misuse. For example, broad sharing without any access restrictions may lead to inappropriate targeting of certain populations for political, marketing, or discriminatory purposes. Unrestricted access to data may also affect investigators: researchers in academic settings may want to avoid secondary use of the data they gathered, for fear of others using the data without providing appropriate attribution, and research sponsors in industry may worry that competitors may use their data as an unfair “leg up” to initiate competing clinical trials at less cost. While some have argued that the automatic issuance of Certificates of Confidentiality for NIH-funded research involving human participants serves sufficiently to safeguard human participant data from inappropriate use,^[15] a Certificate of Confidentiality still permits the further disclosure of information for purposes of scientific research conducted in compliance with applicable federal regulations governing the protection of human subjects in research.^[16] Thus, for example, genomic information that has been shed of all eighteen HIPAA identifiers could be further shared for research purposes despite the existence of a Certificate of Confidentiality because the sharing of such de-identified genomic information would generally fall outside of the federal Common Rule, which governs the research use only of identifiable private information of human subjects.^[17] Accordingly, the availability of Certificates of Confidentiality alone is not sufficient to safeguard sensitive data derived from human participants. SACHRP recommends that sensitive data, particularly those derived from human participants, be more safely shared by encouraging or requiring the implementation of controlled access measures. SACHRP recommends, for example, that NIH consider requiring data requesters to agree to terms and conditions under which the requester must protect data privacy, refrain from attempting to identify individual participants, and not share the data with individuals outside of those who are listed in the data access request. For particularly sensitive data, access could also be controlled by creating a “sandbox” environment in which legitimate requesters may access and manipulate data without obtaining the ability to receive portable copies of the data or to share those data with a third party.

D. Downstream Sharing and Efficacy of Sanctions

As discussed in various sections above, making data broadly available to the research community and the public may result in downstream uses of the data. Though the Policy recognizes that certain data sets may require limitations on sharing, it is unclear as to whether and how downstream requesters would be held accountable for unauthorized uses of data derived from human participants. Further, the Policy is silent on forms of recourse that human participants can take if downstream requesters use their data in breach of sharing limitations or for malicious purposes. As written, the Policy primarily ensures compliance with Plans by terminating awards or taking non-compliance into consideration when making future funding decisions. Though the loss or diminution of funding may serve as a mild deterrent for data users that depend on NIH awards, the Policy’s enforcement provisions have little to no effect on investigators and institutions that operate independently from NIH funding. SACHRP believes that the sensitive nature of data derived from human participants necessitates sanctions of sufficient consequence, regardless of whether NIH funding is involved, and SACHRP therefore recommends that NIH articulate stronger measures to deter more effectively any misuse of such data. Suggested enforcement mechanisms may include fines or civil monetary penalties for each instance of non-compliant use or sharing of data derived from human participants. SACHRP recommends that ideally, such measures be applicable to investigators and institutions more universally, without being limited to research being funded by the NIH.

E. Resolution of Discrepancies Created by Ex-U.S. Standards

The Policy rightly contemplates the applicability of data privacy requirements under federal, tribal, state, and local laws, regulations, statutes, guidance, and institutional policies. However, the Policy remains silent on international requirements that may apply to the use of certain data sets. For example, the General Data Protection Regulation (“GDPR”) regulates U.S.-based use and processing of personal data that have been collected in the European Economic Area (“EEA”) for research purposes. Under the GDPR, consents for future uses of personal data may be required to have a level of specificity that differs from the broader authorizations that are permitted under HIPAA and its implementing rules and regulations. Even if the Policy does not explicitly acknowledge that Plans should consider privacy requirements that may exist under international law, the GDPR and other international regulations may nonetheless apply to certain U.S.-based research. GDPR and other national privacy laws often even define such basic privacy terms as “de-identified” or “anonymized” in ways that differ from the U.S. standards with which U.S.-based investigator tend to be most familiar. SACHRP recommends that the NIH bring to researchers’ attention the potential applicability of international data privacy standards and provide guidance to clarify discrepancies between U.S. and ex-U.S. requirements on the sharing and maintenance of data derived from human participants.

III. Conclusion

As described above, the lack of clarity regarding the Policy’s treatment of data derived from human participants provides NIH an opportunity to release additional, acutely needed guidance. Though the intent to create a flexible set of guidelines that applies across scientific disciplines is understandable and laudable, SACHRP recommends that the NIH articulate certain minimum standards and factors designed to protect human participants, in order to direct investigators to tailor data-sharing approaches more carefully to the risks and limitations inherent in their data sets. Additional guidance would provide greater certainty to the research community and give human participants greater confidence in the security and privacy of their personal data.

[1] 84 Fed. Reg. 60,400 (Nov. 8, 2019).

[2] Id.

[3] Id.

[4] Id. at 60,401.

[5] Id. at 60,401.

[6] Id. at 60,399.

[7] 84 Fed. Reg. 60,401.

[8] For purposes of this document, “de-identification” refers to personal data that have been de-identified under HIPAA standards, and “anonymization” refers to the rendering of personal data, under Common Rule standards, so that the “identity of the human subjects cannot readily be ascertained, directly or through identifiers linked to the subjects.” “Sensitive data” refers to personal data whose disclosure can reasonably be expected to bring some identifiable injury or harm to the data subject, with degree of data sensitivity increasing as the harm increases in severity.

[9] Id.

[10] Id. at 60,402.

[11] 45 C.F.R. § 164.508(b)(5).

[12] See 45 C.F.R. § 164.514(b)(2) (describing requirements for de-identification of protected health information).

[13] Note that, as discussed in Section D, Certificate of Confidentiality requirements attach to, and follow, identifiable human data collected in the course of NIH-funded research.

[14] See 45 C.F.R. § 46.102(e)(7) (requiring the reexamination of “identifiable private information” and “identifiable biospecimen” within one year of the 2019 revisions to the Common Rule and at least every four years thereafter).

[15] See 42 U.S.C. § 241(d) (describing the issuance of a certificate of confidentiality to protect individuals who are the subjects of biomedical, behavioral, clinical, or other research funded by the federal government); NIH Notice Number NOT-OD-17-109 (Sept. 7, 2017, eff. Oct. 1, 2017) (announcing updates to the NIH policy for issuing certificates of confidentiality for NIH-funded and -conducted research).

[16] 42 U.S.C. § 241(d)(1)(C)(iv).

[17] See 45 C.F.R. § 46.102(e)(5) (defining “identifiable private information” as information for which the identity of the human source is or may readily be ascertained or associated with the information).