I am Dr. Francis Collins, Director of the National Human Genome Research Institute
(NHGRI) of the National Institutes of Health. I appreciate the opportunity to appear before the
Subcommittee today to discuss the Human Genome Project and the implications of the recent
announcement by a private company of their intentions to carry out large-scale sequencing of the
The NHGRI is one of the 22 Institutes and Centers that comprise the federation of federal
research entities known as the National Institutes of Health (NIH). The vast majority of research
dollars appropriated to the NIH flow out to the scientific community across the Nation, primarily
in the form of peer-reviewed research grants. Today, that community numbers more than 50,000
investigators affiliated with nearly 2,000 universities, hospitals, and other research facilities
located in all 50 states, the District of Columbia, Puerto Rico, Guam, the Virgin Islands, and
certain points abroad.
The NHGRI is the lead Institute at the NIH with responsibility for The Human Genome
Project (HGP). The HGP officially began in October of 1990 as a 15-year program to
characterize in detail the complete set of human genetic instructions (the "genome"). The central
aim of the project, which the federal government funds through programs at the NIH's
National Human Genome Research Institute and the Department of Energy, is to arm health
researchers with powerful gene-finding and DNA analysis tools to unravel and understand the
myriad human diseases that have their roots in DNA. Now at its half-way mark, genome
project tools have underpinned virtually all gene discoveries of this decade.
The Human Genome Project's success stems largely from a unique and rigorous
planning process that sets ambitious research goals, time lines and budgets. The first joint
NIH/DOE plan, which covered years 1991-1995, included goals for:
physical and genetic maps; experimental DNA sequencing of the fruit fly, a round worm, yeast,
and the bacterium E.coli; computer management of research data; and studies of the ethical,
legal, and social implications (ELSI) of these new abilities to read genetic information.
Because of the rapid pace of genome research and technology development, scientists
met many of those initial goals ahead of schedule and under budget. So the research plan was
updated again in 1993 to establish new NIH-DOE goals through 1998. All of these goals have
now been met or exceeded. Original expectations were that the NIH cost of these activities
from FY=91-97 would exceed $1 billion in 1991 dollars. I am pleased to report that the cost
has been about 25 percent less than that projection.
Today, with Human Genome Project tools, it is possible to track down a disease-related
gene even when nothing is known about the biochemical problems of the disease or how the gene
works. This technique, based on identifying the position of a gene in the chromosome and then
isolating it, is commonly referred to as positional cloning and was successfully used for the first
time in 1986. Now, the increasing detail and quality of genome maps have reduced the time it
takes to find a disease gene from years, to months, to weeks, to sometimes just days, and
scientists are using the tools to discover dozens of disease genes each year.
An Example - Parkinson's Disease
The isolation of a gene for Parkinson's disease (PD) last year demonstrated the power of
this new discovery method and showed conclusively that changes in DNA can cause PD in some
families. Only two years ago, the National Institute of Neurological Disorders and Stroke held a
workshop to explore using genetic approaches to understand PD. A team led by scientists in
NHGRI's Division of Intramural Research (DIR) began large-scale genetic analysis of DNA
from members of a large Italian family containing almost 600 people, more than 60 of whom
have been diagnosed with Parkinson's. In nine days, NHGRI gene hunters mapped the gene to a
region of chromosome 4, which contained approximately 100 genes. One of the several genes in
that interval had already been identified on the gene map and was known to encode a protein
In just a few months, the researchers showed conclusively that an altered alpha-synuclein
gene caused Parkinson's disease in the study families. Many have hailed this as the most
significant advance in Parkinson's disease research in 30 years. Just last month, a Japanese
research team used genome mapping tools to isolate another gene, this time on chromosome 6,
that also appears to contain a gene that, when altered, predisposes the individual to a rare juvenile
form of Parkinson's disease.
Ethical, Legal, and Social Implications
NHGRI has established productive partnerships among consumers, scientists, and policy
makers to help reduce the possibility that genetic information will be used to harm an individual
or family members and ensure that it will be of benefit to both patients and providers. As an
integral part of the Human Genome Project, the NHGRI and the DOE have each set aside a
portion of their funding to anticipate, analyze, and address the ethical, legal, and social
implications (ELSI) of the Project's new advances in human genetics. The current goals of the
ELSI program are to improve the understanding of these issues through research and education,
to stimulate informed public discussion, and to develop policy options intended to ensure that
genetic information is used for the benefit of individuals and society. Because genetic
information is personal, powerful, and potentially predictive, it can be used to stigmatize and
discriminate against people. Genetic information must be private. DNA Sequencing
If the letters representing the 3 billion bases in the human genome were printed out in
books, and the books were stacked one on top of the other, they would reach as high as the
Washington Monument. The current major goal of the Human Genome Project is to read the
order, letter by letter, of those 3 billion bases.
Sequencing was once done by hand as a series of chemical reactions - a slow and costly
method. In 1990, when the HGP began, the sequencing cost was $10/base. Now, because of
public investment and collaboration with the private sector, machines read the sequence
fragments quickly and efficiently. As a result, the sequencing cost has been dramatically
reduced to roughly $.50/base for high-quality "finished" sequence.
Using a strategy referred to as a "shotgun" sequencing, an investigator takes each page of
those books stacked as tall as the Washington Monument, and randomly cuts the text into small
fragments. These fragments are small enough for sequencing machines to read. To get long
stretches of contiguous DNA, investigators must then reassemble these sequenced fragments
back into sentences, paragraphs, chapters, and books. The reassembly of this puzzle is carried
out largely by sophisticated computer programs.
The sequencing strategy the public genome project uses employs shotgun sequencing of
DNA fragments that already have been carefully mapped and catalogued. This process makes
reassembling the sequenced fragments into contiguous sequence easier because you know where
the fragment came from. In addition, scientists periodically encounter DNA fragments that are
particularly difficult to sequence. To return to the analogy, it is much easier, takes less time,
and is less costly to assemble the text in "finished" form if all the fragments are known to have
come from the same chapter.
In 1996, NHGRI began pilot projects to test strategies and technologies for full-scale
sequencing of the human genome. We now have undertaken human sequencing in earnest. As a
result, investigators have deposited almost 150 million bases of "finished" high-quality human
DNA sequence in GenBank, the publicly funded database supported by the National Library of
Medicine. In accordance with the agreed-upon standards of the international genomic
community, all NIH-DOE funded sequencers have agreed to a rapid data release policy, such
that, new sequence data is submitted to publicly accessible data banks within 24 hours. If one
includes "finished" and "close-to-finished" sequence, over 300 million bases, or 10 percent, of
the human DNA sequence has been deposited in GenBank.
In order to meet the standards adopted by the international genomic community, the
sequence produced must have four characteristics --the "4 A's" of the Human Genome Project --
- the sequence must be accurate, that is, the DNA spellings must be correct. The publicly
funded genome effort will ensure accuracy of 99.99 percent or better.
- the sequence must be assembled. Large-scale sequencing relies on the accurate
assembly of smaller lengths of sequenced DNA into longer, genomic-scale pieces, so DNA
will be assembled into long pieces that reflect the original genomic DNA.
- Because human DNA sequence must also be affordable, a portion of our research
funds focuses on technology development to reduce the cost as much as possible.
- Finally, high-quality, finished human DNA sequence must be accessible. In order to be
useful, sequence data needs to be rapidly available to the entire research community.
Informed by a series of workshops over the past year that reviewed research progress
and identified genome research opportunities, Human Genome Project leaders recently met with
more than 100 representatives from a range of scientific disciplines to develop the next 5-year
plan, scheduled to begin in the fall of 1998. With both the physical and genetic maps complete,
and human DNA sequencing pilot projects underway, goals of the 1998-2003 draft plan
considered at that meeting focused on:
- completing a full, highly accurate and contiguous human genome DNA sequence;
- further development of technologies for steadily increasing sequencing capacity and reducing
- studies of variations in human DNA; studies of how large sets of genes function;
- studies of the similarities and differences between the human genome and those of important
- improved computer methods for data management; and
- studies regarding the ethical, legal and social implications of the HGP.
Private Sector Developments
Just prior to the HGP planning meeting, industry researchers from The Institute for
Genomic Research (TIGR) and Perkin Elmer, Inc. announced a plan to apply a DNA
sequencing strategy they had used on micro-organisms to produce a "rough draft" of the human
genome sequence. The sequencing strategy recently proposed by Perkin-Elmer, Inc. and TIGR
differs from the public effort in two significant ways: quality and access.
First, that strategy, called "whole-genome shotgun sequencing", employs fragments that
have not been previously mapped or catalogued prior to sequencing. Because scientists will not
know where in the long chain of 3 billion base pairs the fragment might belong, the task of
reassembling the fragments becomes far more difficult. This difficulty in reassembly inevitably
will lead to gaps and misassemblies in the sequence. Some of these may occur in DNA regions
with great biological significance. The private sector approach does not propose to fill in all the
gaps left by these unsequenced fragments, thereby creating a product that will be incomplete for
many research uses.
Secondly, release of sequence data from the Perkin-Elmer-TIGR effort will occur
quarterly, rather than daily. The policy of daily release of DNA sequence data by
publicly-funded efforts was arrived at because of the great interest in the scientific community in
gaining access to this highly valuable information. Any delay can result in wasted effort in
Deliberations on Five-Year Research Plan
Because the industry plan seemed to parallel some aspects of the federal Human Genome
Project, planners and advisors to the NIH-DOE program have been debating extensively how
the two proposals could be matched up. The scientists, at the recent planning meeting on the
draft HGP 5-Year Plan, concluded that while the two projects should complement one another,
the federal project should continue its plans to provide high-quality human DNA sequence as
soon as possible and that all data should be freely accessible.
Those conclusions rested on a few key factors:
- The industry effort may not deliver the product in the time and manner proposed. The
industry approach to sequencing has not been tried on large and complex genomes, such
as the human, and depends on newly developed and unproven machines. Data to
evaluate the "whole genome" shotgun approach will initially come from a trial project on
the fruitfly, Drosophila, but is not expected on the human for at least 12 to 18 months;
- The industry plan will produce a large amount of highly useful sequence data, but this
plan will yield a qualitatively different product that will likely contain tens of thousands
- The industry plan calls for release of sequence data on a quarterly basis, and patenting of
100-300 "gene systems." While quarterly data release is commendable, the plan is not as
strong as the standards established by the international sequencing community which
require release of data within 24 hours and discourage patenting. Further, some
concerns were expressed that the private effort's commitment to data release might
diminish over time, if business pressures came to the forefront.
In view of those concerns, advisors at the planning meeting enthusiastically made several
- The publicly funded genome project should continue with plans to provide a complete,
high-quality human DNA sequence by the year 2005, and sooner if at all possible;
- All possible steps must be taken to ensure that all sequence data remain in the public
- The publicly funded effort should take advantage of technology advances to increase
sequencing capacity as much as possible as soon as possible to meet research needs, both
for sequencing of the human and model organisms; and
- The sequencing of DNA regions of high utility and research interest should be
Now, Human Genome Project leaders at the NIH and DOE are considering that advice
as they put the final touches on the new research plan, which will be published in the fall of
1998. The complete plan will contain details for all of the Human Genome Project's goals,
including sequencing, gene function, human variation, technology development, and Ethical
Legal and Social Implications.
The private and public genome sequencing efforts should not be seen as engaged in a
race. In fact, scientists at TIGR and Perkin-Elmer have expressed their enthusiasm for a
continued vigorous public effort on the HGP, and have conveyed their willingness to collaborate
with NIH and DOE on the production of the complete human sequence. The NIH and DOE
welcome this collaborative approach, as the whole should be greater than the sum of the parts.
Mr. Chairman, I commend you, and the Members of this Subcommittee, for convening
this hearing today. The impact on the future of biology of knowing the order of all 3 billion
human DNA bases has been compared to Mendeleev's establishment of the Periodic Table of
the Elements in the 19th century and the advances in chemistry that followed. The complete set
of human genes--the biologic periodic table--will make it possible to begin to understand how
they function and interact. Rapidly evolving technologies, comparable to those used in the
semi-conductor industry, will allow scientists to build detectors that analyze tens of thousands of
genes in a single experiment. Scientists will use the powerful new tools to reveal the secrets of
disease susceptibility. This knowledge will in turn allow researchers to create broad new
opportunities for preventive medicine, lay the foundation needed to develop and better target
effective therapeutics, and provide unprecedented information about the origin and migration of
The investment of substantial funds by the private sector in human sequencing reaffirms
the enormous value of Human Genome Project products and is a testament to the success and
value of the tools already developed by the publicly supported project. For the reasons outlined
above, it is not yet known what role this new endeavor will play over the long term in providing
the publicly available, detailed "A-to-Z" instruction book ultimately promised by the Human
Genome Project. Project leaders at the National Institutes of Health and the Department of
Energy look forward to close cooperation with Perkin-Elmer and TIGR as the new initiative
unfolds over the next few years.
This concludes my remarks. I would be pleased to answer any questions.