Shortly after their press conferences, the two groups that had been striving for several years to map the human genome published their findings:
- the International Human Genome Sequencing Consortium (IHGSC) in the 15 February 2001 issue of Nature;
- Celera Genomics, a company in Rockville, Maryland, in the 16 February issue of Science.
These achievements were monumental, but before we examine them, let us be clear as to what they were not.
- Neither group had determined the complete sequence of the human genome.
Each of our chromosomes is a single molecule of DNA. Some day the sequence of base pairs in each will be known from one end to the other. But in 2001, thousands of gaps remained to be filled.
What they had done was present a series of draft sequences that represented about 90% (probably the most interesting 90%) of the genome.
- Even taken together, the results did not provide an accurate count of the number of protein-encoding genes in our genome (in contrast to such genomes as those of
One reason: the
- large number and
- large size
of the introns that split these genes make it difficult to recognize the open reading frames (ORFs) that encode proteins.
The two groups came up with slightly different estimates of the number of protein-encoding genes, but both in the range of 30 to 38 thousand:
- barely two times larger than the genomes of
- and representing only 1–
2% of the total DNA in the cell;
- and a third of the 100,000 genes that many had predicted would be found.
- (By 2023, the number had been reduced to ~19,370.)
Are the tiny roundworm and fruit fly almost as complex as we are?
Probably not, although we share many homologous genes (called "orthologs") with both these animals.
But,
Follow this link to a discussion of the role of changes in gene regulatory regions in the evolution of animal form. |
Although there are some giants such as
- dystrophin with its 79 exons spread over 2.4 million base pairs of DNA;
- titin whose 363 exons can encode a single protein with as many as ~38,000 amino acids,
the average human gene contains 4 exons totaling 1,350 base pairs and thus encodes an average protein of 450 amino acids.
The density of genes on the different chromosomes varies from
- 23 genes per million base pairs on chromosome 19 (for a total of 1,400 genes) to
- only 5 genes per million base pairs on chromosome 13.
Humans, and presumably most vertebrates, have genes not found in invertebrate animals like Drosophila and C. elegans.
These include genes encoding
- antibodies and T cell receptors for antigen (TCRs) [Discussion]
- the transplantation antigens of the major histocompatibility complex (MHC) (HLA, the MHC of humans) [Link]
- cell-signaling molecules including the many types of cytokines
- the molecules that participate in blood clotting. [Link]
- mediators of apoptosis. Although these proteins occur in Drosophila and C. elegans, we have a much richer assortment of them.
Both groups added to the list of human genes that have arisen by repeated duplication (e.g., by unequal crossing over) from a single precursor gene; for examples,
Both groups verified the presence of large amounts of repetitive DNA. In fact, this DNA —
with similar sequences occurring over and over —
is one of the main obstacles to assembling the DNA sequences in proper order.
All told, repetitive DNA probably accounts for over 50% of our total genome.
- Keep looking for genes and determining their function.
As of 2021, 19,969 protein-encoding genes have been positively identified, but the function of many is still unknown.
- Determine the human proteome; that is, the total complement of proteins we synthesize.
- Analyze how clusters of genes are coordinately expressed
- in various types of cells
- at different times in the life of a cell.
Such analysis will benefit greatly from the availability to gene chip technology and will also help us to understand how such a modest increase in gene number from Drosophila to humans could produce such a different outcome!
- Determine the genomes of other vertebrates.
This will not only help us recognize more human genes but will give us insight into what makes us unique.
Already we know that large sections of our genome have closely-related homologs in the mouse.
Examples:
- The collection of genes — and even their order — on human chromosome 17 matches closely those of mouse chromosome 11. The same is true of human chromosome 20 and mouse chromosome 2.
- Humans and mice (also rats) share several hundred absolutely identical stretches of DNA extending for 200–800 base pairs.
- Some are present in the exons of genes, especially genes involved in RNA processing.
- Some are found in or near the introns of genes, especially genes encoding proteins involved in DNA transcription.
- Some are found between genes — especially those, like Pax6, essential to embryonic development — and may serve as enhancers.
To have avoided any mutations for 60 million years since humans and rodents went their separate evolutionary ways suggest that these regions perform functions absolutely essential to mammalian life.
As for the chimpanzee, a comparison of its genome with humans is discussed at this link.
22 October 2023