You would think that geneticists would have a good definition of "gene". After all, genes are what we study. In introductory biology courses, you may have been introduced to the concept of the gene as the unit of heredity. That's all well and good, but when you begin to study genes at a molecular level (i.e., looking at DNA sequences), that definition ceases to be practical. The advent of DNA sequencing led to the concept of the gene as an open reading frame, and the post-genomic era has challenged the very idea of the gene.
I've previously discussed the definition of gene (What is a gene?, What is a gene? -- yes, two different posts with the same title), but I didn't get into very many details. Alas, I don't feel like spending much time laying out my opinion, suffice it to say I think "gene" is an obsolete, overly generic term that should be replaced by a more specific term whenever possible. Luckily, the New York Times has published an article by Carl Zimmer sketching out some of the possible interpretations (Now: The Rest of the Genome ). This lets me pick and chose my favorite meaning from a variety of opinions represented in Carl's piece.
A lot Zimmer's article deals with the results from the pilot ENCODE project. One part of the project was a careful examination of which DNA sequences are transcribed into RNA. This led to some remarkable findings, including the discovery that a lot of transcripts consist of sequences encoded in different parts of the genome:
Encode's results reveal the genome to be full of genes that are deeply weird, at least by the traditional standard of what a gene is supposed to be. "These are not oddities -- these are the rule," said Thomas R. Gingeras of Cold Spring Harbor Laboratory and one of the leaders of Encode.
A single so-called gene, for example, can make more than one protein. In a process known as alternative splicing, a cell can select different combinations of exons to make different transcripts. Scientists identified the first cases of alternative splicing almost 30 years ago, but they were not sure how common it was. Several studies now show that almost all genes are being spliced. The Encode team estimates that the average protein-coding region produces 5.7 different transcripts. Different kinds of cells appear to produce different transcripts from the same gene.
Even weirder, cells often toss exons into transcripts from other genes. Those exons may come from distant locations, even from different chromosomes.
So, Dr. Gingeras argues, we can no longer think of genes as being single stretches of DNA at one physical location.
"I think it's a paradigm shift in how we think the genome is organized," Dr. Gingeras said.
Another highly touted finding from ENCODE was that the majority of the genome is transcribed. This led some people to conclude that much of the genome consists of undescribed functional elements.
These discoveries left scientists wondering just how much noncoding RNA our cells make. The early results of Encode suggest the answer is a lot. Although only 1.2 percent of the human genome encodes proteins, the Encode scientists estimate that a staggering 93 percent of the genome produces RNA transcripts.
John Mattick, an Encode team member at the University of Queensland in Australia, is confident that a lot of those transcripts do important things that scientists have yet to understand. "My bet is the vast majority of it -- I don't know whether that's 80 or 90 percent," he said.
That would mean the human genome is chock full of genes. However, just because something is transcribed does not necessarily mean that it is functional. Many sequences may be aberrantly transcribed, representing merely background noise. That is, a lot of the potential "genes" aren't really genes at all. Of all the people quoted in the article, I find myself agreeing with Ewan Birney and David Haussler the most:
Despite the importance of noncoding RNA, Dr. Birney suspects that most of the transcripts discovered by the Encode project do not actually do much of anything. "I think it's a hypothesis that has to be on the table," he said.
David Haussler, another Encode team member at the University of California, Santa Cruz, agrees with Dr. Birney. "The cell will make RNA and simply throw it away," he said.
Dr. Haussler bases his argument on evolution. If a segment of DNA encodes some essential molecule, mutations will tend to produce catastrophic damage. Natural selection will weed out most mutants. If a segment of DNA does not do much, however, it can mutate without causing any harm. Over millions of years, an essential piece of DNA will gather few mutations compared with less important ones.
Only about 4 percent of the noncoding DNA in the human genome shows signs of having experienced strong natural selection. Some of those segments may encode RNA molecules that have an important job in the cell. Some of them may contain stretches of DNA that control neighboring genes. Dr. Haussler suspects that most of the rest serve no function.
We're still left without a concrete definition of a gene, which leads me back to my original conclusion: we should simply abandon the term when dealing with anything beyond simple classical genetics. The gene is far too general, and more specific terminology is warranted in most cases. And I haven't even touched on the importance of epigenetics (i.e., heritable chromatin modifications, DNA methylation, etc.) and how that affects our definitions.
I'd be interested in your comments on this kind of work, especially the 2004 paper by Stotz, Griffiths, and Knight. These are philosophers trying to flesh out the gene concept by doing, among other things, surveys of biologists: .
For some reason, the link didn't show. It's www dot representinggenes dot org slash publications dot html.
Back when I was teaching introductory biology courses, I told the students that our working definition of genes would change as we learned more and more about them during the course. Is it that the definition has become so fuzzy that the term gene is no longer useful?
All genetic material available at any stage of life span participate in forming the epigenetic control system of the organism - a hierarchical organization including the whole organism, organs, tissues and cells and carrying four fundamental programs of life: development, maintenance, reproduction and death. The problem is that the physical carrier of those programs is yet unknown and must be found by physicists, not biochemists.
That is why the current genetics and epigenetics are at the dead end. See www.misaha.com
Classical geneticists now think of the gene as the single nucleotide polymorphism, the multiple nucleotide polymorphism, the copy number variant, etc. Originally, the gene was recognized by its effect on the phenotype. In my view, and in the view of many of my colleagues, it is incidental where this polymorphism resides, whether in protein coding or non-coding DNA. We are interested in how these polymorphisms affect the phenotype, and that relates to how they affect the function of proteins, gene regulation, and other control systems. I think questions about what is a gene in the scheme of the central dogma, of one DNA, one RNA, one protein, may be post modern angst.
I've been using the following definition for about twenty-five years.
"A gene is a DNA sequence that is transcribed to produce a functional product." [What Is a Gene?]
It seems to work pretty well.
I'd go along with Bill (comment #5). Though doing this make you focus more on genetic variants (that is, genes that have an effect on phenotype) and not those that are silent. But I don't really care if there's something that could actually be a gene and has no effect whatsoever: this is less interesting with regard to the evolutionnary process (I didn't say it is not interesting at all, mind the Evolution Concept Police), except it be a potentiality for further evolution (when it will turn a variant).