What is a Gene?

By evolgen on October 12, 2006.

One of the greatest developments of the post-genomic era has been the refinement of the concept of the 'gene'. The central dogma states that genes encode RNA transcripts which are translated into the amino acid sequence that makes up a protein. But protein coding genes make up a small fraction of many genomes, so what does the rest of the genome do? Some say it's junk. Others say that it's involved in regulating the transcription of the other regions. And even others say that it's transcribed, but not translated. (Note: most think it's some combination of the three.)

We're now discovering that lots of those non-protein-coding regions are actually transcribed into RNAs. Those RNAs may be transcriptional mistakes or they can be functional non-protein-coding RNA. A recent study in Drosophila melanogaster revealed that nearly 30% of the transcribed sequence has not yet been annotated as a functional transcript. Only 29% of those unannotated sequences can be explained as alternative exons of known genes. The other 71% are either unknown protein coding genes or a bunch of untranslated RNAs that need to be characterized (I'd put my money on the latter).

In classical genetics, a gene is a region of the genome that, when mutated, produces an observable phenotype. What would you say is the post-genomic definition of a gene?

More like this

It's a bit like "what is a planet", isn't it. Interesting that astronomy has never had an adequate definition of a planet and biology lacks an adequate definition for a gene.

Problem is that you end up either with a broad definition that covers all cases but is informative (e.g. 'a unit of hereditary information') or something specific for which there's bound to be an exception. I'm sure we can devise something that works though. You'd have to consider the common processes - a template + transcription, for instance.

"Is informative" should be "is not informative" in that last comment, obviously.

For years I have been teaching my students that a gene is a segment of DNA that codes for a single RNA molecule with a complementary sequence, regardless of whether that RNA molecule is translated or not. This definition takes into account the genes for the various rRNAs and tRNAs, which are not translated, and also other forms of non-translated RNA that have recently been discovered. By this definition, genes that code for mRNAs that are actually translated are distinguished as "structural genes," using terminology that was first developed to describe the Jacob-Monod model of the lactose operon. Using this same terminology, the gene that codes for the lactose repressor protein is a "regulatory gene," insofar as the repressor does not function in an "extrinsic" biochemical pathway, but rather participates in the regulation of other structural genes.

However, the distinction between "structural" and "regulatory" genes outlined above is insufficient to describe the various kinds of genetically significant DNA sequences now known. For example, it does not include regions of the DNA to which protein regulators bind, but which are not themselves transcribed. It also does not distinguish between RNAs that are translated into proteins (either enzymes or repressor/regulator proteins) and those that are transcribed into RNA but never translated (such as rRNA, tRNA, and the newer non-translated RNAs).

Given the foregoing, it appears to me that there are four (possibly five) functionally different kinds of DNA coding sequences:

(1) translatable sequences: those DNA sequences that are both transcribed into mRNA and later translated into proteins, regardless of function (these can be further subdivided into proteins that participate in non-DNA related biochemical pathways and those that directly regulate DNA, but those seem to me to be classifications of the proteins, not the DNA sequences that code for them);

(2) transcribable sequences: those DNA sequences that are transcribed into RNA (i.e. rRNA, tRNA, etc.), but are not later translated into proteins/polypeptide chains. Again, what the RNAs do after being transcribed is not a function of the DNA, but rather of the RNAs, and therefore should not really be used to classify DNA coding sequences;

(3) binding sequences: those DNA sequences that are not transcribed into RNA nor translated into protein, but which function as binding sites for regulatory molecules such as repressor proteins, homeotic gene products, etc. While such sequences do not code for the production of a transcribed or translated gene product, they still participate in the regulation of other genes by serving as regulatory binding sites; and

(4) non-binding sequences: those DNA sequences that are not transcribed into RNA, not translated into protein, nor function as binding sites for regulatory moelcules. Such sequences would include highly repetitive sequences, tandom repeats, "spacer DNA", pseudogenes, retroviral and transposon inserts (both "dead" and potentially "alive"), etc. This latter category could be further subdivided into "functional" non-coding/non-binding DNA sequences versus "non-functional/parastitic" non-coding/non-binding DNA sequences, depending on whether they arise as part of the functional architecture of the DNA (primarily of eukaryotes), or whether they arise as side-effects of the action of parasitic genetic elements, such as retroviruses or transposons.

There may be other categories of DNA sequences that have other functions, but right now I can't think of any. Therefore, this is how I intend to teach the concept of a "gene" to my students at Cornell from now on.

So much for the Beadle/Tatum "one gene, one enzyme" model, eh? And the classical Mendelian definition of "one gene, one phenotypic trait" is no longer viable as well...

Check this out:

http://philsci-archive.pitt.edu/archive/00002494/

http://philsci-archive.pitt.edu/archive/00002641/

http://philsci-archive.pitt.edu/archive/00002127/

I am glad to see someone is worrying about the definition of the gene. I feel the gene is such a weak concept that it should not be treated as the basis of evolutionary theory. My approach, which I call bioepistemic evolution, is to regard data as fundamental to evolution and to define the gene in terms of data.

Thus, I have offered the following definition of the gene :-

Genes are subsets of the data set defined by the nucleotide sequence of DNA. To qualify as a gene, the data subset must be so formatted that it can be interpreted by an organism into a distinct biochemical activity. An important implication of this definition is that, because biochemical activities are distinct and chemically separable from other such activities, genes may become manifest as distinct and distinguishable, biological phenotypes.

I would like to refine this definition of the gene to maximise its generality and would like to hear any critiques.

Sincerely

John Hewitt

Didn't we just hear that at least some non-gene sequences code for RNA that inhibits some gene expression? The idea is that this can fine-tune how the gene is used. It seems odd that the cell would create a protien, then use RNA to destroy it. But, the idea is that this is much more repsonsive than other techniques.

I'm not an expert. I thought genes were protien coding regions.

We have always had a working definition of 'planet'. Basically anything that moves in the sky is one. There are seven, named after the days of the week. The Sun, Moon, Mars, Venus, Jupiter, Mercury, Saturn.

When the nature of these objects became apparent, the Sun got kicked out for fusion. The Earth got added when it was discovered that it also orbits the Sun. The Moon got kicked out when it was discovered that things orbit other planets.

What we need now is a lower bound and upper bound for the size of planets. The lower bound is arbitrary. So, they picked up on this idea that planets are round. The upper bound is also arbitrary. So, they picked up on this fusion thing. Any bigger, and you're a star. The rest is politics.

Moons also need a lower limit on size. Some say that Jupiter has four moons, and other junk orbiting it. Others say that Saturn has billions of moons (ring particles).

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

This is a Good-bye Post

January 16, 2009

This is the final post ever at evolgen. It was a fun 4+ years, the last three spent at ScienceBlogs, but it has come time for me to close up shop. When I first got into blogging, I did it as a way to share what was on my mind to the few people who would read what I had to say (usually in topics…

Mendel's Garden #27 - Call for Submissions

January 2, 2009

Mendel's Garden is the original genetics blog carnival. The next edition will be hosted by Jeremy at Another Blasted Weblog. If you would like to submit a blog post to be included in the carnival, send an email to Jeremy (jcherfas at mac dot com). The carnival should be posted within the next few…

Eric Lander Teaches?

December 20, 2008

John Hawks points out that Eric Lander has been appointed to co-chair Obama's Council of Advisers on Science and Technology along with science adviser John Holdren and Nobel Laureate Harold Varmus. Here's how the AP article describes Lander: Lander, who teaches at both MIT and Harvard, founded the…

The Implementation of Molecular Evolution for the Masses

December 18, 2008

A couple of years ago, there was talk in the bioblogosphere about getting the general public interested in bioinformatics and molecular evolution: Amateur bioinformatics? Lowering the Ivory Tower with Molecular Evolution Molecular Evolution for the Masses The idea was inspired by the findings of…

Do people still use microarrays?

December 17, 2008

Larry Moran points to a couple of posts critical of microarrays (The Problem with Microarrays): Why microarray study conclusions are so often wrong Three reasons to distrust microarray results Microarrays are small chips that are covered with short stretches of single stranded DNA. People…

More like this

This is a Good-bye Post

Mendel's Garden #27 - Call for Submissions

Eric Lander Teaches?

The Implementation of Molecular Evolution for the Masses

Do people still use microarrays?

Tracking Arctic Sea Ice

What is the Sun made out of?

Does water freeze or boil in space? (Synopsis)