100 years of genetic research and science journalists are still confused

If you missed it, today's NY Times Science section has been dedicated to "The Gene" a concept invented 99 years ago by Wilhelm Johanssen.

Overall, the articles were very good, however as a scientist who wants to explain basic concepts of molecular biology to the masses, I have a few problems.

First, there is a misplacement of emphasis on how information flows from DNA to phenotype. The idea that the articles try to convey is that in the old model went along theses lines: DNA contains genes, each is copied into RNAs that are then translated into a certain type of protein ... and then presto the end result is a fully formed organism. Now apparently the new model is that the DNA encodes more than genes, it has all sorts of weird stuff mostly noncoding-RNAs, and that there is mass confusion in the biomedical sciences. There is also this epigenetics (as in DNA methylation and histone modification) our simple ideas have to be thrown out the window.

To this I say, WOT?

First of all there hasn't been a clear paradigm shift in the biomedical sciences. In fact our view has essentially remained unchanged since the 1970s. DNA encodes three different types of information.

1 - Protein-coding genes. These are the "classic" genes that get transcribed into RNA that is subsequently spliced, processed and then exported into the cytoplasm where they are translated into proteins. These genes are highly conserved and contain ALL the information needed to make proteins. Proteins act as the tools, machines and scaffolds that are found inside and outside of the cell. They are highly versatile and have extremely complicated functions. They are modified, transported, and eventually destroyed. In any biological process, such as cell migration or cholesterol biosynthesis, proteins are the main players that determine how thee activities will proceed. Most biologists out there study what proteins actually do.

2 - Genes that specify non-coding RNAs (ncRNAs). Here is where much of the hoopla has centered. These genes produce two types of RNAs, catalytic RNAs that act like molecular machines and are known as ribozymes and non catalytic small RNAs that modify the expression of classic genes.

This first class of RNAs, which include ribosomal RNAs, tRNAs and snRNAs, are the most ancient genes in all of biology. They have been known by the biological community for the past 40 years and have some of the most important activities inside the cell, such as protein synthesis and RNA splicing. Are there other catalytic RNAs? There must be, but few have been found. Protein enzymes are just better - they are smaller and much more versatile then their RNA counterparts. The RNA enzymes that are still with us are probably too central to biological function for them to be replaced. There is by far more ribosomal RNA in every cell then there is DNA or protein. We are basically huge ribosome creatures. Our cells spend most of their efforts making ribosomes and regulating ribosomal function.

So now we have the second class of genes that specify, ncRNAs, the newly discovered small regulatory RNAs. Although proteins are generally more versatile, RNA does have one advantage, it is a molecule that can pair up with complementary sequences found on other RNA or DNA molecules quite easily. These 20ish nucleotide long RNA creatures employ this advantage to help recognize target RNA and/or DNA sequences that are then acted on by the real enzymes, proteins. So when a miRNA binds to a region of an mRNA, the RISC protein complex can then direct this mRNA to P-bodies where the RNA is silenced. Fundamentally miRNAs regulate how mRNAs are translated.

In addition to these two types of RNAs, people have noticed that there is all this non-specific RNA being transcribed off of non-conserved sequence. Most scientist believe that the actual content of this RNA is probably not important although some believe that the actual act of transcription itself may actually play a role in regulating how the genome is organized.

That leads us to the last important bit of genomic code ... the one that is constantly being ignore in popular science.

3 - DNA elements that modulates how genes are transcribed into RNA and how the genome is organized. There are promoters, enhancers, silencers and many other functional DNA bits. And this is the part that irritates me the most. These elements have been known for ever!!!! But journalists and certain bioinformatics specialists either ignore, downplay or are simply ignorant of their existence. Take a look at this sentence from Carl Zimmer's article:

As part of the Encode project, scientists identified the location of variations in DNA that have been linked to common diseases like cancer. A third of those variations were far from any protein-coding gene. Understanding how noncoding RNA works may help scientists figure out how to use drugs to counteract genetic risks for diseases.

NO!!! The vast majority of these mutations that are away from protein-coding genes probably map to these DNA elements.

In some ways these elements are the most interesting bit of the genome but some of the most complicated. They are not only ill-defined, but in addition it we have no good way to predict how they will influence the transcription of nearby genes into RNA. This is the true black box of the genome. What the ENCODE results suggested was that these DNA elements are highly conserved, and make up a significant chunk of the genome (at the very least, the same % as the coding bits). The difference between a neuron, and a liver cell, is mostly due to the protein content and this in turn is dictated mostly by how these DNA elements activate or repress the transcription of nearby genes. Sure small ncRNAs and epigenetic mechanisms modulate gene expression, but a HUGE part of the picture lies in how these ill-defined DNA regulator elements affect the the transcription of protein-coding and non-coding genes.

Sure, it is unclear how transcription, epigenetic marks and DNA elements talk to eachother to generate expression patterns, but DNA elements are a fundamental part of the puzzle, one that does not register in our public understanding of biological systems.

More like this

I finally read the huge Nature paper that everyone has been talking about, the ENCODE project, or the encyclopedia of DNA Elements. ENCODE is a large scale concerted effort whose goal is to understand how the genome is used, maintained and conserved. In other words, what parts of the genome get…
You know that organisms develop, grow, and function in part because genes code for proteins that form the building blocks of life or that function as working bioactive molecules (like enzymes). You also know that most DNA is junk, only a couple percent actually coding for anything useful. Most…
The textbook explanation of DNA goes something like this: enzymes in our cells read a stretch of DNA and convert its code into a single-stranded RNA molecule, which is then used by ribosomes as a template for building a protein. That stretch of DNA biologists call a gene. The protein it encodes…
Over the last few years it has become increasing clear that gene expression is partially regulated at the mRNA level. What do I mean by that? In eukaryotic cells, the first step of gene expression occurs in the nucleus when regions of DNA are transcribed into RNA. These "transcripts" then…

This is a great post, Alex. You can consider this your civic duty of the day for clearing up this confusion. I hope the author gives it a read.

I didn't get the impression from the NYT article that the old ideas of gene expression has to be thrown out, and your post seems to argue more with other science articles than this one. However your summary was helpful.
One thing the NYT article mentions that I didn't see explained was how a sequence can encode more than one protein...


I can answer your question about how a sequence encodes for more than one protein.

There isn't a direct, 1 to 1 mapping between a DNA sequence being transcribed, and the sequence of an mRNA, at least in eukaryotes. After DNA is transcribed, the pre-mRNA gets spliced, that is, sequences inside of this transcript get dropped out, which are called the introns. The remaining RNA (called the exons) get combined, and then modified to make them more stable, so that they do not get chewed up by exonucleases (proteins that break nucleic acid bonds) before translation. Actually, the exons combine at the same time that the introns get dropped.

Here's a simple example, assume that the asterixs are the two intronic sequences:

5'-AAAA****GGGG****UUUU-3' -> 5'-AAAAGGGG-3' or 5'-AAAAUUUU-3'

Some coding regions have upwards of 20 introns, so you can imagine that the number of ways of combining the exons to create an mRNA for a given coding region is enormous, leading to a wide array of proteins when they get translated.

By A Rusty Butter Knife (not verified) on 11 Nov 2008 #permalink

Aren't most promoters, enhancers, and silencers quite close to the genes they regulate?


Just like Rusty wrote, alternative splicing (along with alternative transcriptional start sites and alternative transcriptional end sites) can alter the code that specifies what protein is being made, where it is going to made and how much of it is going to be made. And yes, my gripe is a general one, and yes, in Carl's article he does mention these DNA regulatory sequences, but as I pointed out what was overlooked was these DNA regulatory elements probably account for 2% of all your genome (to place that in perspective, that's about as much as the protein-coding part) and are a MAJOR peice of the puzzle. And here's something else for you to think about, every protein coding gene in the human genome has a counterpart in almost every other vertebrate. That's whay on the cellular level, your cells are not that much different from a mouse's cell. We all have the same tools (i.e. proteins). Then why is it that we look so different? It probably has more to do with how many cells each organ has, how cells talk to eachother, and when/where these genetic programs are activated. When and where these "tools" are made - yes most differences are in the DNA regulatory elements that in large part determine when and where each tool is made. These elements are probably the most important difference between your genome and that of a mouce, cow or bat. Do you ever here this concept in the articles produced by science journalism?


That is what most researcher once thought, however it appears that in most instances DNA regulatory regions may influence the transcription of genes that are far away. Of course "near" and "far" are relative terms.

The newest studies seem to suggest that there is some weird interaction between 1) DNA regulatory elements, 2) transcription of junk RNA, and I mean the actual process of transcription and not the RNA transcript which is usually degraded almost immediately, and 3) epigenetic marks, mostly modifications of the proteins that hold the DNA together. It appears that all three processes deeply affect eachother.

All this talk about a "crisis" in our understanding of the "gene" is overblown.


Great post which I am grateful for.

Can I however ask for some clarification, the first and third type of products, are both proteins. The distinction you are making is that the first go out and do their work in the cytoplasm, the third type however do their work in the nucleus.

So isn't the DNA element coding gene essentially a protein coding gene?


Glad you liked it. But here's the clarification, the first class of "DNA elements" codes for proteins, hence the name "protein-coding gene". The encoded proteins can end up anywhere, the nucleus, cytoplasm, the extra cellular environment etc. The third class of "DNA elements" do not code for protein, rather they contain specific sequences that affect how genes within their "vicinity" are activated or repressed. These elements (often called promoters, enhancers, repressors or silencers) by virtue of their DNA sequence can bind to a certain class of proteins (generated from protein-coding genes) directly.

For example, infront of the galactosidase gene (class 1, from my list) there is a galactose promoter (class 3). This promoter recruits proteins called transcription factors (TFs) to the DNA. The TFs then recruit RNA polymerase to the start of the nearby galactosidase gene which can then be transcribed to form mRNA which is eventually translated into a protein. So the amount of galactosidase protein depends on the galactoside promoter's ability to recruit the right TFs. To reiterate, the gene's activation (a class 1 DNA element or "gene") depends on a nearby DNA sequence (a class 3 DNA element).

Cheers AP,

Of course! Thanks for taking the time to answer what must have been a pretty daft question.

Appreciate it.

It's simple: journalists don't have a clue. Look at Nov. 11 story entitled "Scientists and Philosophers Find That "Gene" Has a Multitude of Meanings".

It's no coincidence that the pic's legend says "Evelyn Fox Keller, left, a science historian, calls the language of molecular biology "historical baggage," while Eric S. Lander of the Broad Institute says he is not worried about any confusion that may arise in references to "genes."

Well, Lander obviously has a clue. He tells it explicitely and accurately: "We're trying to parse an incredibly complex system ... You shouldn't be worried about the fact that you have to layer on other things as you go along .. You can never capture something like an economy, a genome or an ecosystem with one model or one taxonomy - it all depends on the questions you want to ask".

He is right. "Gene" is an abstract concept and abstract concepts don't exist in pure form in real life, so the 100% accurate and 100% applicable definition of it is impossible - just like for any other abstract concepts like "color green" or "mountain" or "tree".

On the other hand, we have Fox Keller, a science historian and professor emeritus from Harvard. She obviously is not a scientist and it's not surprising that she has no clue: "The language is historical baggage ... it comes from the expectation that if we could find the fundamental units that make stuff happen, if we could find the atoms of biology, then we would understand the process ... but the notion of the gene as the atom of biology is very mistaken ... we have to get away from the underlying assumption of the particulate units of inheritance that we seem so attached to".

I am thinking that she is simply not aware of the fact that no serious biologist had thought of genes as particulate units of inheritance for a very, very long time... (Not to mention that she propably never pondered a question of whether plasma consists of atoms and if not, whether it means that matter is not built of atoms either).

Further illustrating the point is NYT's graphics "A Bestiary of RNA". Note that it depicts a left-handed double helix DNA. (A favorite peeve of mine). It means that not a single person involved in the production has a clue or enough attention span to realize that helix can turn two ways and that DNA is, normally, only one of them. (Even funnier that the graphics now comes with a correction that corrects labels of sense and antisense strands - totally arbitrary, in a strict sense - but still shows left-handed DNA).

Surely someone ought to point out that Evelyn Fox Keller has PhDs in both physics and molecular biology, and she's currently a professor (history and philosophy of science) at MIT. While her professional career has been oriented towards the history/philosophy side, I doubt that she is uninformed about the science, and especially the history of if.

Wikipedia article