Two Big Genetics Studies

Two big studies on genetics came out in the past couple weeks, and I want to talk about both. One of them -- the ENCODE study -- was well covered by the media. The other seems to have slipped through.

Paper #1:

In the ENCODE study, the authors compiled data using a variety of experimental techniques focusing on a small portion (about 1%) of the human genome. There purpose here was to go deep; they wanted to thoroughly catalog in their target area all the transcriptional elements, all the resulting RNA sequences, all the histone and chromatin modifications, and all of the intronic and intergenic sequences.

The conventional view of the genome is that of genes that sit as islands surrounded by what had appeared to be unused genetic material. Basically, they wanted to answer the following questions: What is the purpose this unused genetic material? Is it really unused? Further, how is the transcription of the known genes regulated?

What they found when they were truly thorough was some astonishing things, some of which we knew but did not have large amounts of evidence to support and some of which we did not know.

Here are their key findings:

  • The human genome is pervasively transcribed, such that the majority of its bases are associated with at least one primary transcript and many transcripts link distal regions to established protein-coding loci.
  • Many novel non-protein-coding transcripts have been identified, with many of these overlapping protein-coding loci and others located in regions of the genome previously thought to be transcriptionally silent.
  • Numerous previously unrecognized transcription start sites have been identified, many of which show chromatin structure and sequence-specific protein-binding properties similar to well-understood promoters.
  • Regulatory sequences that surround transcription start sites are symmetrically distributed, with no bias towards upstream regions.
  • Chromatin accessibility and histone modification patterns are highly predictive of both the presence and activity of transcription start sites.
  • Distal DNaseI hypersensitive sites have characteristic histone modification patterns that reliably distinguish them from promoters; some of these distal sites show marks consistent with insulator function.
  • DNA replication timing is correlated with chromatin structure.
  • A total of 5% of the bases in the genome can be confidently identified as being under evolutionary constraint in mammals; for approximately 60% of these constrained bases, there is evidence of function on the basis of the results of the experimental assays performed to date.
  • Although there is general overlap between genomic regions identified as functional by experimental assays and those under evolutionary constraint, not all bases within these experimentally defined regions show evidence of constraint.
  • Different functional elements vary greatly in their sequence variability across the human population and in their likelihood of residing within a structurally variable region of the genome.
  • Surprisingly, many functional elements are seemingly unconstrained across mammalian evolution. This suggests the possibility of a large pool of neutral elements that are biochemically active but provide no specific benefit to the organism. This pool may serve as a 'warehouse' for natural selection, potentially acting as the source of lineage-specific elements and functionally conserved but non-orthologous elements between species.

Where to even begin?

First, this bit about "pervasive transcription" -- for many years they taught us in genetics classes that the intervening sequences between genes were junk. Where they were not regulatory regions, they served no particularly purpose. Over the last ten years, we have learned that this statement is astonishingly untrue.

First, as this study shows, a lot of those regions are actually made into RNA. What that RNA does in all cases we don't know, but a reasonable argument could be made that it is turned into siRNA and used for the regulation of other genes. Second, we thought that the regulatory regions associated with genes were relatively sparse, and we before the gene they regulated. It turns out from this study that they are all over the place, and they are even after the gene who transcription they regulate.

Now this bit about chromatin regulation we sort of already knew, but it does really reiterate the point. The DNA in our cells is associated with proteins called histones. Together the polymer forms a substance called chromatin. We know that they are a veritable army of proteins that modify with chromatin with various tags such as methylation, phosphorylation and acetylation. We know from this study and others that it is these modifications that partially regulate transcription by regulating the binding of transcription factors at promoters and other transcriptional elements.

Finally, the real surprise from this study was about sequence conservation. By comparing these sequences with the sequences in related species, these researchers show that relatively little of the inter-genomic sequences in related species are evolutionarily conserved. This is particularly relevant when we consider the transcriptional binding elements because these do not appear to be constrained either -- implying that they are largely neutral for selection. As the article mentions, these could become constrained and non-neutral for selection randomly, thus acting as a reservoir for selection. (For a more studied look at the evolutionary implications of this article, read this post on Gene Expression.)

All interesting stuff.

Basically, the take-home from this paper that I would take is that genes are not the only interesting things happening in DNA. Because there are so many levels of regulation in the non-transcribed elements, they are really where the action are. Regulation can change very rapidly in evolution too because a single base change could alter whether a gene is expressed or not in a particular tissue. By changing whether a gene is expressed and when, you can in some cases radically altering the form or physiology of the animal involved.

(There is a pretty good article about the paper and the implications of RNA research in the Economist.)

Paper #2:

The Wellcome Trust Case Control Consortium
just came out with a huge paper looking for the genetic markers of commons diseases such as bipolar disorder and Crohn's disease. They performed genome wide scans of single nucleotide polymorphisms (SNPs) in 14,000 patients and 3,000 controls in a British population and associated them with whether or not the people had a particular disease. The diseases they looked at were bipolar disorder, coronary artery disease, Crohn's disease, rheumatoid arthritis, and type 1 and type 2 diabetes.

The study identified 58 separate alleles that confer a relative risk for having the diseases listed. Now, the exact alleles and techniques are not that relevant, but what is relevant is that for each of the identified alleles the relative risk is not that large. This suggests that while single alleles do not make a huge difference, having a bunch of risk alleles is a problem. Furthermore, combinations of alleles make confer the even greater risk. (We know that this is true from schizophrenia where a risk haplotype has been identified that is the combination of five alleles.)

Look for more of these studies -- massive genome-wide scans of the individual variations between people and their relationship to disease. Understanding these variations is now critical for using the human genome to develop individual treatments.


More like this

There's a new paper in Nature (OPEN ACCESS), Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project: ...First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in…
I finally read the huge Nature paper that everyone has been talking about, the ENCODE project, or the encyclopedia of DNA Elements. ENCODE is a large scale concerted effort whose goal is to understand how the genome is used, maintained and conserved. In other words, what parts of the genome get…
Since we still have someone arguing poorly for the virtues of the ENCODE project, I thought it might be worthwhile to go straight to the source and and cite an ENCODE project paper, Defining functional DNA elements in the human genome. It is a bizarre thing that actually makes the case for…
I just knew it. The second I read this abstract I just knew that the Uncommon Descent cranks would dust off their old "Junk DNA" harangue and suggest that if it wasn't for them, no one would believe that all that non-coding DNA had a purpose. Sal Cordova obliged, and it's the usual embarrassing…

Thank you for this interesting post. I hadn't seen the Economist article before today and had completely missed the points it makes about RNA in any previous online discussions I'd read about the Encode paper. Are we really about to see "Lamarckism" in some sense being taken seriously again or am I reading too much into the final paragraphs? I'm an interested layperson, not a scientist so I'd welcome a steer on how major a development this all is.

Both studies are very big deals for the same reason: understanding DNA is a bigger challenge than conventional wisdom would lead you to believe. The ENCODE study then suggests that RNA plays a bigger role than even us RNA guys understood. Thus, all this work regarding understanding an individual's DNA and how that leads to disease, personalized medicine, and all that is less interesting.

By jim novakoff (not verified) on 21 Jun 2007 #permalink

Clart, I would not jump right into "Lamarkism" (instead of "Darwinism") just yet. Having been brought up in Hungary have a general distaste to dogmatic "-isms"...

However, IMHO it is absolutely undeniable that the present paradigm-shift from the discarded "gene/junk" dogma into what we call the PostModern era, "Genomics beyond Genes" (PostGenetics; entirely glossaries and even some "sacrosaint axioms" ("Dogmatic Darwinism included") are seriously questionmarked.

On the "JunkDNA portal" (I am keeping the moniker, while putting Junk DNA *as a scientific term* to rest), there is already a major study, featuring experimental results that smoking affects DNA (in the sperm) - which is more "Lamarkian" than "Darwinian". (Smoking is more environmental than random - even "second hand smoking" is not random).

I agree that both studies "are very big deals"; one discards the gene/junk dogma, and the other puts questionmarks of "single cell deviations" (SNPs) in themselves.

The Economist is probably right that Isidore Rigoutsos' 2006 finding of "short repetitive sequences" (he calls them pyknons, I found among them fractal structures, see ) may be "the way to go".

Keep an eye on exciting developments,

Ni Hao!Nice sum of papers, poor bandwagon interpretation.Studies from largely empirical, non-hypothesis driven data generation industry that science has become should also be noted here. Lot of disease-related alleles are in non-coding, random places in the genome.In respect to RNA fascination, it's mostly junk. Except for what makes protein.Effect with which there is such fascination is random perturbation of plastic, metastable biological system.Paradigm shift is better described as rediscovery of wheel.As Mr. Twain said: There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact.MOTYR

By Mouth of the Y… (not verified) on 23 Jun 2007 #permalink