On plausible alternative hypotheses

Nic Wade says something very strange in his most recent article on whole genome sequencing in reference to the outcomes of genome-wide association studies:

The results of this costly international exercise have been disappointing. About 2,000 sites on the human genome have been statistically linked with various diseases, but in many cases the sites are not inside working genes, suggesting there may be some conceptual flaw in the statistics.
Erm... or maybe many common variants affecting the risk of complex diseases simply aren't found in protein-coding regions? That's the (biologically entirely plausible) hypothesis that most complex disease geneticists are working under right now.
I'm guessing the statement is an oblique reference to the recent synthetic association paper from David Goldstein's group? If so, it's worth noting that the claims made in that paper are seriously contentious among others in the field.
I'm certainly not alone in my puzzlement here; in a comment on a previous post, p-ter also gives Wade's statement a hearty wtf.

More like this

This critique of genome-wide association studies by Jon McClellan and Mary-Claire King in Cell is the latest salvo in a prolonged backlash against genome-wide association studies (GWAS). I hope to have more on the McClellan and King paper shortly, but in the meantime I will point you to a positive…
The latest issue of the New England Journal of Medicine has four excellent and thought-provoking articles on the recent revolution in the genetics of common disease and its implications for personalised medicine and personal genomics. Razib and Misha Angrist have already commented, and there's…
Personal genomics is a rapidly evolving game, with a clear end goal in sight: offering consumers an accurate, affordable and complete genome sequence, and providing them with tools to dig out the useful nuggets of information contained therein. That goal remains out of reach, and while DNA…
The genome-wide association study has been the technique du jour in human genetics for much of the last two years. It's a pure brute force approach, surveying up to a million sites of common variation throughout the genomes of thousands of people at a time, some of whom suffer from a particular…

Do you really think he's "usually" a reliable writer on genetics? I can't think of a single article he's written in the last year or two that hasn't included gross generalizations/misinterpretations/inaccuracies such as the one you quoted above... for example, his breathless, uncritical writeup of Goldstein's recent synthetic association presented the paper as a brand new novel concept that shows how "mainstream" human geneticists studying GWAS are wasting money and time when essentially the only message of the paper was that... LD exists.

@J--I agree, wholeheartedly.

Does anyone else find it strange that the maximum sample size that Goldstein considered was 6000? One would think that if he truly wanted to scientifically test his hypothesis that the "missing heritability" is due to rare variants (and I'm sure that at least some of it is), he would test competing hypotheses effectively...like the role of epistasis. Admittedly, studying epistasis in human populations is prohibitively expensive. But it seems premature to exclude epistasis as a potentially strong contributor to common human disease, particularly given strong evidence for its role in animal models of common complex diseases such as obesity.

Piling on, but: If one were to construct a metric whereby the ratio of quality/prestige were estimated (quality meaning scientific accuracy, proper framing of story, etc, and prestige meaning the profile of the publication for which one writes), I suspect Wade's q/p value would be near the bottom in a ranking of science writers. I am not alone in the sentiment among geneticists.

this was the line I coughed on:

"The finding implies that common diseases, surprisingly, are caused by rare, not common, mutations."

but that's not true, either - most common diseases aren't genetic in origin, no matter how sophisticated your sequencing technology. The origin of common diseases like heart disease and cancer most often lies in environmental causes - read, behaviors & exposures - not genetic ones.

DNA matters, but the idea that somehow science will be able to spelunk its way to determine genetic causes of everything that ails us is not only incorrect, but perpetuates the idea that our health is preordained by our DNA. And that's simply not the case.

Does this Nic Wade dude have any genetics background? I can think of at least three reasons why common variants are found outside of a gene:

1) The variant is located on non-coding regulatory elements.
2) There exists strong linkage disequilibrium between that variant and the yet unknown functional variant within the gene.
3) Genome evolution specific to a population that causes high correletion between that variant and the yet unknown functional variant.

By Geneticist fro… (not verified) on 11 Mar 2010 #permalink

Thomas is correct - this stuff about "genetic causes" of complex diseases is distracting and misleading. (Wade also wrote: "More common diseases, like cancer, are thought to be caused by mutations in several genes"). Not a few people predicted that GWAS results would be weak if looking just at genetics plus (often poorly defined) disease end point and ignoring the cause. Pale skin is NOT the cause of melanoma. And which are the genes that cause lung cancer? Of course the skin colour provides amazingly useful information susceptibility prevention but the genes don't cause the disease.

I don't think though that GWAS have not been very useful, I think that they have and that they are/were a necessary "small" step on te way to uncovering really useful information about gene-environment, thats for the present decade to sort out please. I also think that the many rare variants hypothesis is possible, perhaps will account for some cases but to go head first into massive sequencing studies is to use the blunt side of Occams Razor

As for the article which is the subject of the post - let's just go with wtf

Can't tell you how delighted I was to see your opinion here; that exact same comment hit me in exactly the same way - WTF??

Are we then moving back to the "junk DNA" insanity??

Glad to see so many are not.

The people piling on here show the worst in crowd behavior. Nick Wade makes one slightly strange (not wrong, strange) statement and suddenly he's the worst rather than the best science journalist at the Times?

What about Natalie "race doesn't exist" Angier? Anyone who's worked with AIMs knows she was completely full of it.

What about the editors who decided to print an uncritical review of Richard Nisbett's pabulum on intelligence testing (contradicted by his very own publications)?

If you're invested in common variant, common disease, of course you don't want Wade to show you up. His statement was AT LEAST half right -- there WAS a serious error in the statistics, which no one can deny -- so the opprobrium here is way out of line.

As for the other half, you can easily interpret his statement as saying

"Exome sequencing of Mendelians is finding strong hits with samples of just a few. Mapping chips with lots of variants in noncoding regions found basically no hits on large samples."

In other words, a serious flaw in the statistics led to us spending our base pair budget on coding rather than noncoding regions.

By realistic (not verified) on 12 Mar 2010 #permalink

Oh yeah and for those who may contend that "well, exome sequencing and mapping chips are based on different technologies, so the concept of a base pair budget is only there in the abstract"

Even that is not true. It was well within our capabilities to build resequencing arrays for 1MB of exonic sequence rather than for 1 million common SNPs.

In other words, the common variant/common disease guys have much to answer for and of course they are pissed that Wade is calling them to account. Goldstein may not have the right answers but he sure does have the right questions. The CV/CD guys didn't even have that, and about 5 years of medical research was spent suboptimally (not entirely down the drain, but suboptimally).

By realistic (not verified) on 12 Mar 2010 #permalink

Over the past 2 years, Wade/NYT have made a series of confident assertions about complex trait genetics. Most such articles are liberally sprinkled with quotes from Wade's go-to guy, Goldstein/Duke. Does it strike you that they've done violence to this rapidly moving area? How can you take a highly complex area and reduce it to the sound-bite level without losing all the caveats and details? And, why aren't we hearing from people with contrary views?

Have you read the Dickson..Goldstein PLoS Biology paper? It is mostly an in silico simulation study that used a genetic model that has a number of important limitations and even some potentially damning flaws. The authors are a bit liberal with their interpretation and should have referred to the ample literature on the topic, but it is a solid contribution. However, the main conclusion is pretty obvious, "LD exists" (#2 above).

The "wft" aspects of Dickson..Goldstein are: they put out press releases for a stat gen in silico study and two journalists from high-end outlets (NYT and Nature News) were somehow convinced that these were real and trustworthy results.

ps - thumbs up. The damage from semi-informed cheerleading is, in my opinion, responsible for a substantial portion of the current general move away from science.

Regular people are taking note- "we were promised jetpacks" - is a standard refrain, besides being a popular rock group. We were also promised a cure for cancer, and power too cheap to meter- all 100% safe (so way are there 5 kids in the neighborhood with leukemia, and why are they recalling all the hydrolyzed vegetable protein?"

The number of disappointments to the public- based on "whoo hoo it's a breakthrough!" reporting is huge- far overwhelming the number of real successes.

They've noticed; and they're ticked off. The consequences may be- severe.

"Exome sequencing of Mendelians is finding strong hits with samples of just a few. Mapping chips with lots of variants in noncoding regions found basically no hits on large samples."

In other words, a serious flaw in the statistics led to us spending our base pair budget on coding rather than noncoding regions.[sic. i think you're arguing the reverse?]

except this is not true. there are *lots* of hits for lots of diseases, they just don't explain all of the variance in the trait (I think for crohn's disease, common, often non-coding variants explain about 20% of the variance, or ~40% of the heritability, in disease risk. this is not perfect but is non-negligible; see my link in the first comment).

OK - I'm not a scientist, but I find the "debate" (if that's the right word) about the relative importance of CNVs vs. SNPs in common complex diseases fascinating. Can anyone suggest an article or other source that gives an objective assessment of the relative merits of this "debate." (Yes, there's a lot of work that needs to be done to provide actual data supporting CNVs, etc. - but I'm curious about how the scientific community currently assesses these issues!)

Hi dbbl,

I'd recommend this recent article in Nature, the most definitive characterisation of CNVs to date, which suggests that common CNVs contribute very little to the risk of common, complex diseases. (Disclaimer: I'm an author.)


"Exome sequencing of Mendelians is finding strong hits with samples of just a few. Mapping chips with lots of variants in noncoding regions found basically no hits on large samples."

In other words, a serious flaw in the statistics led to us spending our base pair budget on coding rather than noncoding regions.

[And I agree with p-ter - I think you're trying to argue the reverse]

And the relative frequency of Mendelian diseases vs common complex diseases is what?

Don't get me wrong - it's good to clear up the sources of (rare) Mendelian diseases, as it can lead to life-changing diagnostics and treatment.

However, with regard common disease, I don't see any statistical flaw - researchers did what they could with the budget and technology available, in the diseases in which funding was to be had.

Common disease/common variant was the only game in town, and that has only really come of age since GWAS.

Even with next-gen sequencing, there isn't a great deal of point in looking at the exomes of people with common disease - all you'll do is find they are different in a noisy ways.

Instead, the big consortia are doing targeted [1] resequencing of pooled [2] samples in the hope of finding loss- or gain-of-function [3] variants in the 1-3% allele frequency range. So still not rare [4] variants.


[1] "targeted" means exomes, or associated regions from GWAS
[2] "pooled" means mixing 10-25 case or control samples together
[3] "functional" variants mean regulatory elements, as well as nsSNPs
[4] "rare" means less than 1%

Detecting genuinely rare variants against the noise is quite hard. In
sequencing a few people with Mendelian diseases, of course, you're not
detecting rare SNPs, but common SNPs that happen not to be present in
other people.