Common disease-common variant hypothesis taken down a notch (again)

By razib on January 25, 2010.

David Goldstein, a geneticist at Duke, has critiqued the current focus on large-scale genomwide associations before. Now he is taking to the next step, as his group has a paper out which suggests that the reason that association studies have been relatively unfruitful in terms of bang-for-buck is due to the fact that they're picking up "synthetic associations." Rare Variants Create Synthetic Genome-Wide Associations:

It has long been assumed that common genetic variants of modest effect make an important contribution to common human diseases, such as most forms of cardiovascular disease, asthma, and neuropsychiatric disease. Genome-wide scans evaluating the role of common variation have now been completed for all common disease using technology that claims to capture greater than 90% of common variants in major human populations. Surprisingly, the proportion of variation explained by common variation appears to be very modest, and moreover, there are very few examples of the actual variant being identified. At the same time, rare variants have been found with very large effects. Now it is demonstrated in a simulation study that even those signals that have been detected for common variants could, in principle, come from the effect of rare ones. This has important implications for our understanding of the genetic architecture of human disease and in the design of future studies to detect causal genetic variants.

The conclusion in the discussion elaborates on the relevance:

... Under our model, the causal sites are both rare and relatively high-penetrant contributors to disease, and will therefore be unlikely to be detected in a small number of control samples. Finally, the focus of attention on genes that are near GWAS signals may be incomplete or misleading in that the actual causal sites may occur in many different genes surrounding the implicated common variant. It is also worth emphasizing that as few as one or two rare variants, at much lower frequency than the associated common SNP, can create a significant synthetic association. In such a case, sequencing a small number of cases that carry the "at risk" common variant might miss entirely the causal rare variants even if the correct genome region is resequenced. These considerations argue for caution in efforts to resequence around genome-wide associations and argue instead that genome-wide sequencing in carefully phenotyped cohorts might be a better use of resources.

PLoS thought that this paper was important enough to commission and accompanying article, Common Disease, Multiple Rare (and Distant) Variants:

The consequence, the authors suggest, is that sequencing near the SNP to find "the" causative gene will often be fruitless, and many causative genes will be missed if that is the only approach taken.
The alternative, whole-genome sequencing, is becoming increasingly practical, and offers the possibility of finding any variant, no matter how far away. But how will it be possible to pick out the needle of a causative variant in the haystack of genomic variability, if it is no longer right next to the signpost? Under the assumption that the variant exerts only a weak effect, it probably wouldn't be. Weak effects are thought to be due to subtle changes that still retain functionality of the encoded protein, like a dimmer switch on a light bulb. The genome is loaded with these kinds of variants, and most of them won't be involved in the disease.

But the weak-effect assumption may be wrong as well, since it rests on the assumption that the variant is common. If instead the variant is rare, its effect could be strong--not just contributing to the disease, but causing it--more like an on-off switch, but one that only a few people have. In that case, the sought-after variant is likely to be a classic kind of mutation--a nonsense sequence, for example--that is easy to find.

If this model is correct, it suggests that a SNP association in a GWA study may be pointing not at one gene, but lots of them; that these genes are likely to have stronger and perhaps easier-to-understand effects than presumed; and that finding these genes is likely to be simpler than has been the case so far. If the authors are right that some of the signals are synthetic, GWA results may be of particular value in interpreting the results of whole-genome sequencing studies. Focussing attention on regions of the genome that show GWA signals may help to identify likely the causal variants amongst the millions of variants identified in any sequencing study.

You probably know that the genetics of height and intelligence have not been revolutionized by genomic techniques. One assumption is that the effect sizes of height & IQ QTLs are too small to be detected by GWAS. But on the other hand to my knowledge these quantitative traits haven't been elucidated very well by family based linkage studies either, which should pick up QTLs of large effect which are rare. In the Goldstein paper he points out that GWAS have't been easily replicated across populations, and argues that the reason for this is that very rare alleles of large effect size won't be common and span geographic locales.

It seems that what is being argued here is that the genetic architecture of many traits of interest are going to be in the blind spot whereby the QTLs are more common and of lower effect size and penetrance than can be detected by linkage studies, but are rarer than can be picked up usefully by GWAS of a few hundred individuals. The idea that quantitative traits like height and intelligence, or diseases such as schizophrenia, might be controlled by fewer large effect QTLs is appealing in some ways because it can explain more easily the variance across siblings in very heritable traits. The smaller the number of relevant QTLs, the greater the expected sample variance. On the other hand, it is notable that one of Goldstein's test cases within the paper is sickle-cell disease, which is one of the few undisputed cases of heterozygote advantage in human genetics. These associations don't exist unperturbed by other evolutionary dynamics.

In any case, here's a quote from Goldstein:

"This tells us that we will surely need to turn to more comprehensive whole genome sequencing studies of more carefully selected subjects if we want to discover more meaningful relationships between genetic variation and disease," says Goldstein. "While such studies are undoubtedly more complex, expensive and time-consuming, we really have no choice if we want to deepen our knowledge about the genetic underpinnings of human disease."

Citation:

Samuel P. Dickson, Kai Wang, Ian Krantz, Hakon Hakonarson, David B. Goldstein. Rare Variants Create Synthetic Genome-Wide Associations. PLoS Biology, 2010; 8 (1): e1000294 DOI: 10.1371/journal.pbio.1000294

Richard Robinson. Common Disease, Multiple Rare (and Distant) Variants. PLoS Biology, 2010; 8 (1): e1000293 DOI: 10.1371/journal.pbio.1000293

More like this

Excellent post!
Thank you for linking to those articles, its exactly what I needed right now :)

The idea that rare variants with large effect may be responsible for some sites of weak association with diseases.

On the other hand, height and IQ are are not diseases, and the observation of the phenotypes make them appear to vary smoothly across populations. These may be strong cases of many genes/alleles with moderate contributions to the phenotype.

mike, sure. but i've read it's hard to tell beyond 10 QTLs how many QTLs there are from the smoothness of a trait's distribution.

Many diseases are likely the effects of smoothly continuous traits. As Falconer pointed out, for most cases the trait is not the disease, but the likelihood of developing it (perhaps interacting with other genetic or environmental factors). Also, just because we have defined a cluster of symptoms as a (singular) disease does not mean that there are not multiple etiologies, another confound for GWAS. Think autism, which is likely a result of many possible perturbations of development that result in the same effect on circuit architecture. I always feel like GWAS rest on naively simplistic assumptions about the relationships between genotype and phenotype.

The best justification for why we should actually be surprised by "common disease common variant" was in a talk Andy Clark gave not long ago talking about super deep sampling (1,000s of individuals) of human variation. Anyway, one upshot was that with the relatively recent human population expansion from the ancestral size to the current ~6.5 Billion (or whatever), there is a fairly large tree that is very sparse for most of the history and it is crazily bushy only very recently. If you're thinking from a coalescent perspective and randomly throwing down mutations, most of those mutations will land in the bushy area and only a few will land in the sparse older branches.

So, most variants will be low frequency and will have a very recent coalescence time. In order to have a lot of "common disease/common variant" examples, you'd have to imagine a very orderly disease (disease A is caused by amino acid i being substituted for amino acid j at position k in peptide X, and in no other way, for one cartoon example). And even so, it would only be identical by state, not ibd. Perhaps this could be caused by hypermutable sites that cause a lot of convergence or whatever.

Anyway, in reality, we are stuck with tons of diseases like thalassemia which has tons of various mutations which can cause similar symptoms, but some versions are worse than other, depending largely on the nature of the actual indel or substitution that causes it. And many mutations are unique to individuals, as you'd predict if you were thinking about how demography influences when and where segregating sites were dropped down on the coalescent tree.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Remember to switch RSS feeds

April 3, 2010

If you link to this weblog from your weblog, please update links: http://blogs.discovermagazine.com/gnxp/ If you have not updated your feeds, please do so now: http://feeds.feedburner.com/GeneExpressionBlog The old feed address will point for another week or so to the new feed, but eventually it…

I'm moving to Discover

March 26, 2010

Update your bookmarks: http://blogs.discovermagazine.com/gnxp And RSS: http://feeds.feedburner.com/GeneExpressionBlog If you have a weblog that links to ScienceBlogs GNXP, I would appreciate you update the link for the sake of PageRank. There isn't much to say about the move. There wasn't one big…

Canada is not a "free society"

March 24, 2010

That's all I have to say to Eric Michael Johnson's post, Ann Coulter, Hate Speech, and Free Societies. OK, seriously, from what I recall Eric is an American, though resident in the forgotten north. American absolutist stances on free speech are not shared by most Western societies, so demanding…

Others in Siberia

March 24, 2010

The complete mitochondrial DNA genome of an unknown hominin from southern Siberia: With the exception of Neanderthals, from which DNA sequences of numerous individuals have now been determined...the number and genetic relationships of other hominin lineages are largely unknown. Here we report a…

The biophysical limits of cognitive computation

March 23, 2010

In this diavlog with Glenn Loury the behavioral economist Sendhil Mullainathan recounts the results of an experiment. - If given the option of paying $100 for an item vs. $80 for an item, but in the second case having to go across town for the item, respondents choose $80 and going across town - If…