Quite a few people mentioned to me about the McClellan et al paper and the related Internet posts about it (including those in Genetic Future). Discussion on at least three diseases in the paper (hearing loss, SCA and autism) cited some of my published papers, and I therefore decided to post my comments on the Internet, to set the records straight.
For impatient readers, these are the major points:
- GWAS interrogate disease loci through linkage disequilibrium, so the lack of known biological function on GWAS SNPs does not justify the attack against GWAS by McClellan et al;
- Methods for adjusting population stratification are well established in the GWAS community; it is not a valid argument to explain most GWAS signals (with odds ratio less than 2) by stratification, especially if family-based study design is used (including the autism GWAS);
- McClellan et al used rs4307059 (from autism GWAS) as a "particularly dramatic" example of stratification because its frequency varies across Europe and it is monoallelic in Africa, which is not scientifically and statistically justified. In fact, it is the nature of SNPs to have differing allele frequencies across populations, and almost half of the SNPs in Illumina array have higher Fst population divergence values than rs4307059 (that is, half the SNPs are more variable than rs4307059 across human populations).
Below I elaborate these points more specifically for interested readers.
1. Lack of known biological function doesn't invalidate GWAS
McClellan et al use the fact that most detected SNPs in GWAS are from intergenic regions to question the utility and the reliability of GWAS, and raised a serious question: "How did genome-wide association studies come to be populated by risk variants with no known function?".
McClellan and King erroneously attributed many published GWAS hits as caused by population stratification, as if GWAS used similar strategies as candidate gene association studies. Without any scientific support, they even claimed that "an odds ratio of 3.0, or even of 2.0 depending on population allele frequencies" would be robust to be interrogated in GWAS.
McClellan and King mistakenly treat GWAS hits as "false positive" if their allele frequencies vary across European populations or HapMap populations. The allele frequency variation for ANY (I mean it, ANY!) SNP across populations is not something that should be surprising to researchers with substantial GWAS knowledge. Of course, it is the very nature of ANY SNP to have variable allele frequencies across human populations, so that Asians, Caucasians and Africans differ from each other.
McClellan and King's interpretation of the autism locus is wrong. McClellan and King utilized this as an example of "false positive", without any valid scientific evidence (differences of allele frequencies in Tuscany and Africans does NOT suggests false positive in European Americans!). Another study (Weiss et al.) cited by McClellan and King was not able to garner evidence for this SNP, but the study has very small non-overlapping sample size and therefore little power to "replicate" loci with moderate effect sizes. Furthermore, Weiss et al. used a family-based association test (TDT test), so there is no comparison of case/control allele frequencies as mentioned by McClellan and King.
McClellan and King mistakenly interpreted the hearing loss GWAS and sickle-cell anemia GWAS that we published in PLoS Biology. Interestingly, they even have a somewhat opposite interpretation of the primary research data presented in our paper: our original purpose is to demonstrate how rare variants may contribute to human diseases (and may show up in GWAS through LD with common SNPs in Illumina arrays), so our paper should really be interpreted as supporting the arguments for studying rare variants in their paper.
A piece such as this should have been put through peer review since it is positioned as a scientific critique and since King has standing as a geneticist (she did contribute, after all, do the original BRCA mappings). Whether it was or not is probably impossible to determine from simple examination. Even if journals do not adhere to an open review process (which I am ambivalent about), they should clearly label which articles have been through review and which have not.
Couldn't population stratification be completely controlled for by limiting studies to within-family comparisons?
Could not agree more with Keith Robinson - MC King x Cell = incredibly influential. The fact that is is pure personal opinion and not based on any particular research effort should be very clear - Nature & Science do a much better job of differentiating between opinion and hard research
I agree entirely that a response is required to Cell. Some top-quality peer-reviewed work has been dismissed needlessly.
However, the focus on McClellan and King's misrepresentation of GWAS has downplayed any criticism of the rest of the opinion piece.
Basically, even if we decided, yeah, we've done enough GWAS now, and "it is time to sequence" - who do we sequence?
So, the McClellan and King opinion piece ends (with my emphasis):
A Time to Sequence â With an Appreciation to Maynard Olson
Genome-wide screening for mutations remains the most effective and unbiased way to discover genes involved in complex illnesses. Heretofore, the identification of rare severe disease-causing variants was limited by the resolution of mutation detection strategies. The widespread availability of next-generation sequencing technology renders this limitation essentially moot. Designs based on genome-wide identification of all exonic variants, all variants in a defined genomic region, or even all variants in a whole genome are replacing genome-wide association approaches. However, although the power of sequencing is enormous, genetic heterogeneity remains a daunting challenge. With next-generation sequencing technology, the issue is not finding potentially deleterious mutations but rather determining which of many potential deleterious mutations in an individual play a role in disease.
Two powerful strategies for identifying critical mutations are (1) tracing coinheritance of potential disease alleles with the illness in severely affected families, and (2) identifying different rare functional mutations in the same gene in unrelated affected individuals.
Do you think it is time to say that BRCA mutations are not a very helpful paradigm in complex disease genetics?
My guess would be that while both strategies might give early results - by separating rare highly-penetrant sub-diseases out of a complex disease - neither will have much to say about the genetic component of most sporadic cases.
To give a concrete example: in type 1 (childhood) diabetes, a VNTR near the insulin gene, INS, is associated with the disease. A small number of cases (< 0.1% - some of those with very early onset), have mutations in the INS gene itself.
* Do these mutations explain the association? No: there aren't enough people with the mutations.
* Does knowing there are mutations advance the understanding of the disease? No: the region was first identified in 1984 and has been extensively worked.
While I can well imagine that some families would want to know whether they are carriers - although, unlike other known rare mutations, this would not affect treatment - sequencing these people is a clinical genetic testing service, and not a research proposal.
So - who and what do we sequence? It is not hypothesis-free ...
Underlying assumptions about the nature of genomes are the major issue of this discussion. 'Science' recently published a wide-ranging study of a plant, Arabidopsis thaliana, which study revealed no single genome across the global spread of that species. A very plastic genome was the term used, throwing doubt on the idea of undifferentiated, concrete species genomes, and posing instead an image of undifferentiated and highly mobile genomes, that produce wide percentages of variability in genes within a single species.
GWAS depends for its logical basis upon species genomes being undifferentiated and more or less static across global dispersions, something which is an assumption.
Studies of creatures with minimal genomes cast doubt upon this assumption.
Science article ref:One species, many genomes, 20 July 2007, <www.eurekalert.org/pub_releases/2007-07/m-osm072007.php
woops, missed a mistake above, should read: A very plastic genome was the term used, throwing doubt on the idea of DIFFERENTIATED, concrete species genomes.
The paper by McClellan and King argues that many findings from GWAS may be false positives based on cryptic population stratification, of a kind that has not been corrected for by current GWAS protocols. Whether this is true or not, it is only one part of their argument. More fundamentally, they argue that it is expected that the variants contributing the most to phenotypic variance in individuals will be rare and of large effect size.
This is based on very sound evolutionary genetic arguments and modeling (e.g., see paper by Adam Eyre-Walker, below). It also has strong empirical support from two angles: first, even accepting the GWAS positives as real, they have been so few and with such small effect size that one can draw the general conclusion that common variants do not contribute substantially to phenotypic variance (which is why they are common).
Second, a growing number of rare, highly-penetrant mutations are being identified for all kinds of "complex" disorders. Such disorders appear complex when viewed across the population but this may simply reflect the fact that many clinical diagnoses (like autism or schizophrenia) are umbrella terms for very heterogeneous groups of disorders.
(This is not to underestimate the added complexity of phenotypic expression due to genetic background effects and non-genetic effects on the phenotype)
See Mitchell and Porteous for a discussion of these issues in relation to schizophrenia and the Wiring the Brain blog for more:
Mitchell, K., & Porteous, D. (2010). Rethinking the genetic architecture of schizophrenia Psychological Medicine DOI: 10.1017/S003329171000070X
Eyre-Walker A. Evolution in health and medicine Sackler colloquium: Genetic architecture of a complex trait and its implications for fitness and genome-wide association studies. Proc Natl Acad Sci U S A. 2010 Jan 26;107 Suppl 1:1752-6. http://www.pnas.org/content/107/suppl.1/1752.long