Late last week I stumbled across a press release with an attention-grabbing headline ("The Causes of Common Diseases are Not Genetic Concludes a New Analysis") linking to a lengthy blog post at the Bioscience Resource Project, a website devoted to food and agriculture. The post, written by two plant geneticists, plays a tune that will be familiar to anyone who has encountered the rhetoric of GeneWatch UK: basically, modern genomics is pure hype perpetuated by scientists seeking grant money and corporations seeking to absolve themselves of responsibility for environmental disasters.
The post is long, but its core argument can be summarised as follows:
- Genome-wide association studies (GWAS) have failed to find variants explaining much of the risk of common diseases like type 2 diabetes;
- The potential hiding places postulated for the remaining "missing heritability" are implausible;
- Many epidemiological studies have shown a major role for environmental factors in determining disease risk;
- Studies estimating the proportion of disease risk determined by genetics using twin pairs are flawed;
- Both corporations and medical researchers have incentives to prop up the notion that common diseases have genetic causes;
- Therefore, the notion of major genetic causation for common diseases is a fallacy, and we should stop looking for disease genes in favour of investing in beneficial environmental changes.
These claims would be fascinating, if true. However, while the article makes some (scattered) valid points, its central claim (that the results of GWAS suggest that genetics plays little or no role in the causation of common diseases) is entirely false, and the authors rely on a combination of distortions and statistical misunderstandings to make their case.
So, let's take a closer look at how well the some of the claims in the article stand up.
Why was the post written?
The article itself is written in a reasonably neutral tone, which could easily fool the casual reader without a solid background in genetics (like, perhaps, Michael Pollan) into seeing it as a dispassionate critique of the field. However, it's important to read the post in the appropriate context.
In a comment over at the Huffington Post found by Keith Grimaldi, one of the authors explains the key messages and motivations of his analysis:
We have just reported that genetics now demonstratÂes that genes cannot be the cause of common diseases:
That means environmenÂt must be the entire cause of ill health, i.e. junk food, pollution, lack of exercise, etc. The reason we wrote an article about human genetics (when we are a food and agriculturÂe website) is that we believe that if people live right, agriculturÂe and therefore the planet will more or less fix itself. [my emphasis]
This quote is illuminating in a number of ways. Firstly, it shows that there is no nuance in this argument: the authors aren't attempting to argue that genes play a smaller role in common disease than geneticists expected, but rather that genetics plays no role whatsoever.
Secondly, it reveals the motivations behind the post: the authors have assembled this critique, despite their acknowledged lack of expertise in the field, because they want to encourage a greater focus on behavioural and economic changes to bring large-scale environmental benefits. A noble cause, to be sure, but not one that necessarily encourages them to take a balanced approach to the discussion.
I don't mean to discount the post itself on the basis of its authors' motivations, but I do think it is important to read the piece in this context.
OK - on to some of the specific claims made in the piece.
Possible explanations for the missing heritability are post hoc and implausible
The authors claim:
A problem for all these hypotheses, however, is that anyone wishing to take them seriously needs to consider one important question. How likely is it that a quantity of genetic variation that could only be called enormous (i.e. more than 90-95% of that for 80 human diseases) is all hiding in what until now had been considered genetically unlikely places? In other words, they all require the science of genetics to be turned on its head. [italics in original]
This is complete nonsense. Indeed, the authors' question should be turned on its head: How likely is it that a technology that we know is only well-powered to find risk-associated variants that are common and have reasonable effect sizes will have found all - or even most - of the variants underlying common disease risk? If the answer to that question is "not very likely" - as it clearly is - then the authors' argument falls apart. Genome-wide association studies (GWAS) were not conducted because scientists expected them to find every disease-associated variant, but because they were a place to start with the technology that was available; the fact that a large fraction of the heritable risk remains undiscovered is not a sound reason to doubt that risk was heritable in the first place.
Some fraction of the missing heritability for complex diseases may turn out to lie in exotic candidates such as epigenetic inheritance or heritable variation in microflora, but these aren't yet required explanations. There are also perfectly mundane locations that haven't yet been explored by modern genomics, and would require absolutely zero changes to "the science of genetics" to investigate. For instance, genome-wide association studies (GWAS) conducted to date have been seriously under-powered to detect risk variants at low frequency (less than 5%) in the population, as well as common variants with individually very small effects on disease risk - yet there's no reason not to expect an appreciable fraction of the population variance in disease risk to fall into these categories. Or, again, are we expected to believe that the distribution of allele frequencies and effect sizes for disease risk variants falls entirely within the range for which GWAS conducted to date have been 100% powered to detect them?
We haven't even begun to make the most of risk variants we have already uncovered. GWAS are capable of flagging up a region of the genome linked to a disease, but typically don't immediately identify the precise genetic change responsible for that association. More detailed analyses of risk-associated regions (known as fine-mapping) allow researchers to zoom in on variants that are more tightly linked with the underlying causal change - and this alone can substantially increase the fraction of variance explained.
Variants discovered by GWAS are useless
The authors argue:
For each disease, even if a person was born with every known 'bad' (or 'good') genetic variant, which is statistically highly unlikely, their probability of contracting the disease would still only be minimally altered from the average.
Erm, no. Luke Jostins has a very handy post
showing the distribution of risk prediction scores for individuals with different combinations of genetic variants associated with three common diseases: type 1 diabetes, type 2 diabetes, and Crohn's disease. Given he'd gone to all the work of collating these distributions, I asked him to do precisely the analysis the post authors describe here, and compare the predicted risk of individuals with all possible risk variants to the population average.
Here are the results for people with the average risk vs those with the highest number of risk variants:
Type 2 diabetes: 19.6% vs 41.3%
Type 1 diabetes: 1% vs 65%
Crohn's disease: 0.4% vs 99.6%
This analysis includes only variants identified by GWAS, but it's also based on a somewhat out-of-date catalogue of variants - so updating the results would increase this spread slightly further. [Explanation above edited to correct minor error in original version, which stated numbers were for lowest vs highest risk rather than average vs highest risk.]
Do the authors genuinely believe that the difference between 0.4% and 99.6% risk represents "minimal alteration", or have they just not bothered to actually look into these numbers themselves?
Strong environmental effects on disease risk argue against strong genetic effects
This argument pops up in a number of places in the article. For instance, the authors point out the apparent contradiction between twin studies suggesting that the risk of myopia is 80% heritable, whereas individuals moving from non-Western to Western countries can go from a prevalence of myopia of 0% to 80%. How can these two figures be reconciled?
The answer is that heritability is a number that applies to a specific population within a specific environment. Within white Europeans living in Western countries, who face a reasonably uniform set of environmental risk factors, around 80% of the risk of myopia is genetic. That number will obviously not apply to a population in which some individuals are moving from a low-risk to a high-risk environment, in whom the majority of the risk is primarily determined by that massive environmental difference. However, importantly, that doesn't mean the heritability estimate isn't correct for white Europeans: it just means that it shouldn't be extrapolated to other populations subject to different combinations of genetic and environmental risk factors.
There is no contradiction here, just a misunderstanding of the concept of heritability. The authors' misunderstanding should remind us of the caution that needs to be applied when thinking about heritability, and also that the existence of strong genetic predispositions to common diseases doesn't mean that environmental interventions can't be extremely effective. However, it's not a valid critique of the heritability estimates generated for common diseases.
The evidence for disease heritability from twin studies is flawed
The authors claim:
Studies of human twins estimate heritability (h2) by calculating disease incidence in monozygotic (genetically identical) twins versus dizygotic (fraternal) twins (who share 50% of their DNA). If monozygotic twin pairs share disorders more frequently than do dizygotic twins, it is presumed that a genetic factor must be involved. A problem arises, however, when the number resulting from this calculation is considered to be an estimate of the relative contribution of genes and environment over the whole population (and environment) from which the twins were selected. This is because the measurements are done in a series of pairwise comparisons, meaning that only the variation within each twin pair is actually being measured. Consequently, the method implicitly defines as environment only the difference within each twin pair. Since each twin pair normally shares location, parenting styles, food, schooling, etc., much of the environmental variability that exists between individuals in the wider population is de facto excluded from the analysis. In other words, heritability (h2), when calculated this way, fails to adequately incorporate environmental variation and inflates the relative importance of genes. [my emphasis]
As Luke Jostins has already explained at length over at Genomes Unzipped
, this criticism is based entirely on a statistical misunderstanding of the methodology behind heritability studies
. In fact, the sentence highlighted in bold above is completely wrong: twin-based heritability estimates use between-family variability, not within-family variability, to estimate the proportion of variation that is due to the environment. This misunderstanding completely undermines their argument against heritability estimates.
As Luke notes
, there are valid reasons to be cautious about heritability estimates from twin studies - but this isn't one of them.
What this piece could have been
Mike the Mad Biologist has a post about this article
, in which he describes it as having "good and bad points". I should also be charitable: although the central argument of the post (that results from GWAS suggest that genetic factors have little or no role in common disease) is completely wrong, there are valid criticisms of the excessive value that is sometimes placed on genetic versus environmental explanations of morbidity.
Stripping away the conspiracy-mongering and accusations of genetic determinism among geneticists (seriously, how can anyone working on complex diseases be a genetic determinist?), there are some nuggets of truth in the article's discussion:
The last fifteen years, coinciding with the rise of medical genetics, have seen unprecedented sums of money directed at medical research. At the same time, research on pollution, nutrition and epidemiology has not benefited in any comparable way.
This same mindset is accurately reflected in the media where even strong environmental links to disease often receive little attention, while speculative genetic associations can be front page news.
Even as a direct beneficiary of money thrown at medical genetics over the last five years, and someone who blogs entirely about news in the genetic domain, I freely acknowledge that these criticisms have merit. Genetic dissection of common disease is valuable, and will be (and indeed already has been) fruitful in generating new therapies, but it is nonetheless true that research into environmental risk factors and interventions to minimise morbidity is woefully under-funded and under-reported relative to its potential benefit.
This article could thus have been a considered, balanced and valuable critique of the imbalance in funding between research into the genetic and environmental contributors to common disease. Instead, the authors have undermined their argument by wandering into territory they don't understand, and taking an extreme position that is inconsistent with the available evidence. Perhaps they felt that polarising the debate was the only way to get attention - and indeed that approach seems to have worked - but that has come at the cost of destroying the credibility of their message. This was a missed opportunity.