Bioscience Resource Project critique of modern genomics: a missed opportunity

i-4c14cda4da6bbaea1e7ed5d599a9c810-NO-GENES.jpgLate last week I stumbled across a press release with an attention-grabbing headline ("The Causes of Common Diseases are Not Genetic Concludes a New Analysis") linking to a lengthy blog post at the Bioscience Resource Project, a website devoted to food and agriculture. The post, written by two plant geneticists, plays a tune that will be familiar to anyone who has encountered the rhetoric of GeneWatch UK: basically, modern genomics is pure hype perpetuated by scientists seeking grant money and corporations seeking to absolve themselves of responsibility for environmental disasters. 

The post is long, but its core argument can be summarised as follows:
  • Genome-wide association studies (GWAS) have failed to find variants explaining much of the risk of common diseases like type 2 diabetes;
  • The potential hiding places postulated for the remaining "missing heritability" are implausible;
  • Many epidemiological studies have shown a major role for environmental factors in determining disease risk;
  • Studies estimating the proportion of disease risk determined by genetics using twin pairs are flawed;
  • Both corporations and medical researchers have incentives to prop up the notion that common diseases have genetic causes;
  • Therefore, the notion of major genetic causation for common diseases is a fallacy, and we should stop looking for disease genes in favour of investing in beneficial environmental changes.

These claims would be fascinating, if true. However, while the article makes some (scattered) valid points, its central claim (that the results of GWAS suggest that genetics plays little or no role in the causation of common diseases) is entirely false, and the authors rely on a combination of distortions and statistical misunderstandings to make their case.

Unfortunately the article has not simply lapsed back into the internet obscurity it deserved: over the weekend a link to the article was posted on Twitter by popular author Michael Pollan, bringing it to the attention of his ~40,000 followers. Pollan's tweet and the cheer-leading responses from his followers were subsequently picked up and blasted over at OpenHelix, leading to an exchange with one of the authors in the comments. The article was also criticised for a schoolboy statistical error by Luke Jostins, but received a qualified positive review from Mike the Mad Biologist.
So, let's take a closer look at how well the some of the claims in the article stand up.
Why was the post written?
The article itself is written in a reasonably neutral tone, which could easily fool the casual reader without a solid background in genetics (like, perhaps, Michael Pollan) into seeing it as a dispassionate critique of the field. However, it's important to read the post in the appropriate context.

In a comment over at the Huffington Post found by Keith Grimaldi, one of the authors explains the key messages and motivations of his analysis:

We have just reported that genetics now demonstratÂes that genes cannot be the cause of common diseases:

http://wwwÂ.biosciencÂeresource.Âorg/commenÂtaries/artÂicle.php?iÂd=46

That means environmenÂt must be the entire cause of ill health, i.e. junk food, pollution, lack of exercise, etc. The reason we wrote an article about human genetics (when we are a food and agriculturÂe website) is that we believe that if people live right, agriculturÂe and therefore the planet will more or less fix itself. [my emphasis]

This quote is illuminating in a number of ways. Firstly, it shows that there is no nuance in this argument: the authors aren't attempting to argue that genes play a smaller role in common disease than geneticists expected, but rather that genetics plays no role whatsoever

Secondly, it reveals the motivations behind the post: the authors have assembled this critique, despite their acknowledged lack of expertise in the field, because they want to encourage a greater focus on behavioural and economic changes to bring large-scale environmental benefits. A noble cause, to be sure, but not one that necessarily encourages them to take a balanced approach to the discussion.
I don't mean to discount the post itself on the basis of its authors' motivations, but I do think it is important to read the piece in this context.
OK - on to some of the specific claims made in the piece.
Possible explanations for the missing heritability are post hoc and implausible
The authors claim:
A problem for all these hypotheses, however, is that anyone wishing to take them seriously needs to consider one important question. How likely is it that a quantity of genetic variation that could only be called enormous (i.e. more than 90-95% of that for 80 human diseases) is all hiding in what until now had been considered genetically unlikely places? In other words, they all require the science of genetics to be turned on its head. [italics in original]

This is complete nonsense. Indeed, the authors' question should be turned on its head: How likely is it that a technology that we know is only well-powered to find risk-associated variants that are common and have reasonable effect sizes will have found all - or even most - of the variants underlying common disease risk? If the answer to that question is "not very likely" - as it clearly is - then the authors' argument falls apart. Genome-wide association studies (GWAS) were not conducted because scientists expected them to find every disease-associated variant, but because they were a place to start with the technology that was available; the fact that a large fraction of the heritable risk remains undiscovered is not a sound reason to doubt that risk was heritable in the first place.

Some fraction of the missing heritability for complex diseases may turn out to lie in exotic candidates such as epigenetic inheritance or heritable variation in microflora, but these aren't yet required explanations. There are also perfectly mundane locations that haven't yet been explored by modern genomics, and would require absolutely zero changes to "the science of genetics" to investigate. For instance, genome-wide association studies (GWAS) conducted to date have been seriously under-powered to detect risk variants at low frequency (less than 5%) in the population, as well as common variants with individually very small effects on disease risk - yet there's no reason not to expect an appreciable fraction of the population variance in disease risk to fall into these categories. Or, again, are we expected to believe that the distribution of allele frequencies and effect sizes for disease risk variants falls entirely within the range for which GWAS conducted to date have been 100% powered to detect them? 
We haven't even begun to make the most of risk variants we have already uncovered. GWAS are capable of flagging up a region of the genome linked to a disease, but typically don't immediately identify the precise genetic change responsible for that association. More detailed analyses of risk-associated regions (known as fine-mapping) allow researchers to zoom in on variants that are more tightly linked with the underlying causal change - and this alone can substantially increase the fraction of variance explained.

Variants discovered by GWAS are useless
The authors argue:

For each disease, even if a person was born with every known 'bad' (or 'good') genetic variant, which is statistically highly unlikely, their probability of contracting the disease would still only be minimally altered from the average.

Erm, no. Luke Jostins has a very handy post showing the distribution of risk prediction scores for individuals with different combinations of genetic variants associated with three common diseases: type 1 diabetes, type 2 diabetes, and Crohn's disease. Given he'd gone to all the work of collating these distributions, I asked him to do precisely the analysis the post authors describe here, and compare the predicted risk of individuals with all possible risk variants to the population average.
Here are the results for people with the average risk vs those with the highest number of risk variants:
Type 2 diabetes: 19.6% vs 41.3%

Type 1 diabetes: 1% vs 65%

Crohn's disease: 0.4% vs 99.6%
This analysis includes only variants identified by GWAS, but it's also based on a somewhat out-of-date catalogue of variants - so updating the results would increase this spread slightly further. [Explanation above edited to correct minor error in original version, which stated numbers were for lowest vs highest risk rather than average vs highest risk.]
Do the authors genuinely believe that the difference between 0.4% and 99.6% risk represents "minimal alteration", or have they just not bothered to actually look into these numbers themselves?
Strong environmental effects on disease risk argue against strong genetic effects
This argument pops up in a number of places in the article. For instance, the authors point out the apparent contradiction between twin studies suggesting that the risk of myopia is 80% heritable, whereas individuals moving from non-Western to Western countries can go from a prevalence of myopia of 0% to 80%. How can these two figures be reconciled?
The answer is that heritability is a number that applies to a specific population within a specific environment. Within white Europeans living in Western countries, who face a reasonably uniform set of environmental risk factors, around 80% of the risk of myopia is genetic. That number will obviously not apply to a population in which some individuals are moving from a low-risk to a high-risk environment, in whom the majority of the risk is primarily determined by that massive environmental difference. However, importantly, that doesn't mean the heritability estimate isn't correct for white Europeans: it just means that it shouldn't be extrapolated to other populations subject to different combinations of genetic and environmental risk factors.
There is no contradiction here, just a misunderstanding of the concept of heritability. The authors' misunderstanding should remind us of the caution that needs to be applied when thinking about heritability, and also that the existence of strong genetic predispositions to common diseases doesn't mean that environmental interventions can't be extremely effective. However, it's not a valid critique of the heritability estimates generated for common diseases.
The evidence for disease heritability from twin studies is flawed
The authors claim:

Studies of human twins estimate heritability (h2) by calculating disease incidence in monozygotic (genetically identical) twins versus dizygotic (fraternal) twins (who share 50% of their DNA). If monozygotic twin pairs share disorders more frequently than do dizygotic twins, it is presumed that a genetic factor must be involved. A problem arises, however, when the number resulting from this calculation is considered to be an estimate of the relative contribution of genes and environment over the whole population (and environment) from which the twins were selected. This is because the measurements are done in a series of pairwise comparisons, meaning that only the variation within each twin pair is actually being measured. Consequently, the method implicitly defines as environment only the difference within each twin pair. Since each twin pair normally shares location, parenting styles, food, schooling, etc., much of the environmental variability that exists between individuals in the wider population is de facto excluded from the analysis. In other words, heritability (h2), when calculated this way, fails to adequately incorporate environmental variation and inflates the relative importance of genes. [my emphasis]

As Luke Jostins has already explained at length over at Genomes Unzipped, this criticism is based entirely on a statistical misunderstanding of the methodology behind heritability studies. In fact, the sentence highlighted in bold above is completely wrong: twin-based heritability estimates use between-family variability, not within-family variability, to estimate the proportion of variation that is due to the environment. This misunderstanding completely undermines their argument against heritability estimates.
As Luke notes, there are valid reasons to be cautious about heritability estimates from twin studies - but this isn't one of them.
What this piece could have been
Mike the Mad Biologist has a post about this article, in which he describes it as having "good and bad points". I should also be charitable: although the central argument of the post (that results from GWAS suggest that genetic factors have little or no role in common disease) is completely wrong, there are valid criticisms of the excessive value that is sometimes placed on genetic versus environmental explanations of morbidity.
Stripping away the conspiracy-mongering and accusations of genetic determinism among geneticists (seriously, how can anyone working on complex diseases be a genetic determinist?), there are some nuggets of truth in the article's discussion:

The last fifteen years, coinciding with the rise of medical genetics, have seen unprecedented sums of money directed at medical research. At the same time, research on pollution, nutrition and epidemiology has not benefited in any comparable way.

[...]

This same mindset is accurately reflected in the media where even strong environmental links to disease often receive little attention, while speculative genetic associations can be front page news.

Even as a direct beneficiary of money thrown at medical genetics over the last five years, and someone who blogs entirely about news in the genetic domain, I freely acknowledge that these criticisms have merit. Genetic dissection of common disease is valuable, and will be (and indeed already has been) fruitful in generating new therapies, but it is nonetheless true that research into environmental risk factors and interventions to minimise morbidity is woefully under-funded and under-reported relative to its potential benefit.

This article could thus have been a considered, balanced and valuable critique of the imbalance in funding between research into the genetic and environmental contributors to common disease. Instead, the authors have undermined their argument by wandering into territory they don't understand, and taking an extreme position that is inconsistent with the available evidence. Perhaps they felt that polarising the debate was the only way to get attention - and indeed that approach seems to have worked - but that has come at the cost of destroying the credibility of their message. This was a missed opportunity.

More like this

Quick comment - I didn't know that you wanted the difference between most-at-risk and least-at-risk. The figures I gave you are the difference in risk between an average person and a person with all the risk variants. E.g. an average person has a 19.8% chance of developing T2D. Someone with all known genetic variants has a 41.3% chance.

Yay! I knew this would be great.

But one thing: I found that HuffPo comment and it was in my original post. Not that it matters much who found it, but it was part of a larger assessment of Latham's work around the web that I did, but didn't use. Because I follow the anti-GMO arguments (which are so much like the anti-vaxxer arguments) I could see what he was up to and from whence it emanated. That's when it became additionally clear to me what this agenda was.

He's doing the classic "god of the gaps" strategy: when science hasn't found the answer I want, I get to make up the gap-filler! It's precisely what the anti-vax/autism argument does: science refuses to find my pet cause, so I'm going assert what I want anyway!

Hi Daniel
Thank you for discussing our article in a mostly polite fashion. The one aspect that does you no credit is to imply that our analysis is wrong as a consequence of our ignorance. I have noticed this to be a common aspect of blog discussions of our article and scientists slip into this mode all too easily when challenged. And then they can't understand why the world thinks them arrogant and hubristic.
I bring this up here because with this exception you do appear to be attempting to be fair. You should also know that we discussed this paper extensively with 'professional' geneticists to ensure that the points you raise were either raised and confronted in the article or were not crucial to the discussion. Certainly, we were âbotheredâ to get it right. We are also not ignorant ourselves and certainly not interested in making unfounded assertions that would make us look silly in five years time.

You make three major points:

1. The best data you bring to this discussion are the estimates of risk for diabetes and Crohns disease. Iâd like to make two points. One is that the high risk people will be rare. How rare exactly depends on the total number of predisposing loci and their allele frequencies but though the numbers you show look impressive most people, as Francis Collins discovered, will be very close to the average. In the paper cited below, of 43,000 study participants only 164 were homozygous for the âimportantâ gene variant and not one of them was diagnosed with Crohns disease. Secondly, I don't know where you got the Crohns disease numbers from but its not at all clear that they are reliable and it goes back to the point you make about GWAs studies finding more genetic variation in the future. Maybe also some of them will be refuted. Maybe you have not seen his paper: Penetrance of NOD2/CARD15 genetic variants in the general population. Yazdanyar et al. (2010) CMAJ Vol. 182 ?

2. Like everyone else you seem to be struggling to figure out plausible hiding places for âthe missing heritabilityâ and 'heritable variation in microflora' is a new one to me. Like the suggestion of mitochondrial genes Iâd like to see how you think that would play out in a twin study? Likewise, your suggestion of common variants with â individually very small effects on disease riskâ is deeply problematic because to explain the missing heritability there would need to be thousands of them. If they existed they would not show in twin studies (such large numbers would effectively cancel each other out) and they wouldnât be useful for disease prediction either.

3. Many authors have fallen back on the idea that the fact they were studying different populations explains why heritability studies contradict environmental ones. This doesnât go very far, in my view, in explaining why each group seems to get the answer that suits them.

Thinking about this some more (and figuring out a hopefully clearer way to say things), I guess my problem regarding 'missing heritability' is that h^2 estimates assume that the genetic x environment covariance is negligible (this covariance doesn't have to mechanistically based either). I don't think that's often the case, even in twin studies. If only humans lived in vials with cornmeal...

If one underestimates the GxE covariance, you'll overestimate the h^2. What this means to me, anyway, is that human genetics has been successful, not that we should stop.

"If they existed they would not show in twin studies (such large numbers would effectively cancel each other out)"

Entirely false. In fact, this situation, where you have an arbitrary number of arbitrary low pentrance variants is EXACTLY the limit in which twin studies function perfectly, due to the central limit assumption. It is LARGE variants that cause divergence from model assumptions. This is exacfly what I've been talking about: we need to be discussing ACTUAL TWIN STUDY THEORY here, and for that we need to be ACCURATE when we talk about HOW IT WORKS.

@JRLatham

1. Ignorance is a reasonable and not arrogant charge when that ignorance is wilful (as in âgenetic predispositions = causesâ)

2. Over at the openhelix blog (bit.ly/hSQAO9), in a comment reply to me you say "gene variants vary in their penetrance...continuum of penetrance". So you agree that genes are actually very important in almost all disease and not that "environmenÂt must be the entire cause of ill health". After all 100% penetrance for Huntingdons moving abruptly to 0% for Crohns, Diabetes etc is not much of a continuum

3. I agree with Daniel that it's a pity you argue in the way you do, you have some reasonable points and miss the opportunity to have them discussed. There are loads of things that the geneticists claim that I hate to hear, there are the dogmatic extremists everywhere. Or maybe yours inclusion of the reasonable points is a tactic to arrive at your chosen conclusion. You're happy to post your comments everywhere, it's a pity that you don't permit comments on your own blog.

While we're am at it - NOD2 is not a GWAS variant, it is a low-frequency, multi-effect risk loci that was discovered well before the GWAS era. Its role varies a lot from population to population (there have been many studies, beyond the one you cite) - for instance, the NOD2 mutations play no role in Crohn's in East Asia, but a larger-than normal role in Ashkenazim. The Danish study is interesting, and is suggestive of a difference in odds ratio that population, but the low prevelance of Crohn's in the sample (5-10 times lower than usual estimates) and the fact that they did not subdivide by genotype makes drawing firm conclusions dificult. Either way, deciding "one out of dozen studies found a slightly lower effects size, therefore everything is broken" is hardly

Either way, as I said NOD2 is not a GWAS variant, as opposed to the 69 Crohn's variants that are, and its low frequency and complex allelic structure give it more opertunity to vary by population. GWAS variants themselves tend to be far better behaved, for example see this multi-ethnic study for T2D:

http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen…

Jrlatham,

I noticed that you didn't list as one of Daniels "main points" that your criticism of twin studies was based on misunderstanding of the twin methodology

Additionally, you don't seem willing to embrace the blatant contradiction between acknowledging strong genetic effects and saying that disease has an "entirely" environmental cause.

As for the idea that (environmental) differences between populations is somehow a strike against twin studies, it requires only a simple example to dismiss this claim..

Take the example of obesity. In first world countries, people generally have enough resources that they can become fat if they choose to eat enough. This is not the case in poorer, third world countries.

In the first world countries, how fat you get is therefore very much a function of your genetic endowment-- your degree of craving for food, and your body's metabolic response to large amounts of food. In poor countries, how fat you get is mainly a function of whether you're in the upper-classes that have first world level access to resources.

Given the vast differences between the conditions facing different populations, we would expect such differences.. It's in no way a strike against the heritability estimates.