First ever association study using whole genome sequences

New-technology DNA sequencing provider Complete Genomics will provide near-complete genome sequences of 100 individuals to the Institute for Systems Biology, driving the first ever association study for a complex trait using whole-genome sequencing. Here's the press release, and GenomeWeb has some additional information.

This is pretty exciting stuff:

The Institute for Systems Biology (ISB) and Complete Genomics Inc. announced today that they are embarking on a large-scale human genome sequencing study of Huntington`s disease (HD). ISB has engaged Complete Genomics to sequence 100 genomes, the majority of which will be used to investigate this disease, with
samples from affected individuals, family members, and matched controls to study
modifiers of disease presentation and progression.

The goal of this project is not to identify the mutations that cause Huntington's (the genetic basis of this disease is already extremely well-characterised), but rather to look for novel variants that alter the progression of the disease - usually called "disease modifiers". In other words, the goal here is to uncover genetic variants that explain variation between Huntington's patients in things such as age of onset or the speed with which the disease progresses.

The major novelty of this study is that the target trait is complex (i.e. is likely determined by multiple genes), whereas the small number of WGS disease studies reported to date have focused on much more tractable Mendelian diseases (those in which disease status is conferred by the presence of a single, disastrous mutation).

You can expect to see plenty of similar announcements over the next twelve months as the cost of sequencing drops to the point that WGS on moderately large cohorts becomes feasible (Complete Genomics is currently offering the service for around $20,000 per genome).

This project is somewhat unusual in its focus on disease-modifying variants rather than disease-causing variants; it's likely that most of the early WGS studies will actually aim to identify new, rare large-effect risk factors for complex diseases such as type 1 diabetes.

At the American Society of Human Genetics meeting we started to get a sense of how early WGS projects in complex diseases will look:

  • Individuals selected from the extremes of the distribution (e.g. particularly early-onset or severe manifestations of disease);
  • A focus on individuals with a strong family history of disease;
  • Sequencing of both patients and unaffected family members;
  • In some cases, experimental designs employing low-coverage sequencing of many individuals rather than high-quality sequencing of a smaller cohort.

The first two features will enrich the target population for the types of rare, large-effect variants that WGS is uniquely capable of detecting, while the addition of unaffected family members will make it easier to differentiate between disease risk variants and the benign polymorphisms that litter all of our genomes. The final feature - low-coverage rather than high-quality sequence - is still controversial, but was strongly advocated by Richard Durbin and Goncalo Abecasis at the meeting; this is the approach currently being taken by the 1000 Genomes Project. I plan to write more about this strategy soon.

Anyway, here we are: the technology has finally arrived that makes WGS-based studies feasible for complex traits. Now the real challenge - coming up with ways of handling the massive volumes of data generated by these technologies, and of finding true causal variants amongst the noise of sequencing artefacts and benign polymorphisms - starts to bite.

rss-icon-16x16.jpg Subscribe to Genetic Future.
i-1e8735341225e739a7862450baf40589-twitter-icon-16x16.jpg Follow Daniel on Twitter.



More like this

Any sense of what the price point is for these 100 genomes? Earlier in the year it seemed that the CG proof-of-concept genomes were going for around $20,000. Presumably the cost has come down from there, if only due to a volume discount, but the question is, how far?

Hey Dan,

I don't know what ISB paid, but I understand the going price is still hovering around or just marginally below the $20K/genome level - when I spoke to Clifford Reid a while back he suggested that volume discounts might drop this down towards $5K/genome soon, but only for customers looking to purchase around 1000 sequences.

Thank you for this fascinating post, Daniel. Five years ago I wrote an article for Genome News Network about the search for modifier genes in this disease. They had a few candidate genes at the time. As you report, technology has come a long way. I hope the investigators have great success.

"Delaying Huntington's"

By Edward Winstead (not verified) on 03 Nov 2009 #permalink

I agree that this should be a really interesting study, but it seems like it is likely to be pretty underpowered, no? Assuming that they are focusing on individuals from extremes of the age of onset distribution or affected family members with similar CAG repeat length but widely disparate age of onset, perhaps they may be a bit more likely to find a variant with large effect size, but this should probably be considered a pilot that will most likely require much larger sample sizes to be adequately powered, don't you think?
-Matt Mealiffe

Hi Matt,

It depends what your prior is regarding the effect sizes they're likely to observe - but yes, assuming that Huntington's progression-related traits have similar genetic architecture to other well-studied complex traits the study is woefully under-powered.

For the study design I mentioned in the post (in which the sample is enriched for rare large-effect variants by only including individuals with extreme phenotypes and strong family history) the power with even small sample sizes should be better. Even so, you're right that a sample size of 100 should definitely be regarded as a pilot project to establish feasibility rather than a full-scale gene discovery operation.