Not all next-gen sequencing technologies are created equal

The Next Generation Sequencing blog has a post on low coverage of A/T regions with Solexa sequencing. The post is in reference to a paper in Nature Methods on genome resequencing in C. elegans (doi:10.1038/nmeth.1179). Here's how the NextGen Sequencing blog summarizes it:

However, it points to a general lack of coverage in A/T rich regions (see figure 2 of the supplementary material) which leaves a number of zero size gaps in the assembly - places where reads sit shoulder to shoulder but simply do not overlap. Having found these problematic A/T rich regions, the authors went back and took a look across the genome, where they found a general correlation between A/T content and read coverage. This correlation was stronger when examining a 200 bp window than when examining a 32 bp window. 200 bp corresponds to the size of the amplicons that are amplifying during the cluster generation step prior to sequencing and 32 bp corresponds to the number of cycles in the actual sequencing by synthesis procedure. This finding made Hillier et al. conclude that failure to amplify A/T rich regions during cluster generation is the cause of the low coverage (other reasons for the bias such as hairpin formation were also explored but discarded).

This is an issue if Solexa is to become a dominant way to resequence genomes. However, there are other applications of the Solexa technology that will probably not be affected. These include using Solexa to quantify gene expression and to genotype known variants segregating in a population (both of these jobs are currently dominated by microarrays). The low read coverage in A/T rich regions shouldn't affect the genotyping of known variants. Problems will arise when the goal of a resequencing project is to identify novel variants. However, 454 sequencing should work well for to achieve that goal.

As a related aside, if you don't know much about next generation sequencing, but would like to learn, check out these two reviews:

Mardisa ER. 2008. The impact of next-generation sequencing technology on genetics. Trends Genet 24: 133-141 doi:10.1016/j.tig.2007.12.007

von Bubnoffa A. 2008. Next-Generation Sequencing: The Race Is On. Cell 132: 721-723 doi:10.1016/j.cell.2008.02.028

Tags

More like this

No more delays! BLAST away! Time to blast. Let's see what it means for sequences to be similar.  First, we'll plan our experiment.  When I think about digital biology experiments, I organize the steps in the following way: 
Shotgun sequencing refers to the process whereby a genome is sequenced and assembled with no prior information regarding the genomic location of any of the DNA we sequence. There are quite a few steps that you have to go through before you have an assembled genome sequence.
A few weeks back, we published a review about the development and role of the human reference genome. A key point of the reference genome is that it is not a single sequence.
What tells us that this new form of H1N1 is swine flu and not regular old human flu or avian flu? If we had a lab, we might use antibodies, but when you're a digital biologist, you use a computer.

Thanks for this. As someone far from the cutting edge of sequencing technology (our main mode of data collection is still PCR followed by direct sequencing on an ABI 377), this is really helpful.

funny that i happen to c this new paper on the significance of A-/AT-tracts.

Genome-wide Analysis of Fis Binding in Escherichia coli Indicates a Causative Role for A-/AT-tracts. PMID: 18340041 [PubMed - as supplied by publisher]

I quote
"Analysis indicates that A-tracts and AT-tracts are an important signal for preferred Fis binding sites, and that A(6)-tracts in particular constitute a high-affinity signal which dictates Fis phasing in stretches of DNA containing multiple and variably-spaced A-tracts and AT-tracts"