Sanger sequencing is not dead?

Daniel G. Hert, Christopher P. Fredlake, Annelise E. Barron (2008). Advantages and limitations of next-generation sequencing technologies: A comparison of electrophoresis and non-electrophoresis methods Electrophoresis, 29 (23), 4618-4626 DOI: 10.1002/elps.200800456

The dideoxy termination method of DNA sequencing (often called Sanger sequencing after the technique's inventor, Fred Sanger) has been the workhorse of pretty much every molecular biology lab for the last 30 years. However, over the last few years the method has been increasingly supplanted by so-called next-generation sequencing technologies, which allow incredibly rapid generation of large amounts of sequence data. Sanger sequencing is still widely used for small-scale experiments and for "finishing" regions that can't be easily sequenced by next-gen platforms (e.g. highly repetitive DNA), but most people see next-gen as the future of genomics.

However, perhaps rumours of the death of Sanger sequencing have been somewhat exaggerated. In a recent review article in Electrophoresis and an interview for In Sequence (subscription required) Stanford's Annelise Barron argues that Sanger sequencing will persist, albeit in a revamped and scaled-up format.

The issue is read length.

All sequencing platforms generate sequence data in the form of many independent reads, which then must be assembled together to form a complete sequence. For Sanger sequencing these reads are routinely 800-1000 base pairs long; next-gen methods produce much larger quantities of sequence, but in the form of much smaller reads (the two best-performing platforms generate 35-75 base pair reads, while a third, lower-throughput platform can manage 400 base pairs).

Read length is absolutely crucial when it comes to assembling accurate sequence, especially for genomes as complex and repetitive as the human genome. If a repetitive region is much longer than a platform's read length, it can't really be accurately assembled - so human genomes sequenced with current next-gen platforms actually consist of hundreds of thousands of accurately sequenced fragments interspersed by gaps. That's good enough for most purposes, but it's by no means a complete genome sequence.

Barron argues that scaled-up platforms employing Sanger-based sequencing - allowing up to 50,000 reads to be generated at once, rather than the 96 reads permitted by current systems - could actually be cost-competitive with next-gen sequencing for some applications, and also provide the benefit of longer reads. The two applications Barron describes in detail in her article are sequencing the human leukocyte antigen (HLA) region, and large-scale genotyping of microsatellite markers (highly repetitive and variable regions of the genome).

An up-scaled Sanger-based approach would certainly be useful for sequencing projects targeting a small region in a large number of individuals (rather than sequencing whole genomes in a smaller number of individuals). In the In Sequence interview Barron explains:

You don't necessarily always want to sequence an entire genome. You sort of have to spend, at this time, $7,000 if you are working with 454, and you get the whole genome [at very low coverage]. What if you want 10 exons, and you want to spend 4 cents each? That's the kind of thing a doctor might want. I think that the advantage of the electrophoresis technologies is [that] they are scalable in that way; you can do it on a per-channel basis. And that is much more suited to looking at limited gene regions for individual patients.

That makes sense to me (especially as someone currently trying to use next-gen platforms to do the same thing, which turns out to be fairly painful). There's also a lot to be said for supplementing short-read platforms with Sanger sequencing for de novo genome assembly, to paper over some of the gaps in repetitive regions.

However, I'm not convinced that scaled-up Sanger sequencing will necessarily be competitive with emerging next-next-gen platforms. Companies like Pacific Biosciences, Oxford Nanopore and Visigen are currently developing technologies that promise long reads generated from single DNA molecule; any one of these platforms may be able to generate the high-throughput long-read results required to make Sanger sequencing completely obsolete for large-scale projects. Whether these companies will actually be able to fulfil their promises, of course, remains to be seen...

Subscribe to Genetic Future.

More like this

Nice description Daniel.

The other issue besides read length is the number of samples that can be processed. Right now, each Solexa or SOLiD run generates lots of data but only for a small number of samples. This won't work well (yet) for doing large numbers of clinical assays.

Read length is a problem that Roche is working on, trying to get longer read lengths for the 454.

Hey Sandra,

Good point about sample number, and one I didn't emphasise enough in my post (I said that a Sanger-based approach "would certainly be useful for sequencing projects targeting a small region in a large number of individuals", but didn't point out that this is one major area where current platforms fall down). I'm currently working on a Solexa-based project using bar-coding to analyse 96 samples per lane (for a ~25 kb region) and it's been a real pain - I'm sure the glitches will be worked out soon, but in the meantime Sanger sequencing is probably the optimal approach there.

All three of the platforms are definitely bumping up their read lengths - e.g. we're currently routinely getting 50 bp on Solexa and moving towards 75. Obviously Roche has the edge in that respect, but I get the feeling that they're really starting to lose their place in the market - their longer read length just doesn't make up for their cost per base being so much higher than Solexa/SOLiD. It will be interesting to see how they go in 2009.

Hey Daniel, happy holidays and welcome back from your posting hiatus.

Can we start a convention where next-next-gen is referred to as "3rd generation"? So Sanger would be 1st generation, Solexa/454 2nd generation.

By Andro Hsu (not verified) on 07 Jan 2009 #permalink

At least for now, Sanger isn't going anywhere and will continue to survive in the kind of niches you describe. Another is large-scale assembly - we need Sanger backbone reads that are large enough to get past repetitive elements like Alus.

Can we start a convention where next-next-gen is referred to as "3rd generation"? So Sanger would be 1st generation, Solexa/454 2nd generation.

Yeah, most journals are rejecting the "next-gen" terminology and asking that you describe the current platforms as "massively parallel sequencing". As you point out, next-gen now means PacBio, et al doing single molecule stuff.

RKirk,

The prediction post is on its way. As for the conference - I'd love to go, if I can find someone to pay for me. :-)

Andro,

Good to be back (although unfortunately the hiatus had as much to do with work commitments as holidays).

I kind of like the futuristic sound of "next-next-gen", but I can see that it sets a trend that will rapidly become ridiculous (I don't want to be laboriously typing out "next-next-next-next-next-gen" in 2023!) - so 3rd gen it is.