Regular readers will know that I'm at the Advances in Genome Biology and Technology (AGBT) meeting this week, one of the most highly-awaited meetings on the genomics calendar.
There's a huge amount of fascinating data being presented (anyone interested in a blow-by-blow account should follow Anthony Fejes' live-blogging), but there's definitely an overarching theme: the evolving battle between the new-technology sequencing companies. This is a competition that most researchers in genomics are watching with great interest, because it promises to bring about very rapid advances in the speed, quality and affordability of large-scale sequencing above and beyond the mind-boggling progress of the last two years.
The week started with bold claims from Illumina, who provide the most widely-used of the three "second-generation" sequencing platforms (the Genome Analyzer), about the improvements that will be made to their platform in 2009. Dan Koboldt has a good overview of the details, but the main message is this: by the end of the year, Illumina claims that it will be able to routinely generate 95 Gb (that's 95 billion bases, the equivalent of 30 human genomes) of DNA sequence per run. This increased yield will come with a boost to read length, which will aid genome assembly and the detection of large-scale insertions and deletions.
Most people I spoke to seemed to feel that Illumina's claims were quite realistic - and they'd better be, because competition is coming from relative newcomers to the field. The first of these to present was Pacific Biosciences, who have been making big promises for quite a while now, but are still (by their own admission) at least a year away from a commercial release. Their presentation included some impressive new data, suggesting high accuracy and very long, continuous reads (up to 3,200 bases, which is massively longer than any other platform on the market). However, there was some uncertainty about the level of throughput that their platform will be able to achieve when it
(finally) reaches the market.
In any case, however impressive their data, PacBio's presentation was blown out of the water in terms of sheer drama by the talk by Clifford
Reid, the CEO of the newest entrant to the new-technology sequencing market - Complete Genomics. Complete has been creating a buzz in the genomics community ever since it emerged from stealth mode in October last year promising to deliver complete human genome sequences at a cost of just $5,000 by mid-2009, and to sequence one million human genomes within the next five years.
Reid's presentation was self-assured and quite persuasive. He presented data from the sequencing of the "complete" (see below) genome from a European sample from the HapMap project: although the data had a high error rate - only 40% of the reads could actually be mapped to the genome! - the sheer amount of data generated by the Complete platform (currently ~70 Gb over an 8-day run) allowed them to generate a consensus sequence and call single-base variants (SNPs) with high accuracy.
I was convinced by the SNP data, but I will be very interested to see how the system performs in terms of calling large-scale structural variants. Certainly the system has problems dealing with repetitive regions (as expected with short reads) - Reid noted that around 8% of the genome couldn't be assembled due to these elements. These are major problems for very short read technologies that can't be solved by simply increasing coverage; Reid's presentation included a brief mention of a technology called "long fragment reads" that might help to address such problems, but the details weren't clear. Large-scale structural variants play an important role in human variation and disease, so Complete will need to deal with these areas effectively if it is to generate genome sequences that can realistically be called "complete".
Update 06/02/09: Here's a relevant statement from an article in Bio-IT World:
Complete identified some 400,000 short indels [insertions/deletions] using its own proprietary software, but Reid admits there is room for improvement. "The assembly software does not today call large structural variations," he acknowledged. "That's one of our next high priority projects -- to tease out of the datasets major structural rearrangements, inversions, translocations etc." Reid calls it "a strategic commitment to write the assembly software that spans the spectrum of variance detection from SNPs to assembling a cancer genome."
Anyone interested in the details of Complete's data is in luck, as the company has released its raw sequence data for public consumption - it will apparently shortly be available through NCBI. Various summary statistics are also available on the company's website.
The other interesting aspect of Complete is its unique business strategy - the company plans to only offer its platform within its own self-contained service centres, rather than selling them to genome facilities. I'm still not totally clear on why Complete has adopted this model, but it's likely a combination of the complexity of their data (their method is generated as a series of 10 base pair reads which then have to be stitched back together) and the economy of scale; Reid noted that computing, labour and overhead costs per base all drop as the size of a facility increases.
One final point of interest is that Complete's services will be completely restricted to the sequencing of human genomes - it will not accept projects involving non-human samples (a point that Reid made emphatically clear during question time). Reid presented this as meaning that Complete is not in competition with genome research facilities; there was an implicit suggestion that Complete would now take care of all whole human genome sequencing research, while genomics facilities could look after algae and such! I'm surprised by how dogmatic Reid was in declaring this, as it seems like this seriously constrains the market for the Complete service - but there are also considerable advantages to specialisation, and the human genome sequencing market is likely to grow very rapidly over the next few years.
Overall it was hard not to be impressed by the sheer audacity of Complete's goals, and by the speed with which they appear to be moving towards those goals. There are still some non-trivial questions in my mind about both the technical and financial facets of the company's strategy - and I will be putting these to company representatives over the next two days - but I think there was little doubt in the audience's mind that this is a serious new contender in the DNA sequencing field.
Subscribe to Genetic Future.
Does Complete have anything to say on the potential for further improvement for their technology? I want to see 23andme offering $399 genomes.
Exciting news, especially regarding Complete Genomics! I was looking forward to reading your report on the AGBT all day today.
Will Illumina also reduce the cycle time (have they already done so?), or is a run of 250bp reads going to take 3 weeks?
Smart move by Complete doing the sequencing themselves. I would literally rather kill myself than try to assemble a full human genome out of 10 bp reads. Wonder why they don't just make the markers longer and use more of them (ex 20 bp long and 80 different markers).
CS - Complete is being coy at the moment, but there will certainly be room for improvement in their system. I expect to see a sub-$1000 genome offered some time in 2010 (early 2011 at the latest), either by Complete or one of its competitors.
CJP - run time will be around 10 days for a 95 Gb run on Illumina.
Jim Bob - well, the 10 bp reads are arranged in a way that makes them much easier to map than if they were independent, but it's still an informatic nightmare. The company has plans in development to use 15 bp probes, but it's unclear whether those will be ready by launch.
I'll have more details on Complete this afternoon, following an interesting discussion with their CEO and CSO.
Any word from Knome or ZS Genetics at the AGBT
IANAL, but it seems to me that doing the sequencing in-house also possesses an important advantage: avoiding regulatory barriers. None of the major genotyping platforms (sequencing or SNP genotyping) has yet been approved by the FDA for specific medical purposes. They can sell chips/sequencers but they are designated "research use only" so they are not supposed to be used for medical purposes.
But most if not all clinical genetic tests are "laboratory-developed tests" performed in-house without the benefit of a kit or other device normally subject to regulation. The FDA has chosen to exercise discretion in their regulation of LDTs or "home brews": i.e. chosen not to regulate them except that they must use reagents that meet a particular standard. However, if companies ship sequencers to hospitals or other independent labs, the labs would now be using an unapproved device. If the sequencers were used to make medical decisions, the labs might find themselves in the unfortunate position of LabCorp, which tried to market a test originally developed by Yale:
By keeping sequencing in-house, Complete Genomics' sequencing will be a laboratory-developed test, meaning that their data can be used for medical decisions without regulation.
I'm enjoying your posts on Marco Island. I have my own commentary on Complete's business model spurred by this post (though I wish I had thought of the regulatory angle -- that isn't a trivial issue and would seem to be another plus)