Larry Moran points to a couple of posts critical of microarrays (The Problem with Microarrays):
Microarrays are small chips that are covered with short stretches of single stranded DNA. People hybridize DNA from some source to the microarray, which lights up if the DNA hybridizes to the probes on the array.
Most biologists are familiar with microarrays being used to measure gene expression. In this case, transcribed DNA is hybridized to the array, and the intensity of the signal is used as a proxy for the transcriptional level of a large sample of genes. Other uses include identifying copy number polymorphism, genotyping single nucleotide polymorphisms (SNPs), and capturing sequences of interest for downstream analysis.
However, many of these uses are much better implemented with next generation sequencing. For example:
- Gene expression can be measured using Solexa sequencing (doi:10.1101/gr.079558.108). This digital quantification is far more precise than microarray analysis, which relies on hybridization intensities.
- Copy number polymorphism can be identified by 454 sequencing using paired-end reads (doi:10.1126/science.1149504).
- SNP genotyping can be performed with next-gen sequencing (doi:10.1016/j.gde.2006.10.009).
- Additionally, Solexa sequencing is replacing microarrays in the high throughput identification of DNA sequences in chromatin immunoprecipitation (ChIP-seq)
Now, all of these techniques require a completely sequenced genome (or transcriptome). However, so do microarrays. Therefore, the up front needs aren't very different. Also, using microarrays to capture sequences can't be replaced by another technology. But it does rely on next-generation sequencing for downstream analysis.
Okay, so the question "Do people still use microarrays?" is a bit of hyperbole. But will microarrays be obsolete any time soon? Not if people are still using Sanger sequencing.
Will these other methods get round the problems with microarrays? They'll still be producing tonnes of data, which can still be bollocksed up (that's a technical term) by the person doing the work, and will still produce piles of data that will be thrown into a crappy analysis. the only difference will be that the toys are shinier, and go 'ping' more effectively.
The real problem, in my humble opinion, is the attitude that producing piles of data is an alternative to thinking. They'll get over that in a couple of years, once they realize that they don't have a clue what a lot of this data is telling them.
Two quick thoughts: one for, one against.
You state that digital quantification is more precise; this is not clear. With microarrays, you get an "analog" readout with error, true, but with Solexa you are limited to integer counts, so you start to run into problems with multinomial sampling. Low abundance transcripts will not be well measured without extremely deep sequencing. It will come, but we are still a long way off, especially with costs included. Microarrays are downright cheap by comparison.
On the other hand, I am not sure you do need a complete genome for many things with solexa and such (at least for expression/polymorphism). Sure, it is helpful, but for expression applications, you are essentially generating an EST library, which can get you well on your way...
It is important to remember that microarrays are simply a technology. I use them, but very differently. Things other than DNA can be printed - I use protein arrays, whole-cell arrays, and other types.
Asking if microarrays are obsolete is sort of like asking if cell culture is obsolete. The way it was originally developed? Yes, hardly anyone uses those methods any more. But you cannot assemble a group of biologists and go a foot without encountering someone who cultures cells.
Microarray is a technique. It will, I think be around for some time. What it will eventually look like may not be at all what it looks like now, but it will persist.
Josh's point is also well put.
Bob O'H is spot on: switching to next-gen won't solve the majority of the problems currently confounding microarray studies, which are related to broader issues of study design and interpretation rather than technical issues. (In the same way, creating more accurate genotyping platforms didn't solve the problems of rampant false positives in candidate gene association studies; people just found other sources of error.)
Some other quick points:
1. As I understand it, the main advantage of RNA-seq over gene expression chips is not so much accuracy as dynamic range. Chips get saturated by high-abundance transcripts and struggle to pick up low-abundance transcripts over background noise; RNA-seq provides good data over a far broader range of transcript levels.
2. With the read-length from Solexa now approaching 100 bp the power of this platform for copy number analysis (with paired-end reads) is pretty close to 454's - and the massively higher throughput of Solexa gives it a substantial advantage over 454 for most applications. ABI's SOLiD platform is roughly equivalent to Solexa in most respects. Unless Roche does something astonishing to the throughput of the 454 system it seems doomed to extinction in the not-too-distant future.
3. SNP genotyping still can't be done cost-effectively with next-gen: whole-genome sequencing still costs far too much, the technical challenges of enriching target SNPs for sequencing are not yet solved, and pooling multiple samples in a single lane (for sequencing small regions) is quite difficult. SNP chips will likely remain the gold standard for genome-wide association studies for at least another year or two.
4. I think it's important to note that most small labs are completely unequipped to deal with the sheer scale of the data from next-gen platforms. People could deal with microarray data using a few commercial packages and a bit of Excel; next-gen data is a different beast entirely. I suspect a lot of geneticists will need to do some fairly hasty informatics training over the next few years to make the most of these technologies.
I agree and disagree with some of your statements. First, I must say, THANK YOU! for providing a scientific forum where actual science is discussed. :)
Despite what Illumina (Solexa) and ABI try to claim about 100 bp reads for nextgen platform sequencing, they're rather mistaken. Read lengths are still only really reliable up to about 40bp, and that's pushing it. The error rate after that point is a bit too much to glean any information from.
SNP detection using nextgen is completely and totally possible! I'm doing it right now actually. The sheer volume of data is incredible-we're talking 180+ exons per patient, with maybe 5 of those having low coverage that requires standard amplification by hand. That's a number I'm willing to handle.
The paired ends read technology -allowing for multiple patients within the same lane- is still very much in development right now. It's very inconsistent in coverage, but give it some time, and time and cost will decrease a thousand-fold.
To handle the data output from the GA, all you really need is one person who's decent at writing scripts. They can be successful even if self taught. Granted, the program machinery my company utilizes is still in Beta form, and I'm not sure what other groups are using (if there are any).
I've only really understood the point of chips when dealing with a known, or trying to isolate a known variable/mutation. Until all the kinks get worked out for the GA though, I imagine chips will continue to be used frequently. I just wanted to let you all know that we are making some serious headway with the GA and clinical applications. :D
But will microarrays be obsolete any time soon?
Not as long as they're fantastically cheap compared to sequencing.
What Bob said.
Re: Solexa read length, I agree that the error rates are a major issue with longer reads, but that's at least partially compensated for by increased coverage. We use 50 bp reads routinely and I've spoken to one person who's tried 75 bp fairly successfully (the major issue apparently being keeping the reagents topped up). The error rate is high, but with enough depth we can still align the reads and call SNPs without too much hassle. I get the impression we'll be doing 100 bp routinely in less than a year.
Re: SNP genotyping - sorry, I was really unclear there. I meant that it's still not feasible to use next-gen for the type of SNP genotyping required for GWAS (i.e. genotyping SNPs widely distributed across the genome in very large numbers of individuals). For SNP discovery in targeted regions it's obviously an incredibly powerful tool.
I think we're in agreement on pooling multiple samples per lane - it's still pretty challenging, but ultimately doable (at least I hope so, because I need it for my work!)
Out of interest, what approach are you guys using to enrich your 180+ exons for sequencing? I'm about to start a pull-down project, so if you guys are using chip-based enrichment I'd love to have a chat. :-)
Re: bioinformatics - I completely agree, but bear in mind that the vast majority of biologists have absolutely no scripting experience. I guess a lot of groups will rely on commercial packages or out-source their informatics, but the benefits of being able to rig up your own tailored analysis pipe-line are huge (especially for unusual applications). Any biologist willing to build up even basic scripting skills will be in increasing demand over the next few years...
Sequencing a transcriptome from scratch (454)... Doing some experiments... Measunring gene expression (solexa) from experimental groups. All that can be done in a single PhD project.
It works! If I do not loose too much time blogging ;-)