Why is sequencing a human genome so expensive?

One of my readers asked: Why does genome sequencing cost so much?

My short answer is because it's big.

But I thought it would be fun to give a better answer to this question, especially since I'm sure many of you are wondering the same thing.

Okay, so let's do some math.

Don't worry, this math isn't very complicated and I'll explain where most of the numbers come from.

Estimating costs from salaries
First, we'll take the easy route. My experience with grant budgets has taught me that the greatest cost for any project comes from salaries. If we look at the PLoS paper with Craig Venter's genome sequence, we can see that there are 31 authors.

That's a lot of people! And, they probably all got paid.

I think it probably took at least a year to do the sequencing and analyze the data. So, let's say that we paid all those 31 people for one year.

If we said that their average salary was $50,000 per year (Of course, JCV and Robert Strausberg probably made much more and any graduate students made much less, but still on average, I think this is close.), and their benefits are 25% of their salary, the cost in human labor would be: 31 x [$50,000 + (0.25 x $50,000)] = $1,937,500.

Aren't you forgetting overhead?
Oh, yeah. Well, I tried, but I'm never able to get with that. In grants, Universities and non-profit institutions charge additional costs for overhead. Some universities and non-profit research institutions charge as much as 90% of salaries, some places charge less.

If we're conservative and say that overhead costs were 50% of salaries, this would be another $775,000 [ this comes from 0.5 x 50,000 x 31]. Now, the cost for doing Craig's genome is up to $2,712,500 and we haven't even bought supplies.

We would still need to factor in the cost of the facilities, sequencing instruments, software, computers, reagents, laboratory instruments, autoclaves, robots, gel boxes, and consumables like plastic pipette tips, microfuge tubes, and 96 well plates. [Washinton University made a great movie that shows the inner workings of a DNA sequencing operation and all the stuff that they use.]

But do you really think all those people worked full-time on the project for a year? Why would it take so many?

No.

I think many of the 31 authors were probably working on other projects in addition to doing the genome sequencing, putting the sequence together, analyzing the sequence, and writing the paper.

Estimating the costs from reads

So, let's try calculating costs another way. Lots of scientists outsource their DNA sequencing activities to core facilities. Core labs come up with pricing models based that reflect their costs for personnel, reagents, equipment maintenance, robots, etc.

What do the core labs charge for DNA sequencing?

I looked at the web pages for a few University core labs to find out. The University of Michigan DNA Sequencing Core seems pretty typical. They charge $4 per sample and each sample, presumably each sample would be good enough to produce a chromatogram and give us a read. [A read, by the way, is a sequence of bases that has been derived from a chromatogram.] This cost is also based on the current sequencing technologies and these were the methods used for JCV's genome. I have no idea what it costs for next generation sequencing methods.

Alright, so at $4 a read, what's the total cost? First, we need to know how many reads it took to sequence JCV's genome.

I was all set to estimate the number of reads, based on the Lander Waterman tables, when I realized that Amit had posted this very handy link to the Venter institute's info on JCV's genome. From there, I found a pdf Fact sheet that listed the number of reads that were generated as part of this project.

The Fact sheet states that they used 32 million reads. It would be really, really unusual if all their reads were usable. I would estimate that at least 10% probably weren't. But, we'll use the 32 million value for now.

So, now we have: 32 million reads x $4 per read = $128 million.

And that's just the sequencing. That wouldn't include the cost of assembling the sequence, computers, software, or analysis.

If it really only took $2 million to sequence JCV's genome, as Chris wrote, I'd say this sequence was quite a bargain. And, now I wonder how they got it so cheap.

More like this

If it really only took $2 million to sequence JCV's genome, as Chris wrote, I'd say this sequence was quite a bargain. And, now I wonder how they got it so cheap.

It fell off the back of a truck?

Actually, I bet much of the equipment and salaries were covered by other sources. Also, the raw sequencing cost of an 8x bacterial genome is around $3,000, so $2 million doesn't shock me. Capillary sequencing (which is probably what your sequencing center uses for 'custom' reads) is far more expensive (~x40).

Yeah, it's always cheaper when someone else pays for part of the costs. I would guess that $2 million might underestimate the real cost.

I forgot, too, that different core facilities might charge different prices for reads. RPM mentioned in an earlier post that his school charges only $2 a read, so right there, it would cut the price down to $64 million.

I'm wondering too, if it cost $3,000 to do the raw sequencing for your bacterial genome, how much did it cost to do the assembly, finishing, and analysis? I think those costs would be a pretty large chunk of any genome project.

I was just about to comment that $4 per read seems high. That may be for single samples, but 96 well plates cost $1-2 per read (and at higher volumes the price probably drops below $1/read). Now, that's the cost of sequencing out of house. If you have your own robots, thermalcyclers, and automated sequencers, the actual cost of generating the data is much less. I doubt that $2mil estimate includes the cost of the actual equipment.

Oh yeah,

I definitely think the price varies from place to place. I got my info by doing a Google search for DNA sequencing and looking at core lab web pages. I found some labs were charging $9 a plate, some labs, only $4. I didn't see any labs that charged less than $4, but I didn't look very hard.

If you're doing sequencing with low-paid graduate students and undergrads, and using more automation, and data management software -(like ours!) - the cost of sequencing probably drops alot.

Maybe they saved money by only sequencing the A's, C's and G's

I don't think 2 million dollar is that unreasonable. Some quick math:

Watson's genome was sequenced at 6x coverage, so we've got: 2.8 billion bases * 6x coverage = 16.8 billion bases total.

454 claims on their website to get 200k reads per run of the 454 machine. Correcting for advertising spin, we'll call it 100k per run. Each read is around 150 bp long.

So, 16.8 billion / (150bp * 100,000) = 1120 runs of the 454 machine.

2 million dollars / 1120 runs = $1,785 per run.

Last I heard, the cost of 454 reagents and such was well below $1,000 per run. So I don't find the 2 million figure high at all.

Granted, they didn't figure in any salaries, the initial cost of the machines, or assembly time. But for the sequencing itself, 2 million is about right.

It's also noteworthy that 454 was initially shooting for the first sub-million dollar genome, but couldn't pull it off.

And just to clarify, I was talking about Watson's genome in my previous comment, not Venter's.

That's an inportant distinction to make, because of different sequencing technologies being used. Watson's assembly wouldn't have been possible without the reference sequence to align the short 454 reads to. Venter's, if I understand correctly, was assembled from scratch.

I agree, Chris.

I'm sure it would have cost less to sequence Watson's genome. His genome was sequenced with a lower coverage (6X for 3 billion bases vs. 8X for 6 billion) and using a less expensive method (454 vs Sanger sequencing). The analysis was probably cheaper, too, since it wasn't possible to compare his two sets of chromosomes.

As you point out, we can't use 454 data to price out Venter's genome because the two methods are not directly comparable in terms of run costs. My price estimates are all based on the Sanger method, since most of the core labs are using it.

So, my cost estimate only applies to Sanger sequencing -because that's all I have data for.

Thank you very much for this information!

Hi all,

Our group provided a large proportion of those 31 co-authors. Some things to note re: costs...

1. The comment about doing bacterial genomes for $3,000 seems to be based on next-generation sequencing. The Venter genome was done with conventional Sanger CE (sorry if someone said this already).

2. The good folks at the Wellcome Trust Sanger Institute tell me that they can do Sanger reads for "pennies" (they have purpose built robotics to cut down on labour, and dilute the ABI BigDye reagents dramatically). For reference, we charge CDN $5 per CE read (on ABI instruments), not so different from the $4 U of Michigan cost listed above.

3. I've heard $30 million kicked around (in the pages of Genome Technology and other places for re-sequencing a human genome with CE (but note that this Venter genome was not a re-sequencing assembly to an existing scaffold, it was a de novo assembly, which is a bit different). With current next-generation technologies, I'd estimate a few hundred thousand dollars to get good coverage, but I doubt that the final assembly could be done without some CE gap-filling (the Venter institute published a good paper about this a short while ago - Goldberg et al. I think). We'll have to see what the Watson/454 genome looks like I guess.

Interesting times, folks, interesting times.

Thanks Richard,

Overall, I think most of the costs in sequencing come from personnel. Since the next-generation technologies can get so much more data, more quickly, it does seem like the costs will be lowered. It will be interesting to see how this works out.

Another small note - Chris points out above that 454 is <$1,000 per run. I'm not sure this is right - retail costs I'm hearing are more like ten times that for a run on either 454 instrument (GS20 or GS-FLX). But we're not running a 454 instrument so I may be wrong. Any users out there?