Jonathan Eisen Asks About Finishing Microbial Genomes...and the Mad Biologist Answers

By mikethemadbiologist on January 28, 2010.

Over at The Tree of Life, Jonathan Eisen asks:

What do people think are the potential benefits that could come from finishing?

For those who don't know what genome finishing is, I'll let Eisen give the short summary:

Finishing: Using any combination of laboratory, computational and other analyses one can both fill in gaps in the assembly and improve the quality of the assembly. This can generally be called "finishing"

In the context of microbial genomes, here are some of my thoughts about finishing (italics orignal; boldface mine):

Whole genomes don't come flying out of the sequencing machines: we have to take hundreds of thousands or millions of reads and stitch them together--what is known in genomics as assembly. It's pretty easy and fast to get a pretty good genome. By pretty good, I mean that most of the genome (~99%) is assembled into pieces 50,000 - 1,500,000 bases long.... Where the assemblers get hung up on with bacteria are repeated elements--regions of the genome that are virtually identical (they don't have to be completely identical, just close enough such that the assembler thinks they're identical reads with sequencing errors). Because the assembler can't figure out where to put these reads (they're all identical), it discards them--that's where the breaks occur...
This is a problem because some of the most interesting genes, such as antibiotic resistance genes, are found sandwiched between repeated elements, known as insertion sequence elements ('IS elements'; IS elements are one of the major reasons resistance genes move from plasmid to plasmid--plasmids are mini-chromosomes that themselves can move from bacterium to bacterium--and from plasmid to chromosome). What this means is that we can assemble an antibiotic resistance gene (or genes) but we might not know if it's found on a plasmid or on the chromosome--that's a pretty critical biological question. To further complicate things, different plasmids can have the same IS elements, along with the bacterial chromosome. Not only will these introduce breaks into the assembly, but they can also lead to accidentally assembling plasmids together or incorrectly incorporating them into the genome.

Now, we do have methods to close up these gaps--this process is called finishing, and it involves either targeted sequencing or manually parsing through the existing data. But these are open-ended, slow processes (particularly the targeted sequencing). Worse, this involves thinking, and, relative to computer algorithms, thinking is very slow. This is also really expensive. So we can get a pretty good assembly, but I think a lot of people, thinking back to the Sanger sequencing days, when most bacterial genomes were closed, are going to have to understand that if you want a lot of genomes, they will be 'pretty good' assemblies, not closed, finished ones.

To return to Eisen's question, I think finishing microbial genomes is important if you really have to localize genes to plasmids (or circularizing prophage). In infectious disease, that's pretty important. However, from this perspective, finishing might become a moot point if the new technologies (454 pyrosequencing and Illumina) improve to the point where genes of interest can be reliably localized to plasmids*. Likewise, if you're interested in the biology of repetitive elements, you'll need finished genomes.

So, regarding finishing, I think in about a year, we'll have very little need for complete finishing, unless the biological question requires it (e.g., repetitive elements).

*To get technical, as long as I can link a gene to a plasmid scaffold--a set of smaller sequences that I know are tied together, even though I lack some of the intervening regions--I'm happy.

More like this

Good additional information here and will post a link on my blog and in the friendfeed discussion. A few comments

1. Finishing does not have to require thinking, except for up front design of an automated system. I think with a little bit of up front work we could in essence design a compleltely automated first pass finishing system.

2. It has yet to be determined in publications what the raw quality of shotgun assemblies are with the various sequencing methods. We have some indications but we do not know enough. For example, how "random" is the output from 454 and Illumina? What are the biases?

3. Seems like there is strong support for finishing from an almost esthetic point of view. It feels better to be done. This is hard to use an a reason to spend $$$ on finishing but I do find it interesting.

4. One challenge we have not really dealt with is that the sequencing technology these days is changing much faster than it used to and the assembly software has a hard time keeping up. So in some cases the software makes mistakes that can be fixed but then the sequencing methods change enough that the fix no longer is that useful.

As an experimentalist, an unfinished genome may not be of tons of use for me, if a region I'd like to know something about is unfinished. Mike's example about assignment to plasmid or chromosome is one example, but there could be lots of others. If the genome sequencing is being done in part to support functional work, then we just might need to know what goes in the missing part of the genome.

Seems like there should be cheaper, experimental ways to determine if a gene is on a plasmid or chromosome rather than finishing [apply logic same argument to other concerns/questions]. I think what we need to consider is how to answer the question directly rather than how to force approaches that don't really apply anymore to said question. Then how to get experimental data from other methodologies to feed back to the annotations and assemblies.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Program Announcement: I'm Moving

September 1, 2011

I've dropped some hints in the past that my relationship with ScienceBlogs would be...altered. Well, I've decided to leave. Mostly, it had to do with the issue of pseudonymity, although I'm very excited to hang out my own shingle once again. I don't want to rehash the issue of pseudonymity,…

Note to Unions: This Is Not How You Build a Coalition

September 1, 2011

The old saw that 'we hang together or we get hung separately' is a perfect description of how the left has disintegrated into irrelevance. Too often, groups will focus on modest gains for their own narrow constituency, while selling out other allies. Over the long term, each component of the…

Links 8/31/11

August 31, 2011

Links for you. Science: Underground river 'Rio Hamza' discovered 4km beneath the Amazon What do accommodationists do about creationist politicians? I've Been Told You Can Get Flu From the Flu Shot: False! Federal Work Suspension of Leading Arctic Scientist Ended as Investigation of His…

Meet the New New Math, Same As the Old New Math? What We Can Learn from Finland

August 31, 2011

Recently, The New York Times published an op-ed calling for curricular changes in K-12 math education: Today, American high schools offer a sequence of algebra, geometry, more algebra, pre-calculus and calculus (or a "reform" version in which these topics are interwoven). This has been codified by…

Links 8/30/11

August 30, 2011

Links for you. Another Scientist Calls Out Sen. Coburn's Misleading, Juvenile "Report" XMRV: ITS EVERYWHERE! UUUUUGH! ITS IN MY RACCOON WOUNDS! AND MY QIAGEN COLUMNS! Coulter Goes All Science-y in Bid to Disprove Evolution Yet another bad day for the anti-vaccine movement 2011 Antibiotics: Killing…