A Final Observation on the Human Microbiome Research Conference: An Underappreciated Breakthrough

By mikethemadbiologist on September 15, 2010.

A couple of weeks ago I attended the Human Microbiome Research Conference. At that meeting, one talk by Bruce Birren (and covered by Jonathan Eisen) mentioned something that was completely overlooked by the attendees. Now, I don't blame them, since what Birren mentioned was about bacterial genomics, not the human microbiome. But here's what I tweeted about Birren's talk (TWEET!):

B. Birren-E. coli K-12 can be assembled into 1 scaffold for hundreds of $s with Illumina seq & new jumps

Let's unpack this below the fold.

When we sequence a genome, we actually sequence small pieces (with the Illumina technology, each read is about 180bp* (one bp is one nucleotide pair), while an E. coli genome is about 5,000,000 bp), and then assemble them, like a jigsaw puzzle into larger sections of contiguous sequence, called contigs (as in contiguous...). We can link these contigs using what are known as 'jumps.' Here, we sequence the ends of a large piece of DNA (Birren's colleagues were using 5 kb pieces). This allows us to scaffold together contigs, into larger pieces (which have gaps of known size) called... scaffolds (clever term, no?). This allows us to deal with repeated elements--DNA sequences that are identical (or nearly so) and are larger than a single read:

Where the assemblers get hung up on with bacteria are repeated elements--regions of the genome that are virtually identical (they don't have to be completely identical, just close enough such that the assembler thinks they're identical reads with sequencing errors). Because the assembler can't figure out where to put these reads (they're all identical), it discards them--that's where the breaks occur.

This is a problem because some of the most interesting genes, such as antibiotic resistance genes, are found sandwiched between repeated elements, known as insertion sequence elements ('IS elements'; IS elements are one of the major reasons resistance genes move from plasmid to plasmid--plasmids are mini-chromosomes that themselves can move from bacterium to bacterium--and from plasmid to chromosome). What this means is that we can assemble an antibiotic resistance gene (or genes) but we might not know if it's found on a plasmid or on the chromosome--that's a pretty critical biological question. To further complicate things, different plasmids can have the same IS elements, along with the bacterial chromosome. Not only will these introduce breaks into the assembly, but they can also lead to accidentally assembling plasmids together or incorrectly incorporating them into the genome.

Basically, for ~$600-$800, we can generate a really good bacterial genome. While it's not finished (all gaps are sequenced), it is closed, and, as noted above, this is a critical advance. I think we'll know very shortly how well this new technology works with more difficult genomes, although I would add, in my experience, some clinically relevant pathogens, such as S. aureus (including MRSA) are pretty straightforward genomes (they don't have a lot of repeated elements). Keep in mind, we're in an era were the actual sequencing (versus everything else) is cheaper than everything else we need to do to sequence a genome.

But what is amazing is that we can generate clinically and epidemiologically relevant data** very rapidly. On a shiny new Illumina Hi-Seq, we're talking about hundreds of genomes in a week (assuming the other steps aren't rate-limiting...). And these are genomes which could give us really good 'positional' information (i.e., is the antibiotic resistance gene found on a plasmid, and thus able to move easily between bacteria).

Very exciting. This will completely blow open the field of microbial population genomics and makes molecular epidemiology very, very powerful.

*The reads are actually 100 bp, but with some tricks we have them overlap to form a 'single' read of 180bp.

**If you're interested in the sequence of repeated elements, such as IS elements, you're outta luck. They're interesting, but, in my opinion, not critical for understanding the spread of resistance genes and resistant organisms.

More like this

Another option is physical mapping. I'm working with technologies that currently are too expensive (~$450) but map an entire chromosome into 1-3kb resolution. I'm convinced that this very month the price can be reduced to $100 and that in 2 years it will be $50 or less and include large plasmids; or still be $100 and include even moderate sized plasmids (25Kb or more, perhaps).

'Jump' sequencing will still not entirely solve the problem with 5-10kb plasmids, neither will this technology. However, plasmid profile gels can tell you when you have something like that in the way.

Cheers, interesting stuff. This is off topic maybe, but I'm fascinated by en masse sequencing of multiple microbial species at a time, from the gut etc. How do people go about stitching a genetic mixture together?

to clarify a small point - a typical short read assembler will not "discard them[reads from repeats]" but will recognize them based on higher coverage and place them accordingly without trying to guess gene order
For example:
Gene1-Repeat1-Gene2-Repeat1-Gene3

you should get three contigs:
G1-R1
R1-G2-R1
R1-G3

kmers/reads from the repeats do not belong exclusively to any one repeat but are shared equally

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Program Announcement: I'm Moving

September 1, 2011

I've dropped some hints in the past that my relationship with ScienceBlogs would be...altered. Well, I've decided to leave. Mostly, it had to do with the issue of pseudonymity, although I'm very excited to hang out my own shingle once again. I don't want to rehash the issue of pseudonymity,…

Note to Unions: This Is Not How You Build a Coalition

September 1, 2011

The old saw that 'we hang together or we get hung separately' is a perfect description of how the left has disintegrated into irrelevance. Too often, groups will focus on modest gains for their own narrow constituency, while selling out other allies. Over the long term, each component of the…

Links 8/31/11

August 31, 2011

Links for you. Science: Underground river 'Rio Hamza' discovered 4km beneath the Amazon What do accommodationists do about creationist politicians? I've Been Told You Can Get Flu From the Flu Shot: False! Federal Work Suspension of Leading Arctic Scientist Ended as Investigation of His…

Meet the New New Math, Same As the Old New Math? What We Can Learn from Finland

August 31, 2011

Recently, The New York Times published an op-ed calling for curricular changes in K-12 math education: Today, American high schools offer a sequence of algebra, geometry, more algebra, pre-calculus and calculus (or a "reform" version in which these topics are interwoven). This has been codified by…

Links 8/30/11

August 30, 2011

Links for you. Another Scientist Calls Out Sen. Coburn's Misleading, Juvenile "Report" XMRV: ITS EVERYWHERE! UUUUUGH! ITS IN MY RACCOON WOUNDS! AND MY QIAGEN COLUMNS! Coulter Goes All Science-y in Bid to Disprove Evolution Yet another bad day for the anti-vaccine movement 2011 Antibiotics: Killing…