Digital Biology Friday: A helpful hint

I began this series last week with a question about a DNA sequence that was published and reported to be one the first beta-lactamases to be found in Streptococcus pneumoniae. Mike has a great post about one of problems with this paper.

I think the data themselves are awfully suspicious.

So, last week I suggested that you, dear readers, go and find out why. I gave you a link to the abstract and a place to get started.

Perhaps that was too hard.

Sigh.

Okay, here's a little more help and another clue.

i-98d5de44bbbc24c4f894240b752c2326-abstract.gif

I highlighted the accession numbers. Post your guesses in the comments.

More like this

I'm going to go with the fact that there is a single nucleotide difference between this gene and the bla from pBR322, the most common cloning vector in the known universe (and the source of many more). Reminds me of a very embarrassing part of my past that I will not divulge (suffice to say, I made a similar mistake at the protein level)

By Paul Orwin (not verified) on 24 Aug 2007 #permalink

As I read it, they sequenced all twenty-one isolates. Twenty of them came up as identical to E. coli vector bla, and one had a single substitution.

Identical sequences. Now that just ain't right. Even if some lucky S. pneumoniae acquired the bla gene from an engineered E. coli - and then rapidly seeded the area with its descendants - the sequences would diverge. There's genetic drift, and then there's selective pressure, and if you're going to take over 90%+ of the population you can't hide from either.

In particular, if that bla gene were actually kicking around in strep, you'd expect the codon biases to shift. Rare codons stall protein expression, so bla mutants with strep-friendly codons would be favored - the more enzyme the bug can produce, the more penicillin it can take.

The abstract doesn't go into how they got the sequence data, either. But given their mystery substitution on #17, I would bet they went the PCR -> clone -> sequence route. I would also bet that they sequenced only one colony from each group of clones. Taq isn't the highest-fidelity polymerase in the drawer; one bad clone with a single base substitution, out of ~21 kb sequenced, is about what I'd expect.

Wow! Great answers!

Okay - now, pretend you're the reviewer. We've established that the data are suspect.

What can these researchers do to check and confirm their hypothesis?

I took a Biology course at school and I liked it but now I realize that I forgot too many things... (( When I read this post, I realized that I'm absolutely poor in Biology now. I started thinking about what I remember... Some names, some characteristics but it's not a deep knowledge... Maybe I should just teach my courses and don't think about Biology. But it was interesting anyway... Sorry for being a grouch...