Part III. Serendipity strikes when we Blink
In which we find an unexpected result when we Blink while looking at the mumps polymerase.
This is the third in a five part series on an unexpected discovery of a paramyxovirus in mosquitoes. And yes, this is where the discovery happens.
I. The back story from the genome record
II. What do the mumps proteins do? And how do we find out?
III. Serendipity strikes when we Blink.
IV. Assembling the details of the case for a mosquito paramyxovirus
V. A general method for finding interesting things in GenBank
To paraphrase Louis Pasteur, discovery favors the prepared mind, and yesterday's work was good preparation for today's discovery.
Some of our take home lessons from yesterday were these:
1. Viral proteins usually match viral proteins.
2. If we see matches to non-viral proteins, there may be something interesting going on.
What does the L protein do? Does L stand for Last?
Yesterday, we skipped the very last one of the mumps proteins when we worked our way through the mumps proteome. That protein is the L protein and its job is to copy the mumps genome. Many viruses reproduce by borrowing proteins from their host cell. They use their host's DNA or RNA polymerases to copy their genomes and they use their host's ribosomes to make their proteins.
Mumps can borrow ribosomes but it can't just ask the cell if it can borrow a polymerase. Eucaryotic cells keep their RNA polymerases locked up inside the nucleus where mumps can't get at them. Then, even if mumps could get access to the host's RNA polymerases, the host cell's RNA polymerase won't work. It only copies DNA, not RNA!
Being an occasional adjunct faculty member, this situation reminds me of what it's like when I want to use the department copy machine to copy assignments. I can't always get to the copy machine very easily, since it may be locked up somewhere in a special copy room. Then, when I get a key and find my way to the copy machine, the machine won't work because it needs some kind of special code (analogous to the DNA). I suppose I shouldn't be comparing adjunct faculty to viruses, but what the heck, in this case it kind of works.
The mumps virus has a similar problem. So, rather than try and mess with getting into the nucleus and changing the host's RNA polymerase, mumps makes it's RNA-dependent RNA polymerase.
Well, what happens if we Blink the mumps L protein?
When I select Blink, I see that there are 717 viral proteins that match the mumps L protein and 2 proteins from Metazoans. By now, you may have guessed which ones I'll find interesting.
When I ask Blink to show me the Metazoans, I find that both links are to the same hypothetical protein from Aedes aegypti (a mosquito that can carry yellow fever virus and dengue virus).
The matching region is also pretty long, 853 amino acids. I also see that, lo and behold, there's a link to the Conserved Domain Database. If I follow it, I can see that I'm hitting a conserved domain that's found in the Paramyxovirus RNA dependent RNA polymerase, and with an e-value of 2 x 10-57. In other words, we can be pretty confident that this hypothetical mosquito protein is very similar to an RNA-dependent RNA polymerase.
But this is weird. Why should a mosquito have an RNA-dependent RNA polymerase?
Mosquitoes don't have any need to copy anti-sense RNA.
How do we know that mosquitoes and other insects don't contain a protein like this and we've missed it?
We can Blink with our mosquito sequence. Blinking will let me see if there are any other mosquito proteins in GenBank that match my replicase sequence.
To do this, I click the little blue diamond to the left of the row and I get a whole new set of Blink results.
These Blink results only showed matches to 559 viral proteins and one metazoan protein (the other record for the same mosquito sequence, you can think about this one as a positive control).
The best matches were to the replicases from a bunch of plant viruses (orchid, maize, rice, strawberry, lettuce, and others). For the Orchid fleck virus, the e value was 1 x 10-24. In other words, the probability of finding a match this good in a database of random sequences would be 1 over a 1 followed by 24 zeros. Very small.
It was interesting to see that the best matches were to plant viruses. In fact, when I selected the multiple alignment tab from the Blink results and used the NCBI Blink options to build a tree from the best matching 100 viral polymerases, sure enough, the mosquito sequence was still closest to the viruses from plants.
This is interesting because mosquitos pollinate certain kinds of orchids. I don't know if mosquitoes pollinate strawberries, but they definitely pollinate blueberries. So, maybe finding a viral RNA polymerase in a mosquito that's most similar to the Strawberry crinkle virus or Orchid fleck virus makes sense.
The curious thing now, is how did a viral sequence end up getting assembled into the Aedes aegypti genome? Does it belong there?
That's our subject for post IV.