databases

Part II. What do mumps proteins do? And how do we find out? This is the second in a five part series on an unexpected discovery of a paramyxovirus in mosquitoes, and a general method for finding interesting things. I. The back story from the genome record II. What do the mumps proteins do? And how do we find out? III. Serendipity strikes when we Blink. IV. Assembling the details of the case for a mosquito paramyxovirus V. A general method for finding interesting things in GenBank In Part I, we looked at the NCBI SeqViewer, and found a new way to check out a genome map, and learn more…
Part I. The back story from the genome record Together, these five posts describe the discovery of a novel paramyxovirus in the Aedes aegyptii genome and a new method for finding interesting anomalies in GenBank. I. The back story from the genome record II. What do the mumps proteins do? And how do we find out? III. Serendipity strikes when we Blink. IV. Assembling the details of the case for a mosquito paramyxovirus V. A general method for finding interesting things in GenBank I began this series on mumps intending to write about immunology and how vaccines work to stimulate the immune…
PubMed is an on-line database at the National Center for Biotechnology Information (NCBI) that contains information from scientific literature. Most of the information is related to medical research. To search PubMed, you use a program called Entrez. You go to the NCBI, select PubMed from the menu, type words into the text box, and start the search. Sometimes that's all you need to do. Sometimes you get several million results and need to use more specific words to limit the results the ones that you really want. Many scientists use PubMed on a daily basis. But the NCBI has noticed that…
One of the things that drives me crazy on occasion is nomenclature. Well, maybe not just nomenclature, it's really the continual changes in the nomenclature, and the time it takes for those changes to ripple through various databases and get reconciled with other kinds of information. And the realization that sometimes this reconciliation may never happen. One of the projects that I've been working on during the past couple of years has involved developing educational materials that use bioinformatics tools to look at the isozymes that metabolize alcohol. As part of this project, I've been…
Instead of enjoying a sunny summer day today, or partying with SciBlings in New York, I'm staring out my window watching the rain. Inspiration hit! What about searching for August? Folks, meet the HFQ protein from E. coli. I found this lovely molecule by doing a multi-database search at the NCBI with the term 'August'. HFQ is a lovely protein with six identical subunits, that's involved in processing small RNA molecules and is homologous to some eucaryotic proteins that work in RNA splicing (1). Do you see the blue loopy regions in the center of the structure? Those are positively…
A few weeks ago, I wrote about a paper in Science(1) that I read on a connection between a mutation in the dopamine D2 receptor and the genetics of learning. Only, it turned out that when I looked at the gene map... the mutation mapped in a completely different gene. I presented the data here and wrote a bit about my surprise at finding this mistake and even greater surprise at seeing this same mistake perpetuated by others. Now, I have some updates to the story. The folks at the NCBI responded quickly and added annotations to both the DRD2 and the ANKK1 citations in the Gene database. Now…
or is it just an idea that's ahead of the curve? Last week, I was stunned to discover at least 31 papers in an NCBI Gene database entry that were in the entry for the wrong gene. I wrote about this here, here, here, and here. Now, an oversight like this is a little understandable. The titles of the entries do include the name of the wrong gene (DRD2 - the dopamine D2 receptor). And it was only four years ago that people figured out that the marker in the title of the articles mapped somewhere else. If computers were responsible for the annotation, well, this would be understandable.…
It's pretty common these days to pick up an issue of Science or Nature and see people ranting about GenBank (1). Many of the rants are triggered, at least in part, by a wide-spread misunderstanding of what GenBank is and how it works. Perhaps this can be solved through education, but I don't think that's likely. People from the NCBI can explain over and over again that some of the sequence databases in GenBank are meant to be an archival resource (2), and define the term "archive," but that's not going to help. Confusion about database content and oversight is widespread in this…
In a recent post, I wrote about an article that I read in Science magazine on the genetics of learning. One of things about the article that surprised me quite a bit was a mistake the authors made in placing the polymorphism in the wrong gene. I wrote about that yesterday. The other thing that surprised me was something that I found at the NCBI. The article that I wrote about definitely made a mistake and I don't understand why it wasn't caught by the reviewers. I found it pretty quickly by searching OMIM and I was only trying to find information about dopamine, not verify results.…