The Genomics Revolution Will Be Standardized...

By mikethemadbiologist on January 5, 2011.

...or it won't be much of a revolution. Yesterday, I discussed the difference between a DNA sequencing revolution and a genomics revolution, and how we have a long way to go before there's a genome sequencer in every pot (or something). But let's say, for argument's sake, these problems are overcome--and I think they will be.

Then the real trouble begins.

The big issue is standardization--without it we will have a genomic Towel of Babel:

"There is a growing gap between the generation of massively parallel sequencing output and the ability to process and analyze the resulting data," says Canadian cancer research John McPherson, feeling the pain of NGS [next generation sequencing] neophytes left to negotiate "a bewildering maze of base calling, alignment, assembly, and analysis tools with often incomplete documentation and no idea how to compare and validate their outputs. Bridging this gap is essential, or the coveted $1,000 genome will come with a $20,000 analysis price tag."

Without some sort of standardization of genome assembly and annotation (gene identification) methods, we're going to have real problems. In human genomes, will a SNP (a change in the smallest subunit of DNA) be due to assembly issues? It's worse for microbial genomes.

Within a bacterial species, there is a 'core' genome--a set of genes that they all share. But a lot of the interesting biology happens in the auxillary genome--genes that are found only in some strains. If a gene is absent, is it really absent? Or is it just a result of either 'bad' assembly or gene calling?

We could get around this by going to the raw (or semi-processed) data. Currently, all NIH funded projects are required to upload raw data to NCBI. However, there will soon be far too much raw data for NCBI to store. Then what?

While this might not appear to be a problem if genomics is simply used as a diagnostic, I would argue it is. In the field of antibiotic resistance, when a hospital lab determines what drugs can kill an infecting bacterium, that information typically is not shared--it's just diagnostic. However, states are increasingly requiring hospitals to report and share this information for surveillance purposes (which is an excellent thing to be doing*). If we generate a lot of genomic information (human or microbial) and it just sits in a file somewhere, it's not exactly fomenting revolution, is it? The data have to be standardized to be broadly used.

A related issue is metadata--the clinical and other non-genomic data attached to a sequence. Just telling me that a genome came from a human isn't very useful: I want to know something about that human. Was she sick or healthy, and so on. These metadata too, will have to be standardized: I can't say one of genome came from someone who was "sick", while you provide another genome from someone who had "inflammatory bowel disease." Worse, I can't say my patient had IBD, while yours had Crohn's disease. The data fields have to be standardized, so we're not comparing apples and oranges.

Without these two types of standardization, we won't have a genomic revolution, but genomic anarchy.

*Of course, states aren't willing to pay for it....

Update: Keith Robison has a very good post about the Ion Torrent technology.

More like this

"Standards are wonderful! There are so many to choose from!"

Although I guess this is a bit of a tangent, standardisation might be viewed as one solution to the reproducible research issue (excuse my pimping my article). The reference to âoften incomplete documentation and no idea how to compare and validate their outputsâ rings a bell!

Standardisation and reproducibility are two different issues, but are certainly related, and I think it may help to think of the two together. Just my idle 2c before getting a coffee on :-)

Good post - thanks for the thoughts.

Perhaps you might be interested in the work of the Genomic Standards Consortium (http://www.gensc.org) and their open access publication Standards in Genomic Sciences (soon to appear in PubMed Central).

I think you will find that there is already a large and growing community with similar concerns and interests. The GSC has been working on this topic since 2005 and has published minimal standards for genome sequences, metagenome sequences and environmental sequences. There are also well established standards for describing draft and finished genome sequences.

There is a natural standard for genomic data, the evolutionary history of the biosphere. The MasterCatalog, which we released a decade ago as a commercial product and was purchased by a number of companies, adopted it:

Benner, S. A., Chamberlin, S. G., Liberles, D. A., Govindarajan, S., Knecht, L. (2000) Functional inferences from reconstructed evolutionary biology involving rectified databases. An evolutionarily-grounded approach to functional genomics. Research Microbiol. 151, 97-106

LOL, I know from my own life and observations of others' that 'standardization' and 'reproducibility' are just a nice way of saying that most of the time we really haven't a clue as to what we're doing. Instead we are relying on mountains of previous 'standardizations' and 'reproducibilities' with the expectation these will allow our own assumptions to validate what we've be told to expect.

And when these assumptions are not validated...

Quick, get some Content Managers on this to help you define the relevant parameters and standardized terms. They are used to dealing with large volumes of information from multiple sources.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Program Announcement: I'm Moving

September 1, 2011

I've dropped some hints in the past that my relationship with ScienceBlogs would be...altered. Well, I've decided to leave. Mostly, it had to do with the issue of pseudonymity, although I'm very excited to hang out my own shingle once again. I don't want to rehash the issue of pseudonymity,…

Note to Unions: This Is Not How You Build a Coalition

September 1, 2011

The old saw that 'we hang together or we get hung separately' is a perfect description of how the left has disintegrated into irrelevance. Too often, groups will focus on modest gains for their own narrow constituency, while selling out other allies. Over the long term, each component of the…

Links 8/31/11

August 31, 2011

Links for you. Science: Underground river 'Rio Hamza' discovered 4km beneath the Amazon What do accommodationists do about creationist politicians? I've Been Told You Can Get Flu From the Flu Shot: False! Federal Work Suspension of Leading Arctic Scientist Ended as Investigation of His…

Meet the New New Math, Same As the Old New Math? What We Can Learn from Finland

August 31, 2011

Recently, The New York Times published an op-ed calling for curricular changes in K-12 math education: Today, American high schools offer a sequence of algebra, geometry, more algebra, pre-calculus and calculus (or a "reform" version in which these topics are interwoven). This has been codified by…

Links 8/30/11

August 30, 2011

Links for you. Another Scientist Calls Out Sen. Coburn's Misleading, Juvenile "Report" XMRV: ITS EVERYWHERE! UUUUUGH! ITS IN MY RACCOON WOUNDS! AND MY QIAGEN COLUMNS! Coulter Goes All Science-y in Bid to Disprove Evolution Yet another bad day for the anti-vaccine movement 2011 Antibiotics: Killing…