The CephSeq Consortium has a strategy

I approve this plan. A number of researchers have gotten together and worked out a grand strategy for sequencing the genomes of a collection of cephalopods. This involves surveying the phylogeny of cephalopods and trying to pick species to sample that adequately cover the diversity of the group, while also selecting model species that have found utility in a number of research areas — two criteria that are often in conflict with one another. Fortunately, the authors seemed to have found a set that satisfies both (although it would have been nice to see the Spirulida and Vampyromorpha make the cut — next round!). Here's the initial group, table taken directly from the text with the addition of a few pretty pictures for those of you unfamiliar with the Latin names.

Table 1: Cephalopod species proposed for initial sequencing efforts.

Species Estimated genome size Current sequencing coverage Geographic distribution Lifestyle juvenile/adult Research importance
O. vulgaris 2.5-5 Gb 46× world-wide planktonic/ benthic classic model for brain and behavior, fisheries science
O. bimaculoides 3.2 Gb 50× California, Mexico benthic emerging model for development and behavior, fisheries science
H. maculosa 4.5 Gb 10× Indo-Pacific benthic toxicity
S. officinalis 4.5 Gb - East Atlantic- Mediterranean nectobenthic classic model for behavior and development, fisheries science
L. pealeii 2.7 Gb - Northwest Atlantic nectonic cellular neurobiology, fisheries science
E. scolopes 3.7 Gb - Hawaii nectobenthic animal-bacterial symbiosis, model for development
I. paradoxus 2.1 Gb 80× Japan nectobenthic model for development, small genome size
I. notoides - 50× Australia nectobenthic model for development, small genome size
A. dux 4.5 Gb 60× world-wide nectonic largest body size
N. pompilius 2.8-4.2 Gb 10× Indo-Pacific nectonic “living fossil”, outgroup to coleoid cephalopods

It's a nice balance. There's a pair of related octopus (Octopus vulgaris and Octopus bimaculoides) and a pair of related squid (Idiosepius paradoxus and Idiosepius notoides) so common features to each group can be recognized, a couple of model organisms used in neuroscience (Loligo pealeii) and developmental biology (Euprymna scolopes), and a couple of just plain cool animals, the blue-ringed octopus (Hapalochlaena maculosa) and the giant squid (Architeuthis dux). And of course you have to include a cuttlefish (also an important research model), and a nautilus for the outgroup.

It's going to be challenging — cephalopods are like us in having large, sloppy genomes with lots of repeats and accumulated junk.

Like all good science, too, this is going to be open and accessible.

We therefore propose to adopt a liberal opt-in data sharing policy, modeled in part on the JGI data usage policy, which will support the rapid sharing of sequence data, subject to significant restrictions on certain types of usage. Community members will be encouraged to submit their data, but not required to do so. We plan to provide incentives for this private data sharing by (1) developing a community data and analysis site with a simple set of automated analyses such as contig assembly and RNAseq transcript assembly; (2) offering pre-computed analyses such as homology search across the entire database; and (3) supporting simple investigative analyses such as BLAST and HMMER. We also plan to provide bulk download services in support of analysis and re-analysis of the entire dataset upon mutual agreement between the requesting scientist and the CephSeq Consortium Steering Committee (see below), who will represent the depositing scientists. Collectively, these policies would provide for community engagement and participation with the CephSeq Consortium while protecting the interests of individual contributors, both scientifically and with respect to the Convention on Biological Diversity. Policy details will need to be specified and implementation is subject to funding. Our intent is to build an international community by putting the fewest barriers between the data and potential researchers, while still protecting the data generators.

I also like that there's an appreciation of the importance of wider communication of this information beyond the sphere of nerdy genomics researchers and obsessed cephalofreaks. The authors recognize that cephalopods are important barometers of climate change and the ocean environment, and that people are just plain fascinated with them.

People are fascinated by cephalopods, from Nautilus to the octopus to the giant squid. The coupling of genomics to cephalopod biology represents a fusion of two areas of great interest and excitement for the public. This fusion presents a tremendous educational platform, particularly for K-12 students, who can be engaged in the classroom and through the public media. Public outreach about cephalopod genomics will help build support for basic scientific research, including study of marine fauna and ecology, and will add to the public’s understanding of global changes in the biosphere.

Unfortunately, this short paper is a little thin on details of particular interest to me: "Education and outreach will be emphasized for broad dissemination of progress in cephalopod genomics at multiple levels, including K-12, undergraduate and graduate students, and the public at large." I'd be curious to see more about the how of doing that, but I'm glad it's on their list of priorities. Part of their plan is building a website, but unfortunately when I just checked it wasn't yet available.

Albertin CB, Bonnaud L, Brown CT, et al.

(2012) Cephalopod genomics: A plan of strategies and organization. Stand Genomic Sci 7:1.


More like this

Science is open and accessible?

Did you really, really?, write that?

Wow! I have looked for the comparison between human and chimp genomes for about a decade now. In human readable form, not the gibberish put out there.

There is little OPEN OR ACCESSIBLE about science.


@ Wayne

What exactly about it is gibberish? The interpretation and explanation? Or the actual data?
Because I wouldn't be surprised for people to consider endless strings of ATTAGCGCCATC etc. gibberish, but that's hardly a fault with science.
I'm not sure what author to turn to for pop-sci literature on that specific issue, to be honest. Several people talking about evolution brought it up, I'm sure, but to the level of detail you appear to be looking for?


I do data. Where is the RAW data? So what if it is ATTA-GCGC-CATC.

That is just a new number base system for me to learn.

Where is the DATA? Everyone keeps talking around the data, but they seldom show the data, and when they do show the data, the DATA DOES NOT SUPPORT WHAT THEY SAID.


The fact that you decided to split the random letters I gave as an example up into groups of four rather than groups of three already tells me something important.

I'm sorry, but rather generally asking for "the data" doesn't really help unless you're more specific.

I recommend looking into the human genome project and its chimpanzee equivalent for the info you're looking for.