Tidbits, 20 November 2009

Have some Friday tidbits!

  • An important biology dataset is losing NSF funding and may fold. Nor (as the article explains) is it the only one. It is impossible to overstate the desperate gravity of the data-sustainability question. Academic libraries, if we are not the white knights here—and we certainly have been in the past; witness arXiv—who is?
  • On a similar theme, Yahoo pulls the plug on GeoCities. O ye researchers relying on consumer-grade web services, or new startups, have an exit strategy! Consumer-grade services die when they lose money. Jason Scott may not come charging to your rescue.
  • H1N1 science depends on a public database of flu immunity data. "As the researchers acknowledge in their paper, the work couldn't have taken place if it weren't for extensive data sharing within the community of flu virus researchers." Data sharing makes possible better, faster science.
  • Data and the journal article. First: if you are saving your data as PDF, stop it. Second: as I suggested to Chris on FriendFeed, there's a serious structural issue with expecting journal publishers to cope with appropriate data archiving: by the time a researcher chooses a journal to publish in, all the decisions about data gathering and representation have already been made—and they may well have been made badly. The poor journal publisher can't go back in time and fix bad decisions! In our not-yet-standardized data age, early data interventions have to happen close to the researcher, which to me means they need to happen at the institution where the research happens.
  • The need for clear data licenses. I haven't talked about data licensing here, partly because the current state of intellectual-property law makes me sick at heart, but there's no question that it's an important piece of the data puzzle.
  • Peer-to-peer technology used for the forces of good: BioTorrents. Datasets vary in size; for the large ones, network latency becomes a sharing problem. Torrenting won't precisely solve the problem, but it certainly increases the size range within which datasets are portable.
  • Fascinating data project of the week: National Center for Ecological Analysis and Synthesis. What caught my attention is that as I read the project description, it takes public data sharing for granted. NCEAS researchers are not generating data; they are mining existing data. I'm inordinately curious about the disciplinary culture that makes this a feasible thing: what price scooping?

Whew. I have a lot more, but it's Friday.


More like this

Many people, first confronted with the idea of data curation, think it's a storage problem. A commonly-expressed notion is "give them enough disk and they'll be fine." Terabyte drives are cheap. Put one on the desk of every researcher, network it, and the problem evaporates, right? Right? Let me…
I'm home sick today, and not precisely looking forward to giving my class tonight because I really do feel wiped out. Fortunately, tidbits posts are easy… Denmark ponders the future of the research library. A thoughtful read for librarians; a good skim for scientists wondering how libraries will…
There are 19 new articles in PLoS ONE today. As always, you should rate the articles, post notes and comments and send trackbacks when you blog about the papers. You can now also easily place articles on various social services (CiteULike, Mendeley, Connotea, Stumbleupon, Facebook and Digg) with…
My del.icio.us tag overfloweth… A challenge to libraries from an information science professor: "I wish I could say that libraries were the obvious organization to take care of data… But… they have not been ambitious, they lack the subject area knowledge, they often lack the technical skills." What…