Science Online 2010: Lessons for IRs and data curators

One way and another, I heard quite a lot of talk at Science Online 2010 relevant to the interests of institutional-repository managers and (both would-be and actual) data curators. Some of the lessons learned weren't exactly pleasant, but there's just no substitute for listening to your non-users to find out why they're not taking advantage of what you offer.

In no particular order, here is what I took away:

  • The take-a-file-give-a-file content model for IRs is much too limited and limiting. Real live scientists are mashing up all sorts of things as they do their work; one wiki-based lab notebook demoed during the Open Notebook Science session contained LaTeX markup, embedded video, embedded images, and an embedded Google Docs spreadsheet. DSpace is utterly helpless faced with this situation. So is Fedora. I suspect EPrints doesn't fare much better, though I could be wrong. Yet science is done this way (among other ways, of course). We have to learn how to collect the results if we are serious about the preservation of the scholarly record.
  • The butterfly-pinned-to-wall content model for IRs is much too limited and limiting. Science isn't a bas-relief figure on a frieze. It's active. It's changing. It's alive! Rooted in the idea of the fixed, unchanging scholarly article, the IR does not respect the way science is done. This vitiates the IR's usefulness to science.
  • Content will not come to IRs; therefore, IRs must be prepared to go and get it. Cameron Neylon and I chatted a bit about potential interactions between IRs and (for example) SlideShare. How can Cameron upload a slidedeck to SlideShare and have SlideShare pass it invisibly to his IR? Multiply that by a great many services, on-campus and off-, and you begin to see the scope of the challenge. Honestly, it's not a new problem; it's an offshoot of the same old "scientists won't come to libraries; libraries must go to scientists" problem. It needs solving, though, and the sooner the better.
  • Restrictive content policies for IRs are counterproductive. The more restrictive, the more counterproductive. A scientist in my session on libraries told me that his IR had rejected diverse materials of his because they weren't peer-reviewed journal articles. How many peer-reviewed journal articles has he submitted to that recalcitrant IR? None, of course. How many does he plan to submit? None, of course. If you smack down someone who approaches you, they'll stop approaching you. Obvious, not so? We must therefore ignore the hardheaded individuals who refuse to participate in IRs because they aren't sure of the quality of the content. Honestly, they probably wouldn't participate anyway. On balance, IRs will do better—certainly win more adherents and supporters—the more open they are to many sorts of content.
  • As research moves OA, particularly toward gold OA, the IR future is in locally-produced gray literature. To some extent this is a corollary of the previous lesson, but I feel it strongly because it squares with my nearly five years of experience running IRs. The California experiment merging IRs with publishing services is all well and good, but there are plenty of ways to do publishing, and none of them really need IRs. Theses and dissertations need IRs. Working papers and technical reports need IRs. Conference proceedings need IRs. Posters and slidedecks need IRs. Student research needs IRs. Campus history needs IRs. (Grateful hat tip to Bonnie Swoger of SUNY Geneseo and Molly Keener of Wake Forest, who helped me think and talk this through.)
  • Pay attention to access statistics, and the combinations of statistics from various sources. I particularly enjoyed Peter Binfield's demo of PLoS's new article-level metrics. I believe that this level of granularity of measurement is the future; it will inevitably displace the (feared and loathed) journal impact factor for the simple reason that scientists are (necessarily) egotists, so why would they settle for journal-level metrics when they can evaluate the fate of their very own published corpus? For institutional repositories, the lessons are that statistics are not optional (which we've known for a while, I think) and that an API to statistics on a per-article and per-author level will soon become a requirement as well.
  • Dividing user-facing interfaces from back-end datastores is a necessity. The IR-as-silo phenomenon is perfectly useless from a scientist's point of view. Contrariwise, librarians (self included) are appalled at the cavalier way scientists entrust their digital work to random online services with no guarantees of persistence or clear exit strategies. The way to meet in the middle is for libraries and IT to provide the rock-solid, well-curated back-end storage solutions atop which individual labs (among others) can then build or graft the researcher-facing services they need. Islandora is a fine example of this approach.
  • Enterprise-level storage provided by campus IT is vastly too expensive. I heard this many times, and it squares with my own experience as well. If we want scientists to move away from the poorly-managed, un-backed-up, under-the-desk Linux server (and oh, boy, do we!), we have to offer alternatives within their means.
  • The maverick IR manager does not scale. A successful IR needs the entire library, particularly its collection developers, to own it. Rather to my surprise, it seems quite possible that IRs will shortly find that instead of the enormous effort they've had to expend to attract a mere trickle of content, they will be expected to deal with tremendous floods of it. Particularly if they are expected to go out, find, massage into shape, and deposit these materials on behalf of faculty, assigning one person to do this for the entire campus (or worse, consortium of campuses) makes the desired result a manifest impossibility. Collection developers, bibliographers, liaisons, this is to your address! What are your departments producing locally that the IR ought to have? How are you going to make sure it gets there?

There's more to say about several of these points, but isn't this enough of an infodump for the time being?

More like this

Back in the day, Time Warner merged with AOL. It turned out to be one of the worst merger ideas in the history of merger ideas, and I believe the evidence suggests that most mergers actually turn out to be clunkers! AOL was simply at the top of its orbit, nowhere but downhill to go. I wonder, I do…
Again in no particular order, some thoughts and ideas that came to mind during Science Online 2010: I did quite a bit of library advocacy during the conference, and not just during the session dedicated to it! I noticed that I had the best luck when I could define a library service in terms of…
About a month ago The Scientist published an interesting set of interviews with a set of scientists, publishers and LIS faculty on the future of scholarly publishing. They called it Whither Science Publishing? with the subtitle "As we stand on the brink of a new scientific age, how researchers…
Many of my readers will already have seen the Nature special issue on data, data curation, and data sharing. If you haven't, go now and read; it's impossible to overestimate the importance of this issue turning up in such a widely-read venue. I read the opening of "Data sharing: Empty archives"…

most of these points corroborate my feeling that "preservation in situ" (as it was called in a report i can't find anymore) may be the easiest way to go in many cases.

By robert forkel (not verified) on 17 Jan 2010 #permalink