Science Online 2010: Lessons for IRs and data curators

By dsalo on January 17, 2010.

One way and another, I heard quite a lot of talk at Science Online 2010 relevant to the interests of institutional-repository managers and (both would-be and actual) data curators. Some of the lessons learned weren't exactly pleasant, but there's just no substitute for listening to your non-users to find out why they're not taking advantage of what you offer.

In no particular order, here is what I took away:

The take-a-file-give-a-file content model for IRs is much too limited and limiting. Real live scientists are mashing up all sorts of things as they do their work; one wiki-based lab notebook demoed during the Open Notebook Science session contained LaTeX markup, embedded video, embedded images, and an embedded Google Docs spreadsheet. DSpace is utterly helpless faced with this situation. So is Fedora. I suspect EPrints doesn't fare much better, though I could be wrong. Yet science is done this way (among other ways, of course). We have to learn how to collect the results if we are serious about the preservation of the scholarly record.
The butterfly-pinned-to-wall content model for IRs is much too limited and limiting. Science isn't a bas-relief figure on a frieze. It's active. It's changing. It's alive! Rooted in the idea of the fixed, unchanging scholarly article, the IR does not respect the way science is done. This vitiates the IR's usefulness to science.
Content will not come to IRs; therefore, IRs must be prepared to go and get it. Cameron Neylon and I chatted a bit about potential interactions between IRs and (for example) SlideShare. How can Cameron upload a slidedeck to SlideShare and have SlideShare pass it invisibly to his IR? Multiply that by a great many services, on-campus and off-, and you begin to see the scope of the challenge. Honestly, it's not a new problem; it's an offshoot of the same old "scientists won't come to libraries; libraries must go to scientists" problem. It needs solving, though, and the sooner the better.
Restrictive content policies for IRs are counterproductive. The more restrictive, the more counterproductive. A scientist in my session on libraries told me that his IR had rejected diverse materials of his because they weren't peer-reviewed journal articles. How many peer-reviewed journal articles has he submitted to that recalcitrant IR? None, of course. How many does he plan to submit? None, of course. If you smack down someone who approaches you, they'll stop approaching you. Obvious, not so? We must therefore ignore the hardheaded individuals who refuse to participate in IRs because they aren't sure of the quality of the content. Honestly, they probably wouldn't participate anyway. On balance, IRs will do better—certainly win more adherents and supporters—the more open they are to many sorts of content.
As research moves OA, particularly toward gold OA, the IR future is in locally-produced gray literature. To some extent this is a corollary of the previous lesson, but I feel it strongly because it squares with my nearly five years of experience running IRs. The California experiment merging IRs with publishing services is all well and good, but there are plenty of ways to do publishing, and none of them really need IRs. Theses and dissertations need IRs. Working papers and technical reports need IRs. Conference proceedings need IRs. Posters and slidedecks need IRs. Student research needs IRs. Campus history needs IRs. (Grateful hat tip to Bonnie Swoger of SUNY Geneseo and Molly Keener of Wake Forest, who helped me think and talk this through.)
Pay attention to access statistics, and the combinations of statistics from various sources. I particularly enjoyed Peter Binfield's demo of PLoS's new article-level metrics. I believe that this level of granularity of measurement is the future; it will inevitably displace the (feared and loathed) journal impact factor for the simple reason that scientists are (necessarily) egotists, so why would they settle for journal-level metrics when they can evaluate the fate of their very own published corpus? For institutional repositories, the lessons are that statistics are not optional (which we've known for a while, I think) and that an API to statistics on a per-article and per-author level will soon become a requirement as well.
Dividing user-facing interfaces from back-end datastores is a necessity. The IR-as-silo phenomenon is perfectly useless from a scientist's point of view. Contrariwise, librarians (self included) are appalled at the cavalier way scientists entrust their digital work to random online services with no guarantees of persistence or clear exit strategies. The way to meet in the middle is for libraries and IT to provide the rock-solid, well-curated back-end storage solutions atop which individual labs (among others) can then build or graft the researcher-facing services they need. Islandora is a fine example of this approach.
Enterprise-level storage provided by campus IT is vastly too expensive. I heard this many times, and it squares with my own experience as well. If we want scientists to move away from the poorly-managed, un-backed-up, under-the-desk Linux server (and oh, boy, do we!), we have to offer alternatives within their means.
The maverick IR manager does not scale. A successful IR needs the entire library, particularly its collection developers, to own it. Rather to my surprise, it seems quite possible that IRs will shortly find that instead of the enormous effort they've had to expend to attract a mere trickle of content, they will be expected to deal with tremendous floods of it. Particularly if they are expected to go out, find, massage into shape, and deposit these materials on behalf of faculty, assigning one person to do this for the entire campus (or worse, consortium of campuses) makes the desired result a manifest impossibility. Collection developers, bibliographers, liaisons, this is to your address! What are your departments producing locally that the IR ought to have? How are you going to make sure it gets there?

There's more to say about several of these points, but isn't this enough of an infodump for the time being?

More like this

most of these points corroborate my feeling that "preservation in situ" (as it was called in a report i can't find anymore) may be the easiest way to go in many cases.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

We're moving!

August 3, 2010

Looking for us? We're happy to say that we're part of the new Scientopia blogging collective. Come see us there!

Belated Zombie Day post

July 13, 2010

Oh, if I'd only had this picture for Zombie Day... Credit for the photo to UK Serials Group. Credit for the alteration of the speech bubble (you can see the original slide here if you care to) to Steve Lawson. Incidentally, I should have a postprint of an article based on this presentation up…

Promoting a comment: "Open and shared format"

July 8, 2010

Richard Wallis has taken my ribbing in good part, which I appreciate; his response is here and will reward your perusal. He also left a comment here, part of which I will make bold to reproduce: As to RDF underpinning the Linked Data Web - it is only as necessary as HTML was to the growth of the…

Small fry, blogging networks, and reputation

July 8, 2010

So, the PepsiCo blog thing. Right. Advance disclaimer: this is me talking, not either of my illustrious co-bloggers. We have not yet made a decision about what to do; one co-blogger is across the pond at a conference and the other is vacationing, so that discussion will have to wait a bit. This is…

I'd love to dance with you, but...

July 6, 2010

Richard Wallis of Talis (a library-systems vendor) posted The Data Publishing Three-Step to the Talis blog recently. My reaction to this particular brand of reductionism is… shall we say, impolitic. I just want to pat Richard on the head and croon "Who's the clever boy, then? You are! Yes, you are…