I pointed out Mike Lesk's slideshow in my last tidbits post, finding it a good critical précis of the data problem. It's pleasantly aware of human problems, human problems many treatments of cyberinfrastructure (including, unfortunately, this otherwise useful call to action from Educause) wholly ignore.
So wince and flinch at the design (black Arial on white? really? in 2009?), but read the slideshow anyway.
I do want to pick apart the slide from which I took the title of this post. I reproduce the said slide's text in full:
Can we just give the problem to the libraries?
As a professor in a library school, I wish I could say that libraries were the obvious organization to take care of data. They understand keeping things for a long time and arranging to find them later. It would be a sensible new activity to balance a decrease in foot traffic into book collections. But...
- They have not been ambitious in this area; libraries feel under budget pressure and don't want new tasks.
- They lack the subject area knowledge to deal with complex data sets in scientific areas.
- They often lack the technical skills for advanced data handling.
I have no quarrel whatever with Lesk's first point. Libraries have absolutely been timid about this, and they still are—not without reason, either! This, to me, is the buck-stopper, the Berlin Wall, the concrete bollards. If library administrators shy away from this, or give it lip-service only, Lesk is right and there's nothing to be done. It won't matter how many librarians are ready and willing to do this work, if they're not allowed to or not given sufficient resources and authority to.
How likely is this outcome? In my estimation, more likely than not. My estimation is admittedly colored by this being very early days yet, but as I've remarked before, the longer any interested group dithers, the more likely it is that the action will be elsewhere. The more the action moves away from libraries, the more likely library administrators are to breathe a quiet sigh of relief and turn away from the problem altogether.
So what is a librarian who wants this work to do? Well, one answer is to keep an eye on discipline-specific projects, those that are larger than any single institution, the up-and-coming ICPSRs and Sloan Digital Sky Surveys. For those interested in data curation inside an institution, I think the answer may well be to learn enough to insinuate oneself onto research teams directly through their in-house IT arms. I may revisit this answer later; in-house IT is starting to become just cost-ineffective enough that some recentralization may happen. In that case, the would-be data curator has more options. Either way, though—the wise data curator does not attach himself limpetlike to the library. The action may well be elsewhere.
What is a researcher or funding agency or think-tank that wants libraries to take on this work to do? Researchers need to ask. Nothing gets library priority so fast as a well-articulated request from faculty; that goes double in disciplines where physical library spaces are waning in importance. Agencies and think-tanks: I'd recommend being an awful lot clearer about what the services provided look like and how they need to be staffed. Laundry-lists of skills are useless without an estimate of FTE and budget; such an estimate is noticeably lacking in every single discussion of this problem I've ever read.
I half-agree, half-disagree with Lesk's second point. There's a lot of disciplinary knowledge in academic librarianship. We don't select books blindly! We do it by taking heed of what our local researchers are doing. Many selectors and liaisons assigned to particular disciplines have degrees, sometimes advanced degrees, in that discipline. In the social sciences, by the way, data librarians with appropriate disciplinary knowledge already exist.
The problem isn't the non-existence of disciplinary knowledge; it's the uneven spread of it. For any given discipline at a research university, I'd guess it's a better-than-even bet that the library has a librarian somewhere with appropriate disciplinary expertise—but it's not a certainty.
Of course, there's also a question of how much disciplinary expertise is actually necessary for this work. Diane Hillmann remarked to me at ALA this summer that "[researchers] all think they're special snowflakes," but in her experience the basic sustainability questions don't differ all that much from dataset to dataset. That's what I think, too, with the added wrinkle that disciplinary specialists may actually be too close to their data to have a good read on how others will want to use and query it. An outsider perspective may well be useful!
(The real problem is one of first impressions and secret handshakes, as my SciBling Christina adroitly points out in the context of reference interviews.)
I could very nearly recycle the answers I just gave for Dr. Lesk's second question for his third. In aggregate, research libraries have quite a lot of technology expertise. How much any given library has isn't predictable, and may well not be sufficient.
If we cross the answer to the second question with the answer to the third, we approach the real conundrum: sufficient disciplinary expertise and sufficient technical expertise tend not to coexist within the same librarian. Take me, for example: if it's textual or linguistic data, I'm your librarian—that's my educational background! I can apply common sense and well-honed data-management expertise to numeric or instrument data, but I can't apply disciplinary knowledge because I don't have it. Selectors and liaisons, conversely, likely understand quite a lot about local research in the disciplines they serve, but they mostly don't sling Python and XSLT, nor do they tend to have the digital-preservation knowhow that I do.
John Saylor of Cornell gave what I believe to be the appropriate answer to this problem in his talk at ALA Annual: a technical team dedicated to data needs to work with librarians who have disciplinary expertise in order to solve problems. The disciplinary coverage achievable with this staffing model won't reach 100%, but it'll get as close as seems feasible. Nota bene: without broad participation by disciplinary specialists across the library, a data-curation service suffers and may well fail!
Lesk's objections are serious, pertinent, and pointed. They are not, I believe, unanswerable, but answering them will take considerable vision and will on the part of research-library administrators. Time will tell.
I was at Mike's talk; it was prefaced by a statement something like "I'm going to say increasingly outrageous things until somebody argues": I guess that's context that can't come across in the slides. BTW no-one argued,,, but he is persuasively and entertainingly outrageous!
That said: I agree with all 3 points, with the caveat: right now. Many libraries have been distressingly unambitious, and attempts to goad them into action have often failed. Domain knowledge sufficient for book selection is totally inadequate for data curation, and missing that understanding is a game-over situation. And the technical skills may be adequate in some circumstances, but often are totally little league.
Libraries have some huge advantages that absolutely no other player has, if they can mobilise them. These include mission centrality, budget, and what I call momentum (for want of a better term). Some of these are perishable and non-renewable, so don;t waste them!
Mission centrality: I still see two-way commitment, from faculty to their library and from library to their faculty. Lots of factors are hard at work eroding this, but to varying degrees it still seems to exist. A credible plan can be believed!
No other organisation on campus that I can see has the sort of budget that libraries have to bear on the information management issue. That budget can be deployed many ways, It's hard to get more, so those who believe that data represent the way forward will (and can, with difficulty) make their leadership choices and make a serious play in ways that no-one else can.
Momentum: there's history and track record here. You've been doing information management for n years, where n is roughly the same as the age of the institution (sometimes longer, as in Edinburgh's case). No-one else has anything like the same track record. This doesn't mean an automatic right for the data business, but it does mean that a credible library can make a more credible case to manage data. I've heard several non-credible cases from librarians (and others). But it seems to me that credible cases are more feasible from the library than any other organisation.
So, bookers of trogool, don't lose heart, grab the iron while it's hot but strike with well-thought blows (or some slightly less-mixed metaphor!). Your data hour may be near!
I still remain to be convinced about the disciplinary knowledge question, I admit. I think there's both underestimation of librarian skill and overestimation of the knowledge that must be supplied by the data curator going on. However, I agree to disagree on that one; the proof of the pudding...
You're right about the technical skills, I hate to say. I hazard that I am just good enough -- barely, and I have distressing weak spots -- but people of even my modest technical ability are not exactly common in libraryland.
I see some joint library/campus-IT data-curation plays happening. I am watching them with interest. There might be a happy medium there somewhere.
I like your idea of "credible" and may well steal it (with credit). What does it take to CREDIBLY make the claim to be a data-curation organization? Conversely, what should researchers do about an organization that is making incredible (un-credible? no, that's not right) claims?