It's been a while since I did anything on my series about library ways of knowing. If you'd like to refresh your memory: The classical librarian The humble index Classification Today I'll finish my discussion of classification, and distinguish it from subject analysis, since that distinction often seems to confuse, especially in our digital age. So if we'll recall, the goal we set for ourselves was to collocate physical books on shelves in such fashion that their arrangement would be useful to information-seekers. With most non-fiction, that means collocation by subject, by what the books are…
My del.icio.us tag overfloweth… A challenge to libraries from an information science professor: "I wish I could say that libraries were the obvious organization to take care of data… But… they have not been ambitious, they lack the subject area knowledge, they often lack the technical skills." What say ye, librarian Trogoolies? Cross-disciplinary use of data shines in this account of the decline of the Maya. "Space technology is revolutionizing archeology." Who would have guessed it? On the tools front, take a look at the Tranche Project, aimed at securely sharing datasets among researchers.…
I have intentionally steered Book of Trogool away from open access. I still believe in it; I still work for it. Toward the waning days of Caveat Lector, however, it became clear that I was shedding more heat than light on the subject, so I made a conscious decision not to repeat that mistake here. This is, however, Open Access Week. I would feel rather churlish about ignoring that, especially since I was speaking yesterday for the occasion. What I'll do, then, is try to set a radical example I wish others in the open-access movement would follow: I'm going to celebrate a librarian. Her name…
I've lived all my short career in academic libraries thus far on the new-service frontier. In so doing, I've looked around and learned a bit about how academic libraries, research libraries in particular, tend to manage new services. With apologies to all the botanists I am about to offend by massacring their specialty, here is my metaphor for the two main courses of action I see: grafting the new service on like an apple branch to a crab-tree, or hybridizing the new service with existing services, thus changing the library from the ground up. Each approach works in some situations, it seems…
I'm still buried in translating a presentation into Spanish for Monday and finishing another in English for Wednesday, but here's a small thought to tide folks over, a thought that came to me shortly before my presentation at Access. At the data-curation workshops I've been to, it has been axiomatic that "we can't afford to keep it all." Some fairly sophisticated judgment rubrics have been worked up, often based on the same kinds of judgment calls that special-collections librarians and archivists make when presented with collection opportunities. Is this dataset unique, or could it be…
If you've been having trouble commenting, you're not alone—the comment form quit working for me a couple days ago. I wrote in to Erin, and from where I'm sitting, the problem has been fixed. If you're not getting comment-form love, email me at dorothea.salo at gmail and I'll see what I can do. Speaking of comments: I am despotic about them, I'm afraid. If I suspect you're a spammer, or if I'm sure you're a timewaster, your comment will silently disappear. I don't expect to have to pull the trigger often (even spam levels around here have been muted), but a warning is only fair.
I will be speaking for UKSG's conference next April. They haven't given me a topic… but they want a talk title by the end of this month. I have to write a paper alongside the talk, and I hate writing papers with every last fiber of my being, so if I have to do it, I want to make it count for something. Anybody got any suggestions? What should I write and talk about, that the UKSG audience needs to hear? If you think you're among the intended audience, what would you want to hear or read from me?
Roy Tennant sent me an email about my Access presentation in which he asked what libraries should do about the laundry-list of data-curation challenges I presented. (If you're curious, you can go view the presentation yourself, courtesy of the wonderful A/V folk at Access. The less-than-an-hour-long way to assimilate the same information is to look over slides plus talk notes on SlideShare.) That's an eminently fair criticism. I've been thinking about it since receiving the email. I think the answer for libraries is to set their own digital houses in order first thing. After all, how can we…
I am back from Access and feeling wonderful… and wonderfully exhausted. Data and its care and feeding were the dominant themes at the conference. I strongly recommend reading the session summaries at Pete Zimmerman's blog. It's hard to pick star sessions out of so very many good ones, but the Leggott, Hartman/Phillips, Turkel, and Sadler presentations assuredly will repay attention. I hate to say it, but blogging may continue to suffer somewhat. I have a Web 2.0 talk this Friday, a Wisconsin Library Association talk on the 21st, and I'll be doing a remote presentation for Open Access Week…
There's quite a bit going on at Access 2009 that's data-related in one way or another. Delegate Pete Zimmerman is taking excellent notes at his blog. The Twitter hashtag is #access2009pei. I'm up right after lunch. There may or may not be a live video stream; if not, there will be canned video later. In the meantime, check out the communal Lego table in real-time! I know I'm behind on answering comments. I'll try to catch up sometime today… but not until after my talk.
A comment Chris Rusbridge left on a previous post leads me to clarify the extent to which the subject matter of this blog draws on my own position in the institution where I work, and that institution's take on matters data-curational. In brief: It doesn't. I don't talk about my place of work here, and I have no plans to start doing so. I have no data-curation or other cyberinfrastructure responsibilities at my workplace save those that happen to touch on my position as institutional-repository manager. The day I acquire such responsibilities, which is not wholly impossible but by no means a…
One of the problems practically every nascent data-curation effort will have to deal with is what serials librarians call the backfile, though the rest of us use the blunter word backlog. There's a lot of digital data (let's not even think about the analog for now) from old projects hanging around institutions. My institution. Your institution. Any institution. There may be wonderful data in there, but chances are they're in terrible condition: disorganized, poorly described if described at all, on perishable (and very possibly perished) physical media. This pile of mostly-undifferentiated…
Sometimes it's worthwhile to let my "toblog" folder on del.icio.us marinate a bit. Posts I recently ran across on two different blogs illuminate the same point so well that they deserve their own post here! Off the Map offers Huffman's Three Principles for Data Sharing, which are really principles for data-collection and -display applications: Create immediate value for anyone contributing data. Make contributors' data available back to them with improvements. (emphasis mine) [Urge users to] share derivative works back with the data-sharing community. Absolutely. These three principles boil…
I know I said I'd be neglecting the place for a bit… but I still feel bad about that! Here's what I've been working on. I'm afraid this is sort of the Cliff's Notes version, but at least it looks pretty? Grab a bucket! It's raining data! If you're coming to Access 2009 next week, you'll see the full version, which should make a bit more sense.
In many of the data-curation talks and discussions I've attended, a distinction has been drawn between Big Science and small science, the latter sometimes being lumped with humanities research. I'm not sure this distinction completely holds up in practice—are the quantitative social sciences Big or small? what about medicine?—but there's definitely food for thought there. Big Science produces big, basically homogeneous data from single research projects, on the order of terabytes in short timeframes. For Big Data, building enough reliable storage is a big deal; it's hard to even look at the…
I commented here earlier, not without frustration, about a pair of researchers who built and abandoned a disciplinary repository. I was particularly annoyed that they seemed to have done this purely for self-aggrandizement, apparently feeling no particular attachment to the resulting repository. Such as they should not open repositories. Neither they nor any service they offer is trustworthy. I hope that's uncontroversial. Unfortunately, even vastly better intentions than that don't guarantee the sustainability of the result, even in the short term. The Mana'o anthropology repository, started…
The Book of Trogool turns another page... Social scientists and medical researchers, pay attention to this: "Anonymized" data really isn't—and here's why not. If informaticists aren't starting to run similar analyses on their own "anonymized" data, they should be. This is a serious concern. One for the humanists: the rather vaguely-named Scholarly Communication Institute Report from Virginia. The theme was using spatial data in the humanities. From my SciBling Christina: Anybody can code… but should you? Peer review is for more than published papers. Holding your code close to your chest…
When I was but grasshopper-knee tall, my father the anthropologist took me to his university's library to help him locate and photocopy articles in his area of study for his files. He had two or three file cabinets full of such copies. (He may still.) I have similar file cabinets, two of them: my del.icio.us account and my Zotero library. The del.icio.us account consists merely of links. The Zotero library, on the other hand, includes the actual digital object(s) as often as I can manage it (even at a major research university like MPOW, I cannot always lay eyes on everything I want to read…
Many doctoral institutions now accept and archive (or are planning to accept and archive) theses and dissertations electronically. Virginia Tech pioneered this quite some time ago, and it has caught on slowly but steadily for reasons of cost, convenience, access, and necessity. Necessity? Afraid so. Some theses and dissertations are honest digital artifacts, unable to be faithfully represented in ink on paper or in other analog fashion. Others might be flattened into analog, but that wouldn't be their (or their author's) preference. Still others contain digital artifacts of various sorts.…
I wanted to call attention to this event at Harvard, which will be webcast live next Friday at 12:15 Central. The difficulties in combining data and information from distributed sources, the multi-disciplinary nature of research and collaboration, and the need to move to present researchers with tooling that enable them to express what they want to do rather than how to do it highlight the need for an ecosystem of Semantic Computing technologies. Such technologies will further facilitate information sharing and discovery, will enable reasoning over information, and will allow us to start…