U.S. Library of Congress to archive Twitter

From Twitter, here's the announcement:

Have you ever sent out a "tweet" on the popular Twitter social media service? Congratulations: Your 140 characters or less will now be housed in the Library of Congress.

That's right. Every public tweet, ever, since Twitter's inception in March 2006, will be archived digitally at the Library of Congress. That's a LOT of tweets, by the way: Twitter processes more than 50 million tweets every day, with the total numbering in the billions.

We thought it fitting to give the initial heads-up to the Twitter community itself via our own feed @librarycongress. (By the way, out of sheer coincidence, the announcement comes on the same day our own number of feed-followers has surpassed 50,000. I love serendipity!)

We will also be putting out a press release later with even more details and quotes. Expect to see an emphasis on the scholarly and research implications of the acquisition. I'm no Ph.D., but it boggles my mind to think what we might be able to learn about ourselves and the world around us from this wealth of data. And I'm certain we'll learn things that none of us now can possibly conceive.

Just a few examples of important tweets in the past few years include the first-ever tweet from Twitter co-founder Jack Dorsey (http://twitter.com/jack/status/20), President Obama's tweet about winning the 2008 election (http://twitter.com/barackobama/status/992176676), and a set of two tweets from a photojournalist who was arrested in Egypt and then freed because of a series of events set into motion by his use of Twitter (http://twitter.com/jamesbuck/status/786571964) and (http://twitter.com/jamesbuck/status/787167620).

Twitter plans to make its own announcement today on its blog from "Chirp," the Official Twitter Developer Conference, in San Francisco.

So if you think the Library of Congress is "just books," think of this: The Library has been collecting materials from the web since it began harvesting congressional and presidential campaign websites in 2000. Today we hold more than 167 terabytes of web-based information, including legal blogs, websites of candidates for national office, and websites of Members of Congress.

We also operate the National Digital Information Infrastructure and Preservation Program www.digitalpreservation.gov, which is pursuing a national strategy to collect, preserve and make available significant digital content, especially information that is created in digital form only, for current and future generations.

In other words, if you want a place where important historical information in digital form should be preserved for the long haul, we're it!

Needless to say, this is a pretty incredible announcement. It's great that a major public institution can step forward and do the kind of digital preservation job that only that kind of institution would be capable of.

It would be really great if their next step could be a similar archiving project for, say, Blogger or Wordpress blogs. Or perhaps other big national libraries around the world could each pick a site and dedicate themselves to preserving their content for future generations.

More like this

Well, NoAstronomer, that's debatable. Twitter may be largely trivial nonsense but it also has a lot of very valuable information, comment and debate. You can see what people are talking about and time it to the second in real time, so it's great to track, for example, reactions to breaking news.

What I had for breakfast? Not so much, but I presume people will just ignore the trivia.

NoAstronomer- except maybe twittering itself?

Actually, I've already seen comments that suggest that there is a certain U.S. imperialism to the idea that Library of Congress should think they are the appropriate body to archive what is essentially a corpus of global communications. I hope that the LoC will put out an informative announcement soon; the blog entry was light on factual information and strategic aims while heavy on enthusiasm and "ain't we cool".

By Jill O'Neill (not verified) on 15 Apr 2010 #permalink

I really hate this idea. I realize that when you post something on the net, it isn't private, but I don't really want the Library of Congress archiving my activity on the internet.

I think this is a great injustice and a slap in the face to people's privacy rights.

Ethan,

You give up your right to privacy when you make a public announcement via twitter. I don't see how you can call this an infringement on privacy rights.

Want privacy, don't make public announcements. You can't have it both ways.

By Andy Latham (not verified) on 15 Apr 2010 #permalink

Jill O'Neill: "there is a certain U.S. imperialism to the idea that Library of Congress should think they are the appropriate body to archive what is essentially a corpus of global communications" I would be one of the first to accuse the US of inappropriately "taking over" but I would have to say to this that I'm glad some institution is doing it. There is no global organization that would or could do it so it would have to be some specific country. And what other country would have the resources or the motivation to do it?

Ethan Sigel: "I don't really want the Library of Congress archiving my activity on the internet" And that's not what they're doing. They're recording the results of some small portion of your activities. But that's what archiving cultural artifacts is all about. You are part of human society and affect it in some small way and therefore your life is being recorded in an indirect way. Yes, this is on a higher resolution and gathers in more individual contributions but that's because we're all contributing more individually. If you don't want to be "recorded" in this way, don't contribute.