Beliefs, Knowledge, Articles, Databases.

By jwilbanks on January 5, 2009.

I've been working on some text for a series of papers lately. I'm writing the core of a book proposal and working through the ideas around the knowledge web and the knowledge economy, and thought I'd post some interim thoughts here.

Knowledge is a funny thing. Philosophers have spent eons debating it. I'm not going to figure it out here - in fact, the conclusion that I wasn't going to figure it out played a big role in my choosing not to go to graduate school. But on the web, we have these things that are kind-of-knowledge. Databases. Journal articles. Web pages. Ontologies.

Taken together, these things are somewhere in the epistemological chain. But the act of digitizing them does some strange things...they start to form an observable, computable network, a knowledge web of sorts. And in a knowledge web, we have to understand a important conceptual transformation that knowledge itself needs to be treated as something similar to software, something upon which computing happens and depends - and the implications of that transformation.

The great revolutions of the internet, the web, and free software were all predicated on access to sources and standards - a mix of technical and legal access. The internet didn't really have to deal with the law, as TCP/IP didn't really affect copyright. The web ignored copyright from a legal perspective, but actively encouraged viewing and copying from a technical perspective. Free software embedded legal freedoms inside the technical access concept.

But knowledge is different, as the vast majority of the canon is already embedded in creative works protected by copyrights. Thus, we have to unlock some content if we're going to reformat it into something that can in turn be treated as an interim step along the way to knowledge, and then used as cyberinfrastructure. This is why Open Access is so crucial. Whatever knowledge is, a lot of it is locked behind paywalls, copyright licenses, or trapped in lousy formats from a machine perspective.

But - if we have access - if we can take the individual facts described in papers and turn them into modelable knowledge, or at least precursors to knowledge, we convert those facts into infrastructure for construction into something bigger, for composition into structures that software can use.

This transformation is already under way in the life sciences. Most of the valuable CI data in the life sciences has been hand-curated out of journal articles into more structured sources like the Kyoto Encyclopedia of Genes and Genomes, or the Human Protein Reference Database, or the Information Hyperlinked Over Proteins, and on and on.

This needs to be accelerated and industrialized, as the human-readable paper is the least valuable format of knowledge from a cyberinfrastructure/CI perspective. But this requires an understanding of access to the knowledge canon as a fundamental lever of CI construction in a knowledge web. Unfortunately most of these databases tend to have copyright or contractual restrictions that make it impossible to build on them as infrastructure (particularly non-commercial restrictions or restrictions on redistribution in federated or integrated knowledgebases). That's why open access to databases is essential as well.

We are lucky to have vast amounts of public domain databases that are, from a CI perspective, un-networked. The scientist needs to open a dozen or more tabs in a browser and use her own mind to integrate the results. That's lousy. But it's a natural outcome of the web not integrating databases the way it integrates documents, and at least the legal terms let us start to integrate.

There are competing philosophies about how to deal with the integration - I follow the one that believes again that software is the right metaphor for dealing with knowledge integration, and that data integration is a plausible first place to start working on knowledge integration. It's certainly better than getting stuck in an infinite loop arguing about what knowledge is. It's a funny place to come back into ontological realism after nearly 20 years away from the academy but this approach indeed demands a certain amount...because you're dealing with database records that need to be reconciled, not ideas like "gene" and so on, and if you're going to write code about them realism helps. But I digress. Back to software integration.

The way we integrate software in free software is via the *distribution* - a community using a standard set of kernel interfaces to knit together multiple software packages. This is a model for data integration, and the SC Neurocommons project is the first one that I know of - released in October 2008 - and we're already seeing some encouraging early returns (I love the version that a user installed on the Amazon cloud). The idea is to let users who like our modeling and ontological work simply expose a version of their database using our standards, and then any user or community that wants to add that database to the distribution can do so with minimal effort, just like adding a new software package to a linux distribution.

Note that we are assuming from the beginning that everyone has a different idea of knowledge - people will disagree with our models, and we've pre-emptively guaranteed the right to "fork" knowledge like software so that each community can craft its own solution based on our kernels.

This is all a way of trying to leverage techniques we've seen work in the service of complex systems creation by distributed inputs. It might work - I hope it does. It might also be an evolutionary step along the path. But clearly we need some evolution away from the human-readable paper and the standalone database as containers for the things we know, or the things we believe. The information space is simply too big for any one brain to process any more, and Google simply isn't as efficient for science as it is for culture...

More like this

China Flash Drives | Memory Sticks Wholesale
http://www.tradestead.com/wholesale/flash-drives-memory-sticks/
Flash drive is a handy data storage device that uses flash memory. The small size and light weight result in great portability. Furthermore, the USB 2.0 connector assures high-speed data transmission. It is a nice compromise between compact design and satisfying performance. TRADESTEAD offers you the best selection of products which are the best solution for your data storage and transportation needs. Plus, all these units fall into the innovative category, speaking of the aesthetic aspect.
Related Categories: Memory Card Readers Memory Card USB Flash Drives Digital Photo Frames Digital Cameras Solar Charger

China Memory Cards Wholesale
http://www.tradestead.com/wholesale/memory-cards/

Memory card is a solid data storage device used for expansion capacity of mobile phones, digital cameras, music players, and other electronics. TRADESTEAD offers a wide range of memory cards, including CFII Cards, Micro SD (TF) Cards, Mini SD Cards, MMC Mobile Cards, and SD Cards. Here you can buy these memory cards direct from China at extremely competitive ex-factory prices.

Related Categories: Memory Card Readers Memory Sticks MP4 Player Car Monitors Digital Cameras Mobile Phone Digital Photo Frames

If you are willing to buy a house, you will have to receive the loan. Furthermore, my sister always uses a collateral loan, which supposes to be really rapid.

John: Thx. for thinking deep thoughts and trying to figure this stuff out. I'm so far behind the curve I only understood about five per cent of what you said and are trying to accomplish, but thought you deserved an appreciative comment nevertheless. I'm guessing by your remarks that you don't believe the "Google" model will work for science but wouldn't Google be a good place from which to start or to seek help for what you are trying to do? They've already made their billions and they seem like thay would do anything to help advance science. Try to get them on board before they become conservatives and focus all their energies on reducing their taxes. Thanks again for all your hard work on this.

Two very quick comments:

I presume you are familiar with the UCSC Genome Browser and how it links in others' data. (There are earlier predecessors of the same general thing, e.g. SRS.) These seem to be in the general direction that you are talking about.

One thing I tried (weakly!) to push years ago was the idea of a database collating a collection of experts on gene or protein families. (There is a more recent effort along these lines out there, but I've darn well forgotten what it's called, and I don't have time to look it up. Where's you memory when you need it...) The reason I bring this up, is the people who study families, as opposed to single members of the family, are doing comparative work and so are generally better placed to create a reference framework for the different members of the family, etc. I think empowering people with this interest would help.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

World Opera, Collaborative Science, and Getting On The One

March 3, 2011

(blows off the dust since the last entry) (Life trumped blogging; my first child was born in March) Just before I went into the parent tunnel, which is awesome by the by, I attended a seminar conducted by Niels Windfeld Lund, General Manager of the World Opera. Not my usual event. But music's…

Documents and Data...

September 10, 2010

Last month I was on Dr. Kiki's Science Hour. Besides being a lot of fun (despite my technical problems, which were part of my recent move to GNU/Linux and away from Mac!), I also discovered that at least one person I went to high school with is a fan of Dr. Kiki, because he told everyone about the…

Marking and Tagging the Public Domain

August 11, 2010

I am cribbing significant amounts of this post from a Creative Commons blogpost about tagging the public domain. Attribution is to Diane Peters for the stuff I've incorporated :-) The big news is that, 18 months since we launched CC0 1.0, our public domain waiver that allows rights holders to place…

rdf:about="Shakespeare"

July 11, 2010

Dorothea has written a typically good post challenging the role of RDF in the linked data web, and in particular, its necessity as a common data format. I was struck by how many of her analyses were spot on, though my conclusions are different from hers. But she nails it when she says: First, HTML…

Of Pepsi and ScienceBlogs...

July 7, 2010

I've gotten a few emails about the Pepsi-ScienceBlogs tempest. It's clearly taken a toll on ScienceBlogs' credibility. Some of my SciBlings have resigned in protest, and others are taking shots on the topic. Sponsorship is part of scientific publishing, even in the peer reviewed world. Remember how…