A Truly "Open" Concept Web Definition

I was invited to join a meeting last week in New York to kick off something called the "Concept Web Alliance." It's an emerging non profit hoping to stimulate the emergence of lots and lots of marked-up content from the life sciences and it's claiming the mantle of open access. The potential value of a concept web is essentially the same idea as the semantic web, but with a little more savvy about branding - we can have a computable web of data linked into the literature, and a better way of asking very precise questions of a massively complex data space if the information has more structure.

The CWA is being driven by the folks behind Knewco, a company that has been selling a technology that generates concepts out of content. I've known their CEO Jan Velterop for a while and consider him a friend, but I stayed away from the Knewco technology for a long time because of what I felt was a closed model that leveraged patents and asserted copyrights over the concepts generated. Neither of those are consonant with my own beliefs about concepts and webs....

So now we have the CWA, which is the Knewco folks taking their ideas and trying to build a more community based approach to the concept web. I like the idea a lot, and it was a fascinating meeting. And I decided at the end of the day to sign the declaration behind the CWA. It's a good declaration - it's kind of hard to go wrong with the core ideas of cooperation and coordination of methods and infrastructure. I believe in supporting efforts to get everyone onto common names. And we definitely should be supporting browser development (though after 10+ years in the space, I've come around to thinking that queries are the key, not browsing, but you'll have to write your queries somewhere).

There are some critical points to consider, however, as this effort moves forward.

The Web wasn't built by a company. Nor was the internet. Open systems get built by lots of entities, for and non profit, and the public-ness of them is the very source of their success. They get more valuable as a function of being open. And making them half-open doesn't work - or scale. Copyrights and patents - even in "non commercial" formats - would have choked the web and the internet in their cribs.

The CWA says it wants to build a web. Well, we've got one already (teh Web!) and we know how it evolved. It was unpatented. It was technically open from the very start. And it aggressively enabled widespread copying via the view source technology. The CWA needs to follow the road that Tim Berners-Lee set for the WWW.

So, I have a four-point definition of "open" I want to propose for the CWA. I'm working from open access and open source software definitions as a guide here: they ban non-commercial restrictions. The W3C process gives us another guide: none of the parties driving the standards can use their patents against people practicing the standards. And we need to use standards in terms of technology and persistent web names.

Thus, the definition I propose is an amalgam of what has worked before, legally and technically.

1. The foundation of the CWA is going to be a database of semantic concepts - "triples" - if the CWA is going to be calling itself open, that database has to be in the public domain. Period. No non-commercial licensing allowed on that database - we have to be able to endlessly remix and recombine. This can be done through either CC0 or through the Public Domain Dedication and License.

Unlike the Web and the internet, we don't have the option of ignoring the copyright law here. We have to decide, right now, if we want to build a concept web on a very stretched interpretation of copyright. The idea that a semantically encoded triple derived from someone else's content (data, literature, you name it) is a "creative work" deserving of life + 70 years of lockdown monopoly rights doesn't seem to fit the reality of the triples. "A causes B" isn't Shakespeare. Calling it creative work in order to get a copyright is a siren's song that appeals to publishers, but it's a dead end if we want to make a web out of trillions of triples. Build the Web, and add value through trusted services and trademarks and brand loyalty. That only scales through the public domain approach.

2. No one involved can use a patent against the Web or the Web's users - to do so is to be a fox in the hen-house. This means that anyone involved will need to issue patent licenses to the public to make the public domain infrastructure accessible to go along with the public domain content. This can be accomplished by having anyone involved in the org sign onto the W3C patent process. This is absolutely essential to protect the ability of the web to scale.

3. Use RDF/OWL. There's an emerging public domain of content around RDF/OWL in the life sciences and we need to interoperate with it.

4. Use common persistent web names. The Shared Names Project should be the guideline for the CWA. Private web addresses will be a tempting way to monetize the concept web, but again, the paradox is that you have to give everything away if you want to make the web scale enough that you can make money on services. At every point, open has to be embraced as a strategy in the early days of web-building.

I think the CWA has the opportunity to be a tremendous public good. It could be great. But the decisions about how to run the CWA and what to distribute are going to be essential. The devil is always in the details, and the temptation to be half-open is always going to be strong. We have to resist it.

Remember Moglen's metaphorical corollaries to Faraday's and Ohm's Laws (see the end of part I for the metaphor). We need to ask of any network, what is the resistance in the wire when it comes to sharing and collaboration, and we can measure that resistance by the field strength of the intellectual property regime applied to the network. My first two points address this. My second two points recognize that divergent use of technology and names is another part of the resistance in the wire when we're talking about concepts.

I invite comments from all parties on this. I moderate comments here for spam purposes, but I'll publish all non spam comments unless they get abusive...

More like this

As part of the series of posts reflecting on the move of Science Commons to Creative Commons HQ, I'm writing today on Open Data. I was inspired to start the series with open data by the remarkable contribution, by GSK, to the public domain of more than 13,000 compounds known to be active against…
As part of the series of posts reflecting on the move of Science Commons to Creative Commons HQ, I'm writing today on Open Data. I was inspired to start the series with open data by the remarkable contribution, by GSK, to the public domain of more than 13,000 compounds known to be active against…
I wrote this up on the request of a colleague who heard my talk recently on open data. I'm posting it here for comment and adding some hyperlinks... Moving from a Web of documents to a Web of data (or of Linked Open Data) is an oft-cited goal in the sciences. The Web of data would allow us to link…
I've been working on some text for a series of papers lately. I'm writing the core of a book proposal and working through the ideas around the knowledge web and the knowledge economy, and thought I'd post some interim thoughts here. Knowledge is a funny thing. Philosophers have spent eons debating…