Promoting a comment: "Open and shared format"

By dsalo on July 8, 2010.

Richard Wallis has taken my ribbing in good part, which I appreciate; his response is here and will reward your perusal.

He also left a comment here, part of which I will make bold to reproduce:

As to RDF underpinning the Linked Data Web - it is only as necessary as HTML was to the growth of the Web itself. Documents were being posted on the Internet in all sorts of formats well before Tim Berners-Lee introduced us to the open and shared HTML format which facilitated the exponential growth of the Web. Some of the above comments are very reminiscent of the "why do I need to use HTML" discussions from the mid 1990's.

It is an open and shared format, such as RDF, that will power the exponential growth of the Linked Data web, but the conversations around it are still at the equivalent of 1995 stage.

If I read this right, Richard is not actually saying that the web is all HTML and therefore HTML is Good and All Web Things Must Be HTML. That's good, because that would be a silly thing to say. The web I use has plenty of CSS and Javascript and XML and JSON and JPEGs and PNGs and Flash (gah) and PDF (double gah) and other stuff on it.

What Richard is saying (again, as I read it) is more subtle: widespread growth of the data web requires an open standard to cut through the Babel of competing and closed formats the same way that HTML cut through the Babel of document formats, because without that interoperability is too much effort and so no one realizes the benefits.

Richard is welcome to check my understanding; I may have this completely wrong. Nonetheless, I don't believe a word of it, and I especially don't believe it if RDF is the HTML analogue (which, let's be clear, Richard very carefully did not say). Here's why I don't.

First, HTML was hardly the only part of the web stack necessary to its explosion. TCP/IP, anyone? Moreover, HTML by itself is obviously insufficient as the driver of that explosion, or we'd all still be on Gopher (remember Gopher?). Formatted strings of words are not all we monkeys interact with. Neither are assertions, about documents or anything else. (The whole thing about "not all data are assertions" seems to escape some of the die-hardiest RDF devotees. I keep telling them to express Hamlet in RDF and then we can talk.)

Second, I don't know that we need to rely on a single data format for interoperability. It's not impossible, but remains to be proven. The data web that I personally think is more likely closely resembles today's mashup and microformats cultures: lots of formats with suitable documentation (one hopes) and APIs, available for use by whoever's willing to suss out how the various datasets work and write code to glue them together. It's a rough-and-ready sort of interoperability, arguably an inefficient one, but eppur si muove, as Galileo did not say of the web.

Third, I'm not entirely convinced we need to rely on interoperability and its network effects as our incentive toward data-sharing. Tim BL certainly did; there wasn't much technical precedent for what he was up to. But we have the web already, a cogent argument if ever there was one. We also have governments, grant agencies, and businesses wanting to multiply return on investment in data. RDF seems downright small-potatoes by comparison, as incentives go.

Finally, the HTML:RDF analogy falls down in one area that I think is utterly crucial: ease of adoption. I can teach enough HTML (and CSS) to be going on with in a couple of hours; I've done it. I still touch RDF only with great fear and loathing and a constant sensation that I must be doing it wrong, and I'll teach it only when I absolutely must and with a great many "I don't pretend to understand this" disclaimers. You can't frighten me with XML namespaces, XPath, XSLT, or regexes, but RDF scares me stiff. This is not an open standard that's going to rule the world. Not today, not tomorrow, and in my opinion not ever.

There's another danger lurking in the one-format-to-rule-them-all argument, a danger I hinted at above: what happens to data that for whatever reason aren't expressible in the format of choice? Second-class citizens? Invisible? I hope not.

Anyway, I say again: if the data web depends on RDF, the data web is a pipe dream and we should look for something else to do. I'd much rather believe the "if" clause counterfactual.

More like this

Strong stuff. And I agree completely. I especially like: "The data web that I personally think is more likely closely resembles today's mashup and microformats cultures: lots of formats with suitable documentation (one hopes) and APIs, available for use by whoever's willing to suss out how the various datasets work and write code to glue them together."

I *do* think RDF has a place at the table, but not as the one-format-to-rule them all. It may also be the case that bits of the RDF will sneak in through the back door (look at Facebook's drop-dead easy Open Graph Protocol work -- it's just HTML!). Reuse-friendly vs. Not resuse-friendly is a continuum, not an all (RDF) or none (not-RDF) proposition.

My real concern is that we are operating on two separate tracks -- the Linked Data side forges ahead w/ a specific vision, and the rest of the world blissfully ignores. I would love to see the Linked Data movement take a more realistic approach and accept the fact that the viewpoint you express here is *widely* held (if not in specifics, in basic conclusions). Of all the work happening right now, efforts like the Open Graph Protocol are the most exciting. I also happen to think that JSON offers an absolutely superb format for data that is drop dead simple to create, share and reuse (e.g., check out an Picasa Web album in Google's JSONC format in a nice JSON viewer: http://bit.ly/brZdSU as-simple-as-it-gets "linked data").

"It's a rough-and-ready sort of interoperability, arguably an inefficient one, but eppur si muove, as Galileo did not say of the web."

Though I like to think he would have if he'd been around. It's worth recalling that the web itself was not infrequently seen in its early days as taking a too rough-and-ready approach to interoperability. There were other networked hypertext systems out there, after all, ones that in many ways were quite graceful from a purely formal standpoint.

The web? No way to follow links *back* as well as forward! Why, there's no guarantee the links will even *work*! And don't get started on HTML-- it may *claim* to be SGML-compliant, but most web pages just string tags together, and don't pass formal validation at all!

Yet somehow the Web managed to take off stratospherically where the other hypertext systems didn't. It was enough to have something that basically worked for the common use cases, and had a modicum of structure on which additional services could be built. (You can use Google, referrer logs, or Technorati to find out who's linking to you, for instance; that capability didn't have to be baked into the Web architecture itself.)

Similarly, I think that if linked data's going to really take off, people will have to accept, and find better ways to cope with, the inevitable messiness that occurs when people put data online. Yes, that means that sometimes people will incorrectly refer to objects instead of documents, or vice versa, or (to take a library example) works instead of expressions, or any of the many pet peeves one sees recurring in mailing list discussions. You either deal with that, or you resign yourself to engaging with a niche instead of the world.

Hmmm. Not sure if I agree with the analysis. The web was 3 things: HTTP for transfer, URLs for links, and HTML for rich documents that could contain links, and transferred easily over HTTP. The 3 parts worked together. Gopher was pretty much just a simple protocol; gopher documents were pretty much endpoints (hec, many of them were .doc files, pre Word-for-Windows, about as flat as you can get) whereas HTML documents are fundamentally rich and linked.

I write this as someone who was pretty confident, back in 1993 or thereabouts, that the web would fail compared to gopher. The reason was, the web needed all those existing documents to be re-coded into HTML, whereas gopher just let you serve them up. So I clearly got that wrong; people (slowly at first, but with ever-gathering pace) saw enough of the advantages to do that recoding, and by late 1994 I was giving courses to librarians on HTML.

But what does that say wrt RDF and Semantic Web or Linked Data? Almost nothing, except that RDF (and a superstructure of vocabularies) does seem to be a simple, reductionist way of expressing enough kinds of data constructs and connections, that enough people see value in, that it is gathering pace. Who would have believed two years ago the quantities of Linked Data we have available now?

I don't think most researchers should have to think in RDF terms, any more than most researchers have to think in HTML terms. More of the former than the latter, perhaps, as there is ten years' less maturity in Linked Data, so the tools are... well, crap. But if your data are in structured form right now (eg in a database), then making them available as RDF is something your favourite geek can probably do in much less than a weekend.

It does seem to have something of the momentum of the late 1990s web!

Not sure about that last bit, Chris. An awful lot of the get-it-into-RDF efforts I've seen end up spiraling down the ontology rathole. It can be and usually is unbelievably non-obvious how best to represent something in RDF.

I think the web, even the early web, was more than those three things. ;)

"First, HTML was hardly the only part of the web stack necessary to its explosion. TCP/IP, anyone? "

A nitpick, but HTTP would be a better example. TCP/IP is to "the internet" as HTTP is to "the web". The twin standards of HTTP (transport) and HTML (content) are what created the web, and were indeed succesful at doing it.

Richard's analogy of HTML->web as RDF->linked-web is a good one for his argument, it is somewhat thought-provokingly persuasive. But I still tend to fall on your side of things.

I guess I lack faith that RDF _will_ catch on in the ways semantic web enthusiasts hope (predict?). The analogy with HTML can be returned too -- what factors led to the actual success of HTML? It's technical superiority for solving certain problems is probably NOT it. (If it even has such superiority, it's really a pretty inelegant hacky standard from some perspectives). Probably more to do with the ultimate success of HTML/HTTP was the incredible simplicity of creating simple web pages that _worked_ (even if they were not actually 'legal', as many of them were not!), without having to know what you were doing.

What does "working" means in that 'the web' context? Provide content that can be easily accessed by other clueless users, and easily linked to by other web authors. Creating, as this caught on, the, well, "web" of content that we know and love/hate.

What does "working" mean for "linked data"? (This question does not neccesarily have a simple or universally agreed upon answer, and it's difficult to be all talking about the same thing until we know what each other means by this, which we don't really).

How likely is RDF to catch on in order to achieve that? How possible is it for non-RDF to achieve that kind of "working"? How difficult is it for the individual to use RDF to achieve that "working"? If that "working" _requires_ some fairly challenging work to achieve... what does that say for the likelyhood of the "working" goal occuring? Is there a way to approach "working" with less challenging means (than RDF?), means where simple things are incredibly simple and complexity of implementation rises proportionally to complexity of goals?

I do wonder if there is any "there" there, when it comes to RDF. I see it shoe-horned into some very ungrounded situations and I worry that it is one of those geek-fashioned abstractions that fills a much-needed gap.

Actually, most of my experience with RDF is confirmation of how little we understand what language is for us and our willingness to project meaning where it doesn't live.

I would not be surprised that RDF has power as a data structure in very well-circumscribed domains where the using community can maintain a coherent conception of the application. But as a hammer looking for nails, I find RDF as worrisome. And, amidst from the misappropriation of "ontology" I wonder if we will every figure out what the "semantic" bit is (although it is, I suppose, an ontological commitment of sorts to grant being to whatever semantics is with regard to RDF).

Hey, I've been too long without a Dorothea fix, and I am overjoyed (whatever that means).

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

We're moving!

August 3, 2010

Looking for us? We're happy to say that we're part of the new Scientopia blogging collective. Come see us there!

Belated Zombie Day post

July 13, 2010

Oh, if I'd only had this picture for Zombie Day... Credit for the photo to UK Serials Group. Credit for the alteration of the speech bubble (you can see the original slide here if you care to) to Steve Lawson. Incidentally, I should have a postprint of an article based on this presentation up…

Promoting a comment: "Open and shared format"

July 8, 2010

Richard Wallis has taken my ribbing in good part, which I appreciate; his response is here and will reward your perusal. He also left a comment here, part of which I will make bold to reproduce: As to RDF underpinning the Linked Data Web - it is only as necessary as HTML was to the growth of the…

Small fry, blogging networks, and reputation

July 8, 2010

So, the PepsiCo blog thing. Right. Advance disclaimer: this is me talking, not either of my illustrious co-bloggers. We have not yet made a decision about what to do; one co-blogger is across the pond at a conference and the other is vacationing, so that discussion will have to wait a bit. This is…

I'd love to dance with you, but...

July 6, 2010

Richard Wallis of Talis (a library-systems vendor) posted The Data Publishing Three-Step to the Talis blog recently. My reaction to this particular brand of reductionism is… shall we say, impolitic. I just want to pat Richard on the head and croon "Who's the clever boy, then? You are! Yes, you are…

Promoting a comment: "Open and shared format"

More like this

We're moving!

Belated Zombie Day post

Promoting a comment: "Open and shared format"

Small fry, blogging networks, and reputation

I'd love to dance with you, but...

How long does a Solar Eclipse last?

Did the California H1N1 swine flu come from Ohio?

The Closest Kuiper Belt Object