The PDF Plague

There have been a half-dozen stories in the past few weeks that looked interesting, but didn't even make it into the Links Dump for the day. Why not? Because the stories or studies were only available as PDF files.

I have no idea if this is actually getting worse, but I'm finding this more irritating than ever. It's particularly annoying as there's usually no good reason for presenting the information in question in PDF form-- you could perfectly well present it as an easily linkable and quotable HTML page. Take, for example, this NEA report on the arts-- the one-paragraph note on Inside Higher Ed is more useful than the official page for the survey itself. Or there's the charter school study that Kevin Drum talked about-- again, there's no way to get even a short summary of the results without downloading a PDF file.

Just to piss Mike Kozlowski off, I'm going to blame this on Microsoft, for making Internet Explorer render HTML differently than every other browser on the planet. As a result, the only way to know that the formatting of a given document will show up correctly is to make it a PDF, because Microsoft hasn't yet found a way to screw those up.

Whoever's really responsible, though, this needs to stop. If you want your work to be talked about on the Internet, stop putting barriers in the way of people who would like to talk about it on the Internet. HTML, not PDF. And for the love of God, stop it with the Flash presentations, already.

More like this

...the only way to know that the formatting of a given document will show up correctly is to make it a PDF, because Microsoft hasn't yet found a way to screw those up.

Maybe they have, but if they ever implemented it in a released product, Adobe would clean their clock in court. PDF, like the various Microsoft Office formats, is a proprietary format, but Adobe, not Microsoft, owns it. Adobe has published the standard, so anybody can write a PDF reader, but it has to conform to Adobe's standards or else.

That does not mean PDF should be used for everything. If you want people to download the document and read it offline (because it's a full-fledged paper or a lengthy report), then by all means use PDF. But the executive summary (or paper abstract) works just as well in HTML, and that's what people who work on Internet time are going to quote.

And for the love of God, stop it with the Flash presentations, already.

Preach it, brother Chad! In all my years of surfing I've only ever seen one Flash presentation that was appropriate, and that was because it was specifically intended to be a parody of Flash presentations. It appeared, appropriately enough, on a now-defunct site called Skip Intro.

By Eric Lund (not verified) on 17 Jun 2009 #permalink

You are so right!

There should be a kind of Occam's Razor approach to presenting material on the web: stick with the simplest, fastest format possible unless there is a compelling reason to do otherwise.

People who link to PDFs and Word docs without telling you that's what you are going to get are worst of all. If you must inflict the damn things on your users, at least warn them beforehand.

By Michael Finn (not verified) on 17 Jun 2009 #permalink

OK, so are you going to make all your scientific papers available in HTML then? It's as simple as downloading tex4ht and typing "htlatex mypaper.tex" then uploading to a webserver.

As a result, the only way to know that the formatting of a given document will show up correctly is to make it a PDF, because Microsoft hasn't yet found a way to screw those up.

Or, you know, not be incredibly lazy and check for browser type and make a few piddling changes that you test in your preferred browser.

for making Internet Explorer render HTML differently than every other browser on the planet.

And this is simply not true. Thanks to mindsets like this Larry Ellison can continue building houses out of money even though his products are crap.

Lynx:

A: If people followed the spec, this kind of bullshit would be unnecessary and Chad's a physics professor, not an IT guy.

2) Put the crack pipe down. WTF does Oracle have to do with IE's inability to follow a free and open standard?

Jamie: Thanks for using my favorite gag numbering system.

A: for the most part, a page coded for IE will work in the other major browsers without needing the check. If you're doing something unusual, you need the check. IE follows the W3 standard 99% of the time.

2) The "Microsoft Sucks" mindset is what I'm talking about. The only reason Oracle is in business is because there are people that hate Microsoft. It is also part of the reason Apple gets away with charging what it does for its machines. It is why other inferior products sell without having to improve.

"As a result, the only way to know that the formatting of a given document will show up correctly is to make it a PDF, because Microsoft hasn't yet found a way to screw those up."

Well... not quite. If you ask MS Word to save a document as a PDF, it may not save correctly; my labmate lost a draft of her dissertation that way.

I'm going to blame this on Microsoft, for making Internet Explorer render HTML differently than every other browser on the planet.

Add to that the fact that MS Word makes horrible HTML, so a document composed in it is trapped.

Actually ... PDF is an open standard, no longer proprietary to adobe. They released it to the public in the last year.

So now its easier than ever for anyone who's on the ball to use. ... so get with the program people.

As for MS and bad HTML rendering ... a superficial problem at most. I have not seen a website that was functionally affected by this in years.

This article is a pointless ignorant rant. Nothing to see here people ... carry on to the next blog.

In what meaningful sense is PDF a "barrier" unless you're stuck in the land of monochrome, 80 character wide dumb terminals?

I see a PDF as a link, I click on the link and-- like magic-- I see a properly formatted PDF open up on my screen.

By John Novak (not verified) on 17 Jun 2009 #permalink

As someone that actually has to make web pages for part of his living, let me tell you that IE6 is still a rank bastard to account for, and still very common. IE7 is a lot better, but still has some funny ideas on a few small things. IE8 still doesn't support SVG.

Anyway. Even for offline reading something bookmarkable is a whole lot better, IMHO, unless your document is only a couple of pages. I don't of any good formats like that currently, so I'd just go with a zipped folder of html, but that's just me.

Man, the internetz are cranky today.

PDFs abound because they're a low-effort means to repurpose print content for web consumption. If your workflow isn't web-focused, it takes a lot of effort to get that document looking (and working) right as HTML. That work only starts with design and layout; pagination and link structure is going to require your attention too.

HTML would be the better way to go, but I'd rather have the PDF than nothing, which is usually the more realistic alternative. And PDF is arguably a more useful format for long-form works, especially with a high proportion of graphics. That said, a group like NEA can and should do better.

And Ken, wtf? IE is a real issue, though more for web apps than for layout. But that's hardly the main point of the post. Even if PDF sprung fully-GPL'd from Richard Stallman's forehead, it's still a pain to download a multi-meg file and deal with a plugin reader. That's even more of a non sequitur than the Ellison reference.

As someone that actually has to make web pages for part of his living, let me tell you that IE6 is still a rank bastard to account for, and still very common. IE7 is a lot better, but still has some funny ideas on a few small things. IE8 still doesn't support SVG. Just because you don't see a lot of websites looking funny in IE, don't assume it's because somebody fixed IE, or that it was terribly easy for the people that made the site.

Anyway. Even for offline reading something bookmarkable is a whole lot better, IMHO, unless your document is only a couple of pages. I don't of any good formats like that currently, so I'd just go with a zipped folder of html, but that's just me.

I'd also like to point out that MS Office, OpenOffice, and Google Docs all allow you to publish your content in html format, though I have no idea, offhand, how well any of them do that, it is an option.

I see a PDF as a link, I click on the link and-- like magic-- I see a properly formatted PDF open up on my screen.

You're doing better than I am. If I accidentally click on a PDF link, it's about a 50/50 chance that Adobe will hang the whole system up. When it doesn't immediately take everything down, I usually have to either fend off a bunch of update requests, or cave in, and waste a bunch of time watching it update, and then re-start the browser.

If I catch it in time, I can right-click and save the file, and then read it with FoxIt, which doesn't put me through any of the bullshit that Adobe does. But that's annoying

For a full report, I consider a PDF fine, on Win2K, WinXP, or MacOSX. However, I'd agree that the abstract ought to be available as an HTML page, so I can get an idea whether to waste my bandwidth on a bloated format.

I used to feel much the same about Flash as you do, and still feel that way about websites that are done completely in Flash, but Flash is useful for lots of things...

One thing that it is useful for is displaying PDF files, if you put them on Scribd. More and more websites I've seen are giving links to Scribd or similar sites when they want to link to PDF files; this displays them in a flash browser that fits a little better into most web browsers than Adobe Reader does.

As someone who works for a government agency that makes all its work available as PDFs I can explain some of it, although, I'm not sure they are good reasons

1) The people in charge like paper. They do not like to read things off screens. They want something that can be printed, stapled and held, with page breaks that make sense for reading on paper. If want them to review something I need to print it out and stick in a folder.

2) They want to control the product. The people in charge view issuing documents as locked PDF as less tamper prone than other formats. They have a real fear of our brand name being damaged by a fake look alike report. I can't explain it, it just is.

3) Run of the mill government agencies are slow adopters of technology. I just got a laptop with wireless capabilities this year. We are two or three versions of Office behind the rest of the world.

And then there's poor slobs who enjoy the country life and suffer through on dial-up. PDF & flash send me off the site pretty quick.

By Canadian Curmudgeon (not verified) on 17 Jun 2009 #permalink

You're doing better than I am. If I accidentally click on a PDF link, it's about a 50/50 chance that Adobe will hang the whole system up. When it doesn't immediately take everything down, I usually have to either fend off a bunch of update requests, or cave in, and waste a bunch of time watching it update, and then re-start the browser.

Well then something is horribly wrong with your machine or your set-up, not the entire PDF regime. I've never had these issues. Hell, I was ecstatic when I learned that Adobe even correctly handles postscript, now, enabling me to buypass the horrible Ghostscript thing I'd been using.

By John Novak (not verified) on 17 Jun 2009 #permalink

When [Adobe] doesn't immediately take everything down, I usually have to either fend off a bunch of update requests, or cave in, and waste a bunch of time watching it update, and then re-start the browser.

Yes, the Adobe Acrobat reader is unbearably slow, and Adobe has the bad habit of checking for updates when I start one of their programs for the first time in a while. (Earth to Adobe: When I start one of your programs, it's because I want to use it, and I want to use it NOW, not 20 minutes from now after you're done updating. Also, there is absolutely no excuse for forcing me to type the administrator password more than once during the update process.) I can't offer solutions for Windows, but I find the Preview plug-in that ships with Safari (Macintosh) far superior in performance to Adobe's PDF reader. Unfortunately, whenever Adobe finds updates to install, it tries to force its PDF reader on me--I have to remember, after the install is complete, to find the Plug-Ins folder for Safari and manually get rid of Adobe's PDF reader so I can use Preview again. It sounds like you go through a similar process on Windows with whatever browser you use.

@Kate: Try to postpone the day you have to upgrade to Office 2007 (2008, if you are of the Mac persuasion) as long as possible. I have heard many horror stories about Office 2007/8, and if your product includes scientific papers remember that most journals do not accept the DOCX format that was introduced with Word 2007.

By Eric Lund (not verified) on 17 Jun 2009 #permalink

Actually ... PDF is an open standard, no longer proprietary to adobe. They released it to the public in the last year.

So now its easier than ever for anyone who's on the ball to use. ... so get with the program people.

As for MS and bad HTML rendering ... a superficial problem at most. I have not seen a website that was functionally affected by this in years.

This article is a pointless ignorant rant. Nothing to see here people ... carry on to the next blog.

In what meaningful sense is PDF a "barrier" unless you're stuck in the land of monochrome, 80 character wide dumb terminals?

Here's the thing, the person offering the information is restricting the means that the receiver can take the information. It is limiting the audience, and self-defeating. A better solution is to provide the information that allows the receiver to choose how to take it.

If I want something printable, PDF good. If I just want to glance at something to determine whether I want to read it in detail, a 275 MB PDF file is not good.

What in the world is giving PDFs that big? My 250 page dissertation is less than 8 MB :p

Anyhow, you can set firefox to use foxit by default (which is what I do).

Go to tools -> options -> applications
Scroll down and pdf document will be there and to the right where it says adobe acrobat, you should be able to click and get a drop down with foxit listed.