04 April 2006
Hansard: HTML versus PDF
HTML is much the better choice when the file is destined to be read from a monitor screen. With HTML, the viewer can choose a font size and a window width to match his/her preference, so the text is easy to read.
When the file is destined for printing on paper, and the precise printed page layout is important, pdf is better — pdf is designed for printing, not browsing.
With pdf the user has no choice of line length for on-screen reading. At a nice font size the page often doesn't fit on the screen. Reading two-column output requires either a tiny font or lots of sideways scrolling. Page breaks get in the way, often forcing the viewer to scroll up and down several times to get the last few words on the previous page and the first few words on the following page; HTML documents never deliver that kind of needless hassle (always experienced at the receiving end and never at the sending end).
In pdf, hyperlinks work badly, and often not at all.
Searching a document for a word or phrase is much easier in HTML than in pdf.
The arguments in favour of pdf are advanced by people at the sending end, website designers, and the arguments against pdf are advanced by people at the receiving end, viewers.
The arguments in favour of HTML are advanced by people at the receiving end, viewers, and the arguments against HTML are advanced by people at the sending end, website designers.
Hansard documents are written reports of spoken words, thus are purely text with no graphics, no maps, no tables — straight text only.
Good, standards-compliant HTML is always better than pdf for use on the web, for delivering straight-text documents.
Hansard reports of Legislature debates should always be presented online in HTML. A second-choice version in pdf may be made available online, if necessary to placate sending-end control freaks.
Where Hansard is available in both forms, indexed side-by-side for equal accessibility, it would be very interesting to see the server logs to compare how often one format or the other is chosen by the citizens.
It is noteworthy that these server logs are controlled, not by citizens (who favour HTML) but by webmasters (who favour pdf). I've never seen Hansard download statistics made public in any situation where both formats are equally available; this drought of statistical information is compatible with the theory that those who oppose HTML are the same people who control access to the download statistics and would be reluctant to release statistics showing that citizens strongly prefer HTML.
A thought: why are blogs never in pdf?
This blog, and all the blogs I've ever seen, is presented in HTML, giving the viewer complete control of the font size. The viewer also can control the line length by adjusting his/her window width. Bloggers are hungry for an audience — no sane (or otherwise) blogger will drive his/her audience away by forcing them to endure the hassles of pdf. Blogs are never even offered with an alternative choice of pdf. Blogs are destined from the beginning to be read on-screen, and for a good reason are never delivered in pdf. The same reason applies with equal force to online Hansards, but Hansard decision-makers are usually people who have spent most of their working days in a printing environment, and who are uncomfortable with the freedom that HTML gives to the consumer.
Hansard available in HTML?
Yes Ottawa House of CommonsFor any particular session, New Brunswick's Hansard is partly HTML but mostly pdf only. The two formats are mixed together in no discernable system. I'm unable to find an explanation of which New Brunswick Hansard records are presented in HTML and which in pdf.
Yes Ottawa Senate
Yes Yukon Legislature
Yes British Columbia Legislature
Yes Alberta Legislature
Yes Manitoba Legislature
Yes Ontario Legislature
Yes Quebec Legislature
Yes Nova Scotia Legislature
Yes Newfoundland and Labrador Legislature
No Saskatchewan Legislature
No Prince Edward Island Legislature
No Northwest Territories Legislature
No Nunavut Legislature
It's much better to XML encode the data and then decide on output with an XLS transformation dynamically. The data can then be displayed, saved, or streamed based on document type.
It's always better to separate data from meta-data and talking about strict output format based on data is wrong.
XSL - http://www.w3.org/Style/XSL/
HTML - http://www.w3.org/Style/XSL/WhatIsXSL.html
PDF modules: http://www.xmlpdf.com/
I agree that "talking about strict output format based on data is wrong". I'd prefer to talk about what the important characteristics are. The characteristics associated with HTML, and not with PDF — such as giving control of text size and line length to the viewer — are the important points. Whatever means are available, that might do the job even better , are fine with me. Thank you for your comment.
In fact, HTML is pretty much dead, most modern webpages (including yours) use XHTML which is an XML document type based on W3C (World Wide Web Consortium) standards.
All XSLT (the XSL (XML Stylesheet Language ) Transformation) is just another text file that defines how those tags are output (screen, file, printer).
So from a simple XML document that contains data and minimal markup you can create HTMl, PDF, or Postscript.
Your worry about line-length and the like are not valid regarding PDF. PDF is a static data type, it generally isn't changed by the user. The _screen display_ of the PDF is different, that's client side.
I guess my point is that Internet supported technology is so far ahead of the vast majority of gov't institutions it invalidates your post about which is better.
The answer is neither, the answer is a non-specific format that has had the capability server-side to present the data in a ubiquitous way for three years or more.
IMO, that's what you get when IT staff are unionized and restricted.
Document (meaning data) management is a well known issue in IT, not so in the realms that deal specifically with information but not necessarily IT.
You wrote: "I guess my point is that Internet supported technology is so far ahead of the vast majority of gov't institutions it invalidates your post about which is better. The answer is neither..."
Here, our signals seem to be getting crossed, or maybe I need to state my position more clearly. I know that governments are a long way behind in presenting information on the Internet. The problem, the governments don't know it, or, if they do, they're keeping it very secret. No doubt there are people deep inside government who know all about this, but they have not yet made it into the ranks of the decision-makers. The decision-makers are still spending their time thinking about how to improve the design of the buggy whips.
What I'm trying to do, with this blog, is to open a conversation about the way governments are now presenting information on the Internet, and what should be improved. This is a job that has to be taken in small steps, and it has to begin with where we are now.
My chosen first small step is to talk about the online Hansards. That's a tiny, relatively simple but important part of government's information stash. Straight text, no images, no graphics, no tables, no forms, and everything cut naturally into reasonably small chunks of data, one day (a few hundred kilobytes) at a time — you can't get much simpler than that.
We have fifteen Hansards to discuss, two federal, ten provincial and three territorial. These Hansard texts are now presented online in one of two formats: HTML and/or PDF. That's why I wrote "Hansard: HTML versus PDF." That's where we are now.
Your position is: "...which is better. The answer is neither..."
My position is: Of the two formats now in use, which is better?
By all means, let's keep our eye on the horizon, but before we get there we have to step forward from our current position. Of the two formats currently in use, one is clearly superior from the point of view of the citizen. Let's try to deep-six the inferior format, or at least to get everyone on board with the superior choice now widely but not universally used.
Is there a case for presenting Hansard online in pdf? If so, can someone describe it?
You wrote: "Your worry about line-length and the like are not valid regarding PDF. PDF is a static data type, it generally isn't changed by the user. The _screen display_ of the PDF is different, that's client side.."
I think my concern about line-length and text size is valid, given the way Hansard is now presented online. Those in charge, some of them that is, have not yet grasped this point. From the point of view of the citizen, trying to find and use information in Hansard, text size and line length are often a problem now, thus are valid concerns. How do we, the citizens, get this across to those in charge?
Thank you for your comments.