soeren says

XHTML remains a mess

September 4th, 2007

I’ve been continuing to work on ’soeren says 2′, the redesign of this site. While most of the initial testing was done in static HTML hand-crafted with TextMate’s help, I decided last night to set up a test WordPress installation and building my own theme to work off, and got surprisingly far. It’s nowhere near finished, but I’ve enjoyed tinkering with it.

Unfortunately, not everything is going quite the way I’d like it to. One of the greater stumbling blocks is the choice of HTML version, specifically with regard to XHTML. You might wonder at this point how larger websites solve this, but the answer is typically: they don’t. They pretend the problem isn’t there:

Wikipedia, for instance, use XHTML 1.0 Transitional, but serves it as text/html, thus effectively making it identical to HTML 4.01, only harder to parse. Flickr, who you might be inclined to think is all modern and hip and such, uses HTML 4.01 Transitional, as does Apple’s website, despite the recent redesign. Google, well, let’s just say their markup is a mess of minefield-like proportions. Facebook, to their credit, use XHTML 1.0 Strict, but mostly render that moot because they, too, serve it as text/html.

The XHTML 1.0 standard makes an exemption that using text/html is okay, but only sort of. After all, using that content type effectively turns the markup into SGML-HTML, not XML-based HTML. So the end result isn’t XHTML, nor does it benefit from any of XHTML’s benefits over HTML 4, nor are browsers allowed to pretend it’s XML; they all use the legacy HTML parser to handle something that looks like XHTML, but is not. Yes, you still get to use XML parsers on the server side, but the advantages are typically slim.

The main reason using XHTML’s actual MIME type, application/xhtml+xml, is out of question is Internet Explorer: even in version 7, it does not support that type. For good reason, too: as the post explains, actual XHTML support wouldn’t be particularly good. Now, there is the workaround of sniffing the UA string and trying to guess, thusly, which MIME type to serve the content as. But that’s error-prone, and an ugly hack.

However you roll the XHTML problem, the current situation just isn’t any good, and pretending it doesn’t exist isn’t a good idea. (Go read this, too, particularly the section with “You have a couple of choices:”.) Rather, the answer (to me, anyway) has always been to favor traditional HTML.

So, all hunky-dory, right? Just use text/html, which is a tad easier to write anyway, works across a ton of browsers, and has virtually no effective disadvantages. As an extra bonus, WHAT-WG has been busily working on some modernizations in their Web Applications / HTML5 effort!

But, no. That approach, too, can bite you in the back. You see, the rather nonsensical marketing surrounding XHTML hasn’t just gone so far that Microsoft once accused Opera of not supporting XHTML (when it was in fact IE that didn’t; oops), but also that a lot of software has started implementing and sometimes even hardcoding a number of its specifics. Including WordPress, the CMS that powers this blog: it has numerous deeply-rooted hardcoded uses of <br />, <img something-something />, <link blah />, and so on.

My options include jumping to another platform (nanoc anyone?), fixing WordPress’s code (after every single upgrade, no less), writing or using someone’s extension that filters WordPress’s code and fixes it on the fly, or doing what just about everyone else does: using XHTML-that-isn’t-quite and pretending it isn’t a problem.

Which, effectively, it kind of isn’t. But that doesn’t mean I have to like the status quo.

Posted in Software, This Blog, Web

Share 3 Comments

Others' Thoughts

# zcorpan

The /> syntax is allowed in HTML5 for elements like link, br, img, etc. So you can use HTML without having to change the hardcoded /> in the templates.

# chucker

I can’t find a mention of that in the spec?

# chucker

Never mind; found it. Thanks!

Your Own Thoughts

I'd love to hear your input. Just try to stick to a few rules:

Before you comment for the first time (or, after you have deleted cookies), you will have to answer a little challenge to prove that you are not a spammer.

Comments are written in Markdown.

Leave the country the same, but correct the continent, and end the sentence with a period instead.