It’s just data

XHTMLating WordPress

Shelley Powers: Input from readers enters Wordpress in several different places in the code, most of which do not have hooks allowing us to override the code to provide our own. The only way Wordpress will be able to effectively do XHTML is through a commitment to make this a change in the underlying base code.  Since the Wordpress developers have not shown any interest in supporting XHTML, and since I haven’t seen a lot of interest in XHTML support in Wordpress from my own explorations and published posts, this is just not a challenge I’ve been eager to take on.

If you submit a ticket with a patch which adds a hook that you need, I can help champion it.  Furthermore, if you describe some observable behavior, I’ll automate a test case that I will continuously run to ensure that the feature continues to behave as desired.  I’ve effectively been developing what amounts to veto power over changes which break things.


To me, XHTML’s future really relies on Internet Explorer’s support of it, for the moment.  We should know something in about a month or so (when is MIX08?)

Posted by Jeff Schiller at

Jeff: my site degrades gracefully in Lynx, as do yours.

Posted by Sam Ruby at

“Jeff: my site degrades gracefully in Lynx, as do yours.”

Do we need decoder rings, or is this communication for everyone?

As for making Wordpress truly XHTML via patch and hooks, that would not be a wise course. It might be a hacker course, but not a wise course.

Posted by Shelley at

Y’all are embarrassing me that I’ve been too busy to devote myself to porting all my changes from MovableType 3.3x to MovableType OpenSource 4.1.

Jason Blevins made me realize that I don’t need to serve the the Admin interface as application/xhtml+xml, which cuts the size of the patch file from 2200 lines to more like 600.

Architecturally, MovableType is far easier to make XHTML-ready than WordPress. All I need is a solid week or two...

Posted by Jacques Distler at

Do we need decoder rings, or is this communication for everyone?

My site degrades gracefully on browsers that are, shall we say, less than capable.  I don’t need to wait for Lynx to support images or IE to support XHTML for me to use both.

As for making Wordpress truly XHTML via patch and hooks, that would not be a wise course. It might be a hacker course, but not a wise course.

Now it is my turn to say that I’m not following.  Both you and Jeff have managed to produce consistently well formed XHTML starting with WordPress as a base.  Whether it was wise to start with WordPress for this purpose or not, you have managed to do it.

Since it has been done, and presuming that you two wish to continue on this course, and further assuming that you two wish to occasionally step up to the latest WordPress releases, then the question becomes how to best minimize the efforts required when you each step up to a new release.

Posted by Sam Ruby at

“My site degrades gracefully on browsers that are, shall we say, less than capable.  I don’t need to wait for Lynx to support images or IE to support XHTML for me to use both.”

Ah, OK. However, if MS does not support XHTML with this release, MS will never support XHTML. We will, at this point, have a problem. Especially those of us who want to see more widespread use of SVG. No offense, but Lynx won’t have an impact on standards, while IE could have a strong, and negative influence on standards.

“Since it has been done, and presuming that you two wish to continue on this course, and further assuming that you two wish to occasionally step up to the latest WordPress releases, then the question becomes how to best minimize the efforts required when you each step up to a new release.”

Producing XHTML isn’t necessarily the real challenge. Cleaning the incoming data is a challenge, as is communicating the necessity to plug-in developers to ensure their stuff runs when the site is served up as XHTML. We can put in patches, but we need a community commitment that XHTML is a viable option, and the only way we’re going to get that is if the developers also support XHTML. Plus, I don’t want to have to ‘fight’ for every change we’d need to put in. I’d rather just fork the project.

Unfortunately, I’m just not seeing the interest to any of my posts on this topic. If I saw more interest, perhaps I would feel more motivated to make the push, do the fight.

Maybe Microsoft is releasing IE8 at the right time. Maybe really don’t care that much now about standards anymore. How else can you explain such a lack of interest in a marvelous technology like SVG?

Posted by Shelley at

“Jason Blevins made me realize that I don’t need to serve the the Admin interface as application/xhtml+xml, which cuts the size of the patch file from 2200 lines to more like 600.”

I will say that Wordpress admin served up relatively decently when served as XHTML. The Dashboard had some problems, but that was more a JS problem.

It’s the plug-ins that are a nightmare. Very few validate as XHTML. Or generate valid XHTML. I would think that a stipulation for Wordpress plug-in developers was that their work would be put through validation before publication.

Posted by Shelley at

Now I am even more confused.

As to the problem of plugins, I don’t see how forking WordPress solves that.

As to commitment, I have an automated test that I run every hour that there is a change in WP’s SVN.  When there is a change to WP that concerns me, I identify which of the changes committed within the past hour caused the problem, and reopen the associated defect.  That approach has proven to be very effective.

Posted by Sam Ruby at

My comment about “XHTML’s future” really had to do with breaking out of the crowd of people willing to hack on WordPress and pushing XHTML into the mainstream.  In order for that to happen, WordPress has to handle XHTML out of the box.  In order for that to happen, we would likely need commitment to XHTML from the WP developers.  In order for that to happen, all major browsers must support XHTML (and especially the leading browser).

To switch over to content negotiation of browsable content can be done with a theme, but moving the feeds and the admin panel over to XHTML requires poking into the database as well as adding some plugin functionality.

Posted by Jeff Schiller at

“My comment about “XHTML’s future” really had to do with breaking out of the crowd of people willing to hack on WordPress and pushing XHTML into the mainstream.  In order for that to happen, WordPress has to handle XHTML out of the box.  In order for that to happen, we would likely need commitment to XHTML from the WP developers.  In order for that to happen, all major browsers must support XHTML (and especially the leading browser).”

What Jeff said.

Posted by Shelley at

I am under the firm belief that draconian error handling will never become mainstream.  I believe this to be true even if IE 8 were to support XHTML, and even if WordPress were somehow able to exercise veto authority over all plugins.

That being said, I do believe that people can and do produce valid XHTML with WordPress today, and that changes which make it either easier to do so or easier to keep up with maintenance releases is both possible and desirable.

Posted by Sam Ruby at

“I am under the firm belief that draconian error handling will never become mainstream.  I believe this to be true even if IE 8 were to support XHTML, and even if WordPress were somehow able to exercise veto authority over all plugins.”

In the days when all content was created manually, this would have been true. Today, though, most content is generated by tools and requiring accuracy of the tools is more than acceptable.

We’ve pointed out how well Opera handles XHTML content. The important component of the Opera page is the link that states, “Display this page using HTML processing”. A site does not have to be inaccessible if there’s an HTML switchover when people run into sites that won’t parse.

Asking that markup be as accurate as code isn’t putting a huge burden on people. Either people use WYSIWYG editors and tools, or we can assume they know enough to close an IMG element, or ensure that there’s a closing end tag for that LI.

“Draconian error handling” — the person who invented that term should be forced to visit every Geocities site. We are not the same web we were ten years ago. Content creators aren’t using the same tools. I bet that 99% of the people using Wordpress and Movable Type use default formatting and never once muck with the markup, except on their templates. Even with template editing, they probably start with a page and then modify it. If they make a mistake, and a tag isn’t closed, the browsers should give this information. If the browsers do, I really think that a person who is willing to edit their templates is also willing to learn just the few things they need to do it right. Otherwise, they wouldn’t touch the templates.

If the toolmakers got behind XHTML, I really see little hardship for the average person using a CMS or weblogging tool. We really are requiring toolmakers and application developers to produce accurate XHTML, and they should be more than capable of dealing with problems.

As for Wordpress producing valid XHTML, yeah, that’s a piece of cake. The same can not be said about handling comments, ping backs, content generated by plug-ins, coming in via searches, not to mention that whole admin section. One or two coders cannot make this transition alone--it requires a willingness and a commitment on the part of the Wordpress community.

Not every problem can be solved with a simple patch.

Posted by Shelley at

I’m going to reserve my opinion on XHTML’s applicability for the masses at the moment.  After all, I’ve only been using it for a week.

I’ve always suspected that the reason XHTML never took off is because not all browsers were behind it.  If we get that bone now, I think it remains to be seen...

Posted by Jeff Schiller at

I’ve always suspected that the reason XHTML never took off is because not all browsers were behind it.

I disagree.

I think the problem was, and is, mostly on the content-creation side.

Over the years, I’ve seen lots of people, bright-eyed and bushy-tailed, announce “Ooooh, I’m gonna start producing XHTML.” only to give up in frustration a few weeks or months later.

Because the tools aren’t there.

If, by some miracle, IE8 was released, with support for the application/xhtml+xml MIME type, this would not, by itself, remove a single barrier to the uptake of XHTML.

As for Wordpress producing valid XHTML, yeah, that’s a piece of cake. The same can not be said about handling comments, ping backs, content generated by plug-ins, coming in via searches, not to mention that whole admin section.

Not to move the goalposts or anything, but ensuring that comments, pingback, etc, produce well-formed XHTML is part of the job.

You can’t just say "All my blog entries are well-formed, except for the comments."

One or two coders cannot make this transition alone--it requires a willingness and a commitment on the part of the Wordpress community.

Good luck with that.

Posted by Jacques Distler at

If, by some miracle, IE8 was released, with support for the application/xhtml+xml MIME type, this would not, by itself, remove a single barrier to the uptake of XHTML.

I guess what I mean is, if all modern browsers now supported XHTML, this might encourage people (INCLUDING content creation tool-makers) to use it.  Without majority support of XHTML, this is a lot harder to justify.  Thus, I stand by my original opinion - without majority browser support of XHTML, people (authors AND tool-makers) see no point in using it.

Posted by Jeff Schiller at

You can’t just say "All my blog entries are well-formed, except for the comments."

Sorry, I happened to get to this and correct it before it could be seen by anyone as an example.  In communication with Jacques, he stated that the character he injected at the end of the comment was U+FFFE.

Still ramping up on this stuff, obviously.

I guess that would make me “bright-eyed and bushy-tailed” in Jacques eyes...

Sam - FYI I got a red warning about spam comments here until I changed my URI (didn’t have a “Submit” button either).

Posted by Jeff Schiller at

“If, by some miracle, IE8 was released, with support for the application/xhtml+xml MIME type, this would not, by itself, remove a single barrier to the uptake of XHTML.”

I disagree. I think that IE not supporting XHTML has had a significant negative impact. I think other impacts have been not enough promotion of the benefits of XHTML, such as inline SVG. But this black space where IE tries to load XHTML as XML has had an impact on adoption.

“Not to move the goalposts or anything, but ensuring that comments, pingback, etc, produce well-formed XHTML is part of the job.

You can’t just say 'All my blog entries are well-formed, except for the comments.'”

I agree with this. I was responding more to Sam’s comment about producing valid XHTML. To me, the whole application has to be focused on XHTML, not just the part that produces the XHTML for the posts. Frankly, modifying the application to handle XHTML doesn’t have to be a onerous task. However, without buy in from the developers, it’s just a bandage. They have to also commit to the specification. And they have to promote it beyond the app, requiring valid XHTML from plug-in developers.

You know, listening to you Jaques, I feel I shouldn’t even try. You’re assuming that even if IE8 releases with XHTML support, Wordpress wouldn’t be interested in supporting this spec.

Why don’t we give up on XHTML, then? Why don’t we give up on SVG, and just pick our side: AIR or Silverlight? I have to think there are people out there interested in supporting standards, but they have to have the tools, and the tool makers have to have the motivation. I don’t think it’s impossible if...big if... IE8 supports XHTML.

Posted by Shelley at

You know, listening to you Jaques, I feel I shouldn’t even try. You’re assuming that even if IE8 releases with XHTML support, Wordpress wouldn’t be interested in supporting this spec.

I’m saying: Forget about IE. Even if it did support application/xhtml+xml, it’s not going to support your use-case (inline SVG).

Without a compelling use-case, nobody (neither authors nor tool vendors) is going to bother with XHTML, regardless of whether IE supports that MIME type.

On the other hand, if you have compelling content, which requires an alternate browser (Mozilla, Safari, Opera), my sense is that people will be willing to use whatever browser is required to see it.

If there are enough people desirous of creating such content, tool vendors will take an interest. (Or, perhaps, some enterprising souls will take matters in their own hands.)

I just don’t believe the assertion, which I’ve seen multiple people make, “I’ll start producing XHTML, just as soon as IE supports it.”

How? And, more importantly, why?

Posted by Jacques Distler at

Stimulating WordPress

I think XTHMLate should be pronounced “stimulate”. Anyway, here’s a list of WordPress bugs that I think are important for XHTML: 3833 - Extra

inside blockquote 3914, 4746 - Two feeds on Dashboard don’t work with

...

Excerpt from Something Witty Goes Here at

“I just don’t believe the assertion, which I’ve seen multiple people make, “I’ll start producing XHTML, just as soon as IE supports it.”

How? And, more importantly, why?”

I think two things are needed. Yes, we need compelling content, and I think the increasing number of sites starting to use SVG is a start. However, I still think that IE not support XHTML is detrimental to overall support for XHTML.

True, IE probably won’t process SVG, but the SVG doesn’t cause an error. Well, shouldn’t cause an error. Which means that people using the other browsers will see the SVG. At a minimum, though, people using IE will at least be able to access the page.

This is the way things worked at Burningbird, where I used content negotiation. What I’d like to do, though, is drop the content negotiation part, because that just makes using SVG inline that much more complicated.

So I think two things are needed: IE8 must support XHTML, and then we have to generate interest for (and lessen fear of) XHTML, both by encouraging tools to support XHTML, and by providing that compelling content.

Will I give up if IE8 doesn’t support XHTML? No, I have a new game plan if that occurs.

Posted by Shelley at

This is the way things worked at Burningbird, where I used content negotiation.

Right. Content negotiation.

RewriteRule ^$ index.html
RewriteCond %{HTTP_ACCEPT} application\/xhtml\+xml [OR]
RewriteCond %{HTTP_USER_AGENT} W3C.*Validator|MathPlayer
RewriteRule \.html$  - [T=application/xhtml+xml]

works just fine for me right now, and will continue to work fine, should IEn, for n≫7, start supporting XHTML.

What interests me (far) more is

1) getting to the point where Mozilla can render this page correctly
2) getting to the point where there are multiple browsers that can do the same.

Posted by Jacques Distler at

I served my blog (and wordpress.org) as application/xhtml+xml to supporting UAs from around early 2003. It’s very doable. I stopped sometime in 2006 not because of any fundamental change, I was just tired and no one seemed to notice or care what we had been working hard several years to maintain.

To Shelley’s original point, “Input from readers enters Word[P]ress in several different places in the code, most of which do not have hooks allowing us to override the code to provide our own.” If there are places where input comes in without hooks, please file a bug, and a patch if you’re so inclined.

Jacque’s sarcastically says “Good luck with that” but I think there are numerous examples of 1-2 people passionate about an issue having a dramatic change on the project. In the past year alone I would point to our changes around Atom, WYSIWYG, XML-RPC, media handling, canonical URLs, and internationalization as places largely impacted by a small number of non-core contributors.

Posted by Matt at

I was just tired and no one seemed to notice or care what we had been working hard several years to maintain.

Well-formedness in itself is not a killer feature for readers to marvel. When it works, of course no one notices. I did notice when there was an ill-formed trackback (or was it comment? can’t remember for sure).

Since people didn’t notice what was going on when it worked what was the point in going XHTML without having some MathML or SVG in the mix? Going to XHTML only seemed to only have a downside (getting noticed when it didn’t work).

Posted by Henri Sivonen at

Going to XHTML only seemed to only have a downside (getting noticed when it didn’t work).

Same thing can be said about unit tests.

Going to XHTML tends to benefit the producer more than it benefits the consumer.  That being said, the cost to do so is higher than you might expect, and the benefits (while very real) can be significantly lower than some promise.  In the final analysis, the cost/benefit analysis needs to be determined by those that maintain the project.

In the case of WordPress, apparently nobody was willing to step forward and maintain the code necessary to do that.  I also seem to be unable to find any evidence of continuous integration tests or even a simple regression test suite.  Perhaps these two are somehow related.

Like Matt and Jacques, I do believe that this is an area where a single person could make a difference.  And would be able to show visible results within the first few weeks of effort.  It might not show up in an official release for a few months, but would likely show up on wordpress.org before that.

Posted by Sam Ruby at

Actually, no, I don’t think this is something that can be hacked out by a single person. Not everything is dependent on coding superstars, though that seems the only thing that gets respect in this environment.

We’re talking architectural changes. More than that, we’re talking about enforcing XHTML validation for the plug-ins, as well as the Wordpress application, itself. It does no good to update Wordpress to effectively deal with XHTML if the existing core developers break it with new updates, or the plug-ins break, both in admin interface pages and generated content.

That’s what I meant when I said Wordpress needs a commitment to XHTML. Not every problem has a code solution. Honest to goodness.

When I turned on XHTML for my admin pages, other than the Dashboard problems, which were reported (and subsequently ignored), both plug-ins I used that had an admin interface page generated XML errors. One was really bad, though I worked with the plug-in developer to fix it. However, the generated code still breaks XHTML.

Wordpress is popular because of the ease of use with plug-ins as much as anything else. But if there is nothing about valid XHTML established by Wordpress.org, we’ll be having nothing but battles every time a new plug-in is added. Either the admin pages will be bad XHTML, or they’ll generate bad XHTML.

Heck, shall we talk about raquo and named entities right about now? Even something as basic as what Wordpress uses to separate blog name from individual article name fails in an XHTML environment.

There is more to this than adding a few bug reports and hacking out some code. Same as there is more to regression testing than putting the tools in place.

I’m going to see how far I can get to XHTMLate Wordpress with plug-ins and then make these available to people, as I’ve already done in the past. It’s a bandage, though. And a disappointing one for an organization that touts support for standards.

As for not getting pats on the head and ‘attaboys’ because you incorporate standards and people don’t seem to notice, people don’t exclaim with joy when they start an application and it works rather than breaks; they don’t jump up and down when the site uses DIV elements instead of tables. People notice when things break, but rarely when they work.

However, there should be an inner satisfaction from being a developer or designer and doing things right. Maybe we’ll be the only ones who notice, but if we respect our work that should be enough.

Yeah, I’m lecturing. I’m being pedantic. I’m dull and boring. Sue me.

Posted by Shelley at

We’re talking architectural changes. More than that, we’re talking about enforcing XHTML validation for the plug-ins, as well as the Wordpress application, itself.

If you’re convinced that the only way to make WordPress XHTML-safe will require major architectural changes, then I suspect that your only recourse, in the end, is to fork the project.

At that point, you might ask yourself, “Is this really the best codebase upon which to build an XHTML-safe blogging application?”

Not everything is dependent on coding superstars, ...

Seems to me that fixing individual bits of brokenness is more easily achieved, by non-superstars, than a major re-architecting. Having a test-suite ensures that those bits stay fixed.

Whether this is sufficient is a different story, but you’ve spent more time wrestling with the WordPress codebase, so you’re probably in a better position to judge than I.

Posted by Jacques Distler at

we’re talking about enforcing XHTML validation for the plug-ins

That is but one approach.  Mombo (my blogging software), and Venus take different approaches.

Mombo escapes literally everything, filters on characters, and selectively unescapes sequences that are recognized as commonly in use and valid.  In the same pass, it also does some wiki-like transformations, and smart quotes.

Venus uses html5lib to produce a DOM which is then serialized as XML.  Perhaps a function such as DOMDocument->loadHTML could be used.

Posted by Sam Ruby at

I disagree that without XHTML-valid plugins that XHTMLating WP is a non-starter.  I think if everything else was handled from an XHTML point of view - then it would be up to the plugin developers to ensure validity, otherwise they risk pissing off authors.  Start with the core product and work outwards.

Posted by Jeff Schiller at

Draconian Error Handling Never Meant for Mainstream Use

Sam Ruby I am under the firm belief that draconian error handling will never become mainstream. I believe this to be true even if IE 8 were to support XHTML...

Excerpt from Planet Case at

“  we’re talking about enforcing XHTML validation for the plug-ins

That is but one approach.  Mombo (my blogging software), and Venus take different approaches.”

I was talking philosophically, Sam. Even something like, “Please test your plug-in to ensure produces valid XHTML before checking it in” added to the plug-in developer page would be a start.

For my comments, I’m using a library from a PHP framework that was released as a separate product and does a lovely job of validating the XHTML and returning meaningful error messaging. The only place these fails is cleaning up ping backs (which I’m about to turn off completely, anyway), and search.

I haven’t heard of Mombo before. I should check it out. However, I’ve got as much time in Wordpress as Jacques has with MT. It’s not appealing to consider moving to another tool.

“Start with the core product and work outwards.”

I’ll be looking forward to the changes in upcoming versions of WP. In the meantime, I’m going to see how far I can get with plug-ins, which people can have for their own use if they want. All two of them.

Posted by Shelley at

Geez, I wish you had comment edits.

Posted by Shelley at

Collection of Februari '08 news

Your browser doesn’t support SVG or this feed was mangled to remove all ‘object’ elements. You may see the post in its original form at my.opera.com/macdev_ed . You may download a browser that supports SVG here . Here’s a collection of links I...

Excerpt from ed.blog at

Venus uses html5lib to produce a DOM which is then serialized as XML.  Perhaps a function such as DOMDocument->loadHTML could be used.

In order for Wordpress, or any other CMS/blogging tool, to produce valid, unbroken XHTML, the DOM needs to be serialized as XML. The fact that most blogging software allows foreign input (posts, comments, etc.) means that, if the blog is delivering application/xhtml+xml content, the page could render with a big, ugly error message if someone failed to close a tag properly, for example. So, the need for tidying up the mark-up and serializing the DOM is important, I think.

OTOH, if you deliver your content as plain ol' text/html, then you don’t have to worry about the client complaining of non-well-formedness, but then are you really delivering XHTML?

Posted by Ben Ramsey at

I think the issue of getting plugin developers to author well-formed plugins is solved by getting the core of WordPress to support and enforce XHTML. Getting this right is, as I see it, a two-part enforcement and encouragement battle. First, WordPress should be outputting everything as application/xhtml+xml (including the admin pages) to supporting browsers.

Plugin developers working with everything but Internet Explorer will then have their own plugins break their WordPress when they do something wrong. I have no idea, but I expect most plugin developers to test in more than just Internet Explorer. The second part would be to actually validate the plugins before they’re accepted and hosted on the WordPress Plugin web site. I don’t know how the plugin acceptance and hosting procedure works, but if it involves any type of automation already, it should be rather easy to insert a new step that goes through the code and checks it for obvious problems. If this step is made open for everyone to contribute to and improve, I’m pretty certain it will become a solid acid test of all plugins checked in and accepted in the future.

On top of this, the core WordPress developers should of course encourage everyone to ensure well-formedness (by providing the necessary tools and guidance) and give good information on how to achieve it. I’d do my part to contribute.

Posted by Asbjørn Ulsberg at

I don’t know how the plugin acceptance and hosting procedure works

signup, get approved, get access to SVN, create a readme.  Of course this means that some plugins are utter crap, and others are absolutely astounding.  The latter tend to be downloaded more than the former.

I’d do my part to contribute.

Patches work best.

Posted by Sam Ruby at

WordPress should be outputting everything as application/xhtml+xml (including the admin pages)

It’s amusing, though admittedly somewhat tiring, to run across people who don’t realize that this is this worst idea ever.

Posted by Mark at

It’s amusing, though admittedly somewhat tiring, to run across people who don’t realize that this is this worst idea ever.

“Worst” is a little bit of an exaggeration.

It was a pain in the ass to convert, but I’ve had the MovableType Admin 3.x interface served as application/xhtml+xml for years. 4.x won’t, because it’s too much of a pain in the ass to convert, and all I care is that the previews are served as application/xhtml+xml. And that can be done with <iframe>s.

My branch of Instiki is top-to-bottom application/xhtml+xml.

My sense is that, if the Admin Interface is prone to blow up in your face, and produce a Yellow-Screen-of-Death, then other facets of the application are problematic, too.

I will grant you, though, that the Admin Interface is the very last part of the application you should “XHTMLate.”

Posted by Jacques Distler at

Jacques: be sure to follow Jeff’s links, particular the one that reads Mark mentioned.  Yes, Mark is prone to hyperbole at times, but it is equally true that he is generally right.  An in specific, he does have a valid point here.  One that went from hypothetical four years ago to actually happened this week.  Ironically, by your doing.  :-)

Posted by Sam Ruby at

As I said:

[T]he Admin Interface is the very last part of the application you should “XHTMLate.”



Posted by Jacques Distler at

Jumping out of the system

In the midst of a discussion between the only four people in the world who care about such things, Asbørn Ulsberg writes: I think the issue of getting plugin developers to author well-formed plugins is solved by getting the core of WordPress to...

Excerpt from dive into mark at

You know, maybe as someone who hasn’t implemented comments yet, I’m one of those bright-eyed and bushy-tailed folks, but I’ve found it astonishing that it wouldn’t be dead obvious how to deal with this task, even years ago. I find it even more bewildering it’s still such a big topic. If the comment form accepts Markdown and doesn’t scrub literal tags there are real problems to overcome (and then only because of the constraints I place on an acceptable approach), but otherwise?

TagSoup, say, has been around forever and the algorithm is simple enough to reimplement in other languages. Or maybe your language of choice already has something else that’s similar. (The only thing that’s important is that whatever you use needs to try to preserve the exact formatting of the source.) Comments then get ground through that; if they come out the other side of this process looking different from how they went in, you force another preview to show the result to the user, asking them whether the sanitisation mangled the comment in a significant way. If it did, they can fix it and try again. That way, dirty input never even makes it into the database.

The same approach is not possible for trackbacks/pingbacks, assuming you even care about them. But all browsers where application/xhtml+xml is of interest support data: URIs, so it’s trivial to tunnel malformed fragments within well-formed pages for display in an iframe. (You might want to do this in the admin interface only, for editing purposes; public pages should probably display a sanitised version.)

Maybe I’m naïve; I just don’t see what the big fuss is about. Do I need to actually try to find out? (Of course, if the experience doesn’t humble me, it will do the exact opposite…)

Posted by Aristotle Pagaltzis at

I wrote:

What interests me (far) more is 1) getting to the point where Mozilla can render this page correctly 2) getting to the point where there are multiple browsers that can do the same.

Woot! The latest Mozilla nightly’s do. (I’ve adjusted my browser instructions accordingly.)

Now, if only ...

Posted by Jacques Distler at

Phun with Rails

An XSS vulnerability in Instiki....

Excerpt from Musings at

If IE8 Supports XHTML is there any risks from Hachers or is it safe!

Posted by dad at

WordPress: XHTMLation Stalled?

I spent some time a few weeks ago quietly trying to shore up my XHTML defenses on my WordPress install - not everyone is planning to move to Drupal just yet. I have a bunch of patches that are aging. I think three of them are ‘good to go’ but I need...

Excerpt from Something Witty Goes Here at

Add your comment