Jon
Udell: To ante up for this game, you have to produce
well-formed content. The mainstream blog-writing tools aren't
helping at all. Most well-formed writing is done in emacs, still.
Can we please change that soon?
Having worked for a legal publisher, and attempting to automate markup of law citations, I can say that regular expressions leave much to be desired :)
After filtering some new Act by Congress through literally hundreds of patterns we still needed a gaggle of human proofreaders (complete with law degrees) to properly markup citations. Then, another pass with real (aka not HTML) SGML tools to make sure cites matched targets.
Oh... we did "well-formed authoring" by having authors and editors submit either word/wordperfect documents or even hand written notes. Production staff turned it into valid SGML :) Hey, think how that would stimulate the economy if every blogger would hire a markup person!
Too right. The excuses for not producing well-formed content are rapidly disappearing, even for those of us without a markup person on hand.
Coincidentally (not!) I was looking at how to do WYSIWYG in-browser editing yesterday. I've not looked at making the material from IE well-formed, but I'm pretty sure a dollop of client-side Javascript would be enough.
'Spontaneous' integration of single-domain data (i.e. XHTML content) is pretty trivial. But for this to work cross-domain we need something more, but that's another story... ;-)