Sjoerd Visscher: We have a simple client-side clean-up
script that extracts well-formed XHTML from the WYSIWYG editor. It
even handles pasted HTML from Word rather well. I discussed it with
my collegues today, and we are willing to make that script
available as open source if people are interested.
Excellent!
I've been running IE's WYSIWYG output through HTML Tidy for about a month now, and it's been working quite well. I'd be curious to know what advantages Sjoerd's script has over Tidy.
That was a quick refresh lets hope, RSS2 will get an even quick refresh soon.
By the way I have been revitalising the IE4 Channel Definition File (CDF) format under windows XP this week (based on Yasser Shohouns ASPX files at learnxmlws.com)
I will give some more info later but here is a msxml demo conversion page with RSS/RDF conversion to CDF and (Sjoerd Visschers XSLT) conversion of RSS to RDF.
"I wonder if the Movable Type people have figured out how to create an [wysiwyg] editor that real people will use that produces XHTML. The most popular editor among Radio users on Windows produces perfectly horrible HTML, which we encode and put in the RSS feeds that all aggregators handle perfectly well. We can't change the editor because it's baked into the browser. Do you think users would understand if we told them they had to use a much worse editor and enter the tags themselves because that made more sense to Ben Trott?"
- ScriptingNews
I guess this (potentially) takes care of that, not that it was much of an obstacle to begin with.
This reminds me of an issue that would make a good use case to think through when considering the api.
A lot of blog servers do this thing where they accept tags and text that mean something special to the specific blog engine. One example is livejournal's "lj" tags. These have special meaning and are converted in the server on their way to consumption.
Another example is the comments for this blog. Like Word, it converts asterisks to "b" or "em" tags before consumption.
Then there's what wikis do with ThingsLikeThis. In each of these cases, what the editor submits is different than what is eventually disseminated.
This pattern is prevalent enough that it's something that should probably be supported; otherwise the API will have limited usefulness.
To support it, there should be a method in the API intended for editing that gets the source of the entry (not the to-be-consumed version).
Creating well-formed XHTML. Sjoerd Visscher: We have a simple client-side clean-up script that extracts well-formed XHTML from the WYSIWYG editor. It even handles pasted HTML from Word rather well. I discussed it with my collegues today, and we are...
At this point, I do not use any application to clean up my XHTML because of obvious concerns. Most of my sites are in table-less <div> layouts, and the applications are simply not smart enough to handle the cleaning.
I look forward to the release of this script, including the export of code written in Word to an editor.
Right now, whenever I have an article to publish, I must change the typeface to Courier New (less whitespace problems with this font for some reason), add in the HTML for publishing, and then paste it into Notepad to nix any of MS-Word's <sarcasm> wonderful auto-formatting </sarcasm off>.
Then, <sigh> I paste it into a WYSIWYG editor (usually DMX). What a hassle this is.
I even do a find and replace for all the auto-formatted quotes that end up with invalid XHTML markup.
So, in a nutshell, I'll be a buyer of your script. :)
As much as I like Sam [Ruby], I hate the name SAM, S(yndication A(ggregation) M(arkup). I would bet Sam [Ruby] doesn't care much for it either. It clouds the issue.