Joe Friend: For example we are encoding smart quotes incorrectly so I had to turn off that feature in Word, but the goal is to output just what is needed to make your blog post clean and readable (code and rendered HTML).
Cool! There’s hope yet. ;-)
On a somewhat related note, I’m investigating to see if there is a simple set of checks which could be made to enable style attributes to pass safely through feeds. Previous recommendations were to strip all style attributes.
My first pass at this came up with the following regular expression:
Pros: it is simple to implement — it doesn’t even require regular expressions. And despite its simplicitity, it seems like it would keep out the worst of the vermin (which seem to require parens).
Cons: one can still do some mischief with things like
position:absolute. To address that does require a bit deeper parsing, but not too bad. Looking at what exists in style attribute values on the web today, the majority is very simple. I don’t even see quotes in use. Anything more difficult to parse should be stripped.
The goal is to enable people who want to use Rich Text Editors like the ones that are found in recent versions of IE and Firefox. And, perhaps someday, one could even use a suitably housebroken version of Word. ;-)
If we can come up with a profile for safe style attribute usage that is relatively easy to parse, I can work to get this into the Feed Validator and the Universal Feed Parser.