Tests for proper handing of XML:
Following are common errors often found in XML that SHOULD cause an aggregator to flag the feed as not well formed:
-
Microsoft "smart quotes" which often find their way into content without the proper indication of the use of the windows-1255 encoding. Possible solutions:
-
replace these characters with the ISO equivalents in the General Punctuation table on this page.
-
declare the encoding used at the top of the feed with
<?xml version="1.0" encoding="windows-1255"?>
-
Even those that clean content correctly content often miss escaping ampersands in URLs outside of the content (example: trackback, comments). E.g. the following URL:
http://www.example.com/script.cgi?id=123§=443&action=update
should be escaped into:http://www.example.com/script.cgi?id=123&sect=443&action=update
-
Unicode characters of the form U+xx3C often causes problems with XML parsers which do not understand Unicode. The reason: 0x3C is the ASCII representation for '<'. An example of such a character: ∼ which is ∼ or U+223C.
Discussion
[AsbjornUlsberg] [AnswerMe:] Could you please try to give some examples and solutions (either by writing it here, or by linking to some website that does it) for each of the cases mentioned above? It would be easier to understand the problems if they were examplified.