Unicode Enabled Trackbacks
I've changed my weblogging software to send trackbacks in utf-8, and to try to respect the charset, if specified, on trackbacks received.
This involved four changes.
Outbound, I changed autoping.py to encode the title and excerpt parameters:
arg['title'] = title.encode('utf-8') arg['excerpt'] = body.encode('utf-8')
And added the content-type header, thus:
request.add_header("Content-type", "application/x-www-form-urlencoded; charset=utf-8")
Inbound, I changed post.py to determine the charset:
charset=cgi.parse_header(fs.headers['content-type'])[1].get('charset','utf-8')
And then made use of this charset when parsing the data:
try: return unicode(value,charset) except: return value
I've also written a small test driver that can be used to verify that a server handles the character set correctly.
It certainly would be understandable for servers today to not respect the charset parameter, but I am curious to hear back if any outright fail to process the trackback at all if the charset parameter is present.
I also would welcome any trackbacks from server which uses a less common character set that happens to be listed in this table.