Jeff Hodges: I was recruited after Bob Aman of FeedTools fame saw me hyping my translation of Mark Pilgrim’s FeedParser from Python to Ruby, and thought it was pretty good. The translation, of course, is called rFeedParser and it really is pretty good. I’ll have a post on that soon. First, I want to fix the silly options bugs that I was turned on to a little while ago.
Seeing if there is some synergy with the html5lib ruby port, or at least see if they can standardize on some common dependencies. But beyond that if there are things of common utility in rFeedParser, putting such code in html5lib may allow more people to befit from such.
Making the FeedParser tests less Python specific. Jeff cites one specific already: dates stored as 9-tuples. It would be trivial to change the tests to use a function like feeddate(...) which in Python is implemented as an identity function, but enables other languages to put in their own transformation.
Heh, don’t worry about not having seen it. It was just dumb luck Bob happened to be in the chat at the same time I was dropping information about it in #ruby on freenode.net. I rarely mentioned it anywhere while I got it somewhere near stable. Bob was only really paying attention because he and I had emails back and forth with him when he was still developing FeedTools.
You might be interested to know that I just wrote a way-too-big post detailing some aspects of rfeedparser. So, there’s that.
The feeddate() idea is great and I’ll have py2rtime renamed to that in a later release (while keeping py2rtime as an alias for a while).
The “\unn” and “\unnnn” format for non-ASCII characters in Python (along with u'' and u"") provide special challenges to someone trying to work with the tests and careful thought would have to go into “fixing” it. While the character-encodings gem rovides a u'' sort-of-work-a-like, it certainly doesn’t interpret the “\unn” or “\unnnn” as hex codes.
There are other Python specific bits in the tests I didn’t mention in that large post, such as 1 and 0 being true and false, len(), None instead of nil, the use of tuples instead of lists, triple quoted strings, and the differences in syntax between dicts and Hashes. Pretty much every Regexp in scrape_assertion_string in rfeedparsertest.rb is a possible “issue”.
The None/nil, and len() problems can be solved by simply writing up a spec saying “we expect an reference called None that is acted on like ‘nil’ in Ruby and ‘None’ in Python, yadda yadda”. The rest, though, will require more changes to the actual unit tests similar to the feeddate() idea. All this assumes Mark is up for it, of course.
I tried to get a hold of him a couple of months ago trying to figure exactly which license feedparser was under (or even if it had a name), but it looks like my email fell into a black hole. As a result, I don’t have a good way of reaching him. (Considering how often I see him here, it might be this very comment thread..)
Oh, and if you happen to know anyone who knows anything at all about iconv and how it expects encodings to be written, feel free to point them in my direction. I’m seriously considering writing up a “standard” iconv-encodings package so that rfp can actually work consistently across OS X, Linux, etc. but this is deep dark magic to me. I got as far as trying to follow the Ubuntu/Debian build of glibc to see where everything came together, with no luck. It looks like I might be tilting at windmills.
My bad. I meant to send you a link Sam, since I knew you’d be
interested, but I was busy getting stuff sorted for my trip to Africa,
and it slipped my mind somehow. Samahani!
Incidentally, it’s not so much that I stopped working on FeedTools as
it is that I started working on a different parser in C instead.
(Which is why I keep cheering anytime someone considers writing a tag
soup or html5lib port for C.) But basically everything code-wise is
on hold until I’m back State-side.
URI::Template 0.08_02 should be hitting CPAN shortly. This release conforms to the latest uri-template spec released this month. I’ve always been interested in portable/generalized test suites. Sam Ruby mentions that he’d like to see...
Nice post and I’m using it nice in Ubuntu. But my VPS is CentOS. Have you installed in a CentOS box? I don’t know how to install sudo apt-get install libxml-parser-ruby1.8 in CentOS. The package is not found.