It’s just data

Validator.Nu on GCJ

This turned out to be much easier than I thought.

sudo apt-get install subversion gcj
svn co build
python build/ checkout dldeps

The tool I chose to focus on itself isn’t very important, it simply converts HTML5 into XHTML5.  But for it to work at all essentially requires the whole of the htmlparser library to be functional, as well as its dependencies (namely chardet and icu4j).

Next task is to repackage the htmlparser as a library, and to reimplement the HTML2XML tool itself in C++.  Here’s a modest start.  The next step is to find header files for org.xml.sax and to generate headers for selected classes in htmlparser using gcjh.

SWIG can then use those same headers to generate bindings for a host of other languages.

Meanwhile, headers can be generated for the full DOM, individual language bindings can be tweaked by adding support for language features like iterators and available CSS Parsers.  Some of these features should be migrated back to Java where they can be made available everywhere.