It’s just data

Porting REXML to Ruby 1.9

Unicode changes:

Other language changes:

REXML changes:

Outputs of running bin/suite.rb:

["", "1.9.0", "2007-12-31"]
REXML version =
Loaded suite REXML
Finished in 12.893064488 seconds.

348 tests, 1252 assertions, 0 failures, 0 errors
["", "1.8.6", "2007-06-07"]
REXML version =
Loaded suite REXML
Finished in 34.733291 seconds.

348 tests, 1252 assertions, 0 failures, 0 errors

Ticket, patch, Update: revision

Ruby 1.9 Strings — Updated

My confusion from yesterday was due to a bug, which was promptly fixed — test case, fix.

Now that I understand what is intended, the situation is a lot clearer.  The net result is that any sequence of operations that produce a runtime exception in Ruby 1.9 would also produce a runtime exception in Python 3.0.  Some use cases that are entirely safe will not produce an exception in Ruby 1.9 when they would in Python 3.0.  Such an approach is entirely consistent with a dynamic language.


3 + 1 = 2

I’ve got portions of HTML5lib working on Ruby 1.9, enough to pass Mars's unit tests.  My initial reaction to Ruby 1.9’s support isn’t favorable.  I definitely like Python 3K's Unicode support better.  This feels closer to Python 2.5.  In fact, I think I prefer Ruby 1.8’s non-support for Unicode over Ruby 1.9’s “support”.

The problem is one that is all to familiar to Python programmers.  You can have a fully unit tested library and have somebody pass you a bad string, and you will fall over.


Two Steps Forward...

Another version of Ruby, a different set of REXML bugs.

Test case.



No Tweets

Russell Beattie: The stampede of unsubscribers has begun! The unbearable pain of Twittering is too much for them.

I have my own approach.



sudo apt-get install libidn11-dev
sudo gem install idn addressable




Rafe Colburn: Rogers Cadenhead looks at Dave Winer’s long bet with New York Times executive Martin Nisenholtz on whether blogs or the Times would reign supreme by 2007. The winner: none of the above. Wikipedia outranks them both.

Perhaps the requirement to cite sources trumps the requirement to provide credentials.

Yet Another Planet Refactoring

A little over a month ago, I outlined how I would like to see the feed parser reorganized.  I’ve now put a little meat on the bones, in the form of running code.  Not just for the feed parser, but also for Planet.  I also did it all in Ruby, so I named this little experiment Mars.  Warning: this version is 0.0.1.  It just barely runs end-to-end.  Feed it real data, and it will choke on some of it.  But it can now produce partial results.

All in all, I’m pleased with how compact this code is.  If anybody wants to join in on the fun, it is a bzr repository and there are plenty of test cases ready to be ported.


Standards that Matter are Standards that Ship

Fundamentally, Microsoft’s strategy is sound.  Ignore standards that you find inconvenient, and focus on producing and enabling the production of content people want.  While my humble site can’t compete with the likes of Jackass 2.5, I do have a few people who follow my site.  I’ve switched my front page to HTML5 despite the fact that this means that MSIE7 will therefore ignore virtually all the CSS styling rules that apply to the page.  The page validates modulo an acknowledged bug in the validator.


Eventual Consistency

Amazon SimpleDB [via Simon Willison].  Erlang.  Schemaless.  Cool.

Amazon seems to really get the Getting Started should be free aspect of the web, and is clearly targetting the “She has the idea on Wednesday and gets the script working next Monday, and one quarter later, either gives up on the idea or is incredibly rich. Both are good outcomes” developer market.

Meanwhie, Mark Nottingham of Yahoo! is proposing standards for caches which prefer to serve slightly stale content fast in lieu of providing late or broken results.

Update: Keith Gaughan: That ain’t REST. They screwed up the “REST” interface for it exactly the same way as they screwed up the one for SQS and FPS.

REXML and Mangled Text

Rick Blommers: ReXML seems to escape items very nicely when setting values.  But it doesn’t unescape the values with … )

A bare minimum amount of functionality that one would expect from an XML parsing library is the ability to round-trip data.

The two things One thing I have yet to find is where I can SVN checkout the latest code, and how to run the exiting set of tests.  I would like to submit new tests which expose the problems I have found so far, and patches to correct these issues.  Ideally in time for 3.1.8.


HTML5 As Viewed By IE8

Dean Hachamovitch: You will hear a lot more from us soon on this blog and in other places. In the meantime, please don’t mistake silence for inaction.

I’d love to see a screenshot of this page using MSIE8.  While supporting HTML5 would be nice, the fact that MSIE7 won’t allow CSS rules to apply to any of the new HTML5 elements will significantly inhibit adoption of this standard.

phpMyId 0.7

Since I last looked at phpMyId, it has progressed from version 0.3 to version 0.7.  A number of changes occurred.  Here’s how I dealt with them.



Steven Lees: Today we published the final v1 spec for Simple Sharing Extensions, under a new name, FeedSync. The new name is a little simpler than the old one (kind of ironic!) and it captures the intent pretty well.

James Snell has some good comments.

HTML5 Deployment Considerations

Lachlan Hunt: HTML 5 introduces and enhances a wide range of features including form controls, APIs, multimedia, structure, and semantics

In the interest of getting practical deployment experience with these specifications, I plan to explore exploiting these new tags on both my weblog and my planet.  Two issues immediately come to mind, and I’m sure I’ll encounter more.


Resource Oriented Registry

Paul Fremantle: fundamentally the approach we have taken is to build a registry/repository based on REST concepts. And as we looked at the REST space, we kept noticing how close the Atom Publishing Protocol (APP) is to our needs, so we’ve made that the public remote API to access the repository. Of course, if you are just browsing the registry, you only need a browser - APP is mainly there to support updating resources.  Of course, using Atom and APP gives some really nice benefits too - like being able to subscribe a feed of new resources that meet your search criteria.

Little Details

Anil Dash: We announced Beacon support on both LiveJournal and TypePad as initial launch partners. But we worked really hard with the Facebook team on one really important detail — making sure our implementations are completely opt-in.  Not to put too fine a point on it, but this was kind of a no-brainer.

Kudos to 6A on getting the more important “unfixable” detail correct.


DIS29500 Comments

Alan Bell: To get the data into the site the documents were opened in Writer, then copied to Calc, tidied up manually (merged cells are evil) then imported to Lotus Notes 8 via the built in Symphony spreadsheet (I had been working on some code to import Calc into Notes so this was easy) exported to XML then imported to WordPress. The import file was just over 2Mb in size. ... The main difficulty in importing the data was smartquotes and em dashes that Word had autocorrected.

Ad Hoc, Situated Software at it’s finest.  There even are feeds for comments on the comments.

Rob Weir has an analysis of the resolutions proposed to date.

Business and Open-Source

David Shields: having worked for almost five years now as a member of the team that manages IBM’s open-source strategy and its execution, I can claim some expertise in this area.  There are no vast secrets here, no grand plan. Here is the strategy as I understand it, and as I have worked to implement it.

The Right Ones in the Right Order

Leonard Richardson: If you liked RESTful Web Services but thought the words were in the wrong order, you’ll like Services Web RESTful.

Blogger in Draft Support for OpenID

David Recordon: Awesome, see their post! OpenID commenting as a beta feature on Blogger, way to go guys! I just tried it out and as you’d expect, it works great. Also really nice to see another site first accepting OpenID instead of providing it.


1984 + 4K

Damien Katz: Yay! CouchDB has an official IANA port number … This must be how Steve Martin felt in The Jerk when he got his name in the phone book

HTML5 needs a CarterPhone

Brendan Eich: Standards often are made by insiders, established players, vendors with something to sell and so something to lose. Web standards bodies organized as pay-to-play consortia thus leave out developers and users, although vendors of course claim to represent everyone fully and fairly.  I’ve worked within such bodies and continue to try to make progress in them, but I’ve come to the conclusion that open standards need radically open standardization processes.

The W3C HTML Working Group needs a CarterPhone.  Clearly, Brendan is talking about ES4, but the issues he brings up are general.


Bash Here

I’m posting this in case I’m not the last person to realize this.  While I’ve used the Unix sh longer than DHH has been alive, I either never realized or have long since forgotten that it supports here documents.  Example:

ruby <<EOD | sort | uniq -c | sort -n
  Dir['*'].each do |name|
     puts name.split('.',0)[1] || '<null>'

Full Text Search — SQLite

SQLite is part of android and gears.  Despite being under development for over a year, and part of an actively developed code base, and included in gears, full text search isn’t integrated into the build system just yet.  Because of its non-standard “virtual” tables, it can’t be used directly with encasulation layers like Python DB-API, but can be used directly using minimal wrappers like APSW.  Such a build also requires some manual steps, but the end result is a single shared library that contains SQLite, FTS, and APSW.

patch | build steps | demo

Meme Tracker in IronPython

Dare Obasanjo: My weekend project was to read Dive Into Python and learn enough Python to be able to port Sam Ruby’s meme tracker (source code) from CPython to Iron Python. Sam’s meme tracker, shows the most popular links from the past week from the blogs in his RSS subscriptions.

More recent code can be found here.  Fetches titles from HTML, handles etags, matches both www. and non-www. versions of a URI.  Handles people who point to things multiple times.  Allows you to group people who tend to all “vote” in bulk.  Note: I consider the alternate link to be a vote too, which gives a small bump to people who post original content vs links.

I’d also recommend that you invest some time into converting from a simple regular expression to a real HTML parser.  You’ll need it anyway for titles.

Deconstructing Facebook Beacon

Jay Goldman: On November 6th, 2007, Facebook launched a series of new tools to help advertisers target the 54 million people now regularly using their site. They’re still throwing around a 3% weekly growth rate and have a target of 60 million active users by the end of the year, so it’s not hard to picture the day in the not-so-distant future when hospitals Facebook babies before handing them over and the little bundle of joy comes with a neural implant that pokes their parental units when the diaper is full. [via Simon Willison]

DEX File Format

Michael Pavone: I’ve started another little reverse engineering project. Google hasn’t released any documentation on their new VM so I decided to get some the hard way. Well, hard is relative here. A decompiled Java class is a bit easier to read than a disassembled 68K binary. Anyway, I’ve managed to write some documentation on the dex file format used by the VM. I hope to have some documentation on the actual instruction set used by the VM in a few days

RFC: FeedBurner Namespace Documentation

Does anybody know when I can find any documentation on the​feedburner/​ext/​1.0 namespace?

While the URI appears to be owned by a domain squatter, luckily the domain is owned by the company that became feedburner.

I’d like to make feedburner a namespace known to the feed validator without marking any of the elements as undefined.


From time to time, the subject of whether to use whitelists or blacklists come up.  As an example, originally when Mark Pilgrim wrote How To Consume RSS Safely (way back in 2003!), he described a list of elements that needed to be blacklisted, and mentioned — almost in passing — that whitelisting may be a reasonable alternative.  Over time, Mark came to realize that there really isn’t any contest: A Whitelist is the best way to validate input.  It basically comes down to a sense of what kind of errors you are willing to tolerate.


Astral-Plane Characters in Json

In Characters vs. Bytes, Tim Bray mentions the Gothic letter faihu.  Whether such a character will display properly in your browser depends on what operating system you use and what fonts you have installed.  Whether or not you can handle such characters programmaticly, however, depends on what programming language you use.



Kevin Lawver: This was a surreal experience... Dan led a sing-along with a bunch of W3C folks, including Tim Berners-Lee, the inventor of the web, and lots and lots of folks who invented important pieces of it (like CSS, HTML, XHTML, etc). Fun, fun, fun.

Not quite as surreal as finding out that you were one of the subjects.

Dark Side of Postel’s “law”

Simon Fell’s weblog contains the following line:

<link rel="alternate" type="application/atom+xml" title="Simon Fell > Its just code" href="">,, and version 4.1 will fail to pick it up.



com.​google.​android.​xmppService.​IXmppService.​createXmppSession: Creates a XMPP session to the server, using username and password for the login. createXmppSession starts a new XMPP session if there isn’t one for the username, connects to and logs into the GTalk server. If there is already a running XMPP session for the username, then createsXmppSession just returns the running session.

Why can’t username contain an @?

Out of the Frying Pan

Don Box: I have to say that the authentication story blows chunks.  Having to hand-roll yet another “negotiate session key/sign URL” library for J. Random Facebook/Flickr/GData clone doesn’t scale.  Personally, my dream stack would be ubiquitous WS-Security/WS-Trust over HTTP GET and POST and tossing out WSDL

I’d suggest that the root problem here has nothing to to with HTTP or SOAP, but rather that the owners and operators of properties such as Facebook, Flickr, and GData have vested interests that need to be considered.


Making Rights Declarations Easier To Find

Planet CreativeCommons is based on Venus.  Unsurprisingly, given their mission, they visibly highlight the license under which each of the entries are published.  The Universal Feed Parser and Venus take great care to ensure that license and rights information is present in the Atom feeds that are produced, but this is the first time that I’m aware of this data being exposed in the HTML page itself.



Mark Pilgrim: What follows are instructions for building and installing MySQL 5 on Ubuntu. These instructions should work perfectly on both Feisty (7.04) and Gutsy (7.10).


From what I hear, people have had trouble with Leopard and Vista.  By contrast, and like others, I found that the default font for Firefox wasn’t to my liking on one of the three machines I installed Gutsy on.

Caja: Capability Javascript

Ben Laurie: I’ve been running a team at Google for a while now, implementing capabilities in Javascript. Fans of this blog will remember that long ago I did a thing called CaPerl. The idea in CaPerl was to compile a slightly modified version of Perl into Perl, enforcing capability security in the process.

Hopefully like the work of Douglas Crockford [via Patrick Logan], the parser itself is (or will be) written in Simplified JavaScript.

This could be a useful, as an option, for CouchDB.  I don’t yet see the value for allowing even a sanitized subset of scripts through the UFP to Venus.


Steven Lees: We will remove the “unpublished” element from the spec, i.e. we will remove sections 1.2.9, 2.6 and all of section 4. We decided that the concept of unpublished belongs at the application level, rather than the base SSE specification. We will include information in the SSE implementer’s guide that describes how applications can implement “unpublished” behavior on top of SSE.

It seems to me that SSE + RFC 5005 complement each other.  RFC 5005 can help you identify which entries have changed, and SSE can help you identify what changes were made to those entries.

If It Hurts When You Do That...

Sanjiva Weerawarana: Are you smart enough to build a RESTful application? … Programming XML in Java still sucks

Patrick Mueller: Moved my content over with a simple matter of programming.

Pluggable Feed Format

Sometime yesterday Jay Young's default feed switched back to RSS 2.0.  The world didn’t end, and not everybody cares about such minutia, but Jay clearly does.  Jay may be a minority, but this enhancement would enable Jay and others like him to simply drop in a plugin such as this one, activate it, and be on their way.

The patch does not change the default feed format from RSS 2.0.  Perhaps that could be considered for a release like WordPress 4.0, and a plugin could be provided at that time to enable users to select the venerable RSS 2.0 feed format, but in any case such a change would require a separate ticket, as this patch does not do that.

Matryoshka Dolls

Tim Berners-Lee: HTML is a big community, but there are others communities. Smaller communities are more in need of uri-extensibility than bigger ones.

bzr-feed updated to support bzr 0.90.0

My branch is here.  If all goes as it should, this change should be reflected shortly in the global bzr-feed feed.


Poisoned Cache

For the past month, eight feeds hosted by were not updated on, a victim of a poisoned httplib2 cache.  A victim of a permanent redirect.  The evidence can be found here.  Eventually, such feeds would have been viewed as inactive for 90 days, but luckily in this case I caught the problem earlier.


ECMAScript round-up

Round-up of ES4 discussions for the past few days: fragmenting, supersetting, civility, and secrecy.

Did I miss anything?


MonkeyPatch for Ruby 1.8.6

There is a bug in Ruby 1.8.6 that affects documents with a default namespace (even a vestigial one, like those sported by WordPress weblogs) which prevents non-namespace qualified attribute names from working in XPath expressions.  The following monkey-patch fixes this:


Apache2, https, and Gutsy Gibbon

Ideally, reconfiguring your Apache installation under Ubuntu to support TLS/SSL (a.k.a. https) would be as easy as:

sudo a2enmod ssl
sudo apache2ctl restart

Unfortunately, there are additional steps involved.


Nebulous Recalcitrance

Brendan Eich: The small-is-beautiful generalization alternates with don’t-break-the-web, again without specifics in reply to specific demonstrations of compatibility.

It is interesting how the don’t-break-the-web meme means different things to different organizations: Mozilla, Microsoft.

WordPress, SSL/TLS, and AtomPub

For all the reasons that Joseph Scott described, you really want to access WordPress AtomPub service documents using SSL/TLS.  Unfortunately, if you look closely at the current APE report, you will both see https and authentication warning.

Ticket 5298 and this patch addresses this problem.



Dalibor Topic: I think you simply landed in a corner their business plan didn’t foresee

This is an important topic.

RFC: Ideal CouchDB DB Dump Format

There’s a discussion going on in CouchDB as to what an ideal dump format for a CouchDB database would look like.  A CouchDB database is a collection of URI’s, and while the content associated with any given URI is often JSON, CouchDB supports the notion of an attachment that could be pretty much anything.

So... how do you dump a database?


CouchDB Round-up

If CouchDB could efficiently compute an ETag† for startkey/endkey type queries using only the index, this could be a big win.  Most shared-nothing applications would simply become a dispatcher, a few views, and a few templates.  The most complicating thing your application need worry about would be when you need to assemble a page using input from multiple views.



Anne van Kesteren: One of my side projects is XML5. Earlier this year I suggested the idea as XML 2.0, but in line with recent “jokes” about HTTP5, SVG5, and CSS5, XML5 makes perfect sense. The idea of XML5 is to provide a revision of XML 1.0, XML 1.1, Namespaces in XML 1.0, Namespaces in XML 1.1, and RFC 3023, that is backwards compatible and introduces HTML-like, although much more sane, error recovery.

Question: should XHTML5 be based on XML5 or XML1?

Warning: brainstorming ahead.  Don’t groan.


Logo Usage Guidelines

Ian B. Jacobs: The cube and cube+'Semantic Web' can be distributed freely. They can be used for derivative works (including used with other imagery and modifications to the cube colors) without permission as long as: The cube shape is not changed; There is attribution of W3C (following some guidelines that we still need to draft).


Happy Birthday, Feed Validator

The Feed Validator has been giving advice for five years as of today.  From a modest beginning of 300 test cases, there now are over two thousand.

My favorite post on this topic during these past five years is Common Feed Errors.  Time to revisit.



Time to buy a new mattress
Read Tim’s sage advice
It arrives on Hallow’s Eve

Liferea - Check it out!

Based on a tip from James Snell, one of the first things I did on Gutsy was

apt-get install liferea

I then subscribed to my own feed, and clicked on my SVG in HTML Momemtum Building post.  The SVG image on that post displayed correctly, and the comments were all fetched in the background.


Gutsy on Dual Boot

In round numbers, it took me an hour to download Ubuntu 7.10 via BitTorrent.  About 15 minutes to burn a CD.  Another 15 minutes to install.  I did this by putting a second 40 gig hard drive into a four year old Win XP machine and installing Gutsy there.

I know some people like Virtual Machines for these kind of things.  With machines as cheap as they are these days, I kinda prefer the real thing.


SVG in HTML Momentum Building

Lots of interesting discussion about SVG in browsers.  Momentum is building towards supporting SVG in HTML5, and that makes me happy.  It is clear that whatever form it takes won’t satisfy everybody.  I’d still prefer that HTML5 support distributed extensibility.


It Pays To Advertise

Joe Cheng: Configuring an AtomPub blog needs to be equally easy. For some reason, people in the AtomPub community don’t seem to like RSD (only Six Apart puts Atom endpoints in RSD). We need another autodiscovery mechanism.

Hmmm.  When I looked at RSD nearly five years ago, it didn’t seem so bad.  In any case, here’s a ticket and a patch to get WordPress to support autodiscovery of AtomPub endpoints.

HTML5-style "Google Suggest"

Anne van Kesteren: Dev Opera just published an article I wrote a few weeks back on request from our new editor, Chris Mills: An HTML5-style "Google Suggest". Thanks to Maciej, Simon, and Johannes for contributing to and implementing the idea of using datalist to emulate Google Suggest.

May I suggest the following patch?


Wordpress Vigilance and Plans

First, behold the benefits of automated testing: TRAC 5180.  :-)

Goals I’d like to set for myself for the next release of Wordpress are twofold: get the APE messages to 0 errors and 0 warnings; and to cleanup the code so that Atom entries are produced in exactly one place and consumed in exactly one place.  (Pete Lacey has indicated that he shares the latter goal and has some additional goals).


Mime Haters Anonymous

James Clark: Overall I think we can do much better than S/MIME by designing something specifically for HTTP.

James' third reason looks like a killer one to me.  One minor caution: There are scenarios where it is convenient for servers to be able to stream responses, and most signing algorithms are designed to accommodate such requirements.  This would tend to indicate that a design that tacks signatures on the end would be preferred.  YMMV.

Use The Force, Luke!

Yaron Goland: The Emperor standards quietly completely hidden in his black robes while Darth Sudsy, covered in his black carapace, leads in Luke Restafarian in a battle torn uniform, long dreadlocks dragging and heavy fatigue evident in his face and stance. Guards stand by the door at stiff attention.

The hero in this dystopian tale is the spec (especially §3.2.1, §3.3, §; and it’s trusty side kick, the Feed Validator.  Note the messages the latter produces on this feed, and think about how much more useful the feed would be to RSS Bandit if these warnings were heeded.


html5lib 0.10

James Graham: html5lib 0.10 is now available for your HTML-parsing pleasure.  html5lib is an implementation of the HTML 5 parsing algorithm, available in both Python and Ruby flavours. The HTML 5 algorithm is based on reverse engineering the behaviour of popular web browsers and so is compatible with the myriad of broken HTML encountered on the web.


Joe Cheng: The Windows Live Writer team is still on track to deliver AtomPub support in the next version, which I am looking forward to immensely. It’s definitely an exciting time to be in the blogging tools space!

The title that the mememe logic in Planet Venus extracts from this text/plain representation is amusing.

RSS Profile Up For Vote

Rogers Cadenhead: We propose that the board endorse and publish the RSS Profile, making it available under a Creative Commons Attribution-ShareAlike 2.0 license so that others can build upon and extend it with their own recommendations.

I’ve taken a first pass at what changes would be required for this profile to be supported by the Feed Validator.


Competent Language Designers

Rick Jelliffe: if you make up or maintain a public text format, and you don’t provide a mechanism for clearly stating the encoding, then, on the face of it, you are incompetent. If you make up or maintain a public text format, it is not someone else’s job to figure out the messy encoding details, it is your job.

I guess it would follow that Python and Perl are competent programming languages.

Secure Business Data Interchange Using HTTP

James Clark: My conclusion is that there’s a real need for a cache-friendly way to sign HTTP responses. (Being able to sign HTTP requests would also be useful, but that solves a different problem.)

Perhaps RFC 4130 would be a good starting point.


Pete Lacey: A better name for SOA, then, might be network-oriented computing (NOC).  This encompasses both WS-* and REST (and most everything else from the socket level up).  We can, if we want, make SOA and resource-oriented architecture (ROA) a subset of NOC.

Kinda.  NOC encompasses REST?, That I’ll buy.  But to say that NOC emcompases protocols which, by design, attempt to abstract away the network?  Well, ... not so much.

Overlapping Circles

Robert Scoble: I read 800 feeds and TechMeme doesn’t miss much

When was the last time TechMeme included a post by Werner Vogels, Steve Vinoski, or myself?  Those are the top three topics of discussion amongst my circle of friends. Clearly the third in this list is a consequence of the fact that this is my circle of friends, but I would argue that so too are the first two.

Perhaps TechMeme doesn’t miss much to Robert Scoble because TechMeme closely tracks to Robert Scoble’s circle of friends — not that there is anything wrong with that.  I seem to have a surprising (though clearly smaller) number of people who track my list too.  Apparently, including Robert Scoble.

Mining Content For Value

James Snell: I need to get a really solid answer to a really simple question: do I parse out the (X)HTML into a hash or leave it as a String. Both are useful in different contexts although the String form is obviously more generic and results in a less complicated JSON serialization. Answer that question and I think this serialization will fall into the “Not terrible” category.

I’m a strong believer in Darwin in these matters.  I believe that the most interesting content is in the, well, content element.  If you guys want to store content as a blob, why not go all the way and store the whole Atom element as a string?  Meanwhile, I’ll continue to pursue data structures that make access to this data drop dead easy.


Stripping Styles

Nick Bradbury: Most RSS aggregator developers (myself included) tackled this problem by completely removing all styles from feed content. Since then, I’ve experimented with stripping only “unsafe” CSS from feeds, and despite Adrian’s claim that doing so requires a lot of work, it’s actually quite easy to do

First, here’s a use case.  Looks much better with style, doesn’t it?

Second, it would be helpful if aggregator authors could share their ideas (or at least point to them) from one place.  I suggest here.

Key + Data

Anant Jhingran The freebase folks do not reveal much about their scaling.  The scaleout models for google and wikipedia (where partitioning/replication strategies work quite well) do not quite work in such a networked graph (after all, a query on person="anant" with one or two pointer chases would end up pinging a few nodes under any partition model), so the question is, if we have billions of pieces of information in a dense graph, how does the query load on the system scale?

I, too, have found precious little about the internals of freebase, and likewise I’m interested in the question at the end of the above paragraph.  But this post is about the stuff in the middle.


Preventing Duplicate Comments

I’ve had a persistent problem with duplicate posts from people who use Gecko based browsers.  The problem grew worse when I finally relented and had the comment form redirect to the comment you just posted instead of to the front page.


Cost of Vista

Early reports lead me to believe that Dell was offering a minor discount for people who chose Ubuntu over Vista.  I took a look today, and came to the conclusion that the discount is now three times what was earlier reported.  I got the data from here: Ubuntu, Vista.  Here’s a summary:


Techmeme Leaderboard River

Gabe Rivera: Since the Techmeme Leaderboard reflects the reality that both blog-driven sites and traditional sites define today’s news, use it to discover new sources, recommend sites to others, or illustrate where tech news breaks. I hope you find it useful, and if you have a stake in tech reporting, not too infuriating.

This looks more current and better maintained than my previous test bed, so I’m now using it for my Venus test site.  I gather that the master Techmeme list of sources is hand picked by Gabe to represent a given slice of the tech world, and the leaderboard produced represents a snapshot of what those sources tend to be following.  And a mememe list produced across those feeds as sources would therefore be a list of sources to those sources.  It appears to be a rather tangled hierarchy of sources, to say the least.

Registration Update

To all of you who have registered for comments, and gave a XMPP id, you should be receiving new ‘buddy’ requests from an ID I just obtained from  For those who have already sold your souls to the big G, you will be unaffected, but for everybody else the messages will route around the Google complex.


Etag vs Encoding

Learn something new every day.

Despite being the way Apache httpd currently behaves, and how I would read the HTTP spec apparently it is the behavior of IIS 7 and the apparent consensus of the Apache httpd developers (including one of the authors of the spec) that a different ETag value should be used if the response is compressed over when it is not.

Up vs Out

Werner Vogels: Only focusing on 50X just gives you faster Elephants, not the revolutionary new breeds of animals that can serve us better.

Email addresses your OpenID via DNS

Byrne Reese: this is exactly was OpenID needs: Open iDNS, or “Open Id Domain Name System.” This service would work just like DNS, and would map email addresses to an OpenID provider designated by the owner.

Looking at Jabber recently caused me to see this prior discussion in a new light.  With Google Talk, one’s email-style address is one’s identity.  I just created a second address, one without a account behind it.  And Google Talk and GMail seem to be doing just fine.


Comment Notification via XMPP

Elias Torres: Sam could you add Jabber notifications to the email in the comment form when a new comment has been posted to entry? pleeeeeeeeeeaase.

OK, I’ve added a Register entry to the Nav Bar on the right.  It will allow you to specify your preferences, but I haven’t yet connected it to XMPP.


Rails Application Deployed by... IBM

CNN: The NMAAHC is the first museum website to partake fully of the Web 2.0 social computing revolution. The site is based on cutting-edge, open source programming frameworks such as Ruby on Rails for collaborative website development. It employs concepts such as tags, or keywords, created by the users to help organize the content. As a result, the Museum on the Web is an example of the bottoms-up web, meaning it’s both a product of a site visitor’s participation, and an enabler of creating a community for them. The site runs on IBM System X web and database servers.

Emphasis added.

Wordpress Atom Futures

Matthew Mullenweg: I’m thrilled to announce that Version 2.3 “Dexter” of WordPress is now ready for the world ... if you’re a developer you’ll be interested in: 1. Full and complete Atom 1.0 support, including the publishing protocol.

It certainly is a dramatic improvement.  And it was fun to be a part of the process.  But full and complete?


JSON for Map/Reduce

James Snell: Abdera has always included the ability to serialize Atom entries to JSON. The mapping, however, was not all that ideal. So I rewrote it. The new serialization is VERY verbose but covers extensions, provides better handling of XHTML content, etc. I ran my initial try by Sam Ruby who offered some suggested refinements and I made some changes. The new output is demonstrated here (a json serialization of Sam Ruby’s blog feed). The formatting is very rough, which I’ll be working to fix up, but you should be able to get the basic idea.

Based on the comments, Patrick and Elias do not seem amused.  Guys, I’ve got a use case in mind, and I wonder if you wouldn’t mind helping me?


Tests I’d Like CouchDB to Pass

Basura, other than being a piece of trash, is starting to get functional.  While it doesn’t yet pass all the CouchDB tests, it does pass some tests that I’d like to see CouchDB pass.  These tests are the subject of this post.


JSON Interop

Python’s simplejson, in an apparent attempt to avoid Unicode issues, defaults to encoding all non-ASCII characters using JSON’s \uXXXX syntax.


UUID to Last-Modified

import uuid, rfc822
print rfc822.formatdate(uuid.uuid1().time/10000000-12219292800)

Python and DB2

Antonio Cangiano: We now have a working Python driver for DB2 which is currently undergoing internal testing. The driver is similar to the Ruby and PHP ones, which means that you get an advanced and very easy to use API. It also means that if you are confident with the Ruby driver, you will be able to use the Python one in no time.

Oh, and Mac looks like it is coming too; but I don’t do Mac.

Securing WordPress

Joseph Scott: Hopefully everyone takes away two things from this. One, you can’t depend on HTTP basic authentication working. Two, if you aren’t using SSL/TLS then your traffic isn’t secure.

Joel's Strategy

Joel Spolsky: What’s going to happen? The winners are going to do what worked at Bell Labs in 1978: build a programming language, like C, that’s portable and efficient. It should compile down to “native” code (native code being JavaScript and DOMs) with different backends for different target platforms, where the compiler writers obsess about performance so you don’t have to. It’ll have all the same performance as native JavaScript with full access to the DOM in a consistent fashion, and it’ll compile down to IE native and Firefox native portably and automatically. And, yes, it’ll go into your CSS and muck around with it in some frightening but provably-correct way so you never have to think about CSS incompatibilities ever again. Ever. Oh joyous day that will be.


Introducing Basura

If Joe Gregorio can name his framework Robaccia, I certainly can name my database Basura.

Whereas Robaccia builds upon KidGenshi, SQLAlchemy, Selector, and WSGI; Basura builds upon BSDDB, JSON, and WSGI.



David Ascher: As Mitchell Baker just blogged, and as a press release from Mozilla will announce shortly, I have taken on an exciting new role within the Mozilla world, leading a new organization focused on email and internet communications.  Wow indeed.

Media Types for CouchDB views?

Johan Sørensen: now you can query your CouchDb views in Ruby instead of Javascript

Jan Lehnardt: As I mentioned in the Post Scriptum of an earlier post, JavaScript is not the only language that you can create CouchDB views with. You can now use PHP, too.

Question: instead of Couch.ini specifying the one and only language that views can be written in for this server, could views instead have a media type?


Sebastian Gostchall: [dd-wrt] RC3 out now — The most interesting news might be that the WRT54G v8 and the WRT54GS v7 is now fully supported

This just days after I bought a v7 WRT54GS.  Some may remember dd-wrt as the software that turns your $60 router into a $600 router.

Apparently, this GPL software has a Field Of Use restrictions.  Sigh.

One More Step Forward?

Tim Bray: I’m going to have to go back and patch up the code so it doesn’t emit any of those nasty colons and relative URI references that apparently hurt implementors’ fragile feelings.

As Tim continues to update his post with more and more aggregators that already do support these features, I’m gaining hope that some day I can retire the following Feed Vaidator message: Avoid Namespace Prefix.


ASCII, ISO-8859-1, UCS, and Erlang

Tony Garnock-Jones: Erlang represents strings as lists of (ASCII, or possibly iso8859-1) codepoints. In this regard, it’s weakly typed - there’s no hard distinction between a string, “ABC”, and a list of small integers, [65,66,67]

It is important to realize that Erlang was invented (in 1987) before utf-8 was (in 1992).


Planet Pruning

Phil Wilson: Since I am too lazy to manage my own subscriptions, I was subscribed to Planet Intertwingly. At 269 feeds though, the signal/noise ratio has taken a bad hit (what do you mean Sam doesn’t tailor his blogroll for me personally?) and I’m going to have to actually import the OPML and weed out stuff I’m not interested in. How annoying.

The issue isn’t the number of feeds, but the number of entries.  And some of the people I subscribe to talk a lot, so it is time to prune.  To help with this task, I wrote a little script.


Implementing an Internet Curfew

Purchase and install a Linksys WRT54GS.

Compare your results to the table below.


Augmenting ODF with OOXML elements

Rick Jelliffe One possibility for the co-existence that hadn’t grabbed my attention until today has probably been obvious to everyone else: when converting from OOXML to ODF just embed OOXML-namespaced elements inside the ODF where there is no direct equivalent.  This allows good round-tripping, doesn’t require ODF to be extended with legacy Office-isms, allows developers who want to support more than the ODF base to do so, gives better fidelity for Office users ... ODF already allows foreign namespace elements. I guess what ODF would need to support this well would be a mechanism to say “This kind of foreign element should be stripped out when its context changes, but round-tripped otherwise.”

You know, that last idea could be handy for AtomPub too...

Dare Takes a Look at CouchDB

Dare Obasanjo: Recently I took a look at CouchDB because I saw it favorably mentioned by Sam Ruby and when Sam says some technology is interesting, he’s always right

Dare’s review of CouchDB is worth a read.  (Update: so are Assaf Arkin's and Damien Katz's responses)  He gets more things right than wrong.  And he doesn’t get things wrong so much as he has a tendency to make unqualified statements that need to be qualified.


Thwarting Entropy

Tim Bray : Check out Sanjiva Weerawarana’s PHP Web services: “a PHP script can integrate with any system over Web services at full fidelity, including security and reliability”. I will restrain my urge to editorialize, but, hey Rails Envy guys: low-hanging-fruit alert.

In general, the Rails guys are heading in the other direction.

SVG on IE via Silverlight Revisited

Toine de Greef: This allowed some interesting techniques like: - SVG on Internet Explorer, without the ASV (Adobe SVG Viewer) plugin required

Cool.  SVG to Silverlight via XSLT.  But, embedding in HTML using comments?  I think I can improve upon that.


Validating Feed History

Mark Nottingham: Feed Paging and Archiving (nee Feed History) has finally made it to a standards-track RFC.

First thoughts on test cases for adding proper support to this to the Feed Validator.


Ascetic Database Architectures

Anant Jhingran: new architectures are successful when they attack a different problem.  Providing an equivalent bit replacement (arguably at a lower cost, as Mike would say) is hardly the dominating argument that would cause people to switch.  Quite simply, as my colleague Curt Cotner points out, database engine technology is a drop in the bucket in terms of the investment it takes to be a full player in database these days. You need APIs/drivers for all the application environments (JDBC,ODBC, OLE, ADO, .NET, Ruby, PHP, Perl, etc.).  You need integration with the application servers (J2EE, persistence layers, XA protocols).  You need system integration (workload management, etc.).  You need tooling to save people cost on the admin side.  And the list goes on.  Ultimately, these AD/Admin issues consume 70% of the IT budget...

While I agree with Curt and yourself that adding and evangelizing a new API is generally harder than the implementation itself; the solution may very well be one that doesn’t do that.


Open Source Lessons

Matt Asay: I put together a list of ten principles that I’ve gleaned from my open source experience, which I believe can be applied to just about any business. [via Simon Phipps]

I’ve highlighted my favorite three


Atom to JSON with Erlang

atom2json.erl converts a directory of Atom files to a directory of JSON files.  As with most real-life problems, this one has multiple layers.  Add in requirements like coalescing consecutive XML text nodes, and the desire to spawn a separate thread per conversion, and the task seems like a fairly daunting one.  Yet the resulting Erlang program is remarkably compact, clean, and simple.


Building CouchDB

I really want to play with CouchDB.

Does anybody know a mailing list or IRC channel where the CouchDB developers hang out?


Next WP 2.3 Alpha and Beyond

Pete Lacey: Currently, we expose posts and uploads (media entries).  Once 2.3 is released I hope to add support in 2.4 for WordPress pages and comments among other things.

With this fix, the code now works on PHP4 (evidence).  In 2.4, I’d like to see WordPress support foreign markup, presumably as a sort of attachment.  I also think the code deserves a good bit of refactoring, as there are two places where Atom is parsed (import/blogger.php and wp-app.php) and two places where Atom entries are created (wp-app.php and feed-atom.php).



Mike Champion: the recent book “RESTful Web Services” by Leonard Richardson and Sam Ruby is chock full of pragmatic REST goodness. Just wanted to set the record straight.

You *know* I couldn’t resist linking to that.  :-)

Dealing With Dates

Simon Willison: Django vs feedparser on dates. Some useful tips in the comments. I find Python’s timezone stuff endlessly frustrating: I know it can do what I want, but it always takes me a ridiculously long time to figure out the necessary incantations.

My recommendation is to convert to UTC as early as possible, stay in UTC as long as possible, and convert to local time as late as possible — preferably in JavaScript on the client.

>>> import datetime,calendar
>>> feedparser_timestamp=(2004, 11, 19, 5, 13, 31, 4, 324, 0)
>>> datetime.datetime.utcfromtimestamp(calendar.timegm(feedparser_timestamp))
datetime.datetime(2004, 11, 19, 5, 13, 31)


As a learning exercise, I tried converting the Universal Feed Parser to Python 3.0.  I picked it because it is a relatively self contained code base that I am familiar with, one that is actively in use, and one that has seen the wear and tear of dealing with compatibility (and the need to monkeypatch the occasional bug) of a number of Python releases.


Python 3.0a1

Guido van Rossum: The first Python 3000 release is out — Python 3.0a1. Be the first one on your block to download it!

$ python3.0
Python 3.0a1 (py3k, Aug 31 2007, 21:24:31) 
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print(len('Iñtërnâtiônàlizætiøn'))



Chris Wanstrath: Erlang’s Mnesia database is something like what I want: you write your queries in plain Erlang and they are translated into Mnesia-queries by walking the parse tree.  Nice trick, but listen up: Ruby has a parse tree, too, and we can get at it pretty easily thanks to ParseTree. So, we do.  Introducing Ambition. [via Stefan Tilkov]

That does look sweet!

WordPress 2.3 AtomPub

Thanks to the hard work of Elias Torres and Pete Lacey WordPress 2.3 beta 1 has significantly upgraded support for the Atom Publishing Protocol.

The apptestclient and APE may also be used to verify your installation.


IPC with Erlang

Previous status: I’ve shown Erlang code that sends a Jabber message.  I’ve shown Erlang code that parses a planet Venus generated Mememes feed.  Now, lets consider the problem of sending notifications for new items.

Typically, solutions for problems like these require some level of persistence: a database, serialized results, or the like.  But each of these options require additional maintenance commitments, and I’m looking for something with zero footprint and “just works”.


Parsing Atom with Erlang

A simple program for parsing memes.atom.  Clearly dumping XHTML fragments to stdout isn’t ideal (perhaps XHTML-IM?), and you wouldn’t want to dump every meme on every run, but those are problems for another day.


Greg Stein Mugged

Kevin Burton: Last Friday night Greg Stein was mugged and seriously injured outside of his home in Mountain View, CA.

Greg certainly has had his share of bad luck lately.  Luckily, however, he has some amazing friends who are rallying together to set things right.  If you feel so inclined, please donate to the cause.  Any overflow will go to Greg’s favorite foundation.

XHTML-IM over TLS via Erlang

jabberdemo.erl constructs and sends an XHTML-IM message over TLS using xmerl and Erlang.  Tested with gaim and ejabberd.

Longer term, this needs to be split up into a module which spawns threads and exports separate interfaces for things like authenticating, sending messages to a server, and registering callbacks for presence changes.


Lean Languages and Libraries

Russell Beattie: I’d even speculate that the recent interest in Erlang is driven by these same forces. The same people who swore they’d never use Java again if they could help it are looking for a good replacement for it on the server. The one spot where the JVM has no real peers is in long running server-side processes that need to execute hard-coded business logic as scalably as possible. Despite the fact that the JVM is a resource hungry beast that’s ponderously slow to start up and happy to eat as much memory as you can throw at it, it’s still the only real game in town for this type of application... for now. The reason people are looking at Erlang is not because its beautiful syntax, great documentation, or up-to-date libraries. Trust me. It’s because the Erlang VM can run for long periods of time, scaling linearly across cores or processors filling the same niche that Java does right now on the server.

First, great article.  I encourage everybody to read the full thing.  I also like Brian McCallister's follow up.


Tomboy Blogposter

Robin Sonefors: I  now call it Tomboy Blogposter, since it really isn’t just for Wordpress, even though that continues to be my primary testing platform.

Another client to test with Wordpress 2.3.



Jonathan Schwartz: JAVA is a technology whose value is near infinite to the internet, and a brand that’s inseparably a part of Sun (and our profitability). And so next week, we’re going to embrace that reality by changing our trading symbol, from SUNW to JAVA.

Intermittent IE failures

Ziv Caspi: None of the pages on your site can be viewed by IE (except for the Atom feed).

This is the second report I’ve gotten on this.


Interesting Start

David Orchard: The TAG has reviewed the proposal in HTML5 and Distributed Extensibility.  In short, we believe it is a very interesting start of a proposal for stronger support for distributed extensibility on the web in the HTML language.  We hope that the Working Group will give it and it’s natural subsequent refinements or similar alternatives very serious consideration.

For the moment, this issue hasn’t even been added to the list of issues that “Ian has marked as needing to be dealt with”.  I hope that this can be corrected.

Forbes’ Seventh Law

Reg Braithwaite attributes this law to Dennis Forbes:

The constructiveness of your criticism is in inverse exponential proportion to the size of the audience.

[via Stefan Tilkov]

Update: the exception that proves the rule.

Seeking to Broaden the Conversation?

Tim O’Reilly: In the first era of the computer industry, lock-in was provided by hardware; in the second era, it was provided by software; today, it is provided by centralized databases driven by winner-takes-all network effects.

If only there were a Foundation which concerned itself with Freedom in all venues Electronic...


Bill de hÓra Social graph aggregation and fluidity allows for better cross-selling. All those recommendation algorithms of the form “you like/bought x and y likes/bought x, you and y might have something in common” work better with larger data sets. Especially if you can jump verticals - such as connecting data to facebook. So it’s gonna happen one way or another.

Dare seems to think that the root problem is oppression by the “man”.  In this case, a 23 year oldBrad seems to view this as a technical problem.


Agents for Change

Azubuko Obele: Because everybody ends up managing their own messaging solution. Now every application isn’t complete until it can send/receive IMs.

25 years ago, pretty much everybody was talking about a new thing (or at least, it was new to them).  It went by a name...


Pattern Matching

Simon Johnston: The difference I see is that stackless is an enabling feature of the VM, that we will still need the language-level primitives for message send/receieve I see in Erlang and then library support for managing local and remote distribution.

The “secret sauce” is pattern matching.  Things like “if these conditions are met, this code fires, with the following variables set”



Bill de hÓra: I’m wondering how would one produce a URL space for a blog style archive, using Servlets+JSP, and do so in a way that isn’t a CGI/RPC explicit call?

Perhaps via URL Rewrite Filters?

Long Bets Apologia

My long Bets post attracted some interesting reactions.  A number of supporters of various items in my list, as well as a few pushbacks.  There’s not much to say about the former, except thanks for the validation.  The remainder of this post will deal with the latter.


Long Bets

Jerry Cuomo: I see several common threads:

Twenty eight months ago, this bordered on heresy.  And if that didn’t, this certainly did.


Comparing Purifiers

Mark Pilgrim: <> is the one to beat.  It passes these tests: <>

I’m not convinced.  Perhaps by comparing this to html5lib's sanitizer, both can be improved.


Erlang: First Impressions

A brief introduction to Erlang syntax, enough to make the concurrency section of the Getting Started manual comprehensible.  Once that is mastered, one can move onto to Yaws, ErlyWeb and Mnesia (overview).


Sending XHTML over Jabber

At times, sending something more than plain text is desirable.  XEP-0071 XHTML-IM provides for that with Jabber.  And sending such XHTML enriched messages with xmpppy turns out to be fairly straightforward.  In fact, I’ve now set up my weblog so that I get notified whenever I’m online and a comment is made.  Here’s how it works.


American Inflight Broadband

American Airlines ... will provide passengers with a high-speed Internet connection, VPN access and e-mail capabilities through Wi-Fi-enabled laptops and PDA devices. The system has the ability to adapt as technology evolves. The technology will be available in all classes of the B767-200 aircraft for a fee. If the connectivity solution is successful, it could be extended to the rest of American’s domestic fleet [via Simon Phipps]

Personal Jabber Server

The following started out as an exploration of erlang, but the side trip has proven interesting enough to merit its own entry.  Accordingly, here are notes regarding the installation of a personal (or “workgroup”, if you prefer) Jabber server on a home LAN running Ubuntu Linux.  Beware, specific user and host names are filled in, as well as a dummy password; adjust as required.


Interoperability and XSS Mitigation

Rob Sayre: Come to think of it, we might want to standardize similar policies for restricted HTML parsing. There’s even a W3C mailing list working on this stuff. Turns out mail clients have the same issues that feed readers do. And Google Reader is just one example of a website that has this problem. Why can’t browsers borrow this policy from email clients and feed readers, and allow site authors to activate it? That way, sites wouldn’t get burned by faulty markup sanitization.

I’ve created Sanitization Rules.  As it is a wiki page, free form additions and refactorings are welcome.

Joe Takes the Road Less Traveled

Joe Gregorio: I start at Google at the end of this month

In selecting a title for this post, I agonized briefly over changing Less to More.  After all, Google seems to be hoovering in all of my friends, one by one.

Then I reread both Ch-ch-changes and Robert Frost’s original, and I believe I made the right choice.

WordPress, AtomPub, and PHP5

Matt: I totally trust Sam, and if he has time to put the stamp on it then I could fast track it.

I only am interested in investing my time on PHP5.



Tim O’Reilly: One of the important truths of Web 2.0 is that it ain’t the personal computer era any more, Eben Moglen’s arguments to the contrary notwithstanding. A lot of really important software can’t even be exercised properly without very large networks of machines, very large data sets, and heavy performance demands. Yahoo! provides all of these. This means that Hadoop will work for the big boys, and not just for toy projects.

This notion deserves a separate name.  Perhaps even a really awful one.



Pete Lacey: appfs is a utility that can mount remote resources exposed via the Atom Publishing Protocol (Atompub) as a local filesystem. [via Pete Lacey]


HTML5 and Distributed Extensibility

Since the workgroup demands use cases for any proposed new feature, I will provide one up front: this feature’s use case is to enable features without use cases.  But before I proceed, it would be helpful to review a bit of background.


CSV Subscription Lists

Alf Eaton: would it be possible to have CSV as an input format for the list of feed subscriptions in Venus? That way you could use Google Spreadsheets to collaboratively manage the list of feeds

That neatly solves a number of issues.


Burning Threads

Stephen O’Grady: If I want to output to Atom from WordPress then, I have two problems: first, outputting the comment counts to the feed, and second getting FeedBurner to recognize the thread elements. Problem one is easily solved, from what James Snell tells me. Problem two is more complicated. My choices seem to be a.) sacrifice my comment count, b.) write a FeedFlare element that will handle RFC 4685, or c.) persuade someone at FeedBurner to support it

I must say that the out-of-the-box experience here is a bit suboptimal.


Agile Financial Publishing

Tim Bray: Why Digital Signature? · This idea was first proposed by James Snell, and it’s a good one.  Mind you, the benefits are a little bit theoretical, since no feed-reading clients that I’ve seen actually check a digital signature.  The argument for this is similar to that for TLS; a bad guy who could somehow insert a fake press release into the feed could make zillions by gaming the share price.  A verifiable digital signature would let someone reading the feed know that the news in it really truly did come from Sun.

From busted to valid to best practices, all in a little over ninety days.  Kudos.


Where the Sun Don’t Shine

Alan Zeichick: Can you confirm or deny the accuracy of Intel’s comment, that the spec lead told Intel that Sun will not include field of use restrictions in the Java EE 6 licenses? Can you comment on whether Sun stands behind what the spec lead allegedly told Intel?

Update: Intel's comments on JSR 317 and JSR 318, with emphasis added:


Venus Administration

Steve Dibb: It has long been a royal pain in the butt to manage planet’s files because it is essentially one large .ini file where you have dozens of entries. I’m first going to write a frontend that I can use to automatically generate the .ini files for each user, instead of all in one global file, and store them in a small database. That way, making minor changes will be a simple feat.  For Planet Larry, I’m going to take it a step further and let users manage their feeds themselves. I’ll have an entire user authentication system where they can login, set their feed URLs, choose their language, set their location, etc. so that Alex or myself don’t have to do it all.

I have a few suggestions


Persai Feedcorpus Status

Kyle Shank: I present to you the Persai feed corpus: 118,254 feeds of pure greatness.

Let’s check on the status of these URIs.



James Tauber: This afternoon, I started django-atompub at Google Code.

Maps 'n Data

Shelley Powers: Frankly, this is the type of stuff that puts a foolish grin on my face.

Mine too.

Information Access Patterns

Tim Bray: Web App Performance · Everything anybody knows on the subject, on one screen. Study and grow wise.

As luck would have it, I was just collecting a set of related links.  Here’s a Tab Sweep of my own:


Monkey Business

Brendan Eich: My Ajax Experience West keynote covers a lot of ground, with slant-wise truth telling the over-arching theme. Mozilla believes in fairly radical open source action, including open strategy. In that spirit, three new projects

That’s a compelling story.  Combined, they describe a strategy to team (either directly or indirectly) with Microsoft, Mono, and Adobe enable every browser and every device that runs Flash to also be able to run applications written in Python, Ruby, and the latest version of JavaScript.


First Impressions

Jonathan Schwartz: to get the latest updates directly from Sun, be sure to subscribe to our RSS feeds.

You know, people use the term RSS like Kleenex.  As long as Sun isn’t using RSS 2.0 to disclose financial results, it will probably be OK.  Let’s take a look


JanRain python-openid-1.2.0 and 404 on openid.server

I got a report from Bob Aman that my “blog broke”, along with the following traceback


The End of the Beginning

Tim Bray: Atom is done.  Now the editorial processes grind away and eventually the official specification of the Atom Publishing Protocol will be an RFC substantially identical to draft-ietf-atompub-protocol-17; it’ll join RFC4287 as the official products of the IETF Atompub Working Group

Next up: a F2F interop event in Tokyo.  Followed by an online interop event in August.  There has even been some talk of a September interop event in the bay area.

Atom 1.0 via FeedBurner

Antonio Cangiano: If you prefer Atom 1.0 over RSS 2.0 (you should), this brief post will tell you how to migrate to FeedBurner and Atom.

That brings to 32 the number of Atom 1.0 feeds that I am subscribed to which are served from  I haven’t counted, but a number of others take advantage of the free MyBrand service.

Feed Folies, Summer of 2007 Edition

Robert Scoble: what really is cooking here is that RSS has been moved to big companies to control.

Apparently a previous draft said “stolen”.

Whether Scoble meant actively stolen, or simply moved, either imply that the spec isn’t where it used to be.  A quick Google, Yahoo!, or Windows Live search reveals that it still is where it has been for over 4 years.

Oh, and as to the recent spec “clarification” that was recently made to the alternate specification that also happens to call itself RSS 2.0?  FeedBurner’s CTO voted against it.

Odorless, Colorless and Toxic

Eric Siu: which file would i edit to remove the blank line? ... seems that a lot of them are having similar problems, but there is no solution yet

Unfortunately, the people who are subscribed to feedvalidator-users don’t seem to know the answers to questions such as these.  I know, I don’t.  But perhaps one of the readers of this weblog does.



Jeff Hodges: I was recruited after Bob Aman of FeedTools fame saw me hyping my translation of Mark Pilgrim’s FeedParser from Python to Ruby, and thought it was pretty good.  The translation, of course, is called rFeedParser and it really is pretty good.  I’ll have a post on that soon.  First, I want to fix the silly options bugs that I was turned on to a little while ago.

I’m not sure how I missed this before


Information Access Patterns

Anant Jhingran: what are the patterns that will emerge for information needs of Web 2.0 class of applications?

Two words: Pull and MegaData.

Much as we watched with amusement in the late 80’s and early 90’s while the PC’s reinvented mainframe operating systems, and thought to ourselves that any day now they will discover Virtual Machines; we seem to be in a period where the web is rediscovering Data Management, and thinking to ourselves that any day now they will discover Data Warehousing, though this time it will be without a fetish for Data Integrity as in the Web 2.0 world one size does not fit all.

The Fremantle Correction

Paul Fremantle: Another software problem that can never be solved by adding another layer of indirection is providing a simple, transparent and easy-to-use code [via Stefan Tilkov]

Gating Issue

The Apache Software Foundation has received no official response from Sun regarding the open letter mentioned above, other than a polite acknowledgment of receipt.


Popular Sovereignty

Matt Mullenweg: PHP core has never shown any particular regard for its biggest apps, as evidenced by the above bug and others, so I’m not sure why we should go out of our way to promote their upgrade. [emphasis added]

Approximately eight years ago I submitted an outline of an approach to integrate PHP4 with Java.  The response was (in essence, I can’t find it at the moment) “here is your CVS account”.  This fundamentally changed my notion of us and them.

Wordpress + Flash + RSS

influxproject: Please, my feed don´t appears as validated can somebody help me? This is the url: [link]

I’m at a loss what to recommend.  Can any reader handle this feed?  Is there something this user can configure or install to fix this problem?

A quick scan indicates that this defect may be related.

Courage: Use XML as a testing tool

olivier Théreaux: Use it for Quality Control! Unlike HTML engines, XML processors are supposed to be very strict with the syntax they accept.

Let’s just see how strict XML processors really are.  The feed is well formed.  It even is valid Atom 1.0.  Unfortunately, the summary is escaped too much, and parts of the content are not escaped enough.

Oh, and the RSS feed?  Multiple choices.

Just Say No

JSR 190DeniedJSR 280Denied.

Neither appear to be directly related to the ASF Open Letter, but it does appear that the general consciousness as to the rampant proclivity of spec leads to create egregious JSPA violations has been raised.

It is time for a new version of the JCP.  One in which license terms are declared up front, so as to not waste anybody’s time.  One in which the specification process itself is “default open” as opposed to “default closed” as it originally was, and not up to the unchallenged whim of the spec lead as it currently is.  One in which access to TCKs do not require an NDA — after all the harness is now open source, so that particular excuse is now gone.

All You Need Is Love

Clay Shirky: you will make more accurate predictions about software and — in this web drive world — about services if you ask yourself not “what’s the business model?”, but rather “do the people who like it take care of each other?”.  That turns out to be the better predictor of longevity.

Open NonDisclosures?

Simon Phipps: I’m with Dalibor, and asserting that regularly FWIW. On the subject of NDAs, note that it’s not neccessarily Sun that requires for them (it can easily be a requirement for participation by one of the other EG members) and I think it’s a mistake to tie this issue to the Harmony JCK issue.

I can’t conceive of any way in which an package can be “open source” and require an NDA.

Which Way Is Up?

WRAL: American Airlines said Thursday that it would upgrade its flight from Raleigh-Durham International Airport to London by moving the flight from Gatwick to Heathrow Airport.

This can’t be good.

Bad Spot

Andrew C. Oliver: This leaves Apache in a bad spot. Continue to participate in a process that restricts a collaborative, consensus based development process by forking its communities into NDA-haves and have-nots and potentially prevents Apache from licensing its software as something meeting the Open Source Definition or disavow itself of this process and leave its projects in an unblessed state (they can continue to implement the specs as released to the public but not participate in their design) or potentially a third fragmentary response where restrictions are accepted for some projects (particularly those that do not restrict “Field Of Use").

Should the ASF vote no "NO” on a JSR Review ballot that established Sun to be the spec lead on yet another JSR?  Oh no, says Henning Schmiedehausen, as that would be going nuclear.

Should the ASF cease the practice of requesting TCKs under condition of NDAs?  Oh no, says Bill Barker, as that would be taking the nuclear option.

I encourage everybody to read the comments that Andy’s post has attracted, and think about what going nuclear really means.

GPL Compatible

Richard Stallman: GPLv3 is now compatible with the Apache 2.0 license

With apologies to Inigo: You keep using that word — I do not think it means what you think it means.

The GPL V3 license is compatible with the ASF V2 license in precisely the same way that blood type AB is compatible with blood type O.

Note: I’m not saying that’s a bad thing.  In fact, this change will positively benefit many.  I just think that it expressing this complex concept by using a word that has multiple — dare I say it — incompatible meanings will only promote confusion.

Publishing a Blog From a mod_atom Store

Seth Gordon: Planet ( was designed to crawl all the feeds on the blogroll and produce some appropriately formatted HTML page with all their contents; you could just set it up so it only read your own blog’s mod_atom feed, make some appropriate template, and voila!

That would certainly cover the front page, but that’s about it.

Fortunately, there are bits and pieces that cover the rest.


a2enmod atom

Tim Bray: if the ASF ever got interested I have the go-ahead to sign over whatever to whomever

Hey, I just might be “whomever”.  :-)

This post should show up shortly on Planet Apache; hopefully, that will flush out a few others with interest.

Making the Web Safe for application/xhtml+xml

Rami Kayyali: Funny how Google (a leader in the Web space) doesn’t recognize Intertwingly’s (a leader in Web standards) Content-Type.

If you check again today, you will see that this situation is changing.  As Google re-crawls my site, it is starting to recognize the content type.

JavaFX SVG Translator Preview

Chris Oliver: It’ll take a few more days before we post the code to OpenJFX, but in the meantime here’s a preview of the latest version of our SVG to FX translator. The translator converts an SVG document into a single JavaFX class.

Here’s a few tests that produce unexpected results:


Miguel de Icaza: The past 21 days have been some of the most intense hacking days that I have ever had and the same goes for my team that worked 12 to 16 hours per day every single day --including weekends-- to implement Silverlight for Linux in record time. We call this effort Moonlight.

It looks like my little demo was used to help debug.  Sweet!


M. David Peterson: WebW3S is Microsoft’s answer to a RESTful web publishing protocol. In many ways it attempts to tackle the same problems solved by the Atom Publishing Protocol.

I took a look at “Web Structured, Schema’d & Searchable”, and found Structure, but was unable to find the Web, Schema, or Search.

But let me first back up.


Worth Watching

Doug Purdy: I have pulled down my old content (both Radio Userland and DasBlog), written a “bare-bones” APP implementation (which you are viewing now), and will attempt to focus on more serious topics, rather than my usually strain on banal postings.  We’ll see how it goes.

Resources: APP Test Suite, APP Test Client, APE, Feed Validator

Safari - now with SVG

Dave Walker: Safari 3.0 Beta (522.11) likes your inline SVG, Sam. :)

Verified... on Windows, no less.


Simple Sharing Extensions, version 0.93

Steven Lees: I posted an new version of the spec to MSDN yesterday. Some of the significant changes since the original version include:

I’ve begun updating the Feed Validator test suite, and asked a few questions.

Python Pain Points

Bill Venners: What Are Your Python Pain Points, Really? [via Bill de hÓra]

I second with Bill’s first two items, but I have to add Python’s module new to the list.  Compare Python vs Ruby

To be fair, this is very much related to Bill’s fifth point.

Update: The Python code in question has been dramatically improved based on suggestions by Bjørn Stabell.

Update: The updated Python code has a very subtle bug, one that could very easily qualify as a pain point. Details.

Life on Linux

Mark Pilgrim: One year ago, I switched to Linux for a variety of reasons revolving around software freedom, choice, and data preservation.

While my Windows desktop spends most of its time in hibernate mode, it does get used a few times a week


Planet Mozilla moves to Venus

Justin Fitzhugh: upgrade. A new version of Planet (Venus) will be deployed. No downtime is expected.


Anne Thomas Manes: Notice that the URL contains a method name (getInfo) and query string containing the method parameters. This is NOT REST!

It is statements like the one I quoted above that tempers my enthusiasm when I hear that Burton sees the future of SOA and it is REST.  Until we can agree on what the term REST means, we’re just replacing one meaningless buzzword with another.


Mosquito Netting

This is not about the gorgeous new pond, at least not directly.  Though you are welcome to marvel at it.  I personally find the setting to be very serene, relaxing, and... dare I say it... restful.

No, this is about the porch behind the pond.  At the moment, in addition to some temporarily displaced yard art, there are two new hammock chairs.  Both have been slept in.

Now I am considering adding mosquito netting.  This place sounds promising, though I welcome other recommendations.


Atom + LDAP

Trey Drake: Why is an OpenDS based Atom server interesting?

wow.  [via Dave Johnson, Bill de hÓra, and James Snell]

Near Miss

Scott Adams: The Cheesecake Factory is a great business model, but if you take your wife there for your 25th wedding anniversary, you might not reach your 26th.

While we did go to the Cheesecake Factory for Valentines day, luckily we went to the Angus Barn for our 25th anniversary.

Whew.  That was a close one.  Perhaps we will make it 26 after all.

Link Bait

You can’t write a review like this one, and not expect the authors to link to it.




HTML5 Sanitizer

A while back, I commented that I would likely backport Jacques’s sanitizer to Python.  I still haven’t gotten around to that, but I have ported it to html5lib (source, tests).



Rogers Cadenhead: If Randy wants to change elements to “elements and attributes” as a spec clarification, I’m comfortable solving the problem in that manner.

Positively breathtaking.


Ruby HTML5 Parser Tests Pass

Thanks go out to Tim Fletcher for fixing the remaining html5lib unit test bugs.

What does this mean?  Essentially it means that the Ruby implementation is approaching functional parity with the Python implementation, where the accuracy of the preceding statement is a function by the unit test code coverage.


Shallow Rest

Bill de hÓra: let’s take the ws guys seriously now they’re recanting. Sigh.

Any day now, they might even discover ETags.

That question wasn’t chosen as it identifies a random HTTP header that happens to have a very practical benefit.  That question was chosen as it identifies an HTTP header than happens to have a practical benefit AND requires one to pierce layers upon layers of “value-add” infrastructure to implement correctly.

Related: see Patrick Mueller’s exploration of twitter.

WordPress 2.2

Matthew Mullenweg: Full Atom support, including updating our Atom feeds to use the 1.0 standard spec and including an implementation of the Atom Publishing API


Ruby HTML5 Parser

I got enough of this running to demonstrate proof of concept.  I’m looking for help.  Interested?  Join the group.


Rank Gamesmanship

As I pointed out in in 2006 the reality is that just like there are two distinct RSS 0.91 specs (UserLand and NetScape (Advisory Board archive)), there are now two distinct RSS 2.0 specs (Harvard and Advisory Board).


Ruby HTML5 Tokenizer

Henri Sivonen: I expected that it would make sense to use RELAX NG for expressing virtually all HTML5 conformance requirements that could theoretically be expressed in RELAX NG. This expectation turned out to be incorrect.

Perhaps a DSL would be appropriate?

So, the first step is to port the HTML5lib tokenizer to ruby.


Control You

Mike Shaver: If someone tells you that their platform is the web, only better, there is a very easy test that you can use

JCP Member of the Year

Danny Angus: The JCP have voted the Apache Software Foundation as "Member of the year 2007".  In the light of the recent waves Apache has been creating around IP restrictions on test kits this is either very ironic, or something of a show of support from the other members. In either case well done to the Apache folks who participate.

Hot Off Of The Presses

Leonard Richardson: O’Reilly sent me a copy of RESTful Web Services before it was even published!

Me too :-)