It’s just data


Ross Mayfield: Cue up not what is popular, or what the people I subscribed to produced.  Cue up what my social network has found interesting.

Herewith, a simple demonstration of what aggressive canonicalization can produce.  Venus may be in Python, but suppose I’m in a Ruby mood.  The cache is simply files in Atom 1.0 format, with all textual content normalized to XHTML.

Let’s make a few simplifying assumptions: all posts are created equal, each post can only vote once for any given link (this also takes care of things like summaries which partially repeat content), posts implicitly vote (once!) for themselves, and the weight of a vote degrades as the square of the distance between when the post was made and now.

Here’s the code, and here’s a snapshot of the output.  The output took 6.239 elapsed seconds to produce on my laptop.  I still have more work to do to eliminate some of the self-referential links (in fact, I a priori removed Bob Sutor’s blog from the analysis as it otherwise he would dominate the results).  But I am confident that this is solvable, in fact, I am working on expanding what filters can do.  I’ll post more on that shortly.

In any case, I will attest to the fact that the remaining links are current topics of conversation within my circle of friends.

As you can see, having ready access to this data, outside of any data silo, in a readily consumable format makes tasks like this fairly easy.

So, the question must be asked, why can’t every public (and private) planet produce a meme feed?