XPath function translate converted to support characters
Comparing indexed strings against numbers was changed to compare slices against strings.
Encoding specific tests were changed as necessary to specify the encoding expected.
Other language changes:
{"a","b"} no longer supported, converted to {"a"=>"b"}
Array.to_s now contains square brackets, quotes, and commas. Array.join used instead when join semantics is required.
Colon is no longer accepted as an alternative to then in when statements.
Renamed variables when necessary to prevent warning message when a block variable shadowed a local variable. The idiom var = array.find {|var| ...} now requires the block variable to be named differently than the local variable. REXML also has a few places where block variables were set to nil immediately outside of the block “for performance reasons”. In a few cases, for node in nodeset was replaced with nodeset.each do |node|
An instance variable which previously wasn’t initialized now explicitly initialized to false
Added a sleep in a threaded sax/socket test to prevent a race condition.
REXML changes:
Substitution of entities was done in a single pass, this meant that entity values whose value referenced other entities would only get fully expanded if the order of the hash matched the order of evaluation.
get_attribute_ns('','a') previously matched both an attribute named a, and xmlns:a. Which one returned depended on the order of a hash. Changed the method to only select a.
One regular expression which had explicitly specified /u for unicode changed to not be unicode: this was the regular expression matching the xml prolog declaration itself which must be matched against bytes.
Added a unit test for 110 and tests and fixes for 123, 127.
My confusion from yesterday was due to a bug, which was promptly fixed — test case, fix.
Now that I understand what is intended, the situation is a lot clearer. The net result is that any sequence of operations that produce a runtime exception in Ruby 1.9 would also produce a runtime exception in Python 3.0. Some use cases that are entirely safe will not produce an exception in Ruby 1.9 when they would in Python 3.0. Such an approach is entirely consistent with a dynamic language.
I’ve got portions of HTML5lib working on Ruby 1.9, enough to pass Mars's unit tests. My initial reaction to Ruby 1.9’s support isn’t favorable. I definitely like Python 3K's Unicode support better. This feels closer to Python 2.5. In fact, I think I prefer Ruby 1.8’s non-support for Unicode over Ruby 1.9’s “support”.
The problem is one that is all to familiar to Python programmers. You can have a fully unit tested library and have somebody pass you a bad string, and you will fall over.
Rafe Colburn: Rogers Cadenhead looks at Dave Winer’s long bet with New York Times executive Martin Nisenholtz on whether blogs or the Times would reign supreme by 2007. The winner: none of the above. Wikipedia outranks them both.
Perhaps the requirement to cite sources trumps the requirement to provide credentials.
A little over a month ago, I outlined how I would like to see the feed parser reorganized. I’ve now put a little meat on the bones, in the form of running code. Not just for the feed parser, but also for Planet. I also did it all in Ruby, so I named this little experiment Mars. Warning: this version is 0.0.1. It just barely runs end-to-end. Feed it real data, and it will choke on some of it. But it can now produce partial results.
All in all, I’m pleased with how compact this code is. If anybody wants to join in on the fun, it is a bzr repository and there are plenty of test cases ready to be ported.
Fundamentally, Microsoft’s strategy is sound. Ignore standards that you find inconvenient, and focus on producing and enabling the production of content people want. While my humble site can’t compete with the likes of Jackass 2.5, I do have a few people who follow my site. I’ve switched my front page to HTML5 despite the fact that this means that MSIE7 will therefore ignore virtually all the CSS styling rules that apply to the page. The page validates modulo an acknowledged bug in the validator.
Amazon seems to really get the Getting Started should be free aspect of the web, and is clearly targetting the “She has the idea on Wednesday and gets the script working next Monday, and one quarter later, either gives up on the idea or is incredibly rich. Both are good outcomes” developer market.
Meanwhie, Mark Nottingham of Yahoo! is proposing standards for caches which prefer to serve slightly stale content fast in lieu of providing late or broken results.
Update: Keith Gaughan: That ain’t REST. They screwed up the “REST” interface for it exactly the same way as they screwed up the one for SQS and FPS.
Rick Blommers: ReXML seems to escape items very nicely when setting values. But it doesn’t unescape the values with REXML::Document.new( … )
A bare minimum amount of functionality that one would expect from an XML parsing library is the ability to round-trip data.
The two things One thing I have yet to find is where I can SVN checkout the latest code, and how to run the exiting set of tests. I would like to submit new tests which expose the problems I have found so far, and patches to correct these issues. Ideally in time for 3.1.8.
Dean Hachamovitch: You will hear a lot more from us soon on this blog and in other places. In the meantime, please don’t mistake silence for inaction.
I’d love to see a screenshot of this page using MSIE8. While supporting HTML5 would be nice, the fact that MSIE7 won’t allow CSS rules to apply to any of the new HTML5 elements will significantly inhibit adoption of this standard.
Steven Lees: Today we published the final v1 spec for Simple Sharing Extensions, under a new name, FeedSync. The new name is a little simpler than the old one (kind of ironic!) and it captures the intent pretty well.
Lachlan Hunt: HTML 5 introduces and enhances a wide range of features including form controls, APIs, multimedia, structure, and semantics
In the interest of getting practical deployment experience with these specifications, I plan to explore exploiting these new tags on both my weblog and my planet. Two issues immediately come to mind, and I’m sure I’ll encounter more.
Paul Fremantle: fundamentally the approach we have taken is to build a registry/repository based on REST concepts. And as we looked at the REST space, we kept noticing how close the Atom Publishing Protocol (APP) is to our needs, so we’ve made that the public remote API to access the repository. Of course, if you are just browsing the registry, you only need a browser - APP is mainly there to support updating resources. Of course, using Atom and APP gives some really nice benefits too - like being able to subscribe a feed of new resources that meet your search criteria.
Anil Dash: We announced Beacon support on both LiveJournal and TypePad as initial launch partners. But we worked really hard with the Facebook team on one really important detail — making sure our implementations are completely opt-in. Not to put too fine a point on it, but this was kind of a no-brainer.
Kudos to 6A on getting the more important “unfixable” detail correct.
Alan Bell: To get the data into the site the documents were opened in OpenOffice.org Writer, then copied to Calc, tidied up manually (merged cells are evil) then imported to Lotus Notes 8 via the built in Symphony spreadsheet (I had been working on some code to import Calc into Notes so this was easy) exported to XML then imported to WordPress. The import file was just over 2Mb in size. ... The main difficulty in importing the data was smartquotes and em dashes that Word had autocorrected.
Ad Hoc, Situated Software at it’s finest. There even are feeds for comments on the comments.
David Shields: having worked for almost five years now as a member of the team that manages IBM’s open-source strategy and its execution, I can claim some expertise in this area. There are no vast secrets here, no grand plan. Here is the strategy as I understand it, and as I have worked to implement it.
David Recordon: Awesome, see their post! OpenID commenting as a beta feature on Blogger, way to go guys! I just tried it out and as you’d expect, it works great. Also really nice to see another site first accepting OpenID instead of providing it.
Brendan Eich: Standards often are made by insiders, established players, vendors with something to sell and so something to lose. Web standards bodies organized as pay-to-play consortia thus leave out developers and users, although vendors of course claim to represent everyone fully and fairly. I’ve worked within such bodies and continue to try to make progress in them, but I’ve come to the conclusion that open standards need radically open standardization processes.
The W3C HTML Working Group needs a CarterPhone. Clearly, Brendan is talking about ES4, but the issues he brings up are general.
I’m posting this in case I’m not the last person to realize this. While I’ve used the Unix sh longer than DHH has been alive, I either never realized or have long since forgotten that it supports here documents. Example:
ruby <<EOD | sort | uniq -c | sort -n
Dir['*'].each do |name|
puts name.split('.',0)[1] || '<null>'
end
EOD
Dare Obasanjo: My weekend project was to read Dive Into Python and learn enough Python to be able to port Sam Ruby’s meme tracker (source code) from CPython to Iron Python. Sam’s meme tracker, shows the most popular links from the past week from the blogs in his RSS subscriptions.
More recent code can be found here. Fetches titles from HTML, handles etags, matches both www. and non-www. versions of a URI. Handles people who point to things multiple times. Allows you to group people who tend to all “vote” in bulk. Note: I consider the alternate link to be a vote too, which gives a small bump to people who post original content vs links.
I’d also recommend that you invest some time into converting from a simple regular expression to a real HTML parser. You’ll need it anyway for titles.
Jay Goldman: On November 6th, 2007, Facebook launched a series of new tools to help advertisers target the 54 million people now regularly using their site. They’re still throwing around a 3% weekly growth rate and have a target of 60 million active users by the end of the year, so it’s not hard to picture the day in the not-so-distant future when hospitals Facebook babies before handing them over and the little bundle of joy comes with a neural implant that pokes their parental units when the diaper is full. [via Simon Willison]
Michael Pavone: I’ve started another little reverse engineering project. Google hasn’t released any documentation on their new VM so I decided to get some the hard way. Well, hard is relative here. A decompiled Java class is a bit easier to read than a disassembled 68K binary. Anyway, I’ve managed to write some documentation on the dex file format used by the VM. I hope to have some documentation on the actual instruction set used by the VM in a few days
In Characters vs. Bytes, Tim Bray mentions the Gothic letter faihu. Whether such a character will display properly in your browser depends on what operating system you use and what fonts you have installed. Whether or not you can handle such characters programmaticly, however, depends on what programming language you use.
Kevin Lawver: This was a surreal experience... Dan led a sing-along with a bunch of W3C folks, including Tim Berners-Lee, the inventor of the web, and lots and lots of folks who invented important pieces of it (like CSS, HTML, XHTML, etc). Fun, fun, fun.
com.google.android.xmppService.IXmppService.createXmppSession: Creates a XMPP session to the server, using username and password for the login. createXmppSession starts a new XMPP session if there isn’t one for the username, connects to and logs into the GTalk server. If there is already a running XMPP session for the username, then createsXmppSession just returns the running session.
Don Box: I have to say that the authentication story blows chunks. Having to hand-roll yet another “negotiate session key/sign URL” library for J. Random Facebook/Flickr/GData clone doesn’t scale. Personally, my dream stack would be ubiquitous WS-Security/WS-Trust over HTTP GET and POST and tossing out WSDL
I’d suggest that the root problem here has nothing to to with HTTP or SOAP, but rather that the owners and operators of properties such as Facebook, Flickr, and GData have vested interests that need to be considered.
Planet CreativeCommons is based on Venus. Unsurprisingly, given their mission, they visibly highlight the license under which each of the entries are published. The Universal Feed Parser and Venus take great care to ensure that license and rights information is present in the Atom feeds that are produced, but this is the first time that I’m aware of this data being exposed in the HTML page itself.
Mark Pilgrim: What follows are instructions for building and installing MySQL 5 on Ubuntu. These instructions should work perfectly on both Feisty (7.04) and Gutsy (7.10).
Priceless.
From what I hear, people have had trouble with Leopard and Vista. By contrast, and like others, I found that the default font for Firefox wasn’t to my liking on one of the three machines I installed Gutsy on.
Ben Laurie: I’ve been running a team at Google for a while now, implementing capabilities in Javascript. Fans of this blog will remember that long ago I did a thing called CaPerl. The idea in CaPerl was to compile a slightly modified version of Perl into Perl, enforcing capability security in the process.
Hopefully like the work of Douglas Crockford [via Patrick Logan], the parser itself is (or will be) written in Simplified JavaScript.
This could be a useful, as an option, for CouchDB. I don’t yet see the value for allowing even a sanitized subset of scripts through the UFP to Venus.
Steven Lees: We will remove the “unpublished” element from the spec, i.e. we will remove sections 1.2.9, 2.6 and all of section 4. We decided that the concept of unpublished belongs at the application level, rather than the base SSE specification. We will include information in the SSE implementer’s guide that describes how applications can implement “unpublished” behavior on top of SSE.
It seems to me that SSE + RFC 5005 complement each other. RFC 5005 can help you identify which entries have changed, and SSE can help you identify what changes were made to those entries.
Sometime yesterday Jay Young's default feed switched back to RSS 2.0. The world didn’t end, and not everybody cares about such minutia, but Jay clearly does. Jay may be a minority, but this enhancement would enable Jay and others like him to simply drop in a plugin such as this one, activate it, and be on their way.
The patch does not change the default feed format from RSS 2.0. Perhaps that could be considered for a release like WordPress 4.0, and a plugin could be provided at that time to enable users to select the venerable RSS 2.0 feed format, but in any case such a change would require a separate ticket, as this patch does not do that.
Tim Berners-Lee: HTML is a big community, but there are others communities. Smaller communities are more in need of uri-extensibility than bigger ones.
For the past month, eight feeds hosted by blogs.sun.com were not updated on planet.intertwingly.net, a victim of a poisoned httplib2 cache. A victim of a permanent redirect. The evidence can be found here. Eventually, such feeds would have been viewed as inactive for 90 days, but luckily in this case I caught the problem earlier.
There is a bug in Ruby 1.8.6 that affects documents with a default namespace (even a vestigial one, like those sported by WordPress weblogs) which prevents non-namespace qualified attribute names from working in XPath expressions. The following monkey-patch fixes this:
Brendan Eich: The small-is-beautiful generalization alternates with don’t-break-the-web, again without specifics in reply to specific demonstrations of compatibility.
It is interesting how the don’t-break-the-web meme means different things to different organizations: Mozilla, Microsoft.
For all the reasons that Joseph Scott described, you really want to access WordPress AtomPub service documents using SSL/TLS. Unfortunately, if you look closely at the current APE report, you will both see https and authentication warning.
There’s a discussion going on in CouchDB as to what an ideal dump format for a CouchDB database would look like. A CouchDB database is a collection of URI’s, and while the content associated with any given URI is often JSON, CouchDB supports the notion of an attachment that could be pretty much anything.
If CouchDB could efficiently compute an ETag† for startkey/endkey type queries using only the index, this could be a big win. Most shared-nothing applications would simply become a dispatcher, a few views, and a few templates. The most complicating thing your application need worry about would be when you need to assemble a page using input from multiple views.
Anne van Kesteren: One of my side projects is XML5. Earlier this year I suggested the idea as XML 2.0, but in line with recent “jokes” about HTTP5, SVG5, and CSS5, XML5 makes perfect sense. The idea of XML5 is to provide a revision of XML 1.0, XML 1.1, Namespaces in XML 1.0, Namespaces in XML 1.1, and RFC 3023, that is backwards compatible and introduces HTML-like, although much more sane, error recovery.
Ian B. Jacobs: The cube and cube+'Semantic Web' can be distributed freely. They can be used for derivative works (including used with other imagery and modifications to the cube colors) without permission as long as: The cube shape is not changed; There is attribution of W3C (following some guidelines that we still need to draft).
Questions:
To what degree of precision must the cube shape be retained?
Given that this logo is itself a shape, if the shape is changed, must permission still be obtained? Does that mean that every cube-like image must be approved by the W3C?
I then subscribed to my own feed, and clicked on my SVG in HTML Momemtum Building post. The SVG image on that post displayed correctly, and the comments were all fetched in the background.
In round numbers, it took me an hour to download Ubuntu 7.10 via BitTorrent. About 15 minutes to burn a CD. Another 15 minutes to install. I did this by putting a second 40 gig hard drive into a four year old Win XP machine and installing Gutsy there.
I know some people like Virtual Machines for these kind of things. With machines as cheap as they are these days, I kinda prefer the real thing.
Lots of interesting discussion about SVG in browsers. Momentum is building towards supporting SVG in HTML5, and that makes me happy. It is clear that whatever form it takes won’t satisfy everybody. I’d still prefer that HTML5 support distributed extensibility.
Joe Cheng: Configuring an AtomPub blog needs to be equally easy. For some reason, people in the AtomPub community don’t seem to like RSD (only Six Apart puts Atom endpoints in RSD). We need another autodiscovery mechanism.
Hmmm. When I looked at RSD nearly five years ago, it didn’t seem so bad. In any case, here’s a ticket and a patch to get WordPress to support autodiscovery of AtomPub endpoints.
First, behold the benefits of automated testing: TRAC 5180. :-)
Goals I’d like to set for myself for the next release of Wordpress are twofold: get the APE messages to 0 errors and 0 warnings; and to cleanup the code so that Atom entries are produced in exactly one place and consumed in exactly one place. (Pete Lacey has indicated that he shares the latter goal and has some additional goals).
James Clark: Overall I think we can do much better than S/MIME by designing something specifically for HTTP.
James' third reason looks like a killer one to me. One minor caution: There are scenarios where it is convenient for servers to be able to stream responses, and most signing algorithms are designed to accommodate such requirements. This would tend to indicate that a design that tacks signatures on the end would be preferred. YMMV.
Yaron Goland: The Emperor standards quietly completely hidden in his black robes while Darth Sudsy, covered in his black carapace, leads in Luke Restafarian in a battle torn uniform, long dreadlocks dragging and heavy fatigue evident in his face and stance. Guards stand by the door at stiff attention.
The hero in this dystopian tale is the spec (especially §3.2.1, §3.3, §4.1.1.1); and it’s trusty side kick, the Feed Validator. Note the messages the latter produces on this feed, and think about how much more useful the feed would be to RSS Bandit if these warnings were heeded.
James Graham: html5lib 0.10 is now available for your HTML-parsing pleasure. html5lib is an implementation of the HTML 5parsing algorithm, available in both Python and Ruby flavours. The HTML 5 algorithm is based on reverse engineering the behaviour of popular web browsers and so is compatible with the myriad of broken HTML encountered on the web.
Joe Cheng: The Windows Live Writer team is still on track to deliver AtomPub support in the next version, which I am looking forward to immensely. It’s definitely an exciting time to be in the blogging tools space!
The title that the mememe logic in Planet Venus extracts from this text/plain representation is amusing.
Rick Jelliffe: if you make up or maintain a public text format, and you don’t provide a mechanism for clearly stating the encoding, then, on the face of it, you are incompetent. If you make up or maintain a public text format, it is not someone else’s job to figure out the messy encoding details, it is your job.
I guess it would follow that Python and Perl are competent programming languages.
James Clark: My conclusion is that there’s a real need for a cache-friendly way to sign HTTP responses. (Being able to sign HTTP requests would also be useful, but that solves a different problem.)
Pete Lacey: A better name for SOA, then, might be network-oriented computing (NOC). This encompasses both WS-* and REST (and most everything else from the socket level up). We can, if we want, make SOA and resource-oriented architecture (ROA) a subset of NOC.
Kinda. NOC encompasses REST?, That I’ll buy. But to say that NOC emcompases protocols which, by design, attempt to abstract away the network? Well, ... not so much.
Robert Scoble: I read 800 feeds and TechMeme doesn’t miss much
When was the last time TechMeme included a post by Werner Vogels, Steve Vinoski, or myself? Those are the top three topics of discussion amongst my circle of friends. Clearly the third in this list is a consequence of the fact that this is my circle of friends, but I would argue that so too are the first two.
Perhaps TechMeme doesn’t miss much to Robert Scoble because TechMeme closely tracks to Robert Scoble’s circle of friends — not that there is anything wrong with that. I seem to have a surprising (though clearly smaller) number of people who track my list too. Apparently, including Robert Scoble.
James Snell: I need to get a really solid answer to a really simple question: do I parse out the (X)HTML into a hash or leave it as a String. Both are useful in different contexts although the String form is obviously more generic and results in a less complicated JSON serialization. Answer that question and I think this serialization will fall into the “Not terrible” category.
I’m a strong believer in Darwin in these matters. I believe that the most interesting content is in the, well, content element. If you guys want to store content as a blob, why not go all the way and store the whole Atom element as a string? Meanwhile, I’ll continue to pursue data structures that make access to this data drop dead easy.
Nick Bradbury: Most RSS aggregator developers (myself included) tackled this problem by completely removing all styles from feed content. Since then, I’ve experimented with stripping only “unsafe” CSS from feeds, and despite Adrian’s claim that doing so requires a lot of work, it’s actually quite easy to do
Anant JhingranThe freebase folks do not reveal much about their scaling. The scaleout models for google and wikipedia (where partitioning/replication strategies work quite well) do not quite work in such a networked graph (after all, a query on person="anant" with one or two pointer chases would end up pinging a few nodes under any partition model), so the question is, if we have billions of pieces of information in a dense graph, how does the query load on the system scale?
I, too, have found precious little about the internals of freebase, and likewise I’m interested in the question at the end of the above paragraph. But this post is about the stuff in the middle.
Earlyreports lead me to believe that Dell was offering a minor discount for people who chose Ubuntu over Vista. I took a look today, and came to the conclusion that the discount is now three times what was earlier reported. I got the data from here: Ubuntu, Vista. Here’s a summary:
Gabe Rivera: Since the Techmeme Leaderboard reflects the reality that both blog-driven sites and traditional sites define today’s news, use it to discover new sources, recommend sites to others, or illustrate where tech news breaks. I hope you find it useful, and if you have a stake in tech reporting, not too infuriating.
This looks more current and better maintained than my previous test bed, so I’m now using it for my Venus test site. I gather that the master Techmeme list of sources is hand picked by Gabe to represent a given slice of the tech world, and the leaderboard produced represents a snapshot of what those sources tend to be following. And a mememe list produced across those feeds as sources would therefore be a list of sources to those sources. It appears to be a rather tangled hierarchy of sources, to say the least.
To all of you who have registered for comments, and gave a non-gmail.org XMPP id, you should be receiving new ‘buddy’ requests from an ID I just obtained from jabber.org. For those who have already sold your souls to the big G, you will be unaffected, but for everybody else the messages will route around the Google complex.
Byrne Reese: this is exactly was OpenID needs: Open iDNS, or “Open Id Domain Name System.” This service would work just like DNS, and would map email addresses to an OpenID provider designated by the owner.
Looking at Jabber recently caused me to see this prior discussion in a new light. With Google Talk, one’s gmail.com email-style address is one’s identity. I just created a second address, one without a gmail.com account behind it. And Google Talk and GMail seem to be doing just fine.
CNN: The NMAAHC is the first museum website to partake fully of the Web 2.0 social computing revolution. The site is based on cutting-edge, open source programming frameworks such as Ruby on Rails for collaborative website development. It employs concepts such as tags, or keywords, created by the users to help organize the content. As a result, the Museum on the Web is an example of the bottoms-up web, meaning it’s both a product of a site visitor’s participation, and an enabler of creating a community for them. The site runs on IBM System X web and database servers.
Matthew Mullenweg: I’m thrilled to announce that Version 2.3 “Dexter” of WordPress is now ready for the world ... if you’re a developer you’ll be interested in: 1. Full and complete Atom 1.0 support, including the publishing protocol.
It certainly is a dramatic improvement. And it was fun to be a part of the process. But full and complete?
James Snell: Abdera has always included the ability to serialize Atom entries to JSON. The mapping, however, was not all that ideal. So I rewrote it. The new serialization is VERY verbose but covers extensions, provides better handling of XHTML content, etc. I ran my initial try by Sam Ruby who offered some suggested refinements and I made some changes. The new output is demonstrated here (a json serialization of Sam Ruby’s blog feed). The formatting is very rough, which I’ll be working to fix up, but you should be able to get the basic idea.
Based on the comments, Patrick and Elias do not seem amused. Guys, I’ve got a use case in mind, and I wonder if you wouldn’t mind helping me?
Antonio Cangiano: We now have a working Python driver for DB2 which is currently undergoing internal testing. The driver is similar to the Ruby and PHP ones, which means that you get an advanced and very easy to use API. It also means that if you are confident with the Ruby driver, you will be able to use the Python one in no time.
Oh, and Mac looks like it is coming too; but I don’t do Mac.
Joseph Scott: Hopefully everyone takes away two things from this. One, you can’t depend on HTTP basic authentication working. Two, if you aren’t using SSL/TLS then your traffic isn’t secure.
Joel Spolsky: What’s going to happen? The winners are going to do what worked at Bell Labs in 1978: build a programming language, like C, that’s portable and efficient. It should compile down to “native” code (native code being JavaScript and DOMs) with different backends for different target platforms, where the compiler writers obsess about performance so you don’t have to. It’ll have all the same performance as native JavaScript with full access to the DOM in a consistent fashion, and it’ll compile down to IE native and Firefox native portably and automatically. And, yes, it’ll go into your CSS and muck around with it in some frightening but provably-correct way so you never have to think about CSS incompatibilities ever again. Ever. Oh joyous day that will be.
David Ascher: As Mitchell Baker just blogged, and as a press release from Mozilla will announce shortly, I have taken on an exciting new role within the Mozilla world, leading a new organization focused on email and internet communications. Wow indeed.
Question: instead of Couch.ini specifying the one and only language that views can be written in for this server, could views instead have a media type?
Tim Bray: I’m going to have to go back and patch up the code so it doesn’t emit any of those nasty colons and relative URI references that apparently hurt implementors’ fragile feelings.
As Tim continues to update his post with more and more aggregators that already do support these features, I’m gaining hope that some day I can retire the following Feed Vaidator message: Avoid Namespace Prefix.
Tony Garnock-Jones: Erlang represents strings as lists of (ASCII, or possibly iso8859-1) codepoints. In this regard, it’s weakly typed - there’s no hard distinction between a string, “ABC”, and a list of small integers, [65,66,67]
It is important to realize that Erlang was invented (in 1987) before utf-8 was (in 1992).
Phil Wilson: Since I am too lazy to manage my own subscriptions, I was subscribed to Planet Intertwingly. At 269 feeds though, the signal/noise ratio has taken a bad hit (what do you mean Sam doesn’t tailor his blogroll for me personally?) and I’m going to have to actually import the OPML and weed out stuff I’m not interested in. How annoying.
The issue isn’t the number of feeds, but the number of entries. And some of the people I subscribe to talk a lot, so it is time to prune. To help with this task, I wrote a little script.
Rick JelliffeOne possibility for the co-existence that hadn’t grabbed my attention until today has probably been obvious to everyone else: when converting from OOXML to ODF just embed OOXML-namespaced elements inside the ODF where there is no direct equivalent. This allows good round-tripping, doesn’t require ODF to be extended with legacy Office-isms, allows developers who want to support more than the ODF base to do so, gives better fidelity for Office users ... ODF already allows foreign namespace elements. I guess what ODF would need to support this well would be a mechanism to say “This kind of foreign element should be stripped out when its context changes, but round-tripped otherwise.”
You know, that last idea could be handy for AtomPub too...
Dare Obasanjo: Recently I took a look at CouchDB because I saw it favorably mentioned by Sam Ruby and when Sam says some technology is interesting, he’s always right
Dare’s review of CouchDB is worth a read. (Update: so are Assaf Arkin's and Damien Katz's responses) He gets more things right than wrong. And he doesn’t get things wrong so much as he has a tendency to make unqualified statements that need to be qualified.
Tim Bray : Check out Sanjiva Weerawarana’s PHP Web services: “a PHP script can integrate with any system over Web services at full fidelity, including security and reliability”. I will restrain my urge to editorialize, but, hey Rails Envy guys: low-hanging-fruit alert.
Anant Jhingran: new architectures are successful when they attack a different problem. Providing an equivalent bit replacement (arguably at a lower cost, as Mike would say) is hardly the dominating argument that would cause people to switch. Quite simply, as my colleague Curt Cotner points out, database engine technology is a drop in the bucket in terms of the investment it takes to be a full player in database these days. You need APIs/drivers for all the application environments (JDBC,ODBC, OLE, ADO, .NET, Ruby, PHP, Perl, etc.). You need integration with the application servers (J2EE, persistence layers, XA protocols). You need system integration (workload management, etc.). You need tooling to save people cost on the admin side. And the list goes on. Ultimately, these AD/Admin issues consume 70% of the IT budget...
While I agree with Curt and yourself that adding and evangelizing a new API is generally harder than the implementation itself; the solution may very well be one that doesn’t do that.
Matt Asay: I put together a list of ten principles that I’ve gleaned from my open source experience, which I believe can be applied to just about any business. [via Simon Phipps]
atom2json.erl converts a directory of Atom files to a directory of JSON files. As with most real-life problems, this one has multiple layers. Add in requirements like coalescing consecutive XML text nodes, and the desire to spawn a separate thread per conversion, and the task seems like a fairly daunting one. Yet the resulting Erlang program is remarkably compact, clean, and simple.
Pete Lacey: Currently, we expose posts and uploads (media entries). Once 2.3 is released I hope to add support in 2.4 for WordPress pages and comments among other things.
With this fix, the code now works on PHP4 (evidence). In 2.4, I’d like to see WordPress support foreign markup, presumably as a sort of attachment. I also think the code deserves a good bit of refactoring, as there are two places where Atom is parsed (import/blogger.php and wp-app.php) and two places where Atom entries are created (wp-app.php and feed-atom.php).
Mike Champion: the recent book “RESTful Web Services” by Leonard Richardson and Sam Ruby is chock full of pragmatic REST goodness. Just wanted to set the record straight.
Simon Willison: Django vs feedparser on dates. Some useful tips in the comments. I find Python’s timezone stuff endlessly frustrating: I know it can do what I want, but it always takes me a ridiculously long time to figure out the necessary incantations.
My recommendation is to convert to UTC as early as possible, stay in UTC as long as possible, and convert to local time as late as possible — preferably in JavaScript on the client.
As a learning exercise, I tried converting the Universal Feed Parser to Python 3.0. I picked it because it is a relatively self contained code base that I am familiar with, one that is actively in use, and one that has seen the wear and tear of dealing with compatibility (and the need to monkeypatch the occasional bug) of a number of Python releases.
$ python3.0
Python 3.0a1 (py3k, Aug 31 2007, 21:24:31)
[GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print(len('Iñtërnâtiônàlizætiøn'))
20
>>>
Chris Wanstrath: Erlang’s Mnesia database is something like what I want: you write your queries in plain Erlang and they are translated into Mnesia-queries by walking the parse tree. Nice trick, but listen up: Ruby has a parse tree, too, and we can get at it pretty easily thanks to ParseTree. So, we do. Introducing Ambition. [via Stefan Tilkov]
Typically, solutions for problems like these require some level of persistence: a database, serialized results, or the like. But each of these options require additional maintenance commitments, and I’m looking for something with zero footprint and “just works”.
A simple program for parsing memes.atom. Clearly dumping XHTML fragments to stdout isn’t ideal (perhaps XHTML-IM?), and you wouldn’t want to dump every meme on every run, but those are problems for another day.
Kevin Burton: Last Friday night Greg Stein was mugged and seriously injured outside of his home in Mountain View, CA.
Greg certainly has had his share of bad luck lately. Luckily, however, he has some amazing friends who are rallying together to set things right. If you feel so inclined, please donate to the cause. Any overflow will go to Greg’s favorite foundation.
Longer term, this needs to be split up into a module which spawns threads and exports separate interfaces for things like authenticating, sending messages to a server, and registering callbacks for presence changes.
Russell Beattie: I’d even speculate that the recent interest in Erlang is driven by these same forces. The same people who swore they’d never use Java again if they could help it are looking for a good replacement for it on the server. The one spot where the JVM has no real peers is in long running server-side processes that need to execute hard-coded business logic as scalably as possible. Despite the fact that the JVM is a resource hungry beast that’s ponderously slow to start up and happy to eat as much memory as you can throw at it, it’s still the only real game in town for this type of application... for now. The reason people are looking at Erlang is not because its beautiful syntax, great documentation, or up-to-date libraries. Trust me. It’s because the Erlang VM can run for long periods of time, scaling linearly across cores or processors filling the same niche that Java does right now on the server.
First, great article. I encourage everybody to read the full thing. I also like Brian McCallister's follow up.
Robin Sonefors: I now call it Tomboy Blogposter, since it really isn’t just for Wordpress, even though that continues to be my primary testing platform.
Jonathan Schwartz: JAVA is a technology whose value is near infinite to the internet, and a brand that’s inseparably a part of Sun (and our profitability). And so next week, we’re going to embrace that reality by changing our trading symbol, from SUNW to JAVA.
David Orchard: The TAG has reviewed the proposal in HTML5 and Distributed Extensibility. In short, we believe it is a very interesting start of a proposal for stronger support for distributed extensibility on the web in the HTML language. We hope that the Working Group will give it and it’s natural subsequent refinements or similar alternatives very serious consideration.
For the moment, this issue hasn’t even been added to the list of issues that “Ian has marked as needing to be dealt with”. I hope that this can be corrected.
Tim O’Reilly: In the first era of the computer industry, lock-in was provided by hardware; in the second era, it was provided by software; today, it is provided by centralized databases driven by winner-takes-all network effects.
If only there were a Foundation which concerned itself with Freedom in all venues Electronic...
Bill de hÓraSocial graph aggregation and fluidity allows for better cross-selling. All those recommendation algorithms of the form “you like/bought x and y likes/bought x, you and y might have something in common” work better with larger data sets. Especially if you can jump verticals - such as connecting last.fm data to facebook. So it’s gonna happen one way or another.
Dare seems to think that the root problem is oppression by the “man”. In this case, a 23 year old. Brad seems to view this as a technical problem.
Simon Johnston: The difference I see is that stackless is an enabling feature of the VM, that we will still need the language-level primitives for message send/receieve I see in Erlang and then library support for managing local and remote distribution.
The “secret sauce” is pattern matching. Things like “if these conditions are met, this code fires, with the following variables set”
Bill de hÓra: I’m wondering how would one produce a URL space for a blog style archive, using Servlets+JSP, and do so in a way that isn’t a CGI/RPC explicit call?
My long Bets post attracted some interesting reactions. A number of supporters of various items in my list, as well as a few pushbacks. There’s not much to say about the former, except thanks for the validation. The remainder of this post will deal with the latter.
At times, sending something more than plain text is desirable. XEP-0071 XHTML-IM provides for that with Jabber. And sending such XHTML enriched messages with xmpppy turns out to be fairly straightforward. In fact, I’ve now set up my weblog so that I get notified whenever I’m online and a comment is made. Here’s how it works.
American Airlines ... will provide passengers with a high-speed Internet connection, VPN access and e-mail capabilities through Wi-Fi-enabled laptops and PDA devices. The system has the ability to adapt as technology evolves. The technology will be available in all classes of the B767-200 aircraft for a fee. If the connectivity solution is successful, it could be extended to the rest of American’s domestic fleet [via Simon Phipps]
The following started out as an exploration of erlang, but the side trip has proven interesting enough to merit its own entry. Accordingly, here are notes regarding the installation of a personal (or “workgroup”, if you prefer) Jabber server on a home LAN running Ubuntu Linux. Beware, specific user and host names are filled in, as well as a dummy password; adjust as required.
Rob Sayre: Come to think of it, we might want to standardize similar policies for restricted HTML parsing. There’s even a W3C mailing list working on this stuff. Turns out mail clients have the same issues that feed readers do. And Google Reader is just one example of a website that has this problem. Why can’t browsers borrow this policy from email clients and feed readers, and allow site authors to activate it? That way, sites wouldn’t get burned by faulty markup sanitization.
I’ve created Sanitization Rules. As it is a wiki page, free form additions and refactorings are welcome.
Joe Gregorio: I start at Google at the end of this month
In selecting a title for this post, I agonized briefly over changing Less to More. After all, Google seems to be hoovering in all of my friends, one by one.
Tim O’Reilly: One of the important truths of Web 2.0 is that it ain’t the personal computer era any more, Eben Moglen’s arguments to the contrary notwithstanding. A lot of really important software can’t even be exercised properly without very large networks of machines, very large data sets, and heavy performance demands. Yahoo! provides all of these. This means that Hadoop will work for the big boys, and not just for toy projects.
This notion deserves a separate name. Perhaps even a really awful one.
Pete Lacey: appfs is a utility that can mount remote resources exposed via the Atom Publishing Protocol (Atompub) as a local filesystem. [via Pete Lacey]
Since the workgroup demands use cases for any proposed new feature, I will provide one up front: this feature’s use case is to enable features without use cases. But before I proceed, it would be helpful to review a bit of background.
Alf Eaton: would it be possible to have CSV as an input format for the list of feed subscriptions in Venus? That way you could use Google Spreadsheets to collaboratively manage the list of feeds
Stephen O’Grady: If I want to output to Atom from WordPress then, I have two problems: first, outputting the comment counts to the feed, and second getting FeedBurner to recognize the thread elements. Problem one is easily solved, from what James Snell tells me. Problem two is more complicated. My choices seem to be a.) sacrifice my comment count, b.) write a FeedFlare element that will handle RFC 4685, or c.) persuade someone at FeedBurner to support it
I must say that the out-of-the-box experience here is a bit suboptimal.
Tim Bray: Why Digital Signature? · This idea was first proposed by James Snell, and it’s a good one. Mind you, the benefits are a little bit theoretical, since no feed-reading clients that I’ve seen actually check a digital signature. The argument for this is similar to that for TLS; a bad guy who could somehow insert a fake press release into the feed could make zillions by gaming the share price. A verifiable digital signature would let someone reading the feed know that the news in it really truly did come from Sun.
From busted to valid to best practices, all in a little over ninety days. Kudos.
Alan Zeichick: Can you confirm or deny the accuracy of Intel’s comment, that the spec lead told Intel that Sun will not include field of use restrictions in the Java EE 6 licenses? Can you comment on whether Sun stands behind what the spec lead allegedly told Intel?
Update: Intel's comments on JSR 317 and JSR 318, with emphasis added:
Steve Dibb: It has long been a royal pain in the butt to manage planet’s files because it is essentially one large .ini file where you have dozens of entries. I’m first going to write a frontend that I can use to automatically generate the .ini files for each user, instead of all in one global file, and store them in a small database. That way, making minor changes will be a simple feat. For Planet Larry, I’m going to take it a step further and let users manage their feeds themselves. I’ll have an entire user authentication system where they can login, set their feed URLs, choose their language, set their location, etc. so that Alex or myself don’t have to do it all.
Brendan Eich: My Ajax Experience Westkeynote covers a lot of ground, with slant-wise truth telling the over-arching theme. Mozilla believes in fairly radical open source action, including open strategy. In that spirit, three new projects
That’s a compelling story. Combined, they describe a strategy to team (either directly or indirectly) with Microsoft, Mono, and Adobe enable every browser and every device that runs Flash to also be able to run applications written in Python, Ruby, and the latest version of JavaScript.
You know, people use the term RSS like Kleenex. As long as Sun isn’t using RSS 2.0 to disclose financial results, it will probably be OK. Let’s take a look
Antonio Cangiano: If you prefer Atom 1.0 over RSS 2.0 (you should), this brief post will tell you how to migrate to FeedBurner and Atom.
That brings to 32 the number of Atom 1.0 feeds that I am subscribed to which are served from feedburner.com. I haven’t counted, but a number of others take advantage of the free MyBrand service.
Robert Scoble: what really is cooking here is that RSS has been moved to big companies to control.
Apparently a previous draft said “stolen”.
Whether Scoble meant actively stolen, or simply moved, either imply that the spec isn’t where it used to be. A quick Google, Yahoo!, or Windows Live search reveals that it still is where it has been for over 4 years.
Oh, and as to the recent spec “clarification” that was recently made to the alternate specification that also happens to call itself RSS 2.0? FeedBurner’s CTO voted against it.
Eric Siu: which file would i edit to remove the blank line? ... seems that a lot of them are having similar problems, but there is no solution yet
Unfortunately, the people who are subscribed to feedvalidator-users don’t seem to know the answers to questions such as these. I know, I don’t. But perhaps one of the readers of this weblog does.
Jeff Hodges: I was recruited after Bob Aman of FeedTools fame saw me hyping my translation of Mark Pilgrim’s FeedParser from Python to Ruby, and thought it was pretty good. The translation, of course, is called rFeedParser and it really is pretty good. I’ll have a post on that soon. First, I want to fix the silly options bugs that I was turned on to a little while ago.
Much as we watched with amusement in the late 80’s and early 90’s while the PC’s reinvented mainframe operating systems, and thought to ourselves that any day now they will discover Virtual Machines; we seem to be in a period where the web is rediscovering Data Management, and thinking to ourselves that any day now they will discover Data Warehousing, though this time it will be without a fetish for Data Integrity as in the Web 2.0 world one size does not fit all.
Paul Fremantle: Another software problem that can never be solved by adding another layer of indirection is providing a simple, transparent and easy-to-use code [via Stefan Tilkov]
Matt Mullenweg: PHP core has never shown any particular regard for its biggest apps, as evidenced by the above bug and others, so I’m not sure why we should go out of our way to promote their upgrade. [emphasis added]
Approximately eight years ago I submitted an outline of an approach to integrate PHP4 with Java. The response was (in essence, I can’t find it at the moment) “here is your CVS account”. This fundamentally changed my notion of us and them.
olivier Théreaux: Use it for Quality Control! Unlike HTML engines, XML processors are supposed to be very strict with the syntax they accept.
Let’s just see how strict XML processors really are. The feed is well formed. It even is valid Atom 1.0. Unfortunately, the summary is escaped too much, and parts of the content are not escaped enough.
Neither appear to be directly related to the ASF Open Letter, but it does appear that the general consciousness as to the rampant proclivity of spec leads to create egregious JSPA violations has been raised.
It is time for a new version of the JCP. One in which license terms are declared up front, so as to not waste anybody’s time. One in which the specification process itself is “default open” as opposed to “default closed” as it originally was, and not up to the unchallenged whim of the spec lead as it currently is. One in which access to TCKs do not require an NDA — after all the harness is now open source, so that particular excuse is now gone.
Clay Shirky: you will make more accurate predictions about software and — in this web drive world — about services if you ask yourself not “what’s the business model?”, but rather “do the people who like it take care of each other?”. That turns out to be the better predictor of longevity.
Simon Phipps: I’m with Dalibor, and asserting that regularly FWIW. On the subject of NDAs, note that it’s not neccessarily Sun that requires for them (it can easily be a requirement for participation by one of the other EG members) and I think it’s a mistake to tie this issue to the Harmony JCK issue.
I can’t conceive of any way in which an package can be “open source” and require an NDA.
WRAL: American Airlines said Thursday that it would upgrade its flight from Raleigh-Durham International Airport to London by moving the flight from Gatwick to Heathrow Airport.
Andrew C. Oliver: This leaves Apache in a bad spot. Continue to participate in a process that restricts a collaborative, consensus based development process by forking its communities into NDA-haves and have-nots and potentially prevents Apache from licensing its software as something meeting the Open Source Definition or disavow itself of this process and leave its projects in an unblessed state (they can continue to implement the specs as released to the public but not participate in their design) or potentially a third fragmentary response where restrictions are accepted for some projects (particularly those that do not restrict “Field Of Use").
Should the ASF vote no "NO” on a JSR Review ballot that established Sun to be the spec lead on yet another JSR? Oh no, says Henning Schmiedehausen, as that would be going nuclear.
Should the ASF cease the practice of requesting TCKs under condition of NDAs? Oh no, says Bill Barker, as that would be taking the nuclear option.
Richard Stallman: GPLv3 is now compatible with the Apache 2.0 license
With apologies to Inigo: You keep using that word — I do not think it means what you think it means.
The GPL V3 license is compatible with the ASF V2 license in precisely the same way that blood type AB is compatible with blood type O.
Note: I’m not saying that’s a bad thing. In fact, this change will positively benefit many. I just think that it expressing this complex concept by using a word that has multiple — dare I say it — incompatible meanings will only promote confusion.
Seth Gordon: Planet (http://www.planetplanet.org/) was designed to crawl all the feeds on the blogroll and produce some appropriately formatted HTML page with all their contents; you could just set it up so it only read your own blog’s mod_atom feed, make some appropriate template, and voila!
That would certainly cover the front page, but that’s about it.
Fortunately, there are bits and pieces that cover the rest.
Chris Oliver: It’ll take a few more days before we post the code to OpenJFX, but in the meantime here’s a preview of the latest version of our SVG to FX translator. The translator converts an SVG document into a single JavaFX class.
Here’s a few tests that produce unexpected results:
Miguel de Icaza: The past 21 days have been some of the most intense hacking days that I have ever had and the same goes for my team that worked 12 to 16 hours per day every single day --including weekends-- to implement Silverlight for Linux in record time. We call this effort Moonlight.
It looks like my little demo was used to help debug. Sweet!
M. David Peterson: WebW3S is Microsoft’s answer to a RESTful web publishing protocol. In many ways it attempts to tackle the same problems solved by the Atom Publishing Protocol.
I took a look at “Web Structured, Schema’d & Searchable”, and found Structure, but was unable to find the Web, Schema, or Search.
Doug Purdy: I have pulled down my old content (both Radio Userland and DasBlog), written a “bare-bones” APP implementation (which you are viewing now), and will attempt to focus on more serious topics, rather than my usually strain on banal postings. We’ll see how it goes.
Anne Thomas Manes: Notice that the URL contains a method name (getInfo) and query string containing the method parameters. This is NOT REST!
It is statements like the one I quoted above that tempers my enthusiasm when I hear that Burton sees the future of SOA and it is REST. Until we can agree on what the term REST means, we’re just replacing one meaningless buzzword with another.
This is not about the gorgeous new pond, at least not directly. Though you are welcome to marvel at it. I personally find the setting to be very serene, relaxing, and... dare I say it... restful.
No, this is about the porch behind the pond. At the moment, in addition to some temporarily displaced yard art, there are two new hammock chairs. Both have been slept in.
Now I am considering adding mosquito netting. This place sounds promising, though I welcome other recommendations.
I have yet to see an Atom/APP implementation application that is identity aware. That is, a server that has intrinsic user knowledge with regards to roles, authorization, authentication mechanisms and user relationships
Most certainly not file based. Resources posted and fetched are stored in the directory thus enabling synchronization, access control, search, etc.
Scott Adams: The Cheesecake Factory is a great business model, but if you take your wife there for your 25th wedding anniversary, you might not reach your 26th.
While we did go to the Cheesecake Factory for Valentines day, luckily we went to the Angus Barn for our 25th anniversary.
Whew. That was a close one. Perhaps we will make it 26 after all.
A while back, I commented that I would likely backport Jacques’s sanitizer to Python. I still haven’t gotten around to that, but I have ported it to html5lib (source, tests).
Rogers Cadenhead: If Randy wants to change elements to “elements and attributes” as a spec clarification, I’m comfortable solving the problem in that manner.
What does this mean? Essentially it means that the Ruby implementation is approaching functional parity with the Python implementation, where the accuracy of the preceding statement is a function by the unit test code coverage.
That question wasn’t chosen as it identifies a random HTTP header that happens to have a very practical benefit. That question was chosen as it identifies an HTTP header than happens to have a practical benefit AND requires one to pierce layers upon layers of “value-add” infrastructure to implement correctly.
Matthew Mullenweg: Full Atom support, including updating our Atom feeds to use the 1.0 standard spec and including an implementation of the Atom Publishing API
Henri Sivonen: I expected that it would make sense to use RELAX NG for expressing virtually all HTML5 conformance requirements that could theoretically be expressed in RELAX NG. This expectation turned out to be incorrect.