It’s just data

Long Bets Apologia

My Long Bets post attracted some interesting reactions.  A number of supporters of various items in my list, as well as a few pushbacks.  There’s not much to say about the former, except thanks for the validation.  The remainder of this post will deal with the latter.


James Snell: To say, as Sam and Tim both do, that REST is important is like saying the fan in my laptop is “important”.  There’s really nothing to discuss about it.  RESTful services are fundamentally critical to the continued evolution of the Web.  It just is.  You just need to do things in a RESTful way.  Period.

When PC’s first came out, they ran VisiCalc.  Shortly thereafter people saw the potential of “client/server” whereby a PC could serve as a fancy front end to a legacy backend.  To this day, I still occasionally see a department store or automotive shop with a white box desktop running a 3270 emulator on Windows.

I see today’s REST applications as the moral equivalent of those first tentative client/server steps: i.e., REST today is mainly used as a way to wrap legacy data sources.  I fundamentally believe that there will be a next step in this evolution.  One in which applications are turned inside-out.  An archetypal example of this would be Google.  Yes, Google is a web server (many, actually).  But it is also a web client.  And one that doesn’t passively wait for requests and do what is requested, but one actively seeks out data in anticipation of future requests.

The most concise description of this reconceptualization of the architecture of global applications I have found was by Sean McGrath.  His article was the inspiration for the goofy traffic sign style figure that I have used to identify blog entries such as these.


Dare Obasanjo: A Search Optimized Architecture isn’t for Everyone

Continuing with the train of thought above, I see the role of databases fundamentally changing in the future.  Databases of the future will fundamentally internalize the Fallacies of Distributed Computing.  In particular, they will grok the inevitability of latency.  And not be locked into the data silo thinking that characterizes databases today.

Whether the database is on a cell phone or an a cloud, the scale and velocity of the data will be such that the database itself is “merely” an index or cache, and yes, optimized for search.

Foreign keys will no longer be ways to reliably locate other data within the same database.  They will instead provide hints to where the data you are seeking either might be, or once was, and in any case the master copy of that data will be outside of the database.  At this point, I imagine there are a lot of Semantic Web cognoscenti nodding in approval.

Much as POJOs are winning out over EJBs, and REST is winning out over WS-*, I think it quite likely that File System like designs will begin to replace traditional RDBs.

I’ll close this section by observing that files on a DFS tend to be more denormalized and more granular than rows or tuples.


Anant Jhingran: how many more programming languages, however pure, does the world need, Sam?

Tim Bray: I just don’t think so. ... It’s too weird, and in my brief experiments, the implementation shows its age; we have in fact learned some things about software since way back then. And anyhow, I worry less about concurrency these days. The right way get the most mileage out of something like our T2 is load it up with a bunch of process-granular PHP or Rails or Django jobs. Which burns memory, but who cares? Or alternatively, to run something like Java EE, stay away from application-level threads, and let the JVM sweat the concurrency.

To Tim: Green Hardware needs Green Software.

To Anant: An analogy.

Java is a language.  Java is a Virtual Machine.  Java is a class library.

Java as a language is passé.  Java as a Virtual Machine still has legs.  Java as a class library has robust and battle tested (though often over-engineered) classes for pretty much everything you might want.  Recent example that I stumbled across: digital signatures.

Erlang is pretty much the same way to me.  I don’t care if you both think that the syntax of the language is uglier than PHP’s.  I haven’t looked at the VM, nor is that what excites me.

Note that I didn’t say Erlang. I said Erlang/OTP. I challenge both of you to read this short white paper on Mnesia.  In paticular, I would like to draw your attention to sections 1 and 5.  Let me know what you think.


I didn’t get much pushback here, but I’m finding that as I implement more active databases, I’m finding a need for application initiated paging / text messsaging / IM capabilities.  These capabilities need to be able traverse firewalls and NAT devices, and work with IM clients and cell phones.

All of these requirements point me in the direction of Google Talk.

Additionally, just like the web encompasses both the human web and programmable web, we need a messaging infrastructure that is not only human friendly but also bot friendly.  In this area Jabber doesn’t disappoint.


James Snell: sure, conventions for class attributes are useful but they’re not groundbreaking. Folks should use Microformats as a best practice but that’s about as exciting as it gets.

In my thinking on this subject, this came up three separate times.

One implication of pull based architectures is that you have to resign yourself to the idea that you will always be a bottom feeder.  You rarely get to dictate the format of the data that you are given; the most you can do build positive feedback loops which encourage small amounts of metadata to piggy back over the carrier signal that the pre-existing data formats provide.

Second is in APP.  While feeds are pure pull, APP is full duplex.  Yet while the underlying format is extensible, my intuition is that again viewing the (X)HTML content as a carrier signal will be more robust in that it will survive intact with a greater variety of both hosts and user agents.

Finally, there is Jabber, and I believe that the same dynamics will apply here.  Jabber too is extensible, but using XHTML-IM as a carrier signal will both allow existing clients and more fully metadata aware clients to get value out of the same datastream.  While the surface area of such messages will continue to be limited to ten dozen or so visible characters, I can totally see hCard and Geo data piggy backing on the message, ready to be exploited by bots and cell phones alike.

Oh, and for the record, I consider RDFa in this category.