It’s just data

Long Bets Apologia

My Long Bets post attracted some interesting reactions.  A number of supporters of various items in my list, as well as a few pushbacks.  There’s not much to say about the former, except thanks for the validation.  The remainder of this post will deal with the latter.

REST

James Snell: To say, as Sam and Tim both do, that REST is important is like saying the fan in my laptop is “important”.  There’s really nothing to discuss about it.  RESTful services are fundamentally critical to the continued evolution of the Web.  It just is.  You just need to do things in a RESTful way.  Period.

When PC’s first came out, they ran VisiCalc.  Shortly thereafter people saw the potential of “client/server” whereby a PC could serve as a fancy front end to a legacy backend.  To this day, I still occasionally see a department store or automotive shop with a white box desktop running a 3270 emulator on Windows.

I see today’s REST applications as the moral equivalent of those first tentative client/server steps: i.e., REST today is mainly used as a way to wrap legacy data sources.  I fundamentally believe that there will be a next step in this evolution.  One in which applications are turned inside-out.  An archetypal example of this would be Google.  Yes, Google is a web server (many, actually).  But it is also a web client.  And one that doesn’t passively wait for requests and do what is requested, but one actively seeks out data in anticipation of future requests.

The most concise description of this reconceptualization of the architecture of global applications I have found was by Sean McGrath.  His article was the inspiration for the goofy traffic sign style figure that I have used to identify blog entries such as these.

Hadoop

Dare Obasanjo: A Search Optimized Architecture isn’t for Everyone

Continuing with the train of thought above, I see the role of databases fundamentally changing in the future.  Databases of the future will fundamentally internalize the Fallacies of Distributed Computing.  In particular, they will grok the inevitability of latency.  And not be locked into the data silo thinking that characterizes databases today.

Whether the database is on a cell phone or an a cloud, the scale and velocity of the data will be such that the database itself is “merely” an index or cache, and yes, optimized for search.

Foreign keys will no longer be ways to reliably locate other data within the same database.  They will instead provide hints to where the data you are seeking either might be, or once was, and in any case the master copy of that data will be outside of the database.  At this point, I imagine there are a lot of Semantic Web cognoscenti nodding in approval.

Much as POJOs are winning out over EJBs, and REST is winning out over WS-*, I think it quite likely that File System like designs will begin to replace traditional RDBs.

I’ll close this section by observing that files on a DFS tend to be more denormalized and more granular than rows or tuples.

Erlang/OTP

Anant Jhingran: how many more programming languages, however pure, does the world need, Sam?

Tim Bray: I just don’t think so. ... It’s too weird, and in my brief experiments, the implementation shows its age; we have in fact learned some things about software since way back then. And anyhow, I worry less about concurrency these days. The right way get the most mileage out of something like our T2 is load it up with a bunch of process-granular PHP or Rails or Django jobs. Which burns memory, but who cares? Or alternatively, to run something like Java EE, stay away from application-level threads, and let the JVM sweat the concurrency.

To Tim: Green Hardware needs Green Software.

To Anant: An analogy.

Java is a language.  Java is a Virtual Machine.  Java is a class library.

Java as a language is passé.  Java as a Virtual Machine still has legs.  Java as a class library has robust and battle tested (though often over-engineered) classes for pretty much everything you might want.  Recent example that I stumbled across: digital signatures.

Erlang is pretty much the same way to me.  I don’t care if you both think that the syntax of the language is uglier than PHP’s.  I haven’t looked at the VM, nor is that what excites me.

Note that I didn’t say Erlang. I said Erlang/OTP. I challenge both of you to read this short white paper on Mnesia.  In paticular, I would like to draw your attention to sections 1 and 5.  Let me know what you think.

Jabber

I didn’t get much pushback here, but I’m finding that as I implement more active databases, I’m finding a need for application initiated paging / text messsaging / IM capabilities.  These capabilities need to be able traverse firewalls and NAT devices, and work with IM clients and cell phones.

All of these requirements point me in the direction of Google Talk.

Additionally, just like the web encompasses both the human web and programmable web, we need a messaging infrastructure that is not only human friendly but also bot friendly.  In this area Jabber doesn’t disappoint.

Microformats

James Snell: sure, conventions for class attributes are useful but they’re not groundbreaking. Folks should use Microformats as a best practice but that’s about as exciting as it gets.

In my thinking on this subject, this came up three separate times.

One implication of pull based architectures is that you have to resign yourself to the idea that you will always be a bottom feeder.  You rarely get to dictate the format of the data that you are given; the most you can do build positive feedback loops which encourage small amounts of metadata to piggy back over the carrier signal that the pre-existing data formats provide.

Second is in APP.  While feeds are pure pull, APP is full duplex.  Yet while the underlying format is extensible, my intuition is that again viewing the (X)HTML content as a carrier signal will be more robust in that it will survive intact with a greater variety of both hosts and user agents.

Finally, there is Jabber, and I believe that the same dynamics will apply here.  Jabber too is extensible, but using XHTML-IM as a carrier signal will both allow existing clients and more fully metadata aware clients to get value out of the same datastream.  While the surface area of such messages will continue to be limited to ten dozen or so visible characters, I can totally see hCard and Geo data piggy backing on the message, ready to be exploited by bots and cell phones alike.

Oh, and for the record, I consider RDFa in this category.


[from manuel] Sam Ruby: Long Bets Apologia

Ruby is really rolling these days (and de hOra too)...

Excerpt from del.icio.us/network/coty at

Interesting... I disagreed with your Microformats assessment mainly because I was thinking of web pages. If people wanted a damn calendar or contact info, they’d just click a different link. Embedding was kind of silly, especially considering how web designers can be control freaks about naming conventions, etc.

However, using XHTML-IM over Jabber is a perfect use case: send the same data blob to multiple clients, and have them render differently if they can.

Also, REST is something of a dead horse. Everybody says its inevitable... why should we pay attention now? Regarding James’s comment: “RESTful services are fundamentally critical to the continued evolution of the Web.  It just is.  You just need to do things in a RESTful way.  Period.”

By that logic, Esperanto is critical to the evolution of the web. Period.

There’s still plenty of life and value in SOAP and SOA, somewhat less in the WS-* stack... Perhaps we’ll have something better than REST by the time change is needed... Like something that supports more than four verbs, more than a dozen error codes, and reliably encoded HTTP headers. More rants here: SOAP vs REST, Part 2

Posted by bex at

Is PHP Doomed?

There’s a tide of opinion suggesting that PHP is threatened with extinction as standard hardware begins to favour widening multicore processors. This would be a fair argument if PHP was a general......

Excerpt from maetl_ at

One more thing...

Besides MNesia, any reason to use Erlang over Stackless Python? The benchmark challenge in chapter 8 of the new Erlang book appears to favor Stackless Python:

erlang-vs-stackless-python-a-first-benchmark

MNesia is cool... but does the world need it PLUS a Hadoop-based database?

Posted by bex at

more about db futures

i must be tuning in to this notion of scalable data stores of the future being more like DFS and RDB. here’s a quote from sam ruby today that caught my eye: Foreign keys will no longer be ways to reliably locate other data within the same database....

Excerpt from mca blog at

There’s still plenty of life and value in SOAP and SOA

Agreed!  See, for instance, OASIS forms six committees to simplify SOA.  You just don’t see that kind of forward-thinking committee-making from the REST advocates.

Like something that supports more than four verbs

Hear hear!  REST should take a cue or two from Java(TM); see Execution in the kingdom of nouns.

Posted by Mark at

XMPP is very versatile.

It could/will be used to transport ATOM notifications. Hopefully, Atompub-notify-XMPP will be implemented into APP publishing tools, perhaps with the ATOM threading extension.

Embedding RDFa in XHTML-IM might be very interesting also, probably better than microformats which seem less rigorously defined than RDFa.

Using XMPP with SPARQL could be very interesting too, although it would require a GUI to ease its use for profane users ; but with XMPP flexible (ad-hoc) data forms, it might not be too difficult.

It could also be used to create Wide-Area Networks, like VPN-over-XMPP and VNC-over-XMPP or Zeroconf-over-XMPP to easily share our data with our buddies, for example via the DAAP protocol used by iTunes.

I’m also thinking to SyncML events pushed over XMPP.

All we need is to make it mainstream and popular - currently, people know MSN or Yahoo Messenger but few GTalk users know that they use Jabber - and more implementations, especially PubSub and Atompub-notify-XMPP.

BTW, I have a disguised LazyWeb request for eventual hackers - :) - about of a combination of FOAF, PEP/PubSub, Atompub-notify-XMPP and a bit of OpenID to unify the publication of social networks presence streams.

Posted by kael at

The Future’s Uncertain…

Colleague Matt Davey posts on Lucene.Net and HADOOP. Sam Ruby details his Long Bets including HADOOP, REST, ERLANG/OTP, JABBER and MicroFormats. Elaborating on HADOOP, he sees the role of databases fundamentally changing Whether the...

Excerpt from Development in a Blink at

I’ve not been following it too closely, but the stackless python comparison has been addressed on the erlang-questions list ([link])

The conclusion there seems to be pretty much that the benchmark had some flaws. :-)

Anyway, one’s choice of python or erlang may come down to which libraries you want to use, as Sam kind of gets at. Erlang/OTP to my knowledge has a lot more in the way of distributed concurrency libraries. Python probably has a lot more in the way of doing stuff within any given process.

Or maybe the post-modern thing is to use both in the right spots.

Posted by Patrick Logan at

I hear much talk about AtomPub (a.k.a. APP) these days and from my understanding it sounds like an interesting move beyond WebDAV, and a nice open standard against publication methods from MovableType and the like, I have not yet found a server implementation outside of abdera and exist in Java to play with. But plenty of clients are readily findable.
Perhaps I’m not looking hard enough?

Found while writing the comment... python and perl have implementations. Still hoping for ruby.

Posted by Scytrin at

The Amazing Gloppita Gloppita Machine

As seen in a comment on Sam Ruby’s blog ... There’s still plenty of life and value in SOAP and SOA Agreed! See, for instance, OASIS forms six committees to simplify SOA . You just don’t see that kind of forward-thinking committee-making from the...

Excerpt from Making it stick. at

If you’re playing with XMPP, you might find it amusing to trace the XML over the wire. Notice how the document root is opened at the handshake, with subelements being passed during chat. In flight the XMPP stream is malformed. The document is closed when the chat closes.

I have to say though, whatever REST as-in-HTTP can’t support, XMPP probably covers off. I’ve done systems integration with XMPP, and would not hesitate to recommend to have that protocol in your toolset.

[link]

Khare has more or less documented the architectural properties that XMPP can cover off in his PhD about ARRESTED:

[link]

Oh, consider not calling XMPP  “Jabber” - it’s a bit like using Atom and calling it “RSS2.0” :)

[link]

Posted by Bill de hÓra at

links for 2007-08-15

Energy Prices State by State - Data Center Knowledge this puts a significant dent in my datacenters-to-Maine plan (tags: energy prices costs datacenters power via:isabel) Discovery Channel :: News - Animals :: Squirrels Outwit Rattlesnakes in...

Excerpt from tecosystems at

"The conclusion there seems to be pretty much that the benchmark had some flaws. :-)"

hehehe... naturally... either way, people need to pick a horse. Erlang, or Python. Not both.

Here’s the rub: Erlang kind of does away with the N-tier model, in favor of a cloud. Thus, a lot of the ways people write web apps these days is completely wrong for Erlang.

For example, if the back-end is Erlang, but the front-end is Ruby on Rails, you lose all the benefits of Erlang. There’s no way Ruby’s front end could keep up, and would always limit the amount of data that gets to the Erlang “tier”. Any bridge between Erlang and another language/app/library will be the scalability bottleneck. Maybe that’s a minor issue for massive data processing, but not for things like Twitter.

Ideally, Erlang needs to be front-end AND back-end... no non-native libraries.

Posted by bex at

Is XMPP the right choice? I think that as an architecture, the answer is probably yes, but the existing implementations are more focussed on chat-style applications, not data exchange (or maybe that’s where REST comes in). At my company last year, we experimented with Jabber but found the “karma” systems in XMPP servers getting in the way of what we needed to push SOAP calls (REST was not and is not appropriate due to technology limitations) over the wire.

Aren’t systems like Twitter’s approach be used for application presence and messaging viable alternatives to XMPP/Jabber stacks?

Posted by Austin Ziegler at

Is XMPP the right choice? ... we needed to push SOAP calls ... over the wire.

If this was SOAP/RPC (or even Doc/Lit with a typical Request/response use pattern), then no, I would not expect that XMPP would be the right choice for your needs.

Posted by Sam Ruby at

Sam Ruby: Long Bets Apologia

Erlang is pretty much the same way to me. I don’t care if you both think that the syntax of the language is uglier than PHP’s. I haven’t looked at the VM, nor is that what excites me. Note that I didn’t say Erlang. I said Erlang/OTP. I chal...

Excerpt from del.icio.us/tag/erlang at

Sam: it is used as an RPC-style mechanism (doc/lit request-response) with .NET and gSOAP (C++) where the “client” is a web server and the “server” is an agent; we needed to turn the command-and-control mechanism on its head, so we looked at XMPP. XMPP itself was the wrong choice for our solution, but the implementation we have would be architecturally familiar to anyone who already knows XMPP. (Since we’re not using XMPP directly, we optimized our implementation, but we still use presence, message identifiers, etc.)

Posted by Austin Ziegler at

Erlang on IBM blogs

I was checking up on Anant Jhingran’s blog and noted that he mentioned Erlang so going back up to the top realized that he was discussing a post from Sam Ruby . Sam is discussing a set of “long bets” although some seem pretty short bets to me. It...

Excerpt from Simon Johnston at

Peter Saint-Andre: Betting Long

Sam Ruby includes Jabber in his recent long bets for the five technologies that will be especially influential over the next five to ten years. Tim Bray concurs . Sam further explains : Jabber I didn’t get much pushback here, but I’m finding that...

Excerpt from Planet Jabber at

SIF & Jabber

Sam Ruby’s elaboration on his latest technological “Long Bets” caused me to lose a few hours of work yesterday. Ruby includes Jabber in his five “bets,” which reminded me of how much better suited the Jabber protocol is for doing the kind of...

Excerpt from Tuttle SVC at

Yet Another Programming Weblog: Sam Ruby y Erlang

Asisto con una mezcla de envidia y escepticismo a la conversión a Erlang de Sam Ruby . Sam Ruby es un típico goleor tecnológico ™ Siempre está a la última en cuanto a estándares y lenguajes sobre la web. De hecho, algunos los hace él :) Como...

Excerpt from Planeta Código at

Blogosphäre (aus JavaSPEKTRUM 05/07)

Mashups als EAI 2.0, Atom, REST, Hadoop, Jabber und Erlang/OTP — dieses Mal wagen wir mit der Blogosphäre einen Blick in die Kristallkugel. Sam Ruby zeigt, wie man auch ohne XML und SOAP elegant “Web Services” bauen kann. Der...

Excerpt from JavaSPEKTRUM Blogosphäre at

Dynamo, Hadoop, Memcached, And Groundhog Day!

Apparently, just like Google, Amazon realized that silly things like transactions don’t scale , neither do auto-incrementing counters, nor table joins... so they made some kind of wacky database called Amazon Dynamo that doesn’t require them. The...

Excerpt from Bex Huff - ... technology, lifehacks, and all that good stuff at

On Exactitude in (Computer) Science

More thoughts on Sam’s Long Bets , with the emphasis on giving up on write consistency , and settle for incomplete, inconsistent views: Brian ‘Bex’ Huff : For example: what’s on the internet right... NOW? OK, how about... NOW? How about NOW?...

Excerpt from Boxes and Glue at

Erlang on IBM blogs

I was checking up on Anant Jhingran’s blog and noted that he mentioned Erlang so going back up to the top realized that he was discussing a post from Sam Ruby . Sam is discussing a set of “long bets” although some seem pretty short bets to me. It...

Excerpt from Simon Johnston at

Add your comment