RestEchoApiOneUriForEachEntry

Up: RestEchoApiDiscuss

One of the biggest differences between Tim Bray's proposal and the draft RFC is that Tim's assumes each Echo Entry has it's own URI. I didn't make that assumption in my RFC, and had to change the interface to accomodate that. I think this is a great point of discussion, and if we can get a concensus that each Echo Entry deserves it's own URI then I think the API can get even more elegant.

This brings up two distinct questions:

Should each Echo Entry have it's own URI, one you can do a GET to retrieve the XML from?
Should the RestEchoApi allow PUT/POST/DELETE on that URI as part of the API?

[TimBray] To the questions: yes and no. Yes, entries should have URIs. Why? Check the Architecture of the Web draft: the single most important web-architectural principle is that things that matter should have URIs. No, there's no need to support HTTP verbs other than GET for the individual entry URIs. Why not? It's harder and I haven't yet seen any real practical advantages. Route the transactions through the URI of the publication.

[DannyAyers] On that last point - why is it harder? Won't the material at the publication URI have to be unmarshalled before processing and routed/marshalled again after, whereas per-item is direct??

[TimBray] Well, when I create a new post or update an old one, in most publishing systems I'm going to want to route that through my publishing-system logic, right? To build a search index or apply security rules or build a category/date directory or create an AvantGo version, all sorts of things. So since it's really the publishing system I'm asking to do the work, it seems like I ought to post to the publishing system. Put another way, the publishing system is going to want to intercept PUT and DELETE requests anyhow against individual entries, so why not just send them there? As to why harder, it's trivial to set up software to watch a single URI and receive chunks of XML posted to it and do something smart based on what it got; it's more work to configure that PUTs and DELETEs against all the URIs in your webspace all get routed to your software. Not impossible, but (a) trickier, (b) kind of decieving since it's really the pub sys doing the work and (c) doesn't really have any advantages that I can see.

[DaveWarnock RefactorOk] 1. Yes, that includes all types of entry (so it includes comments)

[AsbjornUlsberg, RefactorOk] What I seem to miss is why the publishing-system has to have one URI and can't handle all URI's on a web site. With mod_rewrite on Apache, it's rediculously easy to handle all requests and rewrite them into another, which the publishing system can receive. E.g. "http://www.example.com/cars/opel/123.html" can be translated in mod_rewrite to "http://www.example.com/publish.cgi?category=opel&article=123".

Also I'm worried about entries and other resources which can't be retrieved over HTTP. The reason for the unretrievability may be because of a firewall, because the resource was produced in an off-web environment (like e.g. MS Word), etc. Therefore I state that we cannot require Echo-resources to have URI's, but we MUST require them to have an globally unique ID, a'la a MessageID's of NNTP articles.

2. More complicated.

I thought PUT is normally suggested for new and replacement content (ie you PUT a document, because GET transfers the state to the client and then you put the new representation pack to the server). Purest REST seems to suggest PUT for new entry and edit entry, this does not fit well with a server allocated URI for new entrys (unless we add an extra step to first ask for the URI to be used, which does not seem sensible here). Therefore our options are

a) use PUT for edit and POST for new b) use PUT for new and edit, but before new use POST to ask for the URI c) use POST for new and edit Of these c) seems the simplest.

DELETE has less problems, it is clear what you want to do and the URI exists. The only problem is that apparently some tools don't support DELETE and many programmers are not used to using it. So we should use DELETE if we wish to be evangelists for REST with strong opinions about doing the right thing, otherwise for pragmatic reasons we should use POST to get going quicker.

Discuss

[TimBray] Timothy Appnel wrote me to point out that when you create an entry, all you need get back with the 201 Created code is the new URI. Lighter-weight simpler. If you want to get the whole thing, that's what GET is for. I suspect TimA will blog it.

GeorgBauer

RefactorOk

[TimBray] For posting comments, why not use the same URI that's used for editing, just as you do for deleting. Then you can just do

...

</comment>

You could have some optional trackback-like fields identifying the source of the comment.

[JeremyGray] If CommentsAreEntries and the back-pointing reference is expressed via some attribute or child element of the <entry> then is it really necessary to wrap the <entry> above in a <comment>? Further, if TrackBacks were to be treated consistently with this then the only difference between a TrackBack and a Comment would be the location to which the new entry is submitted relative to the location of the entry to which it refers, in that a TrackBack is hosted at its author's normal publishing location and a Comment is hosted along with the entry to which it refers.

[NickChalko RefactorOk] I agree because CommentsAreEntries.

[DannyAyers RefactorOk] But this doesn't look much like the syntax used elsewhere, i.e. CommentEntryExample. See related comments in RestEchoApiDiscuss on syntax reuse.

[DaveWarnock RefactorOk] We have consensus that CommentsAreEntries. Tim, you have pointed out above that every entry should have a URI and we post to it. Therefore when we post a comment, it is exactly the same as posting an entry. The 201 is for the comment URI. We post to the comment URI to edit or delete it. Providing other stuff like TrackBack are also implemented as entries then we have an absolutely simple and consistant API by taking what has been suggested for entries.

[MishaDynin RefactorOk] Blogger has distinct notions of posting (adding the post to the database), publishing (posting, rendering pages, and uploading them), and syndication (posting and emailing them out). How do I capture this with REST?

[JoeGregorio] Misha, is the state of an entry (posted, published, syndicated) something that the user would change multiple times? It is something they would like to see when they are editing an Entry?
[MishaDynin] No. These are distinct actions that are independent of a state: when you create an entry you are "posting" it, optionally emailing it out to an address specified in blog settings, and optionally initiating a publish request for the entire blog. The state is not stored beyond the transaction (with the exception of "last published" timestamp for a blog.) The web UI has separate "post" and "post & publish" buttons (there's no per-post syndication control in the web UI.)
Posted but unpublished entries are visible in the editing area but not displayed externally. This is a peculiarity of Blogger.
[JoeGregorio] So "posting" applies to an Entry while publishing (the rendering and emailing part) applies to the whole weblog?

[JustinErenkrantz RefactorOk] I disagree with TimBray's comment that it is harder to have a single URI receive chunks of XML posted to it than to issue DELETE directly to the URIs of the entries themselves. In an Apache httpd-centric implementation, you can use a handler/script that relies on r->path_info, which is the 'unprocessed' components of the URI. (Most other HTTP servers have similar mechanisms.) Apache HTTP Server's mod_mbox uses this technique to its advantage - it registers the 'root' location and then different methods could be handled or virtual locations served. Therefore, I don't believe this is a significant challenge and such a setup would allow clean matching with the DELETE semantics of HTTP.

[TomasJogin, RefactorOk] What if the software that you use to manage your weblog is constrained to a cgi-bin directory only? I mean, is this a far-fetched not-gonna-happen scenario? One which shouldn't be taken into consideration in the making of this specification? Because, if your hosting provider restricts scripts to certain directories, you are going to have to route publishing operations through a couple of specific scripts in a specific location, not just any URL on the weblog. Furthermore, not all entries have a distinct URL, either. Some are just a-names on a daily, weekly or monthly archive (/weblog/archives/20030707.html#entry14). With that kind of setup, too, you have to route publishing operations through certain scripts, not just send PUT, POST, DELETE or whatever to the permalink URL in question.

Should you be limited to a cgi-bin directory, the simple solution is to put everything (your weblog and related resources) in that directory. Example URI's in this scenario:

http://wellformedweb.org/news/SixPlusOne

[AsbjornUlsberg, RefactorOk] The situation today may be that each feed-resource (entry, content etc) doesn't have distinct URI's attached to them, but I think this can't continue. I can live with <content> not having an URI (except <content> with "src" of course), but <entry> should be retrievable.

As I've stated earlier, not all Echo feeds may be retrievable over HTTP. We may decide that this is unacceptable behavior, but as we haven't yet, we have to concider it. In such cases, each resource should at least have a globally unique ID, so they can be identified by other systems. But if a feed is retrievable over HTTP, al sub-entries of that feed should also be. And the retrievable URI should of course point to a Echo XML version of the resource, not an HTML version. If the resource has alternative views (as HTML is), this should somehow be noted in the Echo XML.

JeremyGray

[AsbjornUlsberg] What I mean with the "alternative view" proposal, is that the situation today -- where you have no clue on what to find at the end of an URL pointing to an entry -- is unbearable. We have to set a standard on what format to expect at the end of an URL, and the format should of course be in Echo/Atom/Whatever XML. Every other view of an entry or other resource than this XML view, is alternative and should be treated accordingly.

It's possible, if not very likely, that resources have many views other than the Echo/XML view. HTML is one of them, but in a transitional period, they may also have different RSS views as well. Plain text is another view. Binary PDF or DOC is another. And the list goes on. The point is that all other formats than Echo/XML is useless in an Echo-environment and -context, and should therefore not be allowed as a resource-format in a default URL. All views or formats of a resource other than Echo/XML are alternative views and can be ignored by a consuming aggregator. The Echo/XML view on the other hand, MUST be understood (of course).

This "alternative view"-idea can be stretched further in saying that Every consuming aggregator service can add an URL of their version of a resource to the resource, so that aggregators of consuming aggregators can go to the closest available aggregator service to get the resource. This can quite easilly be achieved by just plunging in a <link rel="alternative" href="..." />. Every consuming aggregator plunges this into the entry (or feed, or any other resource) somewhere to indicate that We have this resource. Please visit us at this adress to get it!.

[JasonHx] +1 on entries having URIs. Also, to increase wikibility, I'd like to encourage '/' as the virtual file system path indicator in an echo-wiki-space URI and '#' to anchor named links inside a given entry's <content> payload to maintain the dimensions of expression a la HTML and more importantly for API design, to allow for XPath-like functional axes across entries. See Containers for more on this.

A recent demonstration of a Get / Post / Put / Delete model is Syncato.

[AsbjornUlsberg] +1, Jason.

CategoryArchitecture, CategoryModel, CategoryApi, CategoryRest