This page is spun off from AggregatorBehaviorRules. Moot or archived discussions can go in AggregatorApiArchived.


The purpose of AggregatorApi is to provide more efficient delivery of Necho/Atom data to clients. In particular, only changes will be delivered along the channel.

Functionally, the end result should be at least as powerful as polling "flat" XML files.

Getting up to speed with this page/where we are

This section will contain a summary of issues that have been resolved further down the page.

Things that we have agreed upon so far

Things that we have mostly agreed upon

Things that we are working on

Goals / Design Points

Issues to raise with respect to editing

These should bring to light other important things that are being swept under the rug at present, such as:

Aggregator use cases

We need a list of scenarios that we're trying to encompass in order to be able to accurately define the API operations we'll need to complete this work. Throughout we will ignore comments, trackbacks and so forth, since work is ongoing elsewhere to combine them with entries in the ConceptualModel.

Aggregator <-> SimpleProducer use cases

Efficient update of feed knowledge

An aggregator has an existing copy of a feed, which is probably out of date. We want to update the feed in a bandwidth-efficient fashion, ideally without being too processor-intensive.

(Note: to my knowledge, nowhere is anyone currently putting forward a concrete proposal for an expiry mechanism. RSS has some solutions, and HTTP has an expiry mechanism which can give hints but isn't really designed for consuming user agents. The requirement is noted in AggregatorBehaviorRules.)

When considering a feed, the following changes can happen:

The last two are distinct: a deleted entry no longer exists (its permalink no longer functions); an expired entry is no longer considered 'recent' (ie: will no longer exist in the main feed, because it's too old / there are too many more recent entries).

I would argue that expiring entries isn't necessary here, because entries disappear off the main feed largely to avoid the main feed getting so huge that transferring it isn't feasible. There seems no point in wasting bandwidth to say "this is no longer on my front page" (effectively).

Aggregator <-> SuperProducer use cases

Efficient update of feed knowledge across multiple feeds

Similar to updating a single feed efficiently, if an aggregator is fetching several feeds from one site, it could combine the request for updates into one. Note that in the current EchoExample, there isn't a way of knowing what this feed is, so if they are all available for query via a single SuperAggregatorApi URL, there'll need to be a standard way of specifying them. (eg: [primary] URL of the main feed?)

Efficient transfer of feed request

If we're asking for many feeds in one go, it would be nice not to have to explicitly specify the feeds each time.

Aggregator <-> SuperAggregator use cases

I think these are largely going to be the same as Aggregator <-> SuperProducer use cases. There may be some specific ones, which can go here ...

Use Authentication to provide additional feed data

Support for feed retrieval API's by aggregators/news readers. A number of mainstream news organizations require registration (or subscription in some cases). These organizations would be more open to public feeds if the feed retrieval passed them the registration data that they apparently want to track.

A tag in the header of a feed could indicate that authentication was required, and then have an API for retrieving the feed with this additional data. It can be envisioned pointing an aggregator at a site to subscribe to it, and having a login dialog pop up on my machine. 'This site requires a user name and password', the user enters same, and the aggregator stores this information with the subscription. Subsequent feed refreshes use the API to retrieve the feed rather than the non-authenticated http request.

This would open up channels like the Wall Street Journal to providing feeds. It also provides the basis for a myYahoo feed.

[Refactored from a suggestion by JoshJacobs.]

Re-use of existing authentication mechanisms

For requesting the main feed document, it may be possible to use standard HTTP authentication mechanisms; certainly aggregators should implement this (as noted in AggregatorBehaviorRules) even if other solutions are preferable. Within the AggregatorApi, the authentication mechanism of the AtomApi we build in can be used. Hopefully this should prevent our having to invent new mechanisms, although we should ensure that any specification us clear on these matters, and that there are examples of such use.

AggregatorApi Operations

We need only a few operations:

Discussion about what is returned

[KenMacLeod] I'm pretty much in favor of an API approach to querying (GET/POST with url-form parameters for subtyping the query). It should be noted, however, that some weblog software only produce static sites (FTP hosted websites, for example) or where dynamic queries would be a burden and they'd prefer to snapshot certain types of queries (a la "a syndication feed"). In those cases, some sort of static fallback should be defined.

[JeremyGray RefactorOk] Joe's current spec includes an XML representation of search results which currently appears to return a list of entry identifiers instead of full entries. There's probably room (and situations) for either type of returned XML, so perhaps it might be best to try to use the same search mechanism but with added controls that select the type of returned XML. Any thoughts?

[DavidJanes RefactorOk] Check out the multiple feeds section of EchoFeed. I think this is what we're looking for in terms of a result string. I'm hoping to throw together a strawman soon, I can get some spare time. I was thinking conceptual of three levels of service: level 0 -- flat files, level 1 -- incremental delivery, but basically it's flat files effeciently delivered, level 2 -- "something more complex" -- being able to do things like deliver an updated comment or metadata within an entry without resending the entire entry. You'll probably want a better explanation than this, but I'm rushing for work :)



The basic operations for the SuperAggregatorApi are

Subscriptions allows the client to select which feeds it is interested, so it does not need to be sent every time. This could be a "real" operation or it could be some sort of "user preference" (debate?)

[MartinAtkins : RefactorOk] How does a client know whether it's dealing with a simple feed or an aggregate feed provider? Providing a UI in an aggregator to send subscription requests to a simple feed would be counter-intuitive to users, who shouldn't really have to know a great deal about what's going on under the hood.


Discussion about impact of multiple feeds

[JamesAylett RefactorOk] Probably allow different timestamps for different feeds, eg: <feed id='...' last-updated='...'/><feed id='...' last-updated='...'/>. My rationale here is that if I'm doing an hourly (say) sweep in my aggregator, and I updated one five minutes ago but there's another feed on the same URI I'm going to update anyway, I'd probably want to bundle it in just in case (where I wouln't have bothered fetching the main feed again). This could happen with manual updates of feeds. Probably won't be needed for a SuperAggregator.

Find me a home

[MartinAtkins : RefactorOk] I consider it good that people are starting to think higher-level than HTTP. HTTP works well for atomic entities, but something more fine-grained would definitely be a boon for Atom syndication which has "entries", a smaller item than the "feed". Part of this is to realise that HTTP proxies aren't going to do as well as something more Atom-specific which has knowledge of the concepts of Atom and can cache up at the entry level. See AggregateFeeds for (hopefully at some point) discussion on an aggregation/proxying layer for Atom.

[JamesAylett DavidJanes DeleteOk]

[FrançoisGranger DeleteOk] I think some of you probably already read this

CategoryArchitecture CategoryApi