mod speedyfeed

Garrett Rooney: What does it do?  In short, it allows you to only send new entries in your Atom feeds down to the clients. The client program adds a few HTTP headers (a If-Modified-Since to tell you what the last time they got was and an A-IM that indicates you support the 'feed' IM) and things just magically work.  Best of all, the content that's sent down, while smaller, remains valid Atom XML, so no real change is needed on the client side other than sending the new headers.

It looks to me that the delta required for clients that already support If-Modified-Since is to add exactly one header on the request: A-IM: feed.  This would work together with support for things like gzip, making the total request look something like this:

GET /asdf/atom.xml HTTP/1.1
Host: asdf.blogs.com
If-Modified-Since: Fri, 17 Sep 2004 00:18:36 GMT
Accept-Encoding: gzip
A-IM: feed

... with the last line being the only addition.  Note: compression techniques (like gzip) can be handled either as content-encoding or instance-manipulation.

Now, the question is: can we get one or more major hosting providers to deploy this filter (and patch)?  And clients to add this one line?

Update: Removed gzip from A-IM example above.  It is legal to include it there; but for now, let's settle on the example above as being recommendation for the moment.


Is there really a need for the 'A-IM' line? Is it just syntactic/protocol sugar? Forcing clients to change their code, especially when there are so many already in the wild, will greatly limit the effectiveness of this.

Posted by Mark Fletcher at

Mark, what amazes me is that we are less than a week away from FooCamp, and yet we are in a position to find out.

I do still firmly believe that a server installation is within its rights to vary the number of entries it sends back based on whatever criteria it chooses.

Posted by Sam Ruby at

The reason for the A-IM line is primarily (as far as I can tell) because this subtly changes the behavior of the feed, and you don't want to break clients who may have been relying on the fact that they used to get all the entries in a feed each time.  By requiring that the client specify A-IM: feed we force them to opt in to our little scheme.  It's not absolutely necessary, they'll still always get a complete, valid Atom feed each time they request one, it's just a little bit of backward compatibility paranoia at work.

Posted by Garrett Rooney at

Are there plans for an Apache 1.3 module?

Posted by Mark at

Unfortunately, it's not really possible to do what I'm doing in mod_speedyfeed in Apache 1.3.

The filters mechanism is one of the "cool new features" of Apache 2, and this is implemented as a filter.

I'm sure there are ways one could provide similar functionality in Apache 1.3, perhaps by serving up your feeds via a handler module or something, but it would probably require quite a bit more fiddling around to get up and running, where the current module "just works" for anything you serve up with the appropriate content type, with no real configuration at all.

That said, there's nothing keeping someone from doing something similar to this in Apache 1.3, I just don't see myself doing the work any time soon.

Posted by Garrett Rooney at

mod_speedyfeed

Rooneg: What does it do? In short, it allows you to only send new entries in your Atom feeds down to the clients. The client program adds a few HTTP headers (a If-Modified-Since to tell you what the last time they got was and an A-IM that indicates...

Excerpt from RSS at

Adding an "A-IM: feed" header to FeedDemon is simple enough.  Can anyone provide the URL of a feed which uses this technique so that I can test with it?

BTW, why is it necessary to add "gzip" to the A-IM header when "Accept-Encoding: gzip" has already been specified?

Posted by Nick Bradbury at

You shouldn't have to add the gzip to the A-IM header (in fact, if you do that it'll break mod_speedyfeed at the moment due to a bug in the way it parses the A-IM line, I'll fix it for the next release), it should work just fine with it specified in Accept-Encoding.

In fact, if you only specify it in A-IM it won't do any compression at all, since mod_speedyfeed doesn't know how to do compression, so you still need to put it in Accept-Encoding anyway, just to be safe.

Of course, I haven't actually tested it with compressed feeds, so I have no proof of this, but in theory...

Oh, and unfortunately I don't currently have this up and running anywhere but my laptop.  I'll try to get a version up where people can play with it over the weekend.

Posted by Garrett Rooney at

kellan : mod speedyfeed - Interesting, tracking this for inclusion in Magpie....

Excerpt from HotLinks - Level 1 at

Nick Bradbury asked: "why is it necessary to add "gzip" to the A-IM header when "Accept-Encoding: gzip" has already been specified?"

It's not necessary, but would be an option. The distinction between "Accept-Encoding: gzip" and A-IM header with "gzip" is when the gzipping happens. If you use "Accept-Encoding" then compression is done after all instance manipulation as part of preparing the transfer-encoding for the instance. If you request gzip as part of instance-manipulation, then other IM methods can be applied after the gzip is done. (See: The flowchart in RFC3229 on page 10 and the discussion of the relationship between deltas and ranges in section 4.1)

For instance, the following says: "Do feed instance-manipulation first, then compress the result with gzip."

GET /atom.xml HTTP/1.1
Host: pubsub.com
If-None-Match: "123xyz"
A-IM: feed, gzip

The same result can be obtained by asking for:
GET /atom.xml HTTP/1.1
Host: pubsub.com
Accept-Encoding: gzip
If-None-Match: "123xyz"
A-IM: feed

However, the transfer of the response may have been terminated before it was complete. Thus, RFC3229 allows you to come back again and only get what you didn't get when you tried it the first time. Imagine that you had only received the first 1000 bytes of a larger response when the network connection was broken. You would then make a new request asking for the remainder with something like this:

GET /atom.xml HTTP/1.1
Host: pubsub.com
If-None-Match: "123xyz"
A-IM: feed, gzip, range
Range: 1000-

What that request tells the server is: "First do feed instance-manipulation, then compress using gzip, then only send me the data starting at byte 1000."

As far as I can tell, the only time you would really need to specify gzip as an instance-manipulation method is if you had some sort of instance manipulation that you wanted to do after gzip had been applied. The most common case would be asking for a byte-range as in the example above. Since this is rare, I assume that the vast majority of clients would just rely on the tried-and-true "Accepts-Encoding: gzip" header. However, a client that knew how to process byte-ranges would probably specify gzip in both ways and then let the server decide what to do. Servers that could process ranges would indicate that they had done so by including "gzip, range" in the IM headers of their responses. Other servers would ignore gzip and range in the A-IM header and respond in the normal fashion according to the "Accept-Encoding" header. A sample "do it either way" request would look like:

GET /atom.xml HTTP/1.1
Host: pubsub.com
If-None-Match: "123xyz"
Accept-Encoding: gzip
A-IM: feed, gzip, range
Range: 1000-

In summary: specifying gzip as an instance-manipulation method isn't necessary but could be useful in some cases.

bob wyman

Posted by Bob Wyman at

Universal Feed Parser 3.4 will send the appropriate HTTP headers to request partial feeds: [link]

Posted by Mark at

mod_speedyfeed

Sam Ruby talks about a Apache modification which allows only parts of feeds to be downloaded instead of the whole thing....

Excerpt from Neil's World at


Bob, thanks for the in-depth explanation.  I plan to support "A-IM: feed" in the next build of FeedDemon.

Posted by Nick Bradbury at

As a side note, I'm curious as to how modified items in RSS should be handled.  For example, if an item published on Sept 10 is modified on Sept 15 and the client requests items posted since Sept 15, I'm assuming that it would be included in the delta feed.  However, a desktop aggregator would need to know the item has been modified - but unlike Atom, RSS doesn't have a 'modified' element.

Posted by Nick Bradbury at

Can RFC3229 ease RSS's bandwidth consumption?

Bob Wyman of PubSub has been talking about using RFC3229 to reduce RSS bandwidth requirements, and as Sam Ruby points out, adding support for this in a client-side RSS aggregator is incredibly simple. After looking over the spec and Wyman's...

Excerpt from Nick Bradbury at

Interesting, tracking this for inclusion in Magpie....

Excerpt from LaughingMeme's MLPs at


The Bloglines crawler has been updated to include the 'A-IM: feed' HTTP header line.

Posted by Mark Fletcher at


RSS/Atom Feed Bandwidth Minimization

Bob Wyman recently had a interesting idea about using RFC 3229 in order to minimize the large bandwidth utilization of news feeds. After Sam Ruby investigated the matter, it seems that supporting it would be quite simple in most feed readers....

Excerpt from meeeep at

Cutting Download Size for Atom Feeds

Through Sam Ruby , a pointer to mod_speedyfeed , an Apache 2 module that allows feedreading clients to only download the entries that have changed since the last access instead of the entire RSS file.... [more]

Trackback from Windley's Enterprise Computing Weblog at

Interesting, tracking this for inclusion in Magpie....

Excerpt from del.icio.us/kellan/magpie at

Cutting Atom Feeds Down to Size

Phil Windley: Through Sam Ruby, a pointer to mod_speedyfeed, an Apache 2 module that allows feedreading clients to only download the entries that have changed since the last access instead of the entire RSS file. This could cut the transfer amount...

Excerpt from Jeff's Radio Weblog at


Updates on RFC3229 with WordPress

Bob Wyman has summary of RFC3229 events of late, and Sam Ruby comments on Garrett Rooney's mod_speedyfeed. Anthony Yager had a great Suggestion for RFC3229 with Feeds for WordPress which removes an assumption i had made. Thanks. Latest source....

Excerpt from The Robinson House at


RSS Bandwidth Strategies

Much concern, hand wringing and advice on RSS bandwidth issues lately. (see Regular Sucking Schedule, and HowTo RSS Feed......

Excerpt from LaughingMeme at


Overhead

Cutting Atom Feeds Down to Size. Through Sam Ruby, a pointer to mod_speedyfeed, an Apache 2 module that allows feedreading clients to only download the entries that have changed since the last access instead of the entire RSS file. This could cut...

Excerpt from Pushing rectangles... at

Nice try, but...

In Sam Ruby’s comments I cam across this proposed efficiency for Atom (and presumably RSS) feeds: mod-speedy-feed. From the description: Best of all, the content that’s sent down, while smaller, remains valid Atom XML, so no real change is needed...

Excerpt from Smalltalk Tidbits, Industry Rants at


Feed Deltas: What's Changed?

When searching/fetching items through the EUtils interface to PubMed, you can specify an ‘earliest’ date (the mindate parameter), meaning that it’s easy to fetch all the items added since a particular time. This is a feature missing from most feed...

Excerpt from HubLog: Feed Deltas: What's Changed? at

Add your comment












Nav Bar