Scripting News: Feedhose -- a firehose for feeds

Home > Archive > 2010 > September > 30

Feedhose -- a firehose for feeds

By Dave Winer on Thursday, September 30, 2010 at 7:33 PM.

A picture named hose.jpg A strange confluence of events. I was writing a series about rebooting RSS. Then I had a bike accident that left me dazed and in pain. One evening I woke up and found I could sit for a few hours in front of the computer without pain. That was pretty cool, because even lying in bed was excruciating. My poor body was in need of some rest. Programming could provide it.

So there I was in front of the machine. Now what? I could write some code. But what?

I had been playing with long-polling for instant outlining, with excellent results. I wonder if the same could be done for feeds. I had found an OPML file on the NY Times site with all their feeds, so I hooked it up to River2. Now I had a relatively realtime flow of every story from the Times. So I wrote a minimal server that would wait until something new showed up and then return it. (Later I wrote a filter that eliminated duplicate titles, so even if a story appeared in two feeds it would only be shot out through the hose once.)

A small embellishment, let the caller give me a "seed" that indicated where he left off. If any new items came in while he was processing the last ones I returned, he wouldn't miss any. Every packet includes a seed which you can send back to me.

Then I wrote a client. Then I added another stream, and broadened my client so it could handle multiple streams.

The result is a nice sweet little protocol that behaves like two familiar things:

1. A long-poll server.

2. RSS.

It's pretty cool. Let me know if you're writing some scripts to test it out. The format is described below.

Here's an example of a query:

http://hose.scripting.com/?name=nytimes&timeout=3.

Click the link. It'll just sit there till it times out after 3 seconds or, if a new item comes in before the timeout, it will return with that item.

Here's an example of a response with some items. There are two forks to the XML, items and metadata.

items are just a series of RSS items (or Atom entries, for feeds that are in Atom format). However we add three elements to each item, feedUrl, feedTitle and feedLink. My app, on the other end of the hose, needed this info. Seemed reasonable to provide it since I had it there.

In metadata, there are two elements, now -- might add others in the future. seed is the one you need to include as a parameter in your next call. If the seed returned is xxx, then your next call would be http://hose.scripting.com/?name=nytimes&seed=xxx.

The seed is supposed to be opaque, that is -- you shouldn't expect to understand its structure, but you can see mine is pretty straightforward. It's a walk of the hierarchy in my object database. Year-month-day-serialnum. So 2010-09-30-00651 is talking about the 651st news item on Sept 30 (today).

One more thing, I wrote a handler that returns the current seed in case you want to start off with a seed. You don't have to, but your code might be a little simpler if you have it to start out with. http://hose.scripting.com/seed

Update: I've released the source code for the Frontier tool that implements both sides of the protocol.

Follow @davewiner

Previous / Next