HowTo: Implementor's guide to rssCloud

Last update: Friday, October 16, 2009, 11:47:07 AM.

Implementor's guide to rssCloud

This document shows how RSS 2.0's <cloud> element can be used to connect a loosely-coupled Twitter-like network of people and 140-character status messages.

We walk through an implementation.

By design, each of the elements can and hopefully will be replaced by all variety of tools for different platforms, commercial and open source, for desktops, laptops, netbooks, cell phones, wrist watches, car computers, whatever.

Our focus has been on simplicity, being open (subject to replacement and user choice) and scaling.

I've provided an implementation of rssCloud, running on Amazon EC2, discussed, as an example, throughout this doc.

Three sided-cloud

There are three sides to the cloud:

1. The authoring tool. I edit and update a feed. It contains a <cloud> element that says how a subscriber requests notification of updates.

2. The cloud. It is notified of an update by the authoring tool, and then in turn notifies all subscribers.

3. An aggregator. Subscribes to feeds that may or may not be part of a cloud.

What they call real-time

A picture named schema.gif

1. The Writer gets an idea.

2. He or she enters it into the authoring tool, saves, it goes to a file, a feed.

3. The authoring software sends an Update ping to the Cloud (which is just a bit of software running on EC2).

4. The Cloud checks to see if anyone is subscribing to the Writer, and finds that indeed the Aggregator is.

5. He updated! says the Cloud to the Aggregator.

6. The aggregator then reads the feed, finds the new stuff and informs the Reader.

All this happened in less than a second!

That's what they call Real-time.

The authoring tool

The user enters individual entries of 140-characters or less.

My tool is called LifeLiner. It's the analog of Twitter authoring tools Tweetdeck, Seesmic, Tweetie, etc. In fact, I would love it if those tools would support rssCloud in addition to supporting Twitter. That way people could use one or both and all connect through the authoring tools. Everyone gets to play. Users have choice.

Each item can have an enclosure, a media object such as a photo, audio podcast or movie, using the RSS 2.0 <enclosure> element. However the URL of the media object does not take up space in the 140 characters.

Each item can be tagged with one or more RSS 2.0 <category> elements. This does something similar to what hashtags do in Twitter, however, without using up space in the 140 characters.

The feed includes a <cloud> element that defines a server that the aggregator can call to request notification of updates to the feed.

There are two possible ways to notify the cloud of an update to the feed:

1. You can ping the cloud after you save the file. It's essential that you save the file first, since the cloud may verify the update before notifying subscribers.

2. You can use the cloud to save the RSS, and combine storage and notification into one operation.

My implementation of rssCloud offers both options.

Here's my feed, hooked into the cloud at rpc.rsscloud.org.

The cloud

This is the software that connects everything together.

Pinging via XML-RPC

All XML-RPC messages to my cloud are sent to: http://rpc.rsscloud.org:5337/RPC2.

Here's an example call:

["xmlrpc://rpc.rsscloud.org:5337/RPC2"].rssCloud.ping ("http://scripting.com/rss.xml")

This means call the procedure named rssCloud.ping at http://rpc.rsscloud.org:5337/RPC2 with a single parameter, the URL of the Scripting News feed. From now on, we'll abbreviate this as:

[server].rssCloud.ping (url)

When it receives a ping, it verifies that the feed has changed, by reading the feed and comparing a hash of the contents with the previous hash. If it has changed, it notifies the subscribers.

If your feed is known by many URLs, for example, mine can also be reached at www.scripting.com, you must ping for each of them.

Saving the feed through the cloud

This is the equivalent of the feed storage functionality of FeedBurner (we don't have an stats gathering).

When your feed is updated, instead of pinging, send us the text and we'll save it and return the URL.

[server].rssCloud.saveRss (username, password, rsstext)

You must send a valid username/password combination for identi.ca. If your name on identi.ca is "bull", your feed will be saved at:

http://static.lifeliner.org/bull/rss.xml

Identi.ca is a free service operated by Control Yourself of Montreal, Canada. They are the developers of the Laconi.ca open source microblogging service.

If the save works, the return value is the public URL of your feed. If it failed, the call throws an error.

Requesting notification using REST

The aggregator requests notification based on the information in the <cloud> element in the feed. There's a lot of connecting going on here, so pay close attention and you may have to read it two or three times to get what's going on.

First, the <cloud> element says what protocol you must use to connect with the server. There's only one choice for this, for each server. My server, rpc.rsscloud.org uses REST. I know that will surprise some people because I'm such a fan of XML-RPC. I wanted to show everyone that I'm flexible.

So I'll explain the REST interface first.

Here's what the <cloud> element in my feed says:

<cloud domain="rpc.rsscloud.org" port="5337" path="/rsscloud/pleaseNotify" registerProcedure="" protocol="http-post" />

What this means, in English is: "To request notification of updates for this feed, do an HTTP POST to:

http://rpc.rsscloud.org:5337/rsscloud/pleaseNotify

When you POST to that address, supply these parameters:

1. notifyProcedure (must be there, only relevant if protocol is xml-rpc or soap)

2. port

3. path

4. protocol -- either xml-rpc, soap or http-post

5. url1, url2, url3 ... urlN -- one or more URLs of feeds that you wish to be notified about

6. domain -- an optional parameter that specifies the machine that will receive notifications. If it is not specified, notifications are sent to the IP address the request came from. (This is new as of 10/16/09. See the Open Discussion page for the proposal that led to this addition. Also see the challenge parameter change that's related to this addition. DW)

Notifications are sent to the IP address the request came from. You can not request notification on behalf of another server.

It returns an XML response called notifyResult with two attributes: success is true or false, which indicates whether we registered a notification on your behalf, and msg which explains what happened.

Note: Credit for the url numbering scheme goes to FriendFeed. I saw this in their API, and thought it was a good workaround to the fact that HTTP doesn't have a list type.

Requesting notification using XML-RPC or SOAP

If my cloud element looked like this:

<cloud domain="rpc.rsscloud.org" port="5337" path="/RPC2" registerProcedure="rssCloud.pleaseNotify" protocol="xml-rpc" />

Instead of using REST you would use XML-RPC to request notification. There are many excellent XML-RPC libraries, and it's also baked into Python and perhaps other popular scripting environments. I'll use the same notation here as I used in earlier sections.

You'd call it this way:

[server].rssCloud.pleaseNotify (notifyProcedure, port, path, protocol, urllist)

The five parameters are as above, in the REST case, except since XML-RPC has a list type, you use it instead of naming the parameters url1, url2 and so on.

It returns true if it worked, and errors if it didn't.

I have implemented the XML-RPC interface on my server, but have not yet implemented SOAP. Tell me if you need me to move it up my list of priorities.

Pinging via REST

Chuck Shotton requested that I add a way to ping via REST in addition to XML-RPC. It made sense, so I added it.

The endpoint is: http://rpc.rsscloud.org:5337/rsscloud/ping.

POST to that address with a single parameter named url, the address of the feed that changed.

As with the XML-RPC method, it verifies that the feed has changed, and if so it notifies the subscribers.

The event is logged. The return value is an XML message named result, with two attributes, success and msg.

Errors

My server does a test call of the handler before adding the subscription, to verify that the handler is functional and can be reached through firewalls and other obstacles. If the call fails, the registration fails.

Update 10/16/09: If the notification request included the optional domain parameter, the verification process works differently. 1. Instead of making a POST request we do a GET. 2. We include a challenge parameter, a random string of characters (for my cloud implmentation I send 20 characters, but any number is allowed).

The cloud server accepts the subscription if and only if the aggregator returns a response code between 200 and 299 and the body of the response contains the challenge string.

The server can reject the subscription if the url parameter is a feed it isn't interested in by either returning an error code, or failing to return the challenge string. (This change was proposed by Joseph Scott of Automattic.)

The server drops a subscription after three consecutive errors, but it waits till the top of the hour to drop subscriptions. So we're conservative about dropping subscriptions.

The aggregator

As with the authoring tool, this could be something new or an existing tool such as Tweetdeck, Seesmic, Tweetie, etc. However, because of the requirement that it be running on the public Internet, these programs will need extra glue to connect to the cloud, and that glue does not exist at this time. (However it's known how to do it.)

I've adapted river2.root to implement rssCloud. You can run this in the OPML Editor, and as with all such tools, full source is part of the release. All the code for rssCloud is in river2Suite.renewSubscriptions.

When reading a feed, the aggregator looks for a <cloud> element. If it finds one, and if it is running on the public Internet and not behind a firewall or NAT, it should send a request to the cloud asking to be notified. To do so it must have an XML-RPC server or HTTP server capable of receiving a POST request.

You must renew your subscription every 24 hours because they expire after 25 hours.

If my server can't get through to yours three consecutive times, at the top of the hour it removes your subscription.

XML-RPC interface

Your update handler procedure takes a single parameter, the URL of the feed that updated. On receipt of the message you should read the feed and do what you normally do to locate and present new items to the user.

My server cares deeply what your server thinks, however it completely ignores the return value. (Pathetic attempt at humor, please excuse.)

HTTP POST interface

Your server will receive a single parameter, the URL of the feed that updated. On receipt of the message you should read the feed, and do what you normally do to locate and present new items to the user. The return value is ignored by the cloud.

Testing your aggregator

A feed for testing. This one updates every 15 minutes, Murphy-willing and is good for testing apps that request notification and respond to updates (aggregators)

My personal feed, updates at random times with pithy observations on the state of the world from yours truly.

If your calls to the cloud are getting through you should see them in the log and also all calls it makes to you and to and from other members of its network.

How rssCloud uses RSS

A picture named yoQuiero.jpg I previewed the RSS used by rssCloud in this Scripting News post. That more or less covers how we use RSS. Read the comments on the post, they are excellent and identify some of the issues to come.

There's one additional feature I haven't described yet, it comes from the fact that my authoring tool is an outliner. For a long time I've felt that RSS should do hierarchies, as OPML does. This subject came up with some work I did with Microsoft in 2005, but they went with their own solution.

I've defined a new namespace called rssCloud. It contains a single element, <rssCloud:item> which is exactly like RSS 2.0's item, with one exception, it can contain one or more <rssCloud:item>s. This makes it possible to build a hierarchy of RSS-like structures.

Example: Here's a bit of an outline. And here's the XML it generates.

Who knows what we'll do with it, perhaps nothing. But we've now got this out there, and it may turn out to be interesting.

Code I'm releasing

All these apps run in the OPML Editor.

river2.root -- Aggregator that supports the rssCloud interface.

rssCloud.root -- Implementation of the server-side of rssCloud.

lifeLiner.root -- Authoring tool.

Roadmap, philosophy

How to think of rssCloud: Loosely coupled 140-character networks.

The goal is to instantaneously flow 140-character messages, with metadata, including links, tags, enclosures, and whatever else (it's as open archtiecture as RSS 2.0 is).

A network that works alongside Twitter, but outside the control of any company. You can be confident that no company will control it because it is being started by a person, not a company. Even if I wanted to crush you, I couldn't. Even if I wanted to control all the users' data, I can't -- since most of it won't flow through my server.

Re Google's PubSubHubBub, their goal appears to flow updates of blog posts in association with Feedburner. This is a good goal. I hope to help them get RSS compatibility into it. If your goal is to optimize the polling of existing RSS or Atom feeds, you can use the rssCloud network, but that's not something I want to get involved in personally. My interest is in the micro-blogging application, at least at the outset.

Dave Winer
July 2009

More info, questions

For a history of rssCloud, pointers to specs, etc, see the home page of this site.

If you're debugging an app using our cloud, the log is an essential tool, to see if you're getting through and what the cloud is doing with your requests and updates.

Ask questions in the FriendFeed group, on the mail list, or in comments below.

blog comments powered by Disqus

First published: Wednesday, July 22, 2009, 9:32:48 AM.