Internet technology hosted by Berkman Center

Saturday, November 15, 2003


Longhorn has a built-in RSS aggregator?

According to Scoble, Microsoft's new version of Windows, codename Longhorn, has a built-in RSS aggregator.

That was news to me. Is this for sure, or is it some kind of experiment?

If it's true, what do we know about the aggregator? How does it work? What formats does it support?

I remember a few years ago asking people at Microsoft to do something with RSS. I remember trying to convince Markoff at the NY Times that MS would adopt our vision of Web Services. If this is all true, at least I feel vindicated.

Further, I hope they're playing nice. So far they have.

# Posted on 11/15/03; 10:38:05 AM -

Slash-delimited category names

The spec says about categories:

<category> sub-element of <item>

<category> is an optional sub-element of <item>.

It has one optional attribute, domain, a string that identifies a categorization taxonomy.

The value of the element is a forward-slash-separated string that identifies a hierarchic location in the indicated taxonomy. Processors may establish conventions for the interpretation of categories. Two examples are provided below:

<category>Grateful Dead</category>

<category domain="http://www.fool.com/cusips">MSFT</category>

You may include as many category elements as you need to, for different domains, and to have an item cross-referenced in different parts of the same domain.
But what if an element itself uses the forward slash, for example: Hydrogen/potassium ATPase, which -- it's been pointed out --  is a valid Library of Congress Subject Heading?

We think the simplest solution will be to escape the forward slash in that element, so for example:

Hydrogen%2fpotassium ATPase

We invite users of taxonomies in which this issue arises to comment on whether this will, in fact, work acceptably.

# Posted on 12/15/03; 10:29:53 AM -

Links and permalinks

A common question about RSS 2.0 is: should link elements be permalinks or should they point to an external page? The spec talks about this:
An item may represent a "story" -- much like a story in a newspaper or magazine; if so its description is a synopsis of the story, and the link points to the full story. An item may also be complete in itself, if so, the description contains the text (entity-encoded HTML is allowed), and the link and title may be omitted.
Furthermore, there is a mechanism for specifying permalinks: the guid element. So, some recommendations could be proposed: 1. Use the guid element -- and make its value a permalink. This allows aggregators to know for sure what the permalink of an item is: there's no guessing as to whether or not the link element is a permalink. 2. Make the link element a permalink or an external link, depending on the nature of your feed. As in the quote from the spec, a full article could use an external link, but anything short of a full article could use a permalink. If you make the link element an external link, but also supply a permalink (#1 above), then you offer readers a choice.

# Posted on 12/15/03; 5:43:49 PM -

HTML in titles and descriptions

People sometimes ask about using HTML in titles and descriptions. The spec says that "entity-encoded HTML is allowed" in descriptions. It does not say that it's allowed for any other elements. Most aggregators will render descriptions as HTML, though it should be noted that it's conceivable that an aggregator might not have access to an HTML renderer, or might have only a very basic HTML renderer available. (Consider a PDA, for example.) Titles, however, should not contain HTML. The spec doesn't allow for it. The behavior of an aggregator when encountering HTML is undefined: some aggregators strip the HTML, others might display it with the HTML code visible. You can think of titles as like the titles of Web pages. When someone puts HTML in a Web page title, browsers often display the tags in the window title bar. This is because, according to the HTML 4 spec, "Titles may contain character entities (for accented characters, special characters, etc.), but may not contain other markup (including comments)." You can also think of titles as similar to subjects of email messages. Though an email message may be HTML, the subject may not.

# Posted on 12/15/03; 5:46:44 PM -