Scripting News: Friday, February 17, 2012

Home > Archive > 2012 > February > 17

Previous / Next

Follow @davewiner

About the author

Contact me

My sites

How to fork a format

Jon Postel, one of the main architects of the Internet, wrote a rule.

Be liberal in what you accept, and conservative in what you send.

I'm going to focus on the second part in this, I hope, short piece.

I'm thinking of Postel this morning because someone on Hacker News dug up a post written by a developer who then worked at Amazon, who (I believe) now works at Google, where he reasoned that we needed Atom because of a defiicency in RSS.

I remember reading the post at the time, but not commenting, because anytime I said anything in that discussion it became personal, even if I just talked about the merits of the proposal or offered another point of view. I saw this then, and now, as a political thing. For what reason, I don't know or care. But they were making a fundamental mistake in the evolution of formats. And now that we know how it turned out, I think it's even more obvious.

The author said basically, because the generator of an RSS feed can't communicate to a consumer whether the contents of a description element contains encoded HTML or plain text, or perhaps some other kind of character-encoded content, it's impossible for a processor to know how to handle it. They called this "silent data loss" which makes it sound a lot more terrible than it is. We still process RSS 2.0 feeds to this day, somehow, without much apparent loss of data.

A picture named fork.gif Further, if the problem is limited to the description element, why re-invent everything about RSS? Clearly, a better approach was to simply create a new format, call it whatever you like, and define it as RSS 2.0 except for the following differences, and then specify them. They could say that in this new format, description could not contain encoded text. That way there would be no ambiguity, you would know never to decode the value of a description. If you wanted to attach encoded HTML, you would use a different element, perhaps called content, and it would have an attribute that told you how to interpret its value. In every other way it would be identical to RSS 2.0. If they spotted another flaw in RSS 2.0 that they felt they had to correct, they could do it again, in exactly the same way.

But then I can reduce that process another step and say why not follow the other branch of the RSS 2.0 roadmap, and instead of forking off a new format, just define a namespace that has the new elements that behave, in your opinion, correctly, and not invent a new name either? Keep it all as simple as possible. And support the second half of Postel's Law. This is exactly the approach Apple took with their iTunes extensions, and Yahoo took with Media RSS.

At this point it doesn't matter what the answer is. Atom 1.0 is a supported format. It has different names for almost everything in RSS 2.0. Where they use the same name it's got a different meaning and a different set of possible values. It's more work for every developer, but we're living with it.

In case anyone comes across this problem in the future, with some format other than RSS, I suggest they listen to Dr Postel, and ponder the wisdom in what he's saying, and learn from our experience in syndication. Keep the number of variables to a minimum. It makes it easier for everyone, increases compatibilty (Postel called it robustness), keeps complexity down, and lets us build higher structures because we didn't use up all the complexity solving simple problems.

2/17/2012; 9:14:42 AM. .

Previous / Next

February 2012
Sun	Mon	Tue	Wed	Thu	Fri	Sat
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29