NY Times Archive, Weblogs and RSS
Monday, June 16, 2003 by Dave Winer.
As news organizations like the San Jose Mercury News or the New York Times publish on the Web, they accumulate an archive of stories from the past, last week, last month, last year, etc. The Mercury and the Times were two of the first news organizations to publish on the Web, so their early issues, dating back to the mid-90s form a history of the Web. But there's one important difference, the Mercury archive is long gone and the Times archive is largely intact. One news organization burns their archive, the other hires curators to care for the archive. One sells news and ads, the other strives to be the paper of record (and sells news and ads).
Archives are one of the greatest things about the Web, they accumulate our history, and if lots of sources archive reliably, we get to view the history from many perspectives. As more information shows up on the Web, more of the past will be documented. Today, there are people in their teens who can read about events that happened in their infancy, on the Web. I think that's very cool. As that generation grows up, they will expect to find things on the Web.
Of course news archives are not a new idea. When I grew up I read about the Great War and World War II on the NY Times microfilm archive available at the New York public library. I didn't pay to access this archive. To a child who loved history, there was something special about reading of famous events in the words of people who were there when they happened. Their point of view is preserved; even if they are dead, their thoughts and ideas live on.
Before moving on to the story of the New York Times archive, three notes.
1. Brewster Kahle's archive.org is doing a wonderful job of preserving the Web. Kahle became a mega-millionaire when he sold his company to AOL in the early days of the dotcom boom. It's great to see him put his money to such good use. Bravo.
2. The Harvard Crimson, the student newspaper of the university where I have a fellowship, has a Web archive that goes back to 1900. That may be the deepest free and open archive on the Internet. If there are others I certainly would like to hear about them.
3. Other organizations that claim near-perfect archives: the BBC and Guardian in the UK. The BBC is funded by public money, an important consideration; the Guardian is the beneficiary of the Scott Trust.
The New York Times Company is a publicly-held company, listed on the New York Stock Exchange. It is not government funded, as the BBC is. It's a for-profit corporation.
On their corporate Web page, they say "The Company's core purpose is to enhance society by creating, collecting and distributing high-quality news, information and entertainment." That's certainly consistent with what we know about the Times, but it's also necessary to add that the Times exists to create value for shareholders.
So, while they wish to be the paper of record, their first purpose is ROI for their shareholders. Given a choice between serving the public good and serving the interests of their shareholders, they must choose the latter. This isn't wrong, it's the nature of what they are, a public, for-profit corporation.
If you browse the Times website, www.nytimes.com, you must be logged on with a username and password, which is available at no charge. The term for this is "Free subscription required." You'll often see this disclaimer next to links to Times articles from weblogs. It's such a well-known policy that many sites leave it out, assuming the reader knows that the username and password will be requested.
There's a little-known exception to this rule. If you're browsing from a site that has a previously-established relationship with the Times, the free-subscription requirement is waived. The company I founded, UserLand Software, has such a relationship with the Times, so when I point to a Times article from my weblog, I don't have to say "Free subscription required," because it is not required. You can click through even if you're not logged onto the Time site.
Now I'm going to try to explain, as simply as possible, how the Web archive of the Times works.
First, they charge $2.95 per article to access articles that are over seven days old. I have been told, and accept at face value that the archive is a profitable business for the Times, it's an important part of their business model.
However, if you point to an article in the Times archive from a weblog, if you use a URL that comes from one of the Times's RSS feeds, provided in partnership with UserLand, there is no access fee for your readers. This works for any kind of weblog, not just ones created with UserLand's weblog tools, as long as you use a URL from the RSS feeds.
This method is a compromise, it's certainly not the best we could wish for, but imho, it's good enough. I can point to Times articles, and so can everyone else, and our archives will continue to work over time. You probably have to use an aggregator to read the Times, if you run a weblog and want to point to Times articles, but imho, if you're serious about weblogs and news, you're probably already doing it this way. The Times business model is protected, and we may have established a template for other publications to continue to charge for access to their archive, yet remain open to weblog writers.
I've started a new page for pointers to the Times RSS feeds, notes and answers to questions that come up.
PS: ROI stands for Return On Investment.