NY Times topics in OPML, the mother lode?

Wednesday, October 17, 2007 by Dave Winer.

A picture named shovel.jpgAmyloo was digging around the NY Times code weblog and found this OPML file, weighing in at a monstrous 3.3MB that contains some mysterious but rich data about the NY Times and a guide to using the Times to cover special topics that I don't think anyone outside the Times knew existed, but there it is, in a public folder, so lets have a look.  Permalink to this paragraph

1. There are 10522 top-level headlines. There's no structure to the OPML, it's absolutely flat. Permalink to this paragraph

Here's an HTML rendering of the list: timestopics.html.  Permalink to this paragraph

2. It's a subscription list. Each item has four attributes, type, title, htmlUrl and xmlUrl.  Permalink to this paragraph

3. The htmlUrl for each element points to a page of stories for the topic. For example, here's a page of stories about table tennis. On that page is a link to an RSS 2.0 feed containing the same information.  Permalink to this paragraph

4. The xmlUrl links for at least some of the elements are broken, the error appears to be very simple, if you replace the ampersand with a question mark, it works.  Permalink to this paragraph

If you look around at the topics you'll see it's an incredibly rich set of data. Here are just some of the topics that begin with the letter T: Tableware, Taste, Tattoos, Tax Credits, Tax Evasion, Taxation, Taxicabs and Taxicab Drivers, Tea, Teachers and School Employees, TED Conference News, Teflon, Telephones and Telecommunications, Television, Television Sets, Table Tennis, Terra Cotta, Terrorism, Tests and Testing, Textbooks, Thanksgiving Day. Permalink to this paragraph

