NY Times topics in OPML, the mother lode?Wednesday, October 17, 2007 by Dave Winer. Amyloo was digging around the NY Times code weblog and found this OPML file, weighing in at a monstrous 3.3MB that contains some mysterious but rich data about the NY Times and a guide to using the Times to cover special topics that I don't think anyone outside the Times knew existed, but there it is, in a public folder, so lets have a look. 1. There are 10522 top-level headlines. There's no structure to the OPML, it's absolutely flat. Here's an HTML rendering of the list: timestopics.html. 2. It's a subscription list. Each item has four attributes, type, title, htmlUrl and xmlUrl. 3. The htmlUrl for each element points to a page of stories for the topic. For example, here's a page of stories about table tennis. On that page is a link to an RSS 2.0 feed containing the same information. 4. The xmlUrl links for at least some of the elements are broken, the error appears to be very simple, if you replace the ampersand with a question mark, it works. If you look around at the topics you'll see it's an incredibly rich set of data. Here are just some of the topics that begin with the letter T: Tableware, Taste, Tattoos, Tax Credits, Tax Evasion, Taxation, Taxicabs and Taxicab Drivers, Tea, Teachers and School Employees, TED Conference News, Teflon, Telephones and Telecommunications, Television, Television Sets, Table Tennis, Terra Cotta, Terrorism, Tests and Testing, Textbooks, Thanksgiving Day. |