NY Times metadata (Scripting News)

Home > Archive > 2007 > October > 17

NY Times metadata

Wednesday, October 17, 2007 by Dave Winer.

A picture named accordion.gif If you do a View Source on a NY Times story, you'll see that there's lots of metadata in the HTML, including keywords for most of the of the stories.

Behind the keywords is a taxonomy that I haven't seen, but would like to. I asked them to make this public, both at my meeting there last Thursday and in a phone talk this morning. I think there could be a lot of value in the Times taxonomy, it might even set a standard.

In the meantime, I wrote a script last night that tracks the keywords in NY Times stories as they flow through the nytimesriver application. Here's a report that's updated once per hour.

http://nytimesriver.com/keywords.html

Obviously it would be interesting to be able to click on the keywords to see what articles reference each of the keywords. And it would also be nice to have a cumulative list and a daily list. Right now all we have is the cumulative version.

But it's still pretty interesting, bordering on fascinating to think of the possibilities if they provide the framework behind these keywords.

When the pros try to figure out how what they do will continue to make sense after the Internet achieves all its promise, this may be an example. The metadata is generated by librarians, and we don't as yet have our own librarians in the blogosphere (though some might disagree). And it's possible that after a release of the taxonomy that something like Wikipedia may happen, with the public taking over maintenence of the taxonomy. No one knows what will happen, but one thing seems clear, there can be value in a news organization beyond the reporting and editing it does.

.

Last update: 10/17/07; 7:42:21 PM Pacific. "It's even worse than it appears."