Home > Archive >  2007 >  October >  17

NY Times metadata

Wednesday, October 17, 2007 by Dave Winer.

A picture named accordion.gifIf you do a View Source on a NY Times story, you'll see that there's lots of metadata in the HTML, including keywords for most of the of the stories.  Permalink to this paragraph

Behind the keywords is a taxonomy that I haven't seen, but would like to. I asked them to make this public, both at my meeting there last Thursday and in a phone talk this morning. I think there could be a lot of value in the Times taxonomy, it might even set a standard.  Permalink to this paragraph

In the meantime, I wrote a script last night that tracks the keywords in NY Times stories as they flow through the nytimesriver application. Here's a report that's updated once per hour. Permalink to this paragraph

http://nytimesriver.com/keywords.html  Permalink to this paragraph

Obviously it would be interesting to be able to click on the keywords to see what articles reference each of the keywords. And it would also be nice to have a cumulative list and a daily list. Right now all we have is the cumulative version.  Permalink to this paragraph

But it's still pretty interesting, bordering on fascinating to think of the possibilities if they provide the framework behind these keywords.  Permalink to this paragraph

When the pros try to figure out how what they do will continue to make sense after the Internet achieves all its promise, this may be an example. The metadata is generated by librarians, and we don't as yet have our own librarians in the blogosphere (though some might disagree). And it's possible that after a release of the taxonomy that something like Wikipedia may happen, with the public taking over maintenence of the taxonomy. No one knows what will happen, but one thing seems clear, there can be value in a news organization beyond the reporting and editing it does. Permalink to this paragraph

© Copyright 1994-2007 Dave Winer Mailto icon.

Last update: 10/17/07; 7:42:21 PM Pacific. "It's even worse than it appears."

Click here to view blogs commenting on  RSS 2.0 feed.