Scripting News: NY Times metadata

Scripting News, the weblog started in 1997 that bootstrapped the blogging revolution.

NY Times metadata

If you do a View Source on a NY Times story, you'll see that there's lots of metadata in the HTML, including keywords for most of the of the stories.

Behind the keywords is a taxonomy that I haven't seen, but would like to. I asked them to make this public, both at my meeting there last Thursday and in a phone talk this morning. I think there could be a lot of value in the Times taxonomy, it might even set a standard.

In the meantime, I wrote a script last night that tracks the keywords in NY Times stories as they flow through the nytimesriver application. Here's a report that's updated once per hour.

http://nytimesriver.com/keywords.html

Obviously it would be interesting to be able to click on the keywords to see what articles reference each of the keywords. And it would also be nice to have a cumulative list and a daily list. Right now all we have is the cumulative version.

But it's still pretty interesting, bordering on fascinating to think of the possibilities if they provide the framework behind these keywords.

When the pros try to figure out how what they do will continue to make sense after the Internet achieves all its promise, this may be an example. The metadata is generated by librarians, and we don't as yet have our own librarians in the blogosphere (though some might disagree). And it's possible that after a release of the taxonomy that something like Wikipedia may happen, with the public taking over maintenence of the taxonomy. No one knows what will happen, but one thing seems clear, there can be value in a news organization beyond the reporting and editing it does.

Last update: Thursday, June 3, 2010; 4:01:50 PM

~About the Author~

Dave Winer, 55, is a visiting scholar at NYU's Arthur L. Carter Journalism Institute. He pioneered the development of weblogs, syndication (RSS), podcasting, outlining, and web content management software; former contributing editor at Wired Magazine, research fellow at Harvard Law School, entrepreneur, and investor in web media companies. A native New Yorker, he received a Master's in Computer Science from the University of Wisconsin, a Bachelor's in Mathematics from Tulane University and currently lives in New York City.

"The protoblogger." - NY Times.

"The father of modern-day content distribution." - PC World.

One of BusinessWeek's 25 Most Influential People on the Web.

"Helped popularize blogging, podcasting and RSS." - Time.

"The father of blogging and RSS." - BBC.

"RSS was born in 1997 out of the confluence of Dave Winer's 'Really Simple Syndication' technology, used to push out blog updates, and Netscape's 'Rich Site Summary', which allowed users to create custom Netscape home pages with regularly updated data flows." - Tim O'Reilly.

Mail: scriptingnews1mail at gmail dot com.

Sep Nov