News and commentary from the cross-platform scripting community.
Mail Starting 5/21/97
Metadata is already important and will become even more important.
From: AdamT@smginc.com (Adam Turoff);
Sent at 5/21/97; 6:57:00 PM;
Unfortunately, I think Simson used an oversimplified example of purchasing a Kate Bush CD.
Kate Bush is associated with a lot of data: a discography, lyrics, websites, past tours, upcoming tour dates, concert tickets, posters, books, t-shirts, bootleg albums as well as a set of CD's currently on sale at CDNow. And that's just the tip of the iceberg!
A good Metadata framework would let you find any subset of that information and present it in a user-friendly fashion.
Looking at that broad sweep, it's obvious that a good metadata system will have to cross specialties and be both extensible and universal - not limited to CD catalogs and weather maps.
It's been done and done well before. Look at biological classification - Linneaus couldn't imagine half of the flora in the rainforest, but his classification system can accommodate new plants as they are discovered by us westerners. A closer example would be a MIME registry - it keeps expanding and accommodating new types of media as soon as they come out.
Unfortunately, HTML is already a legacy system. Adding metadata now would require a new set of embedded data - new tags, fat pages, new HTTP headers or whatever. That could make a lot of work content designers with a huge amount of existing content.
This might appear to be a bad thing since there's a reason not to add metadata to existing pages, but it isn't really that bad. Is there one search engine? Is there one online catalog? Should there be one source for metadata?
Look at Wired's NewBot - it gathers metadata for existing news sites. Once that becomes open and extensible and standardized (or something very much like it), look for metadata aggregators along the lines of Yahoo that focus on breadth, depth, accuracy and timeliness.
I really REALLY like this idea. But I don't think it will work for (as Garfinkle puts it) finding the best price on Kate Bush CDs. Why? Garfinkle himself puts the finger on it; it would force the big outfits to compete on price, service and delivery -- not on glitzy graphics.
From: firstname.lastname@example.org (Jack Bell);
Sent at 5/21/97; 1:20:55 PM;
So, while the smaller companies may try to build ontologies of pricing data, the big outfits will just use loss leaders and other come-ons as they have always done. They will build relationships with search engine companies and commit to other cross mark eting arrangements to boost their site traffic. They will combine traditional media advertising with their web efforts (something the little guys can't match). And because of human nature I am willing to bet they will win.
Of course this MetaData idea might work pretty well for the other things the web does. Only it requires effort on the part of the people putting up sites; considering the current level of adoption of simple Meta Tags I am not confident of speedy acceptan ce.
Rumor has it that you're interested in semantic tagging of Web pages.
From: email@example.com (Philip Greenspun);
Sent at 5/21/97; 2:33:19 PM;
rumor has it that you're interested in semantic tagging of Web pages
I tried to explain why this was necessary to Tim BL about three years ago and even wrote a paper about it
but I couldn't get through to him. I elaborated a bit on the paper in Chapter 15 of http://www-swiss.ai.mit.edu/wtr/dead-trees/
It's an interesting article. We have an application that does exactly what he is looking for. It is a shopping agent that collects information about products, prices, reviews, store directories, etc. in a variety of categories, including music. You can download a beta from http://www.jango.com/ that will run on Windows 95 and NT 4.0.
From: firstname.lastname@example.org (Ravi Pandya);
Sent at 5/21/97; 10:33:21 AM;
We built our own metadata, and "information adapters" for the various sites. Having the metadata at the site would make our work easier. However, I suspect the main problem with site based metadata will be trying to coordinate the usage across sites. Unless you really work hard to get a consistent interpretation, you can end up with a lot of subtle incompatibilities.
I read Garfinkle's article on meta data that you linked to from scripting.com, and I believe we did come up with a solution for the time being to search for a Kate Bush album, for example.
From: henri@binaryCompass.com (Henri Asseily);
Sent at 5/21/97; 10:07:04 AM;
Garfinkel's article on meta data
At http://www.bizrate.com/, the trick we use is this: the user comes in, selects "Music and Video : CDs and Tapes", and is proposed with a comprehensive listing of music sites on the web. Then he selects from the top right pop-up menu "View by Product Search", clicks on the "View" button, and the listing is changed to display all the search engines of each of the sites on one page.
Then it's simply a matter of going through the search engines.
I agree that it's not perfect, but it's a hell of a lot better than to sift through each of the websites (if you know them!) and look for the search engines.
Of course, we use this workaround because of the problem Garfinkle mentioned: no 2 search engines are the same. We've spent quite a few nights trying to figure out a solution, and that's the best we could come up with.
Almost every website that deals with weather gets its raw data from a set of machine readable weather statements from the national weather service. Clever people then take this raw data, format it, categorize it, add context sensitive graphics and poof, a content rich weather web site.
From: email@example.com (Preston Holmes);
Sent at 5/21/97; 9:35:04 AM;
The government is actually leading the way with this sort of public data publishing on the web. In addition to weather data, there is river and ocean information, traffic information, complete sets of mapping data, zip codes, etc.
Coming up with one universal metadata format may never be possible given the infinite diversity of information, but particular categories of information (weather, news, CDs etc) should work on common formats where possible. I think more important than marking up the HTML version of the content, is publishing a machine readable text/binary version of the content alongside the HTML version, its HTTP not HTML that is key here.
Garfinkle's article reminds me of the discussions which took place in the early days of Mac scripting. The subject then was object model vs. application specific scripting syntax. Can you imagine a web in which every site potentially implements a different syntax?
From: firstname.lastname@example.org (Reede Stockton);
Sent at 5/21/97; 9:25:24 AM;