News and commentary from the cross-platform scripting community.
cactus Mail Starting 4/2/98

From: jjones@mail.ior.com (Jeffrey Jones);
Sent at Fri, 03 Apr 1998 18:36:14 -0800;
Data warehousing

Data warehousing is:

Storing large quantities of data i.e. credit card transaction data. Some companies are doing this with the hope that they will be able to learn something of value about their customers purchasing habits and possibly get better at targeting advertising. They also hope to learn of any relationships between products, people, geography, season, time of day etc.

Someone once suggested a relationship between the proximity of diapers and beer in a grocery store. The theory was that parents coming in late at night for diapers would also buy beer if it was next to the diapers. Some companies spent millions warehousing data based on that claim. Anyway, it sure sells expensive hardware and data analysis software.

From: jlevine@si.timeinc.com (Jason Levine);
Sent at Fri, 3 Apr 98 13:27:49 -0500;
COM interface on Frontier

The COM interface is amazing -- it's exactly what I've been looking for with Frontier. A few observations:

It's fast, both in operation and in development. I was able to create a new project in VB using the COM interface in under one minute -- granted, it just returns the time, but it's trivial to get it to return more than that, since THAT is controlled on the Frontier side of the equation. And the VB side isn't any more difficult, either -- I like the fact that I just have to declare the object, declare any subtables as objects, and run the scripts against the same object-based model that VB makes us all familiar with.

The biggest worry that I had with Frontier on Windows was that it didn't script other apps; now, I don't really need to be able to, since as long as everything implements a COM interface, I can use VB to integrate everything I need. (Granted, I'd love to let Frontier do the integration, and I have a feeling that I'll be able to at some point soon, but for now, this is PERFECT.)

A few things that I'll post to Frontier5-Win, but nonetheless are questions:

How do I pass multiple, perhaps complex parameters to the scripts? Would it just be as a comma-delimited list, like I would within Frontier itself? (Haven't tried it yet.)

What types of variables are supported coming OUT of the COM interface? Can I pass an entire binary object out, like a JPEG? That would be great for this Goodwill Games work that I'm putting together.

Is there any way to browse the object model that Frontier implements? The VB object browser won't let me look at it -- it actually doesn't show it at all. I'd love to know what other methods, properties, and events are available to me. (It would be GREAT if you could create custom events that are fired when certain things happen, like a new subtable is added to a given table.)

From: Kenneth.J.Meltsner@jci.com (Meltsner, Kenneth J);
Sent at Fri, 3 Apr 1998 08:17:20 -0800;
Re:What Would Shakespeare Think?

A long time ago, the Geometry Center (University of Minnesota) determined it was frequently quicker to send and display bitmapped graphics rather than vector graphics, especially when the vector graphics were relatively complex. Compression tended to make the bitmap files small and they were significantly easier to uncompress and display. The bitmap files were also frequently smaller than the vector files, especially when the drawings were complicated.

That said, bitmaps violate the "scalable content" rule since they can't be re-rendered at different resolutions for printing vs. display vs. display on small monitors, lose associated information (the precision of data, constraints, etc.), are more difficult to localize, and can't be text-searched.

There's a fairly decent standard for vector graphics already. It's called CGM, and while it's mired in all sorts of international standards (like SGML), the CGM gurus are defining a more Web/developer friendly version as well (like XML).

Personally, I'd like a graphics format that includes the ability to note semantics/meaning as well as the graphics structure, just as SGML allows users to specify structural markup (i.e. chapters, sections, subsections) and semantic markup (i.e. address, citation, date). You could then use a stylesheet to display the graphic in the format you want. (The semantic graphical format might just be XML, of course. You'd need a new style language, though.)

From: bhofmann@cypressres.com (Bill Hofmann);
Sent at Thu, 2 Apr 1998 16:37:27 -0800;
Re:What Would Shakespeare Think?

I've been doing my best to tell people *not* to write their own XML parser, but to use one of a number of publicly-available ones. Makes a whole lot more sense, for people who don't write parsers for a living.

On the other hand, I've been agitating in our product development to use XML as a wire/file format, and leveraging existing XML vocabularies (OSD, for instance). The biggest obstacle is that people already know how to use INI files or Java Property files, and learning something new seems to be scary. Perhaps we need some stream to/from XML support in Java and C++, once that's done, we're pretty much set.

From: alexhop@exchange.microsoft.com (Alex Hopmann);
Sent at Thu, 2 Apr 1998 09:55:11 -0800 ;
re: What Would Shakespeare Think?

I think you are over generalizing a bit about the Standards process and the role that big companies play there.

I've been spending the week in LA at the IETF, and I can report that there is lots of participation from all sorts of folks, from University students and professors, to people who work for small companies, to people who work for large companies.

The key factor is how much time and effort people are willing and able to contribute. Making Standards is an incredibly difficult and time consuming process. There are big companies who are directly affected by these things who choose to not participate and send lots of people. There are small companies who decide for various reasons to participate (as I did back at ResNova), but obviously in a smaller company, your time is often in more demand and its hard to make the size commitment necessary to really drive new work.

This doesn't preclude people from also taking a smaller but important role as Wesley has recently with his several useful comments on the WebDAV mailing list.

From: chewy@mcione.com (Paul F. Snively);
Sent at Thu, 2 Apr 1998 11:20:22 -0800;
Still More Ramblings

Hi Dave! Welcome back from all the traveling and trench-digging over the past few weeks. Wow! As always, this is some very, very cool stuff you're building. Not quite sure how Betty and the new ContentServer relate to each other just yet, but I'll get there. And the XML support keeps evolving in extremely nifty ways, too. How exciting!

With regard to your observations about how 17th-century English English and 20th-century American English differ, it gets worse: try reading untranslated Chaucer sometime. The only reason I could pull it off is that I'm a third-generation German-American who speaks German. Olde English is much, much closer to our ancestral tongue than modern English is, although you also find echos of it in the writing of James Herriot, the world's favorite veterinarian, when he writes about being a farm vet in rural England in the 1930's, and some crusty old coot yells at him, "I ken more about this than ye!" ("Ken," to know, comes straight from the German infinitive "kennen," to know, which in the first-person singular even in modern conversational German is rendered as "ken," e.g. "Ich ken" meaning "I know..."

At the end, you wrote: "I want to break out from behind HTML-imposed barriers, but I don't want to give up the low-tech understandability of HTML." Unfortunately, these may very well prove to be conflicting goals. If you were going to give a title to the next section, I think it should be:

"It's semantics, stupid."

Most of the time people use the word "semantics" incorrectly (that is, they exhibit a lack of understanding of the semantics of "semantics.") Someone will use one phrase to describe something, then someone else will use a different phrase, and someone will claim that it's "just semantics," when what they mean is that the two phrases have the same meaning--i.e., the same semantics--and the only difference, then, is not semantics at all, but syntax.

This confusion also seems to cloud discussions of the relationships among the various TLA's* being bandied about lately by the W3C and those who follow them. As far as I can tell, the confusion stems precisely from a very important observation that has been made before but bears repeating: HTML is semantics free, a way of describing a presentation. You can think of it as a 1:1 mapping from a purely textual syntax to a visual syntax. In fact, it might be a fun academic exercise to attempt such a mapping as directly as possible, e.g. to feed an HTML grammar to a parser generator that allows you to attach code to the right-hand side of the grammar rules, and to make the RHS calls to MIT's "Functional PostScript" engine.

Life around XML gets confusing, though, because XML isn't about representing "presentation" (that is, syntax). It really is about representing semantics, which is why, as I've written before, some portion of the influence over XML comes from people who either are or were in the Knowledge Representation community, a subset of the Artificial Intelligence community. This isn't necessarily a bad thing, but it does suggest rather strongly that we need to know where the "bottoming out" of the semantics occurs. That is, we can model the semantics of things at various levels of abstraction (cf. Locke's "An Essay on Human Understanding") but we need to know what semantics we consider "primitive," irreducible, atomic. In the context of the slew of stuff coming out of the W3C, this is were DSSSL sits: one of its goals is to be the syntactic rubber meeting the semantic road. It does this, for better or for worse, by being a derivative of one of the only languages in the world to have a formal semantics, which is typically specified in the denotational semantics of Strachy, namely, Scheme.

Whew. Heavy stuff. Bottom line: DSSSL, a Scheme derivative, has a mathematically defined meaning; you can apply "proof procedures" to DSSSL programs, etc. So if you define everything--DTD's, the DOM, etc.--in terms of DSSSL, then you can make mathematically and logically provable statements about them because they'd have a shared formal semantics that anyone with a background in mathematics and logic has already at least tacitly agreed upon. To see how important this is, consider Java's current weaknesses with respect to its promise of "Write Once, Run Anywhere." As if it weren't bad enough that Java runtimes have to live atop OSes without a formal description of their semantics, Java itself has no formal semantics! Like too many languages, its semantics are described only in English, a notoriously vague and ambiguous language.

With DSSSL, we can say "what does this MEAN?" in a meaningful way. We should--in theory--be able to formalize what HTML 4.0 presents by creating a DTD for it that establishes its relationship to DSSSL. I'm being vague because I'm still learning about DSSSL and DTD's, but if DSSSL can generate the presentation itself or, if not, if it can be mapped onto a tool such as Functional PostScript, it should be. We should abandon our ad-hoc mappings of HTML to visual presentation and formalize them as much as possible using the same toolset that we'll use to describe other semantics on the Internet.

The problem with all of this is that if it's necessary to reduce all of our XML/DTD semantics to the denotational semantics in order to have conversations about them, a tiny number of people will be able to have those conversations! So I think what we'll see, eventually, is the formulation of a number of what I'll call "idioms" although they'll almost certainly be huge in scope compared to what we usually think of as idioms: they'll be XML/DTD pairs whose semantics, while still being formally described and even provable/testable, will be essentially taken for granted by the community. This will be particularly true if/when the community settles on one or two XML parsers that allow the easy attachment of code to the results of a parse, because nothing succeeds like success, even if the "success" is operational rather than formal (that is, one XML parser/behavior combination for XML code X is in Java and implements semantics A; one is in C++ and implements semantics B; finally one is in UserTalk and implements semantics C. It's probably safe to claim that A, B, and C are likely to be very similar, but there's no way whatsoever to prove that they are identical).

And therein lies the crux of the issue. There's only one way to provide semantics to XML/DTD combinations, and that's through some kind of code, in the programming sense of the word. The divergence of programming languages alone guarantees lack of consistency in the semantics, unless we all embed the exact same implementation of DSSSL in our software and all XML/DTD semantics are driven through it. Note that this probably isn't a bad idea; there are several very good embeddable implementations of Scheme (I'm thinking of MzScheme from Rice University, ELK from I forget where, and SCM from MIT) and some of them either have implemented or are implementing DSSSL as part of their implementations.

Any chance of seeing a DSSSL engine DLL for Frontier? And yes, I'm very much aware that the correct answer to this question is "Sure, Paul... just as soon as you're done writing it." :-)

As always, thanks for the excellent reporting on the bewildering, but exciting, state of the Internet union.


* TLA stands for Three-Letter Acronym. A famous computer scientist whose name escapes me at the moment, when asked what the biggest issue facing the industry was, pointed out that there are only some 17,000 possible TLA's.

This page was last built on Tuesday, April 7, 1998 at 6:12:54 PM, with Frontier version 5.0.1. Mail to: dave@scripting.com. © copyright 1997-98 UserLand Software.