News and commentary from the cross-platform scripting community.
cactus Mail Starting 11/7/97

From: andy@hmsi.com (Andy Freeman);
Sent at Fri, 7 Nov 1997 17:20:21 -0700;
three comments

Interestingly enough, Intel just settled a lawsuit in this area. (See http://www.intel.com/procs/support/pentium/certif/continuance.htm - get there via the bottom of the pressroom link from www.intel.com.) The lawsuit didn't cost Intel a lot of money, and very little bad press, but there's no reason to believe that Intel intentionally cheated either.

I'd guess that the only thing saving Sun is the fact that there's no real connection between purchasing decisions and these deceptions. When that changes....

Sun's "anything is acceptable if it's aimed at Microsoft" attitude is beginning to grate.

Yes, Microsoft has a lot of power, and they got it without delivering the sexiest geek-toys, and they ship bugs. However, perfection isn't our alternative. For the purposes of this discussion, MS products are 90%. That's pretty bad, but the alternative is usually worse.

In some sense, this reminds me of Sun's early days. For a long time, 5-10% of Sun machines were dead-on-arrival. For a while, it was obvious that no one even bothered to try to turn them on before shipment. I sneered at a friend of mine who ordered machines that I wouldn't accept anything like that. He pointed out that his job was to provide working machines, that he couldn't boycott the best supplier he had. Yup, Sun's 5-10% was both unacceptable and better than the alternatives.

(3) I haven't figured out if I'm annoyed when you post a link to (a page that I can't get to without registering) without mentioning the registration requirement. (The most recent example is the New York Times article where you say that McNealy advocates a hatemail campaign. I don't register, so I haven't seen the article in question.) I agree that there's a difference between such pages and "for pay" pages, but at times I don't see that it's a huge difference. (In one case, I'm paying cash, in another case, I'm paying personal info. Both are MINE and I'm careful about how I spend them.)

From: icp@webfayre.com (Ivan Phillips);
Sent at Fri, 07 Nov 1997 16:10:29 -0600;
CaffeineMark: 50% or 50 times...

I read your article entitled Rastas! today, and enjoyed your plain English explanation of pattern matching.

I wanted to explain how the 50 times (50x) and 50% relate to each other.

The overall CaffeineMark score is the geometric mean of the 9 sub-tests, i.e., it is the ninth root of the product of all the tests. If all individual test scores go up by, say, 30%, so does the overall score. If a single test score improves by a factor F, the overall score only rises by the 9th root of F.

What I noticed when Sun sent their scores to me was that the Logic test score was 50 times faster than any result I had seen before. It was easy to pick out because it looked like it had too many digits!

Sun's overall score was about 50% higher than the highest NT score we had published at that time. FYI, at that time, the published high score was an NT box with a score of about 3000. The Sun overall score was about 4600. Today's highest NT score is around 4000.

On the alternate benchmark we created, NT, 95 and MacOS VMs scored the same as on the original test. However, in one case with Sun's Solaris 2.6 JIT, the Logic test score was 300 times lower (slower).

It's difficult to say exactly what Sun's overall score would have been without the use of their "optimization" technique.


From: sidney@sidney.com (Sidney Markowitz);
Sent at Fri, 07 Nov 1997 15:06:27 -0800;

I am appalled at the responses I have seen so far from Sun trying to spin what they did as optimizing their compiler for the benchmark. I don't think your school analogy went far enough at making clear just how awful what they are accused of doing really is.

Here is what is analogous to what Sun claims "everyone is doing":

You know that the final exam always has multiple choice questions based on factual material in the textbook. You cram for the exam by memorizing from the book, and get an A without necessarily understanding the course material. This happens. People criticize teachers who set things up this way and the students who take advantage of it. It is not cheating.

Another example is training for the Boston Marathon by running the course in Boston and getting familiar with every hill and curve, gaining an advantage over others who train elsewhere who might be better runners in general. That also happens and it also is not cheating.

Here is what Sun is being accused of doing:

One of the modules in the CaffeineMark test measures how fast a Java program can calculate a certain result. Instead of calculating the result, Sun's compiler is alleged to detect that it is being asked to run that test and simply outputs the correct result 50 times faster than the Microsoft compiler can calculate the result. That fast a time on the one module gave Sun's compiler an overall test score 50% better than Microsoft's.

That is analogous to a student obtaining the answers to the final exam in advance, completing the exam faster than anyone else and getting an A. That's called cheating. The way they got caught is similar to that student completing their exam 50 times faster than the next best student, having a perfect score, and raising suspicions. That's called stupidity.

It is also like starting the Boston Marathon, then sneaking off the road to a waiting car that takes you near the finish line where you sneak in ahead of everyone else and come in first. Then when you get caught because you just "ran" 50 times faster than the world's record, you claim that you crossed the finish line correctly and were just trying for the best times like everyone else.

Sun is being accused of fraud, of claiming that their compiler can perform a certain computation at a certain speed when in fact it runs 300 times slower at that computation. This has nothing to do with questions about the usefulness of benchmarks or how other people tune products to perform well at tasks that will be measured.

As always in a situation like this, it is Sun's response to the news that says even more about them than whatever unethical action some compiler engineer may have taken with or without the knowledge of upper management. So far the response has been pretty damning, perhaps even a good candidate for Sun's own http://www.javasoft.com/features/fudwatch.html and http://www.javasoft.com/features/fudform.html

From: xtian@Eng.Sun.COM> (by way of Dave Winer (xtian);
Sent at Fri, 7 Nov 1997 11:42:18 -0700;

I agree. However, people scrutinize CaffeineMarks and other benchmarking results to closely that they become myopic. Benchmarking software and the results they produce are interesting from a philosophical point of view, but as any software developer can tell you, benchmark results have little to do with the user.

As you know, software is very much like creative writing or poetry. Two writers can write two stories with the exact same plot lines, and the stories will still be remarkably different. When a software developer writes test tools or a benchmarking application, they are writing a story in their own way...another developer would write the software differently.

These two hypothetical benchmarking apps can take very different approaches and will probably make very different assumptions about fundamental things. When one product is run through both benchmarking apps, you will get a lot of similar results between the two apps, but you will also get some wild deviations around the edges.

I have no direct knowledge of what Sun did in this particular instance with the Pendragon software, but being shocked that a company tried to improve their test results is like being surprised when you see a gas station at the corner of a major intersection. This happens all the time in every industry. When big car companies submit their cars for testing by major magazines, they spend a week touching up the car, they put an extra spare tire in the trunk to give the back end some extra weight, maybe they beef up the sway bars a bit, etc, etc, etc.

And ya know what? When a consumer buys that car, unless they have extensive high performance driving experience, they won't be able to get the same acceleration times, braking distances, and mileage. As the small print says, "Your Mileage May Vary."

I think it is important for companies to shake the consumer hard every now and then, and break the user out of this statistical stupor. Benchmarking is not relevant to most of the "user experience". For instance I don't care how fast your processor is, if the screen doesn't redraw quickly, the user will think the machine is slow...regardless of what any benchmarking tool says. It's not pleasant, but it is a fact of life that we just need to learn to deal with.

From: amy@home.cynet.net (Amy Wohl);
Sent at Fri, 7 Nov 1997 08:04:17 -0700;

Dave, thank you so much for this little insight into Sun and Pendragon. I agree, being able to trust the tests is very important. Otherwise the customers who don't know how to look behind the green curtain and find the powerless little fake wizard behind it are going to get ripped off.

This is why I think Microsoft is right when it says having Sun insist on doing the tests on Java compatibility itself rather than publishing the tests so anyone could do them is dead wrong.

What we really need, I think, are independent companies that make their money by being trusted sources of test data. That's not for me to do, of course, but for someone with appropriately geeky credentials, but we sure need them.

Frankly, the vendors should each put up their dimes and dollars to get this started, because THIS is what would give Java performance numbers creditability.

I don't trust vendors who think cheating on performance numbers is "just business." Do you?

This page was last built on Tuesday, April 7, 1998 at 7:03:40 PM, with Frontier version 5.0.1. Mail to: dave@scripting.com. © copyright 1997-98 UserLand Software.