How To Tune a Porsche

This week some researchers from Microsoft published a paper about a project called JSMeter they have created for measuring JavaScript performance.  In the paper, researchers Ben Livshits and Ben Zorn want to examine the effectiveness of current Javascript benchmarks and how accurately they reflect “real world applications”.  This is a great topic, commonly debated, and one which we know we should do better at.

I like their intentions and their idea, except for one fundamental flaw.

To understand, let’s compare their research to how a mechanic might benchmark the performance of two cars.

In this case, we’ve got a Chevy Passenger Van and a Porsche 911.1 To measure the performance of the cars fairly, we take them to the track.  We recognize that the track is not indicative of real world driving, but it does give us a place to compare each car under the same conditions.

Normally, you’d probably want to drive both cars around the track, right?  Sure.  But these researchers decided to only drive the Van around the track.   Despite the obvious fact that the performance characteristics of the Van have little bearing on the performance characteristics of the Porsche, they then used the performance of the Van to make claims about how Porsche should be tuned and the track should be improved to be more like real driving conditions.  This claim is absurd.

In their last test, the researchers decided to drive both the Porsche and the Van around the track.  But in this test, they elected to have an elephant sit on top each car as it went around the track.  Rather than observing that the Porsche carrying an elephant is still faster than a Van carrying an elephant, they document the fact that the Porsche with an elephant is 2x slower than the Porsche without an elephant, while a Van with an elephant is only 30% slower than a Van without an elephant.

Wow.  Read the report for yourself, this is exactly what they did.

Now, don’t get me wrong – I’m not defending the existing benchmarks in any way.  We definitely need more and better benchmarks.  And their research, when done properly, will likely prove their hypothesis – that the existing benchmarks don’t accurately reflect real world websites.  (I thought we already knew that?) 

1. IE8’s JS engine has been well documented to be orders of magnitude slower than any other JS engine on every single test, so I believe the Passenger Van is a reasonable comparison; if there is a flaw, it is that the Van is too fast for this analogy (it’s not 10x slower than a Porsche), and I should have used a moped.

Note:  These views are mine alone and do not reflect those of my employer.

3 Responses to “How To Tune a Porsche”

  1. [...] This post was mentioned on Twitter by Stephen Shankland. Stephen Shankland said: Criticism of a Microsoft paper on how best to measure real-world JavaScript performance. Nice idea, bad implementation http://bit.ly/aSDddA [...]

  2. roy_hu says:

    I haven’t read their report yet, but they had an interview on C9 that you may want to check out: http://channel9.msdn.com/shows/Going+Deep/E2E-Research-Perspectives-on-JavaScript-with-Erik-Meijer-Ben-Zorn-and-Ben-Livshits/

  3. BJ says:

    Wait, what is your claim exactly? Somewhere in the animals and automobiles I’ve lost your critique.

    There is always some bias introduced by the specific engine being instrumented, but comparisons between real-world sites and benchmarks tend to be valid within the same tested browser. The heap use graphs would look very similar in shape (modulo GC activity) whether it were IE, Chrome, or what have you. The only real variety in results would come from mutually exclusive code paths (i.e, browser-sniffing code) which typically is just a wrapper around native calls. The architecture of the browser has some impact, say on the bytecode distribution, but not on code size or other code-dependent metrics. Likewise, the conclusions about benchmarks are very similar regardless of browser.

    We have done a similar experiment using Safari, and the conclusions (regarding the appropriateness of industry benchmark) are quite similar. The paper is at http://www.cs.purdue.edu/homes/jv/pubs/pldi10b.pdf and our tools are available from http://sss.cs.purdue.edu/projects/dynjs/

    Best,
    Brian

Leave a Reply

You must be logged in to post a comment.