Recently, a friend sent me a link for video and methodology of Microsoft’s browser benchmark. I was a little bored right now so I finally took the time to read the long PDF on the methodology. Here are the summary of my thoughts:
- Issue highlighted is very valid. Micro-benchmark does not test end user scenario well. A more macro view is needed. Though completely discounting micro-benchmark is not exactly the right thing. Micro-benchmark does provide good, practically unbiased (due to lack of dependency on network stack, web servers, load-balancing, etc.) measure on browser’s parts. A combination of micro- and macro-benchmark would actually be the best
- W.r.t. measurement overhead, actually there is one test that completely ignores measurement overhead: measurement using video recording. We can record the visual cue that the browsers made and perform manual/human comparison. This is very easy to do for web browsers since the results of the computation is directly displayable on the screen. However, manual comparison may cause inaccuracy, especially since the paper seems very intent to calculating to tens of milliseconds. Hence, automating this and improving the accuracy of the timing would be awesome.
- While we are talking about milliseconds, I disagree with this liking of measuring browser load time to tens of milliseconds (2 decimal places for timings in seconds). Users are not gonna care if the page loaded 50ms faster, just measure accurate to a hundred milliseconds (1 decimal place for timings in second).
- The issue of extensibility completely contradicts the issue highlighted in the first section: that benchmark should test what users will experience. Right now, more and more users (especially Firefox users) are relying on add-ons to improve their browsing experience. I guess extensibility should generally be addressed separately; however, it should not be discounted so completely. I know that Microsoft is trying to sell this thing (IE8) all right, so I guess it’s acceptable.
- Inconsistent definitions: I really like this part, it is exactly as I imagine it should be. (The video recording thing I suggested above is also based on my dislike of using browsers’ own onload mechanism to determine that the page has fully loaded.)
[On inconsistent definitions:] Another factor that impacts benchmarking is having a consistent baseline of what it means to complete a task. In car and horse racing, the end is clear—there is a finish line that all contestants must cross to finish the race. Browser benchmarking needs a similar way to define when a webpage is done loading. The problem is that there is no standard which dictates when a browser can or should report that it is “done” loading a page—each browser operates in a different way, so relying on this marker could produce highly variable results.
Quoted from: Measuring Browser Performance (Microsoft)
Honestly, right now, I don’t really care which one is the fastest browsers around. As long as they are around the same ballpark (not orders of magnitude slower), I would care more about customization feature. If you took a look at my two instances of Firefox (one FF3.1b3, another FF3.0.8), they are both heavily customized with heavy theme-ing (yeah, one of them looks like Chrome) and tonnes of addons.
Oh, and being a Mac users, whatever IE develops do not directly affect me that much. ;)