Chrome vs IE9 JavaScript

Here are some results of benchmarking on my home computer.  It shows that Chrome is still much faster than IE9 on all major JavaScript benchmarks.  The IE 32bit version is a little better, but Win7 64bit is outselling Win7 32bit by a 3:1 margin, so this is what many users will experience.

image

image

image

My system is a Intel Core 2 Duo E6550 @2.33GHz with 4GB of RAM running Win7 Build 7600.  The version of IE tested was 9.0.8080.16413 64bit and the version of Chrome was 10.0.648.204.

The Era of Browser Preconnect

I was playing around on WebSiteTest today – trying out it new IE9 test feature, and I noticed something new that IE9 does: preconnect.

What is preconnect?  Preconnect is making a connection to a site before you have a request to use that connection for.  The browser may have an inkling that it will need the connection, but if you don’t have a request in hand yet, it is a speculative request, and therefore a preconnect.

IE9 isn’t the first to use preconnect, of course. Chrome has been doing preconnect since ~Chrome7.  So it is nice to see other browsers validating our work.  But, IE9 is the first browser I know of which appears to preconnect right out of the gate, without any data about a site.  Chrome, on the other hand, will only preconnect based on data it has learned by observing network activity through repeat visits to a site.  As such, Chrome usually is issuing the same number of connects and network traffic, just with less delay. 

Observations

Here is the trace where I first noticed this behavior on WebPageTest.  Notice that WebPageTest did not record any connect-time delay on the second request to dev.chromium.org.  How can this be?  Because the socket was already connected.

ie.preconnect

To understand this better, I then opened up WireShark and watched the packets.  The packet trace clearly shows that IE9 simply opens 2 connections, back to back, for every domain the browser connects to.  This isn’t a horrible assumption for the browser to make – since many sites will indeed require more than one connection per domain already.

Some Wastefulness

But it also wasn’t hard to notice cases where it connects wastefully.  On belshe.com, for instance, there is a single link to a YouTube video requiring only one resource.  IE9 opens two connections to YouTube anyway (WebPageTest doesn’t show the unused connection in its trace, by the way, but it is in the trace!).  One connection loads the image, the other connection is wasted.  YouTube diligently kept that connection open for 4 minutes too!  There are also a couple of 408 error responses from Akamai – it appears that the Akamai server will send a graceful 408 error response to an empty connection after some period of time.

But is this a problem?

As long as the level of accidental connects is minimal, probably not.  And much of the time, 2 connections are useful. It would be great to hear from the IE9 team about their exact algorithm and to see if they have data as to how much extra resources they are using? 

WebPageTest already offers some clues.  For belshe.com, for example, I can see that IE8 uses 20 connections, while IE9 is now using 23 connections to render the page.  10% overhead is probably not the end of the world.

What about SSL?

I love SSL, so of course this got me wondering about what IE9 does for preconnecting https sites too.  Sure enough, IE9 happily preconnects SSL too.  [Sadly – it even forces the server to do two full SSL handshakes- wastefully generating 2 session-ids.  This is a bit more annoying, because that means the main site was just put through double the number of PKI operations.  Fortunately, PKI operations are relatively cheap these days.  I’d complain more, but, tragically, Chrome is not much better yet.  Did I mention that SSL is the unoptimized frontier?]

What Would Brian Boitano Chrome Do?

As I mentioned, Chrome has been doing preconnect for some time.  But, Chrome doesn’t preconnect right out of the gate.  We were so worried about this over-connecting business that we added gloms of more complicated code highly sophisticated, artificial intelligence before turning it on at all 🙂

Specifically, Chrome learns the network topology as you use it.  It learns that when you connect to www.cnn.com, you need 33 resources from i2.cdn.turner.com, 71 resources from i.cdn.turner.com, 5 resources from s0.2mdn.net, etc etc.  Over time, if these patterns remain true, Chrome will use that data to initiate connections as soon as you start a page load.  Because it is based on data, we hope and expect that it will much less often connect incorrectly.  In fact, it should be making the same number of connections, just a little earlier than it otherwise would.  But all of this is an area that is under active research and development.  (By the way, if you want to see how Chrome works, check out the ultra-chic-but-uber-geek “about:dns” page in your Chrome browser)

So does all this fancy stuff make my Internet faster?

Fortunately, we have pretty good evidence that it does.  We’re monitoring this all the time, and I’d say this work is still in its infancy.  But here is some data from Chrome’s testing in 2010.

Our Chrome networking test lab has a fleet of client machines (running Linux), a simulated network using dummynet (see Web-Page-Replay for more information), and some fast, in-memory servers.  We record content from the top-35 websites, and can play it back repeatedly with high fidelity.  Then we can change the network configuration and browser configuration and see how it all works out. 

In my test, I picked 4 different network configurations.  I then varied the RTT on each connection from 0 to 200ms.

Here is a graph of page load time (PLT) improvements in this test.

preconnect.improvement

Overall, we’re pretty happy with this result.  When the RTT is zero, preconnect doesn’t really help, because the connections are basically free (from a latency perspective).  But for connections with RTTs greater than ~50ms, we see a solid 7-9% improvement across the board.  (typical RTTs are 80-120ms)

The Larger Question

While this is great for performance now, I am worried about the wastefulness of HTTP on the Internet.  We used to only have one domain per site, now we have 7.  We used to have only 2 connections per domain, but now we have 6.  And on top of that, now we’re preconnecting when we don’t even need to?

With it’s proliferation of TCP connections, HTTP has been systematically sidestepping all of TCP’s slow start and congestion controls.  Sooner or later will the internet break?  Throwing inefficiency at the problem can’t go on forever.

So one last note – a blatant plug for SPDY

The downside of preconnect is a big part of why we’re working on SPDY.  HTTP has been nothing short of a heroic protocol and made the Internet as we know it possible.  But as we look to the next generation of rich media sites with low latencies, it is clear that today’s HTTP can’t perform at that level.

SPDY hopes to solve much of HTTP’s connection problems while also providing better performance and better security. 

With that, I guess I need to get back to work…

H.264 – Who Holds the Patents?

H.264 is in the news because Google Chrome won’t support it natively and instead will use WebM, an allegedly open and free video alternative.

Who gets paid when you license H.264?  It’s managed by MPEG-LA.  They have a 70-page list of patents which allegedly contribute to H.264.  If you hear a complaint about Google, Mozilla, and Opera’s stance on not supporting H.264, consult this list to see if the complainer has a conflict of interest.

H.264 Patent Holders

Apple Inc.
Cisco Systems Canada IP Holdings Company†
The Trustees of Columbia University in the City of New York
DAEWOO Electronics Corporation
Dolby Laboratories Licensing Corporation
Electronics and Telecommunications Research Institute
France Télécom, société anonyme
Fraunhofer‐Gesellschaft zur Foerderung der angewandten Forschung e.V.
Fujitsu Limited
Hewlett‐Packard Company
Hitachi, Ltd.
Koninklijke Philips Electronics N.V.
LG Electronics Inc.
Microsoft Corporation
Mitsubishi Electric Corporation
Nippon Telegraph and Telephone Corporation
NTT DOCOMO, INC.
Panasonic Corporation†
Polycom, Inc.
Robert Bosch GmbH
Samsung Electronics Co., Ltd.
Sedna Patent Services, LLC
Sharp Corporation
Siemens AG
Sony Corporation
Tandberg Telecom AS
Telefonaktiebolaget LM Ericsson
Toshiba Corporation
Victor Company of Japan, Limited

Google Will Rue The Day It Invited the Gov’t to Net Neutrality

A few years ago, Google started poking the government to act on Net Neutrality.  The motive behind Google’s goal is well intentioned.  But practically, it is foolish.  We’re inviting a beast into our industry that is more devastating than any beast we’ve yet imagined.  Eventually, Google will come to oppose the very legislation that it helped create.

The main problem with Net Neutrality is that we don’t need it.  The market works.  There are many choices for network access today, and you can access your favorite sites from Amazon to Wikileaks from almost anywhere in America.  We have access to the internet at home, in libraries, in schools and at work.  Who is not getting access again?

For individuals, the real debate is that some people want more bandwidth to more remote areas and they want someone else to pay for it.  Steve Wozniak, the eccentric co-founder of Apple, was very clear about this.  He wants to live on a remote hill, pay $29/mo, and have everyone else be required to pay to run the cables to his secluded hide away for fast internet access.  Steve’s argument is not new.  Many people have made the same argument far more elegantly.  They claim it “costs too much” for the high speed links and that dialup is unreasonably slow, or that “there is only one provider in my area”, etc.  None of those arguments hold.  These very same people still have access through wireless, through dialup, at work, at school, at the library, and at about a half million Starbucks or McDonalds across the planet.  And their access grows every single day!  They just want it cheaper.

Finally, the most important part of net neutrality is ensuring that content is available to everyone.  (No, this doesn’t mean you should get to watch your “Family Guy” or your favorite TV show for free)  Most of us hold at least some fear that eventually a big company (Comcast, AT&T, or Verizon, or Google) will screw the little guy by using their monopoly to restrict content and maximize profits.  This fear is reasonable,  because censorship on a grand scale would be a horrible thing for all of us.  But it’s not happening, and there is no evidence of it happening any time soon.  Further, if it ever did happen, customers can and would revolt.  Competition provides us everything we need.

But our fears of corporations are grossly misplaced.  There is someone far more scary, with vastly greater power that we should fear – the US government.  There is simply no company that can wreck devastation at the scale of the US government. Who’s rules are more scary – Comcast’s rules (sorry to pick on you, Comcast!), which would only apply to those that pay Comcast money?  Or Uncle Sam’s rules?  And every 4 years we elect a new set of politicians.  Even if we trust the politicians today, what happens when we get into a war, or have a 9/11-type event, and suddenly a “temporary” cease of terrorist communications is required?  (Did we forget about the TSA already?)  Who’s the terrorist?  Is Wikileaks a terrorist?  Is Wikipedia?  What if you have a science blog about particle physics?  Can you be shut down too?  The government is what you should fear.  Not a piddly little Microsoft, Google, or Comcast. 

Ok, but why will Google rue this?

With continued prodding from Google and others, legislation will be passed, and today was a starting point.  Whatever they pass will be  costly to companies and will cause that cost burden to be passed on to customers like you and me.  Further, it will put America at a disadvantage in our global marketplace.  All to solve a problem that doesn’t exist.

The first problem they’ll create is that of cost.  Every law has enforcement, and that means we pay people to define the specific rules, monitor compliance with those rules, and enforce punishments for those that do not obey.  This will cost billions of dollars and be spread into the margins of every content provider and ISP in America.  Of course, those companies will pass the cost onto their customers.  This means the price of DSL, Cable, AOL, and Netflix will all rise.  (I still think costs are heading down overall, but they could decrease faster without net-neutrality)

Second, it will snowball from “fair access” into content filters (aka censorship).  Initially, it might include banning certain forms of pornography.  It might even seem like something you agree with.  But with each new regulation, our freedoms are diminished.  Then, we might enter into a particular international conflict “requiring” a ban on certain types of communications to keep us safe.  With the content filters in place, the variety and types of information we can publish and read diminishes, and it is all out of our control.   You can’t switch providers to escape the unfairness of it all.

Finally, remember that America is in a global marketplace.  If our legislators legislate too much, Internet companies will simply move out of the country, taking the jobs, the profits, and the tax revenues with them.  This has already happened – gambling is alive and well on the internet – it just runs out of Costa Rica, Antigua, and other disputable places – leaving consumers at risk while simultaneously sticking America with a bill to ensure that gambling doesn’t happen here.  How silly!  Now the government will need to block outside access, or credit card payments to certain areas in order to keep Americans safe from information.

Google’s mission is “to organize the world’s information and make it universally accessible and useful.”  But with our own Government censors and the massive costs created to enforce “net neutrality”, Google will find this mission impossible to accomplish.  And that is when Google will rue the day…

 

Note: This article solely represents the views of a far-too-opinionated software engineer, and does not represent the views of his employer in any way.

Performance and the TLS Record Size

Today I ran into a problem with TLS (SSL) Record Sizes causing the performance of my site to be sluggish and slow.  The server was doing a good job of sending large messages down the client, and I am using a late-model version of the OpenSSL library, why would this happen?

HTTP and TLS both seem like streaming protocols.  But with HTTP, the smallest sized “chunk” you can send is a single byte.  With TLS, the smallest chunk you can send is a TLS record.  As the the TLS record arrives on the client, it cannot be passed to the application layer until the full record is received and the checksum is verified.  So, if you send large SSL records, all packets that make up that record must be received before any of the data can be used by the browser.

In my case, the HTTP-to-SPDY proxy in front of my webserver was reading chunks of 40KB from the HTTP server, and then calling SSL_Write() for as much of that data over SPDY (which uses SSL for now).  This meant that the client couldn’t use any of the 40KB until all of the 40KB was received.  And since 40KB of data will often incur round-trips, this is a very bad thing.

It turns out this problem surfaces more with time-to-first-paint than with overall page-load-time (PLT), because it has to do with the browser seeing data incrementally rather than in a big batch.  But it still can impact PLT because it can cause multi-hundred-millisecond delays before discovering sub-resources.

The solution is easy – on your server, don’t call SSL_Write() in big chunks.  Chop it down to something smallish – 1500-3000 bytes.  Here is a graph comparing the time-to-first paint for my site with just this change.  Shaved off over 100ms on the time-to-first-paint.

smallbuf.ttfp

Gettys on Bufferbloat

Jim Gettys has a nice tale of what he calls ‘bufferbloat’.  Instinctively, it seems like bigger buffers should result in less packet loss.  As long as you can buffer it, the other guy doesn’t have to retransmit, right?  But that is not the way TCP works.  It’s going to retransmit if you don’t reply fast enough.  And if you clog the buffers, its going to take a long time before the endpoint can acknowledge the data.

One interesting anecdote to me (and it isn’t really a conclusion) is that the world’s love affair with Windows XP (which has an ancient TCP stack) may actually be helping the internet at large, even though the Vista TCP stack is measurably a better stack:

The most commonly used system on the Internet today remains Windows XP, which does not implement window scaling and will never have more than 64KB in flight at once. But the bufferbloat will become much more obvious and common as more users switch to other operating systems and/or later versions of Windows, any of which can saturate a broadband link with but a merely a single TCP connection.

Gettys did conclude that this was a problem for video downloads, which is something everyone is doing these days.  He’s not wrong, but real video services may not be as subject to this as it seems.  Video services live-and-die by bandwidth costs, so to preserve bandwidth costs, they avoid simply transmitting the whole video – instead they dribble it out manually, at the application layer.  If they depended on TCP for throttling, he’d be right, but I don’t think many large-scale video services work this way.  Need more data! 🙂

Anyway, a great read.

Free SSL Certificates

padlock Adam Langley slammed me today for using a self-signed cert on this site (https://www.belshe.com/), pointing out that there is no reason not to have a real certificate, especially when you can get them for free.

As usual, he is right, of course.  So I got myself a signed certificate from StartSSL.

Here are the step by step instructions.  You can do it too:

https://github.com/ioerror/duraconf/blob/master/startssl/README.markdown

Chrome Speeding up SSL with SSL FalseStart

The latest releases of Chrome now enable a feature called SSL False Start.  False Start is a client-side change which makes your SSL connections faster.  As of this writing, Chrome is the only browser implementing it.  Here is what it does.

In order to establish a secure connection, SSL uses a special handshake where the client and server exchange basic information to establish the secure connection.  The very last message exchanged has traditionally been implemented such that the client says, “done”, waits for the server, and then the server says, “done”.  However, this waiting-for-done is unnecessary, and the SSL researchers have discovered that we can remove one round trip from the process and allow the client to start sending data immediately after it is done.

To visualize this, lets look at some packet traces during the handshake sequence, comparing two browsers:

Chrome

Browser w/o FalseStart

  0ms SEND TCP SYN
83ms RECV TCP SYN ACK
83ms SEND TCP ACK
83ms SEND Client Hello
175ms RECV Server Hello
           Certificate
           Server Hello Done
176ms SEND Client Key Exchange
           Change Cipher Spec
           Enc Handshake Msg
           HTTP Request
274ms RECV Enc Handshake Msg
           Change Cipher Spec
           Enc Handshake Msg
275ms RECV HTTP Response
  0ms SEND TCP SYN
84ms RECV TCP SYN ACK
84ms SEND TCP ACK
84ms SEND Client Hello
173ms RECV Server Hello
           Certificate
           Server Hello Done
176ms SEND Client Key Exchange
           Change Cipher Spec
           Enc Handshake Msg

269ms RECV Enc Handshake Msg
           Change Cipher Spec
           Enc Handshake Msg
269ms SEND HTTP Request

524ms RECV HTTP Response

These two traces are almost identical.  Highlighted in red is the subtle difference.  Notice that Chrome sent the HTTP Request at time 176ms, which was a little more than one round-trip-time faster than the other browser could send it. 

(Note- it is unclear why the HTTP response for the non-FalseStart browser was 250ms late; the savings here is, in theory, just 1 round trip, or 83ms.  There is always variance on the net, and I’ll attribute this to bad luck)

Multiplicative Effect on Web Pages
Today, almost all web pages combine data from multiple sites.  For SSL sites, this means that the handshake must be repeated to each server that is referenced by the page.  In our tests, we see that there are often 2-3 “critical path” connections while loading a web page.  If your round-trip-time is 83ms, as in this example, that’s 249ms of savings – just for getting started with your page.  I hope to do a more thorough report on the effect of FalseStart on overall PLT in the future.

For more information on the topic, check out Adam Langley’s post on how Chrome deals with the very few sites that can’t handle FalseStart.

Linux Client TCP Stack Slower Than Windows

Conventional wisdom says that Linux has a better TCP stack than Windows.  But with the current latest Linux and the current latest Windows (or even Vista), there is at least one aspect where this is not true.  (My definition of better is simple- which one is fastest)

Over the past year or so, researchers have proposed to adjust TCP’s congestion window from it’s current form (2pkts or ~4KB) up to about 10 packets.  These changes are still being debated, but it looks likely that a change will be ratified.  But even without official ratification, many commercial sites and commercially available load balancing software have already increased initcwnd on their systems in order to reduce latency. 

Back to the matter at hand – when a client makes a connection to a server, there are two variables which dictate how quickly a server can send data to the client.  The first variable is the client’s “receive window”.  The client tells the server, “please don’t exceed X bytes without my acknowledgement”, and this is a fundamental part of how TCP controls information flow.  The second variable is the server’s cwnd, which, as stated previously is generally the bottleneck and is usually initialized to 2.

In the long-ago past,  TCP clients (like web browsers) would specify receive-window buffer sizes manually.  But these days, all modern TCP stacks use dynamic window size adjustments based on measurements from the network, and applications are recommended to leave it alone, since the computer can do it better.  Unfortunately, the defaults on Linux are too low. 

On my systems, with a 1Gbps network, here are the initial window sizes.  Keep in mind your system may vary as each of the TCP stacks does dynamically change the window size based on many factors.

Vista:  64KB
Mac:    64KB
Linux:    6KB

6KB!  Yikes! Well, the argument can be made that there is no need for the Linux client to use a larger initial receive window, since the servers are supposed to abide by RFC2581.  But there really isn’t much downside to using a larger initial receive window,  and we already know that many sites do benefit from a large cwnd already.  The net result is that when the server is legitimately trying to use a larger cwnd, web browsing on Linux will be slower than web browsing on Mac or Windows, which don’t artificially constrain the initial receive window.

Some good news – a patch is in the works to allow users to change the default, but you’ll need to be a TCP whiz and install a kernel change to use it.  I don’t know of any plans to change the default value on Linux yet.  Certainly if the cwnd changes are approved, the default initial receive window must also be changed.  I have yet to find any way to make linux use a larger initial receive window without a kernel change.

Two last notes: 

1) This isn’t theoretical.  It’s very visible in network traces to existing servers on the web that use larger-than-2 cwnd values.  And you don’t hit the stall just once, you hit it for every connection which tries to send more than 6KB of data in the initial burst.

2) As we look to make HTTP more efficient by using fewer connections (SPDY), this limit becomes yet-another-factor which favors protocols that use many connections instead of just one.  TCP implementors lament that browsers open 20-40 concurrent connections routinely as part of making sites load quickly.  But if a connection has an initial window of only 6KB, the use of many connections is the only way to work around the artificially low throttle.

There is always one more configuration setting to tweak.