Archive for the ‘Chrome’ Category

Chrome 16 – World’s Most Popular Browser

Thursday, February 2nd, 2012

I haven’t read about this anywhere, but in Jan 2012, Chrome 16 was the World’s Most Popular Browser. A minor stat, but a testament to modern update policies. You won’t read about this in any press release, but this is the #1 reason why Chrome’s security is better than any other browser.

Firefox Idle Connection Reuse

Saturday, June 11th, 2011

httpwatch does some anecdotal testing to conclude that Firefox’s new algorithm for selecting which idle connection to reuse has some strong benefits.

This is great stuff, and in general it should definitely help.  This is part of why getting to one-connection-per-domain is an important goal.  HTTP’s use of 6 or more connections per domain make it so that each connection must “warm up” independently.  A similar algorithm should land in Chrome soon too.

Fortunately, there is a protocol for this stuff too :-)   Hopefully firefox will pick that up soon too.

How to Get a Small Cert Chain

Saturday, April 23rd, 2011

chain After my last article illustrated the length of our Certificate Chains, many people asked me “ok – well how do I get a small one?”. 

The obvious answer is to get your certificate signed as close to the root of a well-rooted Certificate Authority (CA) as possible.  But that isn’t very helpful.  To answer the question, lets look at a few of the problems and tradeoffs.

Problem #1:  Most CA’s Won’t Sign At The Root

Most CA’s won’t sign from the root.  Root CAs are key to our overall trust on the web, so simply having them online is a security risk.  If the roots are hacked, it can send a shockwave through our circle of trust.  As such, most CAs keep their root servers offline most of the time, and only bring them online occasionally  (every few months) to sign for a subordinate CA in the chain.  The real signing is most often done from the subordinate.

While this is already considered a ‘best practice’ for CAs, Microsoft’s Windows Root CA Program Requirements were just updated last month to require that leaf certificates are not signed directly at the root.  From section F-2:

All root certificates distributed by the Program must be maintained in an offline state – that is, root certificates may not issue end-entity certificates of any kind, except as explicitly approved from Microsoft.

Unfortunately for latency, this is probably the right thing to do.  So expecting a leaf certificate directly from the root is unreasonable.  The best we can hope for is one level down.

Problem #2: “Works” is more important than “Fast”

Having your site be accessible to all of your customers is usually more important than being optimally fast.  If you use a CA not trusted by 1% of your customers, are you willing to lose those customers because they can’t reach your site?  Probably not.

To solve this, we wish that we could serve multiple certificates, and always present a certificate to the client which we know that specific will trust.  (e.g. if an old Motorola Phone from 2005 needs a different CA, we could use a different certificate just for that client.  But alas, SSL does not expose a user-agent as part of the handshake, so the server can’t do this.  Again, hiding the user agent is important from a privacy and security point of view.

Because we want to reach all of our clients, and because we don’t know which client is connecting to us, we simply have to use a certificate chain which we know all clients will trust.  And that leads us to either presenting a very long certificate chain, or only purchasing certificates from the oldest CAs.

I am sad that our SSL protocol gives the incumbent CAs an advantage over new ones.  It is hard enough for a CA to get accepted by all the modern browsers.  But how can a CA be taken seriously if it isn’t supported by 5-10% of the clients out there?  Or if users are left with a higher-latency SSL handshake?

Problem #3: Multi-Rooting of CAs

We like to think of the CA trust list as well-formed tree where the roots are roots, and the non-roots are not roots.  But, because the clients change their trust points over time, this is not the case.  What is a root to one browser is not a root to another.

As an example, we can look at the certificate chain presented by www.skis.com.  Poor skis.com has a certificate chain of 5733 bytes (4 pkts, 2 RTs), with the following certificates:

  1. skis.com: 2445 bytes
  2. Go Daddy Secure Certification Authority 1250 bytes
  3. Go Daddy Class 2 Certification Authority: 1279 bytes
  4. ValiCert Class 2 Policy Validation Authority: 747 bytes

In Firefox, Chrome and IE (see note below), the 3rd certificate in that chain (Go Daddy Class 2 Certification Authority) is already considered a trusted root.  The server sent certificates 3 and 4, and the client didn’t even need them.  Why?  This is likely due to Problem #2 above.  Some older clients may not consider Go Daddy a trusted root yet, and therefore, for compatibility, it is better to send all 4 certificates.

What Should Facebook Do?

Obviously I don’t know exactly what Facebook should do.  They’re smart and they’ll figure it out.  But FB’s large certificate chain suffers the same problem as the Skis.com site:  they include a cert they usually don’t need in order to ensure that all users can access Facebook.

Recall that FB sends 3 certificates.  The 3rd is already a trusted root in the popular browsers (DigiCert), so sending it is superfluous for most users.  The DigiCert cert is signed by Entrust.  I presume they send the DigiCert certificate (1094 bytes) because some older clients don’t have DigiCert as a trusted root, but they do have Entrust as a trusted root.  I can only speculate.

Facebook might be better served to move to a more well-rooted vendor.  This may not be cheap for them.

Aside: Potential SSL Protocol Improvements

If you’re interested in protocol changes, this investigation has already uncovered some potential improvements for SSL:

  • Exposing some sort of minimal user-agent would help servers ensure that they can select an optimal certificate chain to each customer.  Or, exposing some sort of optional “I trust CA root list #1234”, would allow the server to select a good certificate chain without knowing anything about the browser, other than its root list.  Of course, even this small amount of information does sacrifice some amount of privacy.
  • The certificate chain is not compressed.  It could be, and some of these certificates compress by 30-40%.
  • If SNI were required (sadly still not supported on Windows XP), sites could avoid lengthy lists of subject names in their certificates.  Since many sites separate their desktop and mobile web apps (e.g. www.google.com vs m.google.com), this may be a way to serve better certificates to mobile vs web clients.

Who Does My Browser Trust, Anyway?

All browsers use a “certificate store” which contains the list of trusted root CAs.

The certificate store can either be provided by the OS, or by the browser.

On Windows, Chrome and IE use the operating-system provided certificate store.  So they have the same points of trust.  However, this means that the trust list is governed by the OS vendor, not the browser.  I’m not sure how often this list is updated for Windows XP, which is still used by 50% of the world’s internet users.

On Mac, Chrome and Safari use the operating system provided store.

On Linux, there is no operating system provided certificate store, so each browser maintains its own certificate store, with its own set of roots.

Firefox, on all platforms (I believe, I might be wrong on this) uses its own certificate store, independent of the operating system store.

Finally, on mobile devices, everyone has their own certificate store.  I’d hate to guess at how many there are or how often they are updated.

Complicated, isn’t it?

Yeah Yeah, but Where Do I Get The Best Certificate?

If you read this far, you probably realize I can’t really tell you.  It depends on who your target customers are, and how many obscure, older devices you need to support.

From talking to others who are far more knowledgeable on this topic than I, it seems like you might have the best luck with either Equifax or Verisign.  Using the most common CAs will have the side benefit that the browser may have cached the OCSP responses for any intermediate CAs in the chain already.  This is probably a small point, though.

Some of the readers of this thread pointed me at what appears to be the smallest, well-rooted certificate chain I’ve seen.  https://api-secure.recaptcha.net has a certificate signed directly at the root by Equifax.  The total size is 871 bytes.  I don’t know how or if you can get this yourself.  You probably can’t.

Finally, Does This Really Matter?

SSL has two forms of handshakes:

  • Full Handshake
  • Session Resumption Handshake

All of this certificate transfer, OCSP and CRL verification only applies to the Full Handshake.  Further, OCSP and CRL responses are cacheable, and are persisted to disk (at least with the Windows Certificate Store they are). 

So, how often do clients do a full handshake, receiving the entire certificate chain from the server?  I don’t have perfect numbers to cite here, and it will vary depending on how frequently your customers return to your site.  But there is evidence that this is as high as 40-50% of the time.  Of course, the browser bug mentioned in the prior article affects these statistics (6 concurrent connections, each doing full handshakes).

And how often do clients need to verify the full certificate chain?  This appears to be substantially less, thanks to the disk caching.  Our current estimates are less than 5% of SSL handshakes do OCSP checks, but we’re working to gather more precise measurements.

In all honesty, there are probably more important things for your site to optimize.  This is a lot of protocol gobbledygook.

Thank you to agl, wtc, jar, and others who provided great insights into this topic.

Certificate Validation Example: Facebook

Wednesday, April 20th, 2011

Most people know the concepts of SSL, but not the gory details.  By using Facebook as a walkthrough example, I’m going to discuss how it works from the browser’s viewpoint, and how it impacts latency to your site.  BTW, this is not intended as a criticism of Facebook – they’re doing all the right things to make sure your data is encrypted and authenticated and fast.  The failures highlighted here are failures of a system that wasn’t designed for speed.

Fetching the Certificate
When you first connect to a SSL site, the client and server use the server’s public key to exchange a secret which will be used to encrypt the session.  So the first thing the client needs to do is to get the server’s public key.  The public key is sent as part of the SSL Server Hello message.   When we look at the Server Hello Message from Facebook, we see that it sent us a Certificate which was 4325 bytes in size.  This means that before your HTTP request even gets off your computer, the server had to send 4KB of data to the client.  That’s a pretty big bundle, considering that the entire Facebook login page is only 8.8KB.  Now, if a public key is generally only 1024 or 2048 bits, with elliptic curve keys being much smaller than that, how did Facebook’s certificate mushroom from 256 to 4325 bytes?  Clearly there is a lot of overhead.  More on this later.

Trusting the Certificate
Once the browser has the server’s certificate, it needs to validate that the certificate is authentic.  After all, did we really get Facebook’s key? Maybe someone is trying to trick us.  To deal with this, public keys are always transferred as part of a certificate, and the certificate is signed by a source, which needs to be trusted.  Your operating system shipped with a list of known and trusted signers (certificate authority roots).  The browser will verify that the Facebook certificate was signed by one of these known, trusted signers.  There are dozens of trusted parties already known to your browser.  Do you trust them all? Well, you don’t really get a choice.  More on this later.

But very few, if any, certificates are actually signed by these CA’s.  Because the Root CA’s are so important to the overall system, they’re usually kept offline to minimize chances of hackery.  Instead, these CAs periodically delegate authority to intermediate CAs, when then validate Facebook’s certificate.  The browser doesn’t care who signs the certificate, as long the chain of certificates ultimately flows to a trusted root CA.

And now we can see why Facebook’s Certificate is so large.  It’s actually not just one Certificate – it is 3 certificates rolled into one bundle:

The browser must verify each link of the chain in order to authenticate that this is really Facebook.com.

Facebook, being as large as they are, would be well served by finding a way to reduce the size of this certificate, and by removing one level from their chain.  They should talk to DigiSign about this immediately.

Verifying The Certificate
With the Facebook Certificate in hand, the browser can almost verify the site is really Facebook.  There is one catch – the designers of Certificates put in an emergency safety valve.  What happens if someone does get a fraudulent certificate (like what happened last month with Comodo) or steal your private key?  There are two mechanisms built into the browser to deal with this.

Most people are familiar with the concept of the “Certificate Revocation List” (CRL).  Inside the certificate, the signer puts a link to where the CRL for this certificate would be found.  If this certificate were ever compromised, the signer could add the serial number for this certificate to the list, and then the browser would refuse to accept the certificate. CRLs can be cached by the operating system, for a duration specified by the CA.

The second type of check is to use the Online Certificate Status Protocol (OCSP).  With OCSP, instead of the browser having to download a potentially very large list (CRL), the browser simply checks this one certificate to see if it has been revoked.  Of course it must do this for each certificate in the chain.  Like with CRLs, these are cacheable, for durations specified in the OCSP response.

In the Facebook.com example, the DigiCert certificates specify an OCSP server.  So as soon as the browser received the Server Hello message, it took a timeout with Facebook and instead issued a series of OCSP requests to verify the certificates haven’t been revoked.

In my trace, this process was quick, with a 17ms RTT, and spanning 4 round-trips (DNS, TCP, OCSP Request 1, OCSP Request 2), this process took 116ms.  That’s a pretty fast case.  Most users have 100+ms RTTs and would have experienced approximately a ½ second delay.  And again, this all happens before we’ve transmitted a single byte of actual Facebook content.  And by the way, the two OCSP responses were 417 bytes and 1100 bytes, respectively.

Oh but the CDN!
All major sites today employ Content Delivery Networks to speed the site, and Facebook is no exception.  For Facebook, the CDN site is “static.ak.facebook.com”, and it is hosted through Akamai. Unfortunately, the browser has no way of knowing that static.ak.facebook.com is related to facebook.com, and so it must repeat the exact same certificate verification process that we walked through before.

For Facebook’s CDN, the Certificate is 1717 bytes, comprised of 2 certificates:

Unlike the certificate for facebook.com, these certificates specify a CRL instead of an OCSP server.  By manually fetching the CRL from the Facebook certificate, I can see that the CRL is small – only 886 bytes. But I didn’t see the browser fetch it in my trace.  Why not?  Because the CRL in this case specifies an expiration date of July 12, 2011, so my browser already had it cached.  Further, my browser won’t re-check this CRL until July, 4 months from now.  This is interesting, for reasons I’ll discuss later.

Oh but the Browser Bug!
But for poor Facebook, there is a browser bug (present in all major browsers, including IE, FF, and Chrome) which is horribly sad.  The main content from Facebook comes from www.facebook.com, but as soon as that page is fetched, it references 6 items from static.ak.facebook.com.  The browser, being so smart, will open 6 parallel SSL connections to the static.ak.facebook.com domain. Unfortunately, each connection will resend the same SSL certificate (1717 bytes).  That means that we’ll be sending over 10KB of data to the browser for redundant certificate information.

The reason this is a bug is because, when the browser doesn’t have certificate information cached for facebook.com, it should have completed the first handshake first (downloading the certificate information once), and then used the faster, SSL session resumption for each of the other 5 connections.

Putting It All Together
So, for Facebook, the overall impact of SSL on the initial user is pretty large.  On the first connection, we’ve got:

  • 2 round trips for the SSL handshake
  • 4325 bytes of Certificate information
  • 4 round trips of OCSP validation
  • 1500 bytes of OCSP response data

Then, for the CDN connections we’ve got:

  • 2 round trips for the SSL handshake
  • 10302 bytes of Certificate information (1717 duplicated 6 times)

The one blessing is that SSL is designed with a fast-path to re-establish connectivity.  So subsequent page loads from Facebook do get to cut out most of this work, at least until tomorrow, when the browser probably forgot most of it and has to start over again.

Making it Better

OCSP & CRLs are broken
In the above example, if the static.ak.facebook.com keys are ever compromised, browsers around the planet will not notice for 4 months. In my opinion, that is too long.  For the OCSP checks, we cache the result for usually ~7 days.  Having users exposed to broken sites for 7 days is also a long time.  And when Comodo was hacked a month ago, the browser vendors elected to immediately patch every browser user on the planet rather than wait for the OCSP caches to expire in a week.  Clearly the industry believes the revocation checking is broken when it is easier to patch than rely on the built-in infrastructure.

But it is worse than that.  What does a browser do when if the OCSP check fails?  Of course, it proceeds, usually without even letting the user know that it has done so (heck, users wouldn’t know what to do about this anyway)!   Adam Langley points this out in great detail, but the browsers really don’t have an option.  Imagine if DigiCert were down for an hour, and because of that users couldn’t access Facebook?  It’s far more likely that DigiCert had downtime than that the certificate has been revoked.

But why are we delaying our users so radically to do checks that we’re just going to ignore the result of if they fail anyway?  Having a single-point-of-failure for revocation checking makes it impossible to do anything else.

Certificates are Too Wordy
I feel really sorry for Facebook with it’s 4KB certificate.  I wish I could say theirs was somehow larger than average.  They are so diligent about keeping their site efficient and small, and then they get screwed by the Certificate.  Keep in mind that their public key is only 2048bits. We could transmit that with 256B of data.  Surely we can find ways to use fewer intermediate signers and also reduce the size of these certificates?

Certificate Authorities are Difficult to Trust
Verisign and others might claim that most of this overhead is necessary to provide integrity and all the features of SSL.  But is the integrity that we get really that much better than a leaner PGP-like system?  The browser today has dozens of root trust points, with those delegating trust authority to hundreds more.  China’s government is trusted by browsers today to sign certificates for google.com, or even facebook.com.  Do we trust them all?

A PGP model could reduce the size of the Certificates, provide decentralization so that we could enforce revocation lists, and eliminate worries about trusting China, the Iranian government, the US government, or any dubious entities that have signature authority today.

Better Browser Implementations
I mentioned above about the flaw where the browser will simultaneously open multiple connections to a single site when it knows it doesn’t have the server’s certificate, and thus redundantly download potentially large certs.  All browsers need to be smarter.
Although I expressed my grievances against the OCSP model above, it is used today.  If browsers continue to use OCSP, they need to fully implement OCSP caching on the client, they need to support OCSP stapling, and they need to help push the OCSP multi-stapling forward.

SSL Handshake Round Trips
The round trips in the handshake are tragic.  Fortunately, we can remove one, and Chrome users get this for free thanks to SSL False Start.  False Start is a relatively new, client-side only change.  We’ve measured that it is effective at removing one round trip from the handshake, and that it can reduce page load times by more than 5%.

Hopefully I got all that right, if you read this far, you deserve a medal.

Chrome Speeding up SSL with SSL FalseStart

Sunday, December 5th, 2010

The latest releases of Chrome now enable a feature called SSL False Start.  False Start is a client-side change which makes your SSL connections faster.  As of this writing, Chrome is the only browser implementing it.  Here is what it does.

In order to establish a secure connection, SSL uses a special handshake where the client and server exchange basic information to establish the secure connection.  The very last message exchanged has traditionally been implemented such that the client says, “done”, waits for the server, and then the server says, “done”.  However, this waiting-for-done is unnecessary, and the SSL researchers have discovered that we can remove one round trip from the process and allow the client to start sending data immediately after it is done.

To visualize this, lets look at some packet traces during the handshake sequence, comparing two browsers:

Chrome

Browser w/o FalseStart

  0ms SEND TCP SYN
83ms RECV TCP SYN ACK
83ms SEND TCP ACK
83ms SEND Client Hello
175ms RECV Server Hello
           Certificate
           Server Hello Done
176ms SEND Client Key Exchange
           Change Cipher Spec
           Enc Handshake Msg
           HTTP Request
274ms RECV Enc Handshake Msg
           Change Cipher Spec
           Enc Handshake Msg
275ms RECV HTTP Response
  0ms SEND TCP SYN
84ms RECV TCP SYN ACK
84ms SEND TCP ACK
84ms SEND Client Hello
173ms RECV Server Hello
           Certificate
           Server Hello Done
176ms SEND Client Key Exchange
           Change Cipher Spec
           Enc Handshake Msg

269ms RECV Enc Handshake Msg
           Change Cipher Spec
           Enc Handshake Msg
269ms SEND HTTP Request

524ms RECV HTTP Response

These two traces are almost identical.  Highlighted in red is the subtle difference.  Notice that Chrome sent the HTTP Request at time 176ms, which was a little more than one round-trip-time faster than the other browser could send it. 

(Note- it is unclear why the HTTP response for the non-FalseStart browser was 250ms late; the savings here is, in theory, just 1 round trip, or 83ms.  There is always variance on the net, and I’ll attribute this to bad luck)

Multiplicative Effect on Web Pages
Today, almost all web pages combine data from multiple sites.  For SSL sites, this means that the handshake must be repeated to each server that is referenced by the page.  In our tests, we see that there are often 2-3 “critical path” connections while loading a web page.  If your round-trip-time is 83ms, as in this example, that’s 249ms of savings – just for getting started with your page.  I hope to do a more thorough report on the effect of FalseStart on overall PLT in the future.

For more information on the topic, check out Adam Langley’s post on how Chrome deals with the very few sites that can’t handle FalseStart.

Linux Client TCP Stack Slower Than Windows

Friday, November 19th, 2010

Conventional wisdom says that Linux has a better TCP stack than Windows.  But with the current latest Linux and the current latest Windows (or even Vista), there is at least one aspect where this is not true.  (My definition of better is simple- which one is fastest)

Over the past year or so, researchers have proposed to adjust TCP’s congestion window from it’s current form (2pkts or ~4KB) up to about 10 packets.  These changes are still being debated, but it looks likely that a change will be ratified.  But even without official ratification, many commercial sites and commercially available load balancing software have already increased initcwnd on their systems in order to reduce latency. 

Back to the matter at hand – when a client makes a connection to a server, there are two variables which dictate how quickly a server can send data to the client.  The first variable is the client’s “receive window”.  The client tells the server, “please don’t exceed X bytes without my acknowledgement”, and this is a fundamental part of how TCP controls information flow.  The second variable is the server’s cwnd, which, as stated previously is generally the bottleneck and is usually initialized to 2.

In the long-ago past,  TCP clients (like web browsers) would specify receive-window buffer sizes manually.  But these days, all modern TCP stacks use dynamic window size adjustments based on measurements from the network, and applications are recommended to leave it alone, since the computer can do it better.  Unfortunately, the defaults on Linux are too low. 

On my systems, with a 1Gbps network, here are the initial window sizes.  Keep in mind your system may vary as each of the TCP stacks does dynamically change the window size based on many factors.

Vista:  64KB
Mac:    64KB
Linux:    6KB

6KB!  Yikes! Well, the argument can be made that there is no need for the Linux client to use a larger initial receive window, since the servers are supposed to abide by RFC2581.  But there really isn’t much downside to using a larger initial receive window,  and we already know that many sites do benefit from a large cwnd already.  The net result is that when the server is legitimately trying to use a larger cwnd, web browsing on Linux will be slower than web browsing on Mac or Windows, which don’t artificially constrain the initial receive window.

Some good news – a patch is in the works to allow users to change the default, but you’ll need to be a TCP whiz and install a kernel change to use it.  I don’t know of any plans to change the default value on Linux yet.  Certainly if the cwnd changes are approved, the default initial receive window must also be changed.  I have yet to find any way to make linux use a larger initial receive window without a kernel change.

Two last notes: 

1) This isn’t theoretical.  It’s very visible in network traces to existing servers on the web that use larger-than-2 cwnd values.  And you don’t hit the stall just once, you hit it for every connection which tries to send more than 6KB of data in the initial burst.

2) As we look to make HTTP more efficient by using fewer connections (SPDY), this limit becomes yet-another-factor which favors protocols that use many connections instead of just one.  TCP implementors lament that browsers open 20-40 concurrent connections routinely as part of making sites load quickly.  But if a connection has an initial window of only 6KB, the use of many connections is the only way to work around the artificially low throttle.

There is always one more configuration setting to tweak.

Chrome Turns 2

Thursday, September 2nd, 2010

chrome Chrome turns 2 today.  Consequent with the birthday, Chrome 6 shipped today as well.

That’s 6 stable releases in 2 years.  Not too bad.

I can’t believe I’ve been working on this project for 4 years now.  It’s entirely too fun.

Velocity 2010 Chrome Slides

Friday, June 25th, 2010

Yesterday I had the opportunity to speak at Velocity on behalf of Chrome as part of a browser panel discussion.  I met a lot of very smart folks and had some great discussions.  Representatives from Firefox and Internet Explorer were also there.  Many people asked, “where is Safari”.  I know Apple doesn’t like these sorts of events, but they were missed.  Anyway, here are my slides.

Chrome: Cranking Up The Clock

Friday, June 4th, 2010

Over the past couple of years, several of us have dedicated a lot of time to Chrome’s timer system. Because we do things a little differently, this has raised some eyebrows. Here is why and what we did.

Goal
Our goal was to have fast, precise, and reliable timers. By “fast”, I mean that the timers should fire repeatedly with a low period. Ideally we wanted microsecond timers, but we eventually settled for millisecond timers. By “precise”, I mean we wanted the timer system to work without drift – you should be able to monitor timers over short or long periods of time and still have them be precise. And by “reliable”, I mean that timers should fire consistently at the right times; if you set a 3.67ms timer, it should be able to fire repeatedly at 3.67ms without significant variance.

Why?
It may be surprising to hear that we had to do any work to implement these types of timers. After all, timers are a fundamental service provided by all operating systems. Lots of browsers use simpler mechanisms and they seem to work just fine. Unfortunately, the default timers really are too slow.

Specifically, Windows timers by default will only fire with a period of ~15ms. While processor speeds have increased from 500Mhz to 3Ghz over the past 15 years, the default timer resolution has not changed.  And at 3GHz,15ms is an eternity.

This problem does affect web pages in a very real way. Internally, browsers schedule time-based tasks to run a short distance in the future, and if the clock can’t tick faster than 15ms, that means the application will sleep for at least that long. To demonstrate, Erik Kay wrote a nice visual sorting test. Due to how Javascript and HTML interact in a web page, applications such as this sorting test use timers to balance execution of the script with responsiveness of the webpage.

John Resig at Mozilla was also wrote a great test for measuring the scalability, precision, and variance of timers. He conducted his tests on the Mac, but here is a quick test on Windows.

In this chart, we’re looking at the performance of IE8, which is similar to what Chrome’s timers looked like prior to our timer work. As you can see, the timers are slow and highly variable. They can’t fire faster than ~15ms. 

timers.IE

A Seemingly Simple Solution
Internally, Windows applications are often architected on top of Event Loops. If you want to schedule a task to run later, you must queue up the task and wake your process later. On Windows, this means you’ll eventually land in the function WaitForMultipleObjects(), which is able to wait for UI events, file events, timer events, and custom events.  (Here is a link to Chrome’s central message loop code) By default, the internal timer for all wait-event functions in Windows is 15ms. Even if you set a 1ms timeout on these functions, it will only wake up once every 15ms (unless non-timer related events are pumped through it).

To change the default timer, applications must call timeBeginPeriod(), which is part of the multimedia timers API. This function changes the clock frequency and is close to what we want.  Its lowest granularity is still only 1ms, but that is a lot better than 15ms. Unfortunately, it also has a a couple of seriously scary side effects. The first side effect is that it is system wide. When you change this value, you’re impacting global thread scheduling among all processes, not just yours. Second, this API also effects the system’s ability to get into it’s lowest-power sleep states.

Because of these two side effects, we were reluctant to use this API within Chrome. We didn’t want to impact any process other than a Chrome process, and all of the possible impacts of the API were nebulous.  Unfortunately, there are no other APIs which could make our message loop work quickly. Although Windows does have a high-performance cycle counter API, that API is slow to execute1, has bugs on some AMD hardware2, and has no effect on the system-wide wait functions.

Justifying timeBeginPeriod
At one point during our development, we were about to give up on using the high resolution timers, because they just seemed too scary.  But then we discovered something. Using WinDbg to monitor Chrome, we discovered that every major multi-media browser plugin was already using this API. And this included Flash3, Windows Media Player, and even QuickTime.  Once we discovered this, we stopped worrying about Chrome’s use of the API.  After all – what percentage of the time is Flash open when your browser is open?  I don’t have an exact number, but it’s a lot. And since this API effects the system globally, most browsers are already running in this mode.

We decided to make this the default behavior in Chrome.  But we hit another roadblock for our timers.

Browser Throttles and Multi-Process
With the high-resolution timer in place, we were now able to set events quickly for Chrome’s internals.  Most internal delayed tasks are long timers, and didn’t need this feature, but there are a half dozen or so short timers in the code, and these did materially benefit. Nonetheless, the one which matters most, the timer stall for the browser’s setTimeout and setInterval functions did not yet benefit. This is because our WebKit code (and other browsers do this too) was intentionally preventing any timer sustaining a faster than 10ms tick.

There are probably several reasons for the 10ms timer in browsers. One was simply for convention. But another is because some websites are poorly written, and will set timers to run like crazy.  If the browser attempts to service the timers, this can spin the CPU, and who gets the bug report when the browser is spinning? The browser vendor, of course.  It doesn’t matter that the real bug is in the website, and not the web browser, so it is important for the browser to address the issue.

But the 3rd, and probably most critical reason is that most single-process browser architectures can become non-responsive if you allow websites to loop excessively with 0-millisecond delays in their JavaScript. Remember that browsers are generally written on top of Event Loops.  If the slow JavaScript interpreter is constantly scheduling a wakeup through a 0ms timer, this clogs the Event Loop which also processes mouse and keyboard events. The user is left with not just a spinning CPU, but a basically hung browser.  While I was able to reproduce this behavior in single-process browsers, Chrome turned out to be immune – and the reason was because of Chrome’s multi-process architecture. Chrome puts the website into a separate process (called a “renderer”) from the browser’s keyboard and mouse handling process.  Even if we spin the CPU in a renderer, the browser remains completely responsive, and unless the user is checking her Task Manager, she might not even notice.

So the multi-process architecture was the enabler. We wrote a simple test page to measure the fastest time through the setTimeout call and verified that a tight loop would not damage Chrome’s responsiveness.  Then, we modified WebKit to reduce the throttle from 10ms to 1ms and shipped the world’s peppiest beta browser: Chrome 1.0beta.

Real World Problems
Our biggest fear with shipping the product was that we would identify some website which was spinning the CPU and annoying users.  We did identify a couple of these, but they were with relatively obscure sites. Finally, we found one which mattered – a small newspaper known as the New York Times. The NYTimes is a well constructe site – they just ran into a little bug with a popular script called prototype.js, and this hadn’t been an issue before Chrome cranked up the clock. We filed a bug, but we had to change Chrome too. At this point, with a little experimentation we found that increasing the minimum timer from 1ms to 4ms seemed to work reasonably well on most machines. Indeed, to this day, Chrome still uses a 4ms minimum tick.

Soon, a second problem emerged as well. Engineers at Intel pointed out that Chrome was causing laptops to consume a lot more power. This was a far more serious problem and harder to fix.  We were not concerned much about the impact on desktops, because Flash, Windows Media Player, and QuickTime, were already causing this to be true.  But for laptops, this was a big problem. To mitigate, we started tapping into the Windows Power APIs, to monitor when the machine is running on battery power. So before Chrome 1.0 shipped out of beta, we modified it to turn off fast timers if it detects that the system is running on batteries. Since we implemented this fix, we haven’t heard many complaints.

Results
Overall, we’re pretty happy with the results.  First off, we can look at John Resig’s timer performance test. In contrast to the default implementation,  Chrome has very smooth, consistent, and fast timers: 

timers.chrome

Finally, here is the result at the Visual Sorting Test mentioned above.  With a faster clock in hand, we see performance doubles. 

clock

Future Work
We’d still like to eliminate the use of timeBeginPeriod.  It is unfortunate that it has such side effects on the system. One solution might be to create a dedicated timer thread, built atop the machine cycle counter (despite the problems with QueryPerformanceCounter), which wakens message loops based on self-calculated, sub-millisecond timers. This sounds trivial, but if we forget any operating system call which is stuck in a wait and don’t manually wake it, we’ll have janky timers. We’d also like to bring the current 4ms timer back down to 1ms. We may be able to do this if we better detect when web pages are accidentally spinning the CPU.

From the operating system side, we’d like to see sub-millisecond event waits built in by default which don’t use CPU interrupts or otherwise prevent CPU sleep states. A millisecond is a long time.

1. Although written in 2003, the data in this article is still relatively accurate: Win32 Performance Measurement Options
2. http://developer.amd.com/assets/TSC_Dual-Core_Utility.pdf
3. Note:  The latest versions of Flash (10) no longer use timeBeginPeriod.
NOTE: This article is my own view of events, and do not reflect the views of my employer.

Google Chrome 2.0

Friday, May 22nd, 2009

Chrome 2.0 shipped out of beta today.

The New York Times seems to like it.