Archive for the ‘Technology’ Category

Comments on Microsoft’s SPDY Proposal

Thursday, March 29th, 2012

Microsoft published their SPDY proposal today to the IETF. They call it “HTTP + Mobility”. Here are some quick comments on their proposal.

a) It’s SPDY!
The Microsoft proposal is SPDY at its core. They’ve fully retained the major elements of SPDY, including multiplexing, prioritization, and compression, and they’ve even lifted the exact syntax of most of the framing layer – maintaining SYN_STREAM, SYN_RESET, SYN_REPLY, HEADERS, etc.

It’s a huge relief for me to see Microsoft propose SPDY with a few minor tweaks.

b) WebSockets Syntax
When SPDY started a couple of years ago, WebSockets didn’t exist. Microsoft is proposing taking existing SPDY, and changing the syntax to be more like WebSockets. This won’t have any feature impact on the protocol, but does make the protocol overall more like other web technologies.

Personally, I don’t think syntax matters much, and I also see value in symmetry across web protocols. I do think the WebSocket syntax is more complicated than SPDY today, but its not that big of a deal. Overall, this part of the Microsoft proposal may make sense. I’m happy that Microsoft has presented it.

c) Removal of Flow Control
The Microsoft proposal is quick to dismiss SPDY’s per-stream flow control as though it is already handled at the TCP layer. However, this is incorrect. TCP handles flow control for the TCP stream. Because SPDY introduces multiple concurrent flows, a new layer of flow control is necessary. Imagine you were sending 10 streams to a server, and one of those streams stalled out (for whatever reason). Without flow control, you either have to terminate all the streams, buffer unbounded amounts of memory, or stall all the streams. None of these are good outcomes, and TCP’s flow control is not the same as SPDY’s flow control.

This may be an example of where SPDY’s implementation experience trumps any amount of protocol theory. For those who remember, earlier drafts of SPDY didn’t have flow control. We were aware of it long ago, but until we fully implemented SPDY, we didn’t know how badly it was needed nor how to do it in a performant and simple manner. I can’t emphasize enough with protocols how important it is to actually implement your proposals. If you don’t implement them, you don’t really know if it works.

d) Optional Compression
HTTP is full of “optional” features. Experience shows that if we make features optional, we lose them altogether due to implementations that don’t implement them, bugs in implementations, and bugs in the design. Examples of optional features in existing HTTP/1.1 include: pipelining, chunked uploads, absolute URIs, and there are many more.

Microsoft did not include any benchmarks for their proposal, so I don’t really know how well it performs. What I do know, however, is that the header compression which Microsoft is advocating be optional was absolutely critical to mobile performance for SPDY. If the Microsoft proposal were truly optimized for mobile, I suspect it would be taking more aggressive steps toward compression rather than pulling it out.

Lastly, I’m puzzled as to why anyone would propose removing the header compression. We could argue about which compression algorithm is best, but it has been pretty non-controversial that we need to start compressing headers with HTTP. (See also: SPDY spec, Mozilla example, UofDelaware research)

e) Removal of SETTINGS frames
SPDY has the promise of “infinite flows” – that a client can make as many requests as it wants. But this is a jedi mind trick. Servers, for a variety of reasons, still want to limit a client to a reasonable number of flows. And different servers have very different ideas about what “reasonable” is. The SETTINGS frame is how servers communicate to the client that they want to do this.

I’m guessing this is an oversight in the Microsoft proposal.

f) Making Server Push Optional
Microsoft proposes to make server push optional. There is a fair discussion to be had about removing Server Push for a number of reasons, but to make it optional seems like the worst of all worlds. Server Push is not trivial, and is definitely one of the most radical portions of the protocol. To make it optional without removing it leaves implementors with the burden of all the complexity with potentially none of the benefits.

The authors offer opinions as to the merits of Server Push, but offer no evidence or data to back up those claims.

h) Removal of IP Pooling
The Microsoft writeup eliminates connection pooling, but it is unclear why. Connection pooling is an important element of SPDY both for performance and for efficiency on the network. I’m not sure why Microsoft would recommend removing this, especially without benchmarks, data, or implementation details. The benchmarks clearly show it has measurable benefit, and without this feature, mobile performance for the Microsoft proposal will surely be slower than for SPDY proper.

Conclusion
I’m happy with the writeup from Microsoft. I view their proposal as agreement that the core of SPDY in acceptable for HTTP/2.0, which should help move the standardization effort along more quickly. They’ve also raised a couple of very reasonable questions. It’s clear that Microsoft hasn’t done much testing or experimentation with their proposal yet. I’m certain that with data, we’ll come to resolution on all fronts quite quickly.

SPDY Momentum Fueled by Juggernauts

Wednesday, March 7th, 2012

Recent SPDY news comes from some big brands: Twitter, Mozilla, Amazon, Apache, Google.

Looking forward to seeing what comes next!

Rethinking SSL for Mobile Apps

Saturday, February 4th, 2012

Mobile Apps use HTTP. But they usually don’t use it to transfer HyperText – rather they are using it to transfer JSON, XML, or other data formats. Just like their web counterparts, secure transmission is desirable.

But, if you ever trace a fresh SSL connection, you know that its a nasty process:

  • DNS
  • TCP handshake
  • SSL handshake
  • Server sends certificate
  • DNS to CA
  • TCP to CA
  • OCSP to CA
  • Finish SSL handshake
  • Finally do what you wanted to do….

SSL is designed so that you can pick up some random certificate and check it dynamically. This is a good thing for the web, where the user coasts from site to site, happily discovering new content which needs new validation.

But this process is pretty costly, especially on mobile networks. For my own service, I just did a quick trace over 3G:

  • DNS (1334ms)
  • TCP handshake (240ms)
  • SSL handshake (376ms)
  • Follow certificate chain (1011ms) — server should have bundled this.
  • DNS to CA (300ms)
  • TCP to CA (407ms)
  • OCSP to CA #1 (598ms) — StartSSL CA uses connection close on each!
  • TCP to CA #2 (317ms)
  • OCSP to CA #2 (444ms)
  • Finish SSL handshake (1270ms)

With the web, this verification process makes some sense – you ask the CA to be your trust point and verify that he trusts the certificate provided.

But why do this with a mobile app? Your mobile app has a lot of trust with it – they downloaded it from you, its signed by Apple, and if the code has been compromised, well, heck, your app isn’t really running anyway.

What we really want for mobile apps is to bake the server’s certificate into the app. If the server’s certificate needs to change, you can auto-update your app. In the example above, doing so would have shaved about 3000ms off application startup time.

The downside of this is that if your certificate changes, your app won’t verify. Then what to do? Simple – force an auto update.

There is another advantage to this approach. If you can verify your own certs, you don’t need a CA provided certificate anyway. These silly 1-2 year expirations are no longer necessary. Sign your own cert, and verify it yourself. Since our CAs have been getting hacked left and right in 2011, this is probably even more secure.

PS: SSL is hard. In this one trace, I can spot at *least* 3 low-hanging-fruit optimizations. I haven’t mentioned them, because they are pervasive everywhere on the net. There are errors here at every level – the client is missing opportunities, the server is missing opportunities, and the CA is missing opportunities! It’s no wonder that SSL is slow. The chance that your combination of client + server + CA will have some dumb performance bug is ~99%.

SPDY configuration: tcp_slow_start_after_idle

Saturday, December 3rd, 2011

If you’re a SPDY server implementor, you’ve likely already read about the impact of CWND. Fortunately, the TCP implementors now largely agree that we can now safely increase CWND, and the standard will likely change soon. The default linux kernel implementation already has.

But, there is a second cwnd-related kernel flag which is not often mentioned. It’s important in all cases, but particularly important if you’re trying to establish long-lived connections. It’s not just important to SPDY – it’s important for HTTP keepalives or pipelines too. And many of the large web service providers are already tuning it:

    > sysctl -a | grep tcp_slow_start_after_idle
    net.ipv4.tcp_slow_start_after_idle = 1
    

At casual glance, you probably think “this sounds good, after a minute or so, it will go back into slow start mode”. That is fine, right?

Not quite. “Idle” in this case doesn’t mean a ‘minute or so’. In fact, it doesn’t even mean a second. This flag comes from RFC2861’s recommendation, which states that cwnd be cut in half with each RTT of idleness. That means that a persistently held open connection soon degrades back to the performance of an un-warmed connection very quickly.

So why does this matter? If you’re attempting to use a long-lived SPDY connection and think that the initial CWND won’t affect you because you’re only opening one connection anyway, you’re wrong. The slow-start-after-idle will still get you.

While there has been a tremendous amount of investigation and discussion about the initial cwnd value, I’m not aware of any recent debate about the slow-start-after-idle. I know that many websites are already disabling this flag to make HTTP keepalive connections perform more reasonably. Sadly, I can’t find any research which actually measured the effects of this behavior in the real world, so I can’t fall back on any real data. Given how aggressive TCP already is at backing off should network congestion change, I see no reason to enable this flag. Further, if you’re helping the net by dropping from N connections to 1, there is no reason you should be further penalized for your good deeds! Turn this one off.

IPv6 DNS Lookup Times

Wednesday, June 15th, 2011

A couple of weeks ago, I posted an article about slow IPv6 DNS query performance.  Several readers suggested in the comments that my observations were isolated to a few bad implementations and that perhaps Mac and Linux systems were not prone to this.  I hoped they were right, but I now have data to show they’re wrong.

Measuring performance is routine in Chrome, so a couple of days ago I was able to add a simple test to measure the speed of pure IPv4 lookups (A record) vs IPv6 lookups (A + AAAA records).  Today, with hundreds of millions of measurements in hand, we know the impact of IPv6 on DNS.

ipv6dns

Windows users face a ~43% DNS latency increase and Mac users face a 146% DNS latency increase when they install an IPv6 client-side address on their machines. 

Today, there are two basic approaches to the IPv6 lookup problem:  you can issues the requests in serial or in parallel.  Obviously, issuing in serial will be more latency impacting than doing them in parallel.  But, even issuing them in parallel will be slower, as the maximum latency of two samples along a normal curve will be lower than the average latency of a single sample along the same curve.

Some readers may argue that these are dominated by older implementations.  Further investigation into the data does not confirm this.  Even comparing the fastest 10% of Mac or Windows users (which should hopefully be using the best DNS resolver algorithms available) are seeing greater than 100% DNS lookup latency increases with IPv6.  Of course, we have conflation here, as DNS server response times may be slower in addition to the DNS client double-lookup being slower.  A deeper study about why these results are so bad is warranted.

Summary

Performance-sensitive users should be cautious about assigning a global IPv6 address on their clients.  As soon as they do, the browser will switch into IPv6 lookup mode, and take on a 40-100% increase in DNS latency.

Firefox Idle Connection Reuse

Saturday, June 11th, 2011

httpwatch does some anecdotal testing to conclude that Firefox’s new algorithm for selecting which idle connection to reuse has some strong benefits.

This is great stuff, and in general it should definitely help.  This is part of why getting to one-connection-per-domain is an important goal.  HTTP’s use of 6 or more connections per domain make it so that each connection must “warm up” independently.  A similar algorithm should land in Chrome soon too.

Fortunately, there is a protocol for this stuff too :-)   Hopefully firefox will pick that up soon too.

Codeflattery (n)

Wednesday, May 25th, 2011

codeflattery
[kohd-flat-uh-ree] noun
1.  when journalists write about your half-done, partial work as though it is news.

My employer (Google), my project (Chrome), and my team (SPDY) make such amazing products that the press even pays attention to my tiny hacks.  That makes me proud.  Codeflattery.

http://www.conceivablytech.com/7582/products/google-adds-possible-tcp-replacement-to-chrome

SSL FalseStart Performance Results

Thursday, May 19th, 2011

From the Chromium Blog:

Last year, Google’s Adam Langley, Nagendra Modadugu, and Bodo Moeller proposed SSL False Start, a client-side only change to reduce one round-trip from the SSL handshake.

We implemented SSL False Start in Chrome 9, and the results are stunning, yielding a significant decrease in overall SSL connection setup times. SSL False Start reduces the latency of a SSL handshake by 30%1. That is a big number. And reducing the cost of a SSL handshake is critical asmore and more content providers move to SSL.

Our biggest concern with implementing SSL False Start was backward compatibility. Although nothing in the SSL specification (also known as TLS) explicitly prohibits FalseStart, there was no easy way to know whether it would work with all sites. Speed is great, but if it breaks user experience for even a small fraction of users, the optimization is non-deployable.

To answer this question, we compiled a list of all known https websites from the Google index, and tested SSL FalseStart with all of them. The result of that test was encouraging: 94.6% succeeded, 5% timed out, and 0.4% failed. The sites that timed out were verified to be sites that are no longer running, so we could ignore them.

To investigate the failing sites, we implemented a more robust check to understand how the failures occurred. We disregarded those sites that failed due to certificate failures or problems unrelated to FalseStart. Finally, we discovered that the sites which didn’t support FalseStart were using only a handful of SSL vendors. We reported the problem to the vendors, and most have fixed it already, while the others have fixes in progress. The result is that today, we have a manageable, small list of domains where SSL FalseStart doesn’t work, and we’ve added them to a list within Chrome where we simply won’t use FalseStart. This list is public and posted in the chromium source code. We are actively working to shrink the list and ultimately remove it.

All of this represents a tremendous amount of work with a material gain for Chrome SSL users. We hope that the data will be confirmed by other browser vendors and adopted more widely.


1Measured as the time between the initial TCP SYN packet and the end of the TLS handshake.

IPv6 Will Slow You Down (DNS)

Wednesday, May 18th, 2011

ipv6-logo When you turn on IPv6 in your operating system, the web is going to get slower for you.  There are several reasons for this, but today I’m talking about DNS.  Every DNS lookup is 2-3x slower with IPv6.

What is the Problem?
The problem is that the current implementations of DNS will do both an IPv4 and an IPv6 lookup in serial rather than parallel.  This is operating as-per the specification.

We can see this on Windows:

     TIME   EVENT
       0    DNS Request A
www.amazon.com
      39    DNS Response www.amazon.com
      39    DNS Request AAAA www.amazon.com
      79    DNS Response www.amazon.com
       <the browser cannot continue until here>

The “A” request there was the IPv4 lookup, and it took 39ms.  The “AAAA” request is the IPv6 lookup, and it took 40ms.   So, prior to turning IPv6 on, your DNS resolution finished in 39ms.  Thanks to your IPv6 address, it will now take 79ms, even if the server does not support IPv6!  Amazon does not advertise an IPv6 result, so this is purely wasted time.

Now you might think that 40ms doesn’t seem too bad, right?  But remember that this happens for every host you lookup.  And of course, Amazon’s webpage uses many sub-domain hosts.  In the web page above, I saw more of these shenanigans, like this:

     TIME   EVENT
       0    DNS Request A
g-ecx.images-amazon.com
      43    DNS Response g-ecx.images-amazon.com
      43    DNS Request AAAA g-ecx.images-amazon.com
     287    DNS Response g-ecx.images-amazon.com

Ouch – that extra request cost us 244ms.

But there’s more.  In this trace we also had a lookup for the OCSP server (an Amazon’s behalf, for SSL):

     TIME   EVENT
       0    DNS Request A
ocsp.versign.com
     116    DNS Response ocsp.versign.com
     116    DNS Request AAAA
ocsp.versign.com
     203    DNS Response ocsp.versign.com

Ouch – another 87ms down the drain.

The average website spans 8 domains.  A few milliseconds here, and a few milliseconds there, and pretty soon we’re talking about seconds.

The point is that DNS performance is key to web performance!  And in these 3 examples, we’ve slowed down DNS by 102%, 567%, and 75% respectively.  I’m not picking out isolated cases.  Try it yourself, this is “normal” with IPv6.

What About Linux?
Basically all of the operating systems do the same thing.  The common API for doing these lookups is getaddrinfo(), and it is used by all major browsers.  It does both the IPv4 and IPv6 lookups, sorts the results, and returns them to the application.

So on Linux, the behavior ends up being like this:

     TIME   EVENT
       0    DNS Request AAAA www.calpoly.edu
      75    DNS Response www.calpoly.edu
      75    DNS Request A www.calpoly.edu
      93    DNS Response www.calpoly.edu

In this particular case, we only wasted 75ms, when the actual request would have completed in 18ms (416% slower).

But It’s Even Worse
I wish I could say that DNS latencies were just twice as slow.  But it’s actually worse than that.  Because IPv6 is not commonly used, the results of IPv6 lookups are not heavily cached at the DNS servers like IPv4 addresses are. This means that it is more likely that an IPv6 lookup will need to jump through multiple DNS hops to complete a resolution. 

As a result, it’s not just that we’re doing two lookups instead of one.  It’s that we’re doing two lookups and the second lookup is fundamentally slower than the first.

Surely Someone Noticed This Before?
This has been noticed before.  Unfortunately, with nobody using IPv6, the current slowness was an acceptable risk.  Application vendors (namely browser vendors) have said, “this isn’t our problem, host resolution is the OS’s job”.

The net result is that everyone knows about this flaw.  But nobody fixed it.  (Thank goodness for DNS Prefetching!)

Just last year, the “Happy Eyeballs” RFC was introduced which proposes a work around to this problem by racing connections against each other.  This is an obvious idea, of course.  I don’t know of anyone implementing this yet, but it is certainly something we’re talking about on the Chrome team.

What is The Operating System’s Job?
All browsers, be it Chrome or Firefox or IE, use the operating system to do DNS lookups.  Observers often ask, “why doesn’t Chrome (or Firefox, or IE) have its own asynchronous DNS resolver?”  The problem is that every operating system, from Windows to Linux to Mac has multiple name-resolution techniques, and resolving hostnames in the browser requires using them all, based on the user’s operating configuration.  Examples of non-DNS resolvers include:  NetBIOS/WINS, /etc/hosts files, and Yellow Pages.  If the browser simply bypassed all of these an exclusively used DNS, some users would be completely broken.

If these DNS problems had been fixed at the OS layer, I wouldn’t be writing this blog post right now.  But I don’t really blame Windows or Linux – nobody was turning this stuff on.  Why should they shine a part of their product that nobody uses?

Lesson Learned:  Only The Apps Can ‘Pull’ Protocol Changes
IPv6 deployment has been going on for over 10 years now, and there is no end in sight.  The current plan (like IPv6’s break the internet day) is the same plan we’ve been doing for 10 years.  When do we admit that the current plan to deploy IPv6 is simply never going to work?

A lesson learned from SPDY is that only the applications can drive protocol changes.  The OS’s, bless their hearts, can only do so much and move too slowly to push new protocols.  There is an inevitable chicken-and-egg problem where applications won’t use it because OS support is not there and OSes won’t optimize it because applications aren’t there.

The only solution is at the Application Layer – the browser.  But that may be the best news of all, because it means that we can fix this!  More to come…

IPv6 latency analysis coming….

Friday, May 13th, 2011

Over the next few days, I’m going to be posting some blogs about IPv6 performance.

The results are pretty grim – but my aim is not to make everyone despair.

There is a solution, and I think I can see light at the end of the tunnel.  My theory is that we’ve been approaching IPv6 deployment incorrectly for the last 10 years.  It seems obvious now, but it wasn’t obvious 10 years ago, and things have certainly changed which enable this new mechanism. 

IPv6 break-the-world-day is approaching quickly.  If you haven’t started thinking about this, you should.