What caused the Google service interruption?

Posted by Andree Toonk - March 12, 2015 - BGP instability - No Comments

This morning people on twitter reported that they were unable to reach Google services. Businessinsider followed up with a story in which they mentioned that the Google service interruption primarily involved European and Indian users.

In this blog we’ll take a quick look at what exactly happened by looking at our BGP data. The first clue comes from David Roy ‏on twitter who noticed traffic was re-routed towards AS9498 in India. Digging through our BGP data we are able to indeed confirm that routing paths for many google prefixes changed to a path that includes the Indian AS 9498 between 08:58 UTC and 09:14 UTC.

Let’s take a look at an example. In my case www.google.com resolves to the following addresses:
www.google.com has address 74.125.226.19
www.google.com has address 74.125.226.20
www.google.com has address 74.125.226.17
www.google.com has address 74.125.226.16
www.google.com has address 74.125.226.18
www.google.com has IPv6 address 2607:f8b0:4006:806::1014

The IPv4 addresses are all in the 74.125.226.0/24 range. If we now look at the BGP announcements for that prefix we see a flurry of BGP updates starting at 08:58:44 UTC from peers in Europe.
This is what all these path have in common: 9498 17488 15169

Google BGP leak

Google BGP leak

In all cases the prefix was originated by Google AS15169, which peers with 17488 Hathway which is an Indian ISP, who then leaked it to its Transit provider Airtel, AS9498. Airtel has a presence at many Internet Exchanges and propagated the announcements to its peers at these Internet Exchanges. Once the peers had alternative routes to Google they started to prefer this path via Airtel and Hathway, possibly back to India, which would explain the outage.

In total 336 Google prefixes were affected during this incident, this includes 7 Google Autonomous systems such as Postini AS 26910 and AS6432 Double Click. The incident only affected IPv4 prefixes.

The leaked routes we detected by a few dozen of our European peers at several European Internet Exchanges including the London Internet Exchange, the Amsterdam Internet Exchange as well as the Moscow Internet Exchange. The list of networks that selected this leaked path to reach Google included a large national telecom provider based in Europe as well as a global Tier1 provider.

Why did these networks prefer the path via Airtel instead of directly to Google?
There are a number of possible explanations. It could be that the network doesn’t have a peering with Google or the peering with Google went down and as a result preferred the path over the peering with airtel. In the case of the Tier1 and the large national telecom network, in both cases it appears Airtel is a customer of these networks and since customer networks are preferred over peerings (with Google in this case), traffic was sent to Airtel.

In this blog we looked at what caused the service interruption this morning for European Google users. Using BGP data we established the service interruption started at 08:58 UTC and was resolved at 09:14 UTC. We were able to determine the root cause: a BGP leak for Google prefixes by Airtel and Hathway. The data further confirms that the BGP leak was picked up only in Europe, which confirms the twitter reports.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *