You might have experienced it or read about it Monday on the nanog
and cisco mailing list. Widespread routing instability caused by a single AS, The beauty of distributed routing system.
AS47868 (SuproNet) apparently was experimenting with traffic engineering, by using AS path prepending. AS path prepending is a frequently used method to make a certain announcement a bit less preferable by making the AS path longer. It can help network administrators influencing on which peering traffic for certain prefix is preferred. This is done by prepending your own AS one, two or maybe a few times. I guess it's fair to say that prepends up to let's say 5 are fairly common, you will see them longer as well but in normal scenario's that shouldn't be necessary.
AS47868 was prepending it's AS path many times, up to 252 times resulting in a AS path of 256. Although this is an insanely high number, considering that the average AS path length is about 4.3, It should definitely not cause the behavior we observed Monday.
A number of routers that apparently run older software, were not capable to handle these long AS paths and as a result a fair number of BGP sessions started to flap, which caused a wave of updates (many times higher then normal) causing instability. A Good technical explanations can be found at renesys
and arbor security
The key thing here is that a single AS, announcing a single odd announcement was able to influence many BGP routers, resulting in world wide instability.
So who do we 'blame'? Well, I would not blame AS47868 in this case, the real cause are the ASn's with buggy BGP implementations. A single odd announcement should never be able to impact so many others.
Monitoring for Long AS paths
I added some extra functionality to the BGPmon software. It now collects the longest AS paths seen each day. It will also display the AS path and additional information. Check it out here: http://bgpmon.net/maxASpath.php
Interestingly we did see a similar issue
a few weeks ago as well. BGP sessions started flapping because of invalid data (AS_CONFED_SEQUENCE and AS_CONFED_SET) in the AS4_PATH field. This is actually a feature not a bug, or if you will a bug in the RFC.
The RFC described that a BGP speaker should teardown the session if it sees such an update. This was done to isolate the problem as much as possible and only direct neighbors would be affected. As it turns out in some cases the direct neighbor does not detect this and propagates the update and as a result routers further in the core will start flapping.
The problem here is comparable, a single announcement is able to teardown BGP sessions all over the Internet, so not just its direct neighbors. This results in lots of BGP updates and global BGP instability.
The above I guess, could be described a BGP denial of service attack.
However it important to realize that in one case the flaw was actually in the RFC and this is being fixed. In the case we saw this week it is a software bug. As with many of the BGP related events we have seen lately, most are non intentionally. Never the less the impact can be huge.
Same kind of incident last week
Last week one of our upstream providers in Vancouver experienced a similar problem, causing some routing instability for them and all their customers.
According to the Post Mortem we received, one of their peers sent them BGP updates with Malformed AS paths, this is the exact same behavior many people experienced on Monday.
Looking back in some of the BGP data that I collect for BGPmon.net
, I notice that at the exact moment that my upstream started to experience these problems an AS path with a length of 257 was detected. In that case AS45307 had prepended it's AS 251 times.
From a security perspective it's a real nightmare that things like this can have such a widespread impact. The scary part is, that it's not hard to imagine the harm someone can do to the stability of the Internet if attacks like these are targeted or by maybe combining one or two of these attacks.
Contrary to what some belief, the issues described in this article will not be solved by secure routing proposals such as SBGP, SoBGP or rpki, it will however be solved by good BGP implementations!