This archive is retained to ensure existing URLs remain functional. It will not contain any emails sent to this mailing list after July 1, 2024. For all messages, including those sent before and after this date, please visit the new location of the archive at https://mailman.ripe.net/archives/list/routing-wg@ripe.net/
[routing-wg] How BGP routes can get 'stuck' in the Default-Free Zone
- Previous message (by thread): [routing-wg] New on RIPE Labs: Does The Internet Route Around Damage? - Edition 2021
- Next message (by thread): [routing-wg] AS8003 and U.S. Department of Defense routing
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Job Snijders
job at fastly.com
Wed Apr 21 14:00:52 CEST 2021
Dear group, I'd like to draw your attention to an excellent article on an intricate interaction between BGP and TCP which can result in 'zombie routes' in the BGP Default-Free Zone. https://blog.benjojo.co.uk/post/bgp-stuck-routes-tcp-zero-window My current running theory on the root cause of some mishaps in the global routing system is that certain BGP implementations can end up in a broken state where such systems will still generate and send out KEEPALIVE messages, but are unable to process other BGP messages (and such a system instructs all its peers to not send new data by signalling a zero TCP receive window). This is "Problem #1". "Problem #2" is that almost all BGP implementations are unable to robustly deal with systems suffering from Problem #1. Allmost all BGP implementations assume that when KEEPALIVE messages don't make it across the wire, the remote system will initiate the session tear down. But of course, if the remote system is in such a broken state that it can't issue session tear downs ... the combined system state is perpetually broken. The Security Section of https://datatracker.ietf.org/doc/html/draft-spaghetti-idr-bgp-sendholdtimer elaborates on three detrimental facets of the above situation. It is quite rare for systems to end up in the "Problem #1" state, but when it happens, all systems connected to the broken node probably are better off disconnecting from such a system than perpetually forwarding (and potentially blackholing) Internet traffic into the broken system. Kind regards, Job
- Previous message (by thread): [routing-wg] New on RIPE Labs: Does The Internet Route Around Damage? - Edition 2021
- Next message (by thread): [routing-wg] AS8003 and U.S. Department of Defense routing
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]