This archive is retained to ensure existing URLs remain functional. It will not contain any emails sent to this mailing list after July 1, 2024. For all messages, including those sent before and after this date, please visit the new location of the archive at https://mailman.ripe.net/archives/list/routing-wg@ripe.net/
[routing-wg] Subject: RPKI ROA Deletion: Post-mortem
- Previous message (by thread): [routing-wg] Subject: RPKI ROA Deletion: Post-mortem
- Next message (by thread): [routing-wg] Subject: RPKI ROA Deletion: Post-mortem
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Job Snijders
job at ntt.net
Sun Apr 5 20:29:54 CEST 2020
Dear Danny, others, On Fri, Apr 03, 2020 at 04:56:41PM -0400, Danny McPherson wrote: > I also look forward to [your] analysis of the Rostelecom incident that > occurred in the same timeframe. I've taken a look at the incident. 2,666 VRPs disappeared around 2020-04-01T16:32Z. For the purpose of this analysis the list of affected VRPs is http://instituut.net/~job/deleted-vrps-ripe-2020-04-01-16-32.txt Andree Toonk (BGPMon) so kind to compile a list of prefixes which were wrongly originated by Rostelecom during incident at 2020-04-01T19:27Z https://portal.bgpmon.net/data/12389_apr2020.txt The above list is not the full list of prefixes affected by this leak. The leak appears to have included route announcements that 12389 received from some customers and some peers, in addition to 'bgp optimiser'-style more-specific hijacks. Full list is available here: https://map.internetintel.oracle.com/api/leak_prefixes/20764_12389_1585768500.pfxs I'm leaving the 'merely leaked otherwise untouched' routes out of this analysis as those are outside of scope of Origin Validation: the fabricated routes in relation to missing RPKI VRPs are what is matters for this analysis. If we take the intersection of Andree's list with the list of missing VRPs, we have the IP addresses that were affected by both the RIPE NCC RPKI Deletion incident and the Rostelecom BGP incident. The following 12 prefixes (4352 IP addresses): peer_count start_time alert_type base_prefix base_as announced_prefix src_AS Affected_ASname example_ASPath 49 2020-04-01 19:30:34 more_spec_by_other 91.195.240.0/23 47846 91.195.240.0/24 12389 SEDO-AS, DE 24751 20764 12389 12 2020-04-01 19:29:55 more_spec_by_other 62.122.168.0/21 50245 62.122.170.0/24 12389 SERVEREL-AS, NL 18356 38794 4651 4651 20764 12389 11 2020-04-01 19:30:34 more_spec_by_other 91.203.184.0/22 41064 91.203.187.0/24 12389 SKYROCK, FR 29430 13030 20764 12389 6 2020-04-01 19:32:12 more_spec_by_other 109.206.160.0/19 50245 109.206.164.0/23 12389 SERVEREL-AS, NL 49673 24811 20764 12389 6 2020-04-01 19:32:12 more_spec_by_other 109.206.160.0/19 50245 109.206.174.0/23 12389 SERVEREL-AS, NL 49515 197595 20764 12389 6 2020-04-01 19:32:12 more_spec_by_other 109.206.160.0/19 50245 109.206.178.0/23 12389 SERVEREL-AS, NL 49673 24811 20764 12389 6 2020-04-01 19:32:12 more_spec_by_other 109.206.160.0/19 50245 109.206.168.0/23 12389 SERVEREL-AS, NL 49673 24811 20764 12389 6 2020-04-01 19:32:12 more_spec_by_other 109.206.160.0/19 50245 109.206.180.0/23 12389 SERVEREL-AS, NL 43317 20764 12389 5 2020-04-01 19:33:04 more_spec_by_other 109.206.160.0/19 50245 109.206.161.0/24 12389 SERVEREL-AS, NL 49515 197595 20764 12389 5 2020-04-01 19:33:04 more_spec_by_other 109.206.160.0/19 50245 109.206.170.0/24 12389 SERVEREL-AS, NL 49673 24811 20764 12389 5 2020-04-01 19:33:04 more_spec_by_other 109.206.160.0/19 50245 109.206.187.0/24 12389 SERVEREL-AS, NL 1126 24785 20562 20764 12389 5 2020-04-01 19:33:04 more_spec_by_other 109.206.160.0/19 50245 109.206.166.0/24 12389 SERVEREL-AS, NL 51514 20562 20764 12389 If we look at the list of ASNs which were most impacted, the top ten seems mostly anchored to the US (thus under the ARIN TAL), and almost all of them seem heavyweights in the cloud / CDN space. https://portal.bgpmon.net/data/12389_apr2020_affected_asns.txt The incorrect routing information covering to the above listed prefixes was observed by a limited number of BGPMon peers, for other affected routes the peer_count was around 170. While the RPKI incident lasted a number of hours, but the Rostelecom routing incident lasted ten minutes or so. (source: https://map.internetintel.oracle.com/leaks#/id/20764_12389_1585768500) If we assume the generation & propagation of these hijacks was the result of operator error, I imagine the change could've been reverted almost immediately but we'd still see a bit of sloshing for a few minutes through the routing system. Or perhaps the 'waves' we can see in Oracle's 3D rendering of the incident are the effects of Maximum Prefix limits kicking in and various timers firing off at different times. Were these prefixes just unlucky because some BGP optimiser algorithm had chosen them for the purpose of traffc engineering? Was this the result of sophisticated planning? In any case, I can't judge the impact this routing incident had on the three above listed ASNs. I don't know what the victim IPs are used for. We have to keep in mind that a large portion of RIPE NCC's RPKI repository, and of course the RPKI repositories of the other RIRs were *not* affected. ISPs with 'invalid == reject' policies had lot of RPKI data (~134,516 VRPs) available and those VRPs did have positive effects on the scope and reach of the hijacks. RPKI Invalid BGP announcements don't propagate as as good as Not-Found announcements. It appears the 'peer_count' for RPKI protected prefixes was significantly lower (~140) than prefixes not covered by RPKI ROAs (~160). The 'peer_count' value can be considered a proxy metric for a hijack's reach and impact. The RPKI Invalids in this leak propagated through ASNs for which we know they have not yet deployed RPKI OV. The above suggests to me that unavailability of RPKI services during routing incidents, or lack of deployment of Origin Validation confirms what most of us already suspected: it is inconvenient. RIPE NCC's service interruption appears to have affected 4,352 out of the total of 5,945,764 misrouted IPs, and the 'peer_count' for the illegitimate announcements was much lower (better) compared to other prefixes. This leads me to believe this was not a deliberate plan dependent on a process failure inside RIPE NCC, the incident's BGP data just doesn't seem to show the incident maximally capitalised on the RPKI outage. Kind regards, Job
- Previous message (by thread): [routing-wg] Subject: RPKI ROA Deletion: Post-mortem
- Next message (by thread): [routing-wg] Subject: RPKI ROA Deletion: Post-mortem
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]