This archive is retained to ensure existing URLs remain functional. It will not contain any emails sent to this mailing list after July 1, 2024. For all messages, including those sent before and after this date, please visit the new location of the archive at https://mailman.ripe.net/archives/list/[email protected]/
[routing-wg] Subject: RPKI ROA Deletion: Post-mortem
- Previous message (by thread): [routing-wg] Subject: RPKI ROA Deletion: Post-mortem
- Next message (by thread): [routing-wg] Subject: RPKI ROA Deletion: Post-mortem
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Danny McPherson
danny at tcb.net
Mon Apr 6 15:54:09 CEST 2020
[top post only] Thanks for this Job, interesting analysis. Another question here: at what interval is data from a given RIR repository ingested / operationalized by a given network operator? Or put differently, any idea how much lag today between when an RIR RPKI repository has a change until that becomes OV policy in _your routers? I'm sure this varies but not sure by how much within a given operator, or across operators. -danny On 2020-04-05 14:29, Job Snijders wrote: > Dear Danny, others, > > On Fri, Apr 03, 2020 at 04:56:41PM -0400, Danny McPherson wrote: >> I also look forward to [your] analysis of the Rostelecom incident that >> occurred in the same timeframe. > > I've taken a look at the incident. 2,666 VRPs disappeared around > 2020-04-01T16:32Z. For the purpose of this analysis the list of > affected > VRPs is > http://instituut.net/~job/deleted-vrps-ripe-2020-04-01-16-32.txt > > Andree Toonk (BGPMon) so kind to compile a list of prefixes which were > wrongly originated by Rostelecom during incident at 2020-04-01T19:27Z > https://portal.bgpmon.net/data/12389_apr2020.txt > > The above list is not the full list of prefixes affected by this leak. > The leak appears to have included route announcements that 12389 > received from some customers and some peers, in addition to 'bgp > optimiser'-style more-specific hijacks. Full list is available here: > https://map.internetintel.oracle.com/api/leak_prefixes/20764_12389_1585768500.pfxs > I'm leaving the 'merely leaked otherwise untouched' routes out of this > analysis as those are outside of scope of Origin Validation: the > fabricated routes in relation to missing RPKI VRPs are what is matters > for this analysis. > > If we take the intersection of Andree's list with the list of missing > VRPs, we have the IP addresses that were affected by both the RIPE NCC > RPKI Deletion incident and the Rostelecom BGP incident. The following > 12 > prefixes (4352 IP addresses): > > peer_count start_time alert_type base_prefix > base_as announced_prefix src_AS Affected_ASname example_ASPath > 49 2020-04-01 19:30:34 more_spec_by_other 91.195.240.0/23 > 47846 91.195.240.0/24 12389 SEDO-AS, DE 24751 20764 12389 > 12 2020-04-01 19:29:55 more_spec_by_other 62.122.168.0/21 > 50245 62.122.170.0/24 12389 SERVEREL-AS, NL 18356 38794 4651 > 4651 20764 12389 > 11 2020-04-01 19:30:34 more_spec_by_other 91.203.184.0/22 > 41064 91.203.187.0/24 12389 SKYROCK, FR 29430 13030 20764 > 12389 > 6 2020-04-01 19:32:12 more_spec_by_other 109.206.160.0/19 > 50245 109.206.164.0/23 12389 SERVEREL-AS, NL 49673 24811 20764 > 12389 > 6 2020-04-01 19:32:12 more_spec_by_other 109.206.160.0/19 > 50245 109.206.174.0/23 12389 SERVEREL-AS, NL 49515 197595 20764 > 12389 > 6 2020-04-01 19:32:12 more_spec_by_other 109.206.160.0/19 > 50245 109.206.178.0/23 12389 SERVEREL-AS, NL 49673 24811 20764 > 12389 > 6 2020-04-01 19:32:12 more_spec_by_other 109.206.160.0/19 > 50245 109.206.168.0/23 12389 SERVEREL-AS, NL 49673 24811 20764 > 12389 > 6 2020-04-01 19:32:12 more_spec_by_other 109.206.160.0/19 > 50245 109.206.180.0/23 12389 SERVEREL-AS, NL 43317 20764 12389 > 5 2020-04-01 19:33:04 more_spec_by_other 109.206.160.0/19 > 50245 109.206.161.0/24 12389 SERVEREL-AS, NL 49515 197595 20764 > 12389 > 5 2020-04-01 19:33:04 more_spec_by_other 109.206.160.0/19 > 50245 109.206.170.0/24 12389 SERVEREL-AS, NL 49673 24811 20764 > 12389 > 5 2020-04-01 19:33:04 more_spec_by_other 109.206.160.0/19 > 50245 109.206.187.0/24 12389 SERVEREL-AS, NL 1126 24785 20562 > 20764 12389 > 5 2020-04-01 19:33:04 more_spec_by_other 109.206.160.0/19 > 50245 109.206.166.0/24 12389 SERVEREL-AS, NL 51514 20562 20764 > 12389 > > If we look at the list of ASNs which were most impacted, the top ten > seems mostly anchored to the US (thus under the ARIN TAL), and almost > all of them seem heavyweights in the cloud / CDN space. > https://portal.bgpmon.net/data/12389_apr2020_affected_asns.txt > > The incorrect routing information covering to the above listed prefixes > was observed by a limited number of BGPMon peers, for other affected > routes the peer_count was around 170. While the RPKI incident lasted a > number of hours, but the Rostelecom routing incident lasted ten minutes > or so. (source: > https://map.internetintel.oracle.com/leaks#/id/20764_12389_1585768500) > > If we assume the generation & propagation of these hijacks was the > result of operator error, I imagine the change could've been reverted > almost immediately but we'd still see a bit of sloshing for a few > minutes through the routing system. Or perhaps the 'waves' we can see > in > Oracle's 3D rendering of the incident are the effects of Maximum Prefix > limits kicking in and various timers firing off at different times. > > Were these prefixes just unlucky because some BGP optimiser algorithm > had chosen them for the purpose of traffc engineering? Was this the > result of sophisticated planning? In any case, I can't judge the impact > this routing incident had on the three above listed ASNs. I don't know > what the victim IPs are used for. > > We have to keep in mind that a large portion of RIPE NCC's RPKI > repository, and of course the RPKI repositories of the other RIRs were > *not* affected. ISPs with 'invalid == reject' policies had lot of RPKI > data (~134,516 VRPs) available and those VRPs did have positive effects > on the scope and reach of the hijacks. RPKI Invalid BGP announcements > don't propagate as as good as Not-Found announcements. > > It appears the 'peer_count' for RPKI protected prefixes was > significantly lower (~140) than prefixes not covered by RPKI ROAs > (~160). The 'peer_count' value can be considered a proxy metric for a > hijack's reach and impact. The RPKI Invalids in this leak propagated > through ASNs for which we know they have not yet deployed RPKI OV. > > The above suggests to me that unavailability of RPKI services during > routing incidents, or lack of deployment of Origin Validation confirms > what most of us already suspected: it is inconvenient. > > RIPE NCC's service interruption appears to have affected 4,352 out of > the total of 5,945,764 misrouted IPs, and the 'peer_count' for the > illegitimate announcements was much lower (better) compared to other > prefixes. > > This leads me to believe this was not a deliberate plan dependent on a > process failure inside RIPE NCC, the incident's BGP data just doesn't > seem to show the incident maximally capitalised on the RPKI outage. > > Kind regards, > > Job
- Previous message (by thread): [routing-wg] Subject: RPKI ROA Deletion: Post-mortem
- Next message (by thread): [routing-wg] Subject: RPKI ROA Deletion: Post-mortem
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]