This archive is retained to ensure existing URLs remain functional. It will not contain any emails sent to this mailing list after July 1, 2024. For all messages, including those sent before and after this date, please visit the new location of the archive at https://mailman.ripe.net/archives/list/routing-wg@ripe.net/

[routing-wg] RPKI Outage Post-Mortem

Previous message (by thread): [routing-wg] RPKI Outage Post-Mortem
Next message (by thread): [routing-wg] RPKI Outage Post-Mortem

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Mick O'Donovan mick at mickod.ie
Tue Feb 25 19:16:16 CET 2020

I care for one!

Furthermore I think it's very refreshing to have outages like this
called out for what they are and full transparency about the cause and
fix communicated. It's most helpful.

Fair play Nathalie and team!

- Mick (AS2110)

On Tue, Feb 25, 2020 at 07:57:41PM +0200, Max Tulyev wrote:
> Hello!
> 
> just a summary: RPKI did not work for 3 days. Nobody care ;)
> 
> 25.02.20 16:12, Nathalie Trenaman пише:
> > Dear colleagues,
> > 
> >  From Saturday 22 February at 08:24 (CET), any newly created, modified, or deleted ROAs (176 in total) could not be added to our publication server due to a disk problem. From that moment on, all the data was stored on the database, but the publication did not happen. The disk did not report any problems and, therefore, no engineer was alerted of this incident.
> > 
> > Due to the disk problem, starting from Sunday 23 February at 09:10 (CET), our CRL expired and our repository could not be properly updated. This was reported to us on Monday 24 February at 11:44 (CET). Immediately, our engineers fixed the disk problem, however, since the CRL expired, all underlying objects also expired. Depending on the Relying Party software an operator used, this abnormal behaviour appeared differently.
> > 
> > Initially, our engineers tried to do a full re-population of the RPKI repository, but unfortunately, this did not update the CRL in the validation tree. At 15:03 (CET), we performed a full CA key-roll, which was completed at 21:02 (CET) and resolved the problem. At 19:58 (CET), all objects in the backlog were published.
> > 
> > We apologise for any inconvenience this may have caused and we are taking all the necessary steps to ensure this incident does not appear again in the future.
> > 
> > Kind regards,
> > 
> > Nathalie Trenaman
> > Routing Security Programme Manager
> > RIPE NCC
> > 
> 

-- 
- MickoD <mick at mickod.ie>

Previous message (by thread): [routing-wg] RPKI Outage Post-Mortem
Next message (by thread): [routing-wg] RPKI Outage Post-Mortem

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[ routing-wg Archives ]