This archive is retained to ensure existing URLs remain functional. It will not contain any emails sent to this mailing list after July 1, 2024. For all messages, including those sent before and after this date, please visit the new location of the archive at https://mailman.ripe.net/archives/list/routing-wg@ripe.net/
[routing-wg] Issue affecting rsync RPKI repository fetching
- Previous message (by thread): [routing-wg] Issue affecting rsync RPKI repository fetching
- Next message (by thread): [routing-wg] Issue affecting rsync RPKI repository fetching
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Job Snijders
job at fastly.com
Mon Apr 12 17:10:23 CEST 2021
On Mon, Apr 12, 2021 at 02:12:10PM +0100, Nick Hilliard wrote: > Erik Bais wrote on 12/04/2021 11:41: > > This looks to be a 3 line bash script fix on a cronjob … So why > > isn’t this just tested on a testbed and updated before the end of > > the week ? > > cache coherency and transaction guarantees are problems which are > known to be difficult to solve. Long term, the RIPE NCC probably > needs to aim for the degree of transaction read consistency that might > be provided by an ACID database or something equivalent, and that will > take time and probably a migration away from a filesystem-based data > store. > > So the question is what to do in the interim? The bash script that > Job posted may help to reduce some of the race conditions that are > being observed, but it's unlikely to guarantee transaction consistency > in a deep sense. Does the RIPE NCC have an opinion on whether the > approach used in this script would help with some of the problems that > are being seen and if so, would there be any significant negative > effects associated with implementing it as an intermediate patch-up? Perhaps the script [0] can be of use, or perhaps not. The script assumes a POSIXish-compliant environment. It is not clear to me what software process runs where and how RIPE NCC runs their publication service. The core problem seems to me that while RSYNC clients are connected the RIPE NCC RPKI server appears to 'pull the rug' from underneath them. This practise reduces the reliability of the RIPE NCC RPKI service. I can only guess how the RIPE NCC RPKI publication service exactly is configured, but I imagine there is a 'Signer Server' which writes to disk the few thousand individual RPKI objects, and separately there is a RSYNC server (rpki.ripe.net) which serves the files to RSYNC clients. Transferring sets of inter-related files around is a 'batch' operation, the pipeline should set up accordingly. As such, calling 'rsync' from crontab to populate the rpki.ripe.net rsync server would likely lead to inconsistent results. There are (at least) two objectives to keep in mind: 1/ While the Signer software is writing new files out to disk, the 'signer to publisher' replication process should not run, because the signer isn't finished yet. 2/ While a given RSYNC client is fetching from 'rpki.ripe.net', the 'signer to publisher' replication process should not alter the contents of the filesystem hierarchy the RSYNC client is fetching from. The satisfy the above two conditions, I suspect a number of solutions are available: A) take ownership and control and only launch subsequent pipeline steps when the Signer is done signing the latest requests. After a consistent set of files has been written to disk, only then copy, stage, and switch to the new directory contents using a symlink swap (allowing already connected RSYNC clients to complete their fetch). B) Use a load balancer to direct new RSYNC clients to a RSYNC server containing the latest (consistent) set of files. C) Make the RSYNC service pull from the latest (allegedly consistent) RRDP snapshot.xml file, then move newly connected clients to the new content using either the symlink [0] trick or a orchestrate draining/onramping via a load balancer like haproxy. There is a wealth of knowledge available in this working group on how POSIX-like systems work, how ISP operations work, and the RPKI works, I hope RIPE NCC can leverage that. Kind regards, Job [0]: http://sobornost.net/~job/rpki-rsync-move.sh.txt
- Previous message (by thread): [routing-wg] Issue affecting rsync RPKI repository fetching
- Next message (by thread): [routing-wg] Issue affecting rsync RPKI repository fetching
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]