This archive is retained to ensure existing URLs remain functional. It will not contain any emails sent to this mailing list after July 1, 2024. For all messages, including those sent before and after this date, please visit the new location of the archive at https://mailman.ripe.net/archives/list/routing-wg@ripe.net/

[routing-wg] Delay in publishing RPKI objects

Previous message (by thread): [routing-wg] Delay in publishing RPKI objects
Next message (by thread): [routing-wg] Improving operations at RIPE NCC TA (Was: Delay in publishing RPKI objects)

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

George Michaelson ggm at algebras.org
Thu Feb 18 01:59:31 CET 2021

On Thu, Feb 18, 2021 at 10:37 AM Randy Bush <randy at psg.com> wrote:
>
> > To refresh the stack, can you give me an instance please?
> >
> > >  then you need to fix operational deployment.
> >
> > Thats work-in-progress. We were hoping to move on a process design to
> > get there, while we finish that deployment. Almost all children NOT in
> > hosted, are RRDP active.  I would be very surprised if the majority
> > use case now, is not RRDP active.
> >
> > > then you  can measure the net to be sure everybody is serving rrdp properly.
> >
> > That sounds like a fine activity for somebody ELSE to do, to me.
>
> see our imc 2020 paper

The data is from January-April 2020. It would be interesting to see
how the landscape has changed by April 2021 I think. Two reasons: the
publishing side may well have changed, and the RP side has definitely
changed in some ways. Not that it invalidates the IMC paper: far from
it. The point would be, to see if it can help show there has been a
substantive change in the system overall.

Do you think a re-measure is achievable as a low(ish) cost activity?

>
> >> but we have had this discussion before.
> >
> > Yea, I know, but the problem is we've arrived at needing to boost
> > resiliency against scale, and rsync is a really poor fit for the
> > problem because of the fact most CDN choices are tuned for HTTP and
> > not arbitrary TCP protocols.
>
> your emergency due to lack of planning and action does not motivate me

I think this is a poor characterisation of what should be done, and
what the cost/benefit issues are.

Suffice to say we have plans, and we are acting.

The "emergency" such as there is one, is, that during the deployment
and planning, service levels are going to continue to be open to
question. I have this work timed for Q3/4 in 2021 because I have a
larger body of un-related work in Q1/2.  The distribution of service
into self-hosted raises concerns for me that no amount of work in the
RIR will fix. We have been promoting "publish in parent" because it
helps to reduce the points of connect, which are going to tend to be
SPF for many self-hosted people until they also put their publication
states into a resilient fabric. We're improving our own resiliency all
the time. I discussed some RTT outcomes today with Job, in RRDP he can
see 300ms drop to 5ms from the CDN/DNS solution we use, which is a
significant improvement in RTT, and load sharing. I cannot achieve
that in the non-web protocol because nobody can offer cache for the
datastream in question. I can do better than 300ms delay (which RobA
frequently pointed out made APNIC look particularly eggregious on the
long-haul datapath, because rsync is innately serialised read/write
function) if I can get enough points of presence behind rsync, but
then I get a coherency problem, which the CDN for web guys solved. Its
just hard to fix this, in rsync. You know this, and its one of the
reasons I wanted to promote deprecation.

It might help, if publication-as-a-service was a thing, and we all
decided to put the publication burden into prime agents, we paid to do
this under SLA. That has problems of its own, in terms of governance,
maybe it needs to be a market. But, thats kind-of how the DNS works.
There's a label, its served by different people, sometimes they
administrate the boxes directly, sometimes they use intermediaries, we
measure the effectiveness of them against load, it mostly works.

I wouldn't have a problem if there was a declared market price to do
publication protocol into AWS, Cloudflare, Fastly, GCP, same protocol
endpoint, they do the rest once you write objects in.  It might well
be significantly more resilient than what we're trying to do now.
Hosting the TA function, the HSM bound functions, I don't think we've
hit significant stresses yet. RIPE are looking at dual-redundant
signer models. There are cloud-HSM services.

-G

>
> randy

Previous message (by thread): [routing-wg] Delay in publishing RPKI objects
Next message (by thread): [routing-wg] Improving operations at RIPE NCC TA (Was: Delay in publishing RPKI objects)

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[ routing-wg Archives ]