This archive is retained to ensure existing URLs remain functional. It will not contain any emails sent to this mailing list after July 1, 2024. For all messages, including those sent before and after this date, please visit the new location of the archive at https://mailman.ripe.net/archives/list/ripe-atlas@ripe.net/

[atlas] New on RIPE Labs: Another Look at RIPE Atlas Probe Lifetimes

Previous message (by thread): [atlas] New on RIPE Labs: Another Look at RIPE Atlas Probe Lifetimes
Next message (by thread): [atlas] New on RIPE Labs: Another Look at RIPE Atlas Probe Lifetimes

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Robert Kisteleki robert at ripe.net
Thu Aug 25 19:44:24 CEST 2016

Hello,

Some clarifications below.

On 2016-08-25 16:51, Max Mühlbronner wrote:
> Hi,
> 
> i noticed another problem besides the hardware, the "controllers" are not
> really highly-available? (E.g. ctr-ams07, NL)

They are not individually highly-available, but we have many of them for
redundancy. So if some of them go down, only a portion of the network is
affected.

> Sometimes the probes are disconnected from a RIPE controller for some time
> even though there is no absolutely no network issue at the probes site. I
> suppose this could happen from time to time but it's too often. (Maybe it's
> just me, maybe everyone should check the connection history at RIPE atlas).

Every now and then, the NCC network has a hickup, or scheduled network
maintenance. This can break the connections of the probes currently
connected to us. We have controllers in other networks too (in Germany, US
east/west and Singapore) which of course can have similar issues.

As you probably know, probes keep an open TCP (SSH) connection to the
controller. If any part of the network between the probe and the controller
fails hard enough, there is a disconnection. In other words, probe
disconnection can be caused by issues in the probe's network, in ours, or in
anything in between :)

> Also if a controller is not working sometimes it seems to take forever until
> the probe will use a failover controller and in the meanwhile its
> "disconnected"... Any chance to improve the availability?

Generally speaking, probes try hard to get back to the same controller they
were connected to before disconnection. If this does not succeed for two
hours, then they connect to the reg.servers ("trust anchors" if you will) to
ask for a new controller. In the meantime they continue to measure and
buffer results -- though they indeed show up as disconnected and therefore
can't participate in the newest measurements.

We have this mechanism in place to avoid probes jumping around from
controller to controller in case there's a glitch somewhere.

Hope this clarifies,
Robert

> Best Regards
> 
> 
> Max M.
> 
> 
> On 25.08.2016 16:37, Colin Johnston wrote:
>> maybe virtual vm probes might be useful to investigate and also a split
>> file system for v3 probs with /var/log on a different file system ?
>>
>> Colin
>>
>>> On 25 Aug 2016, at 15:30, Mirjam Kuehne <mir at ripe.net> wrote:
>>>
>>>
>>> Dear colleagues,
>>>
>>> We took another, more detailed look at probe lifetimes and the dynamics
>>> of probes connecting and disconnecting from the RIPE Atlas
>>> infrastructure to try to understand how to keep the network growing in
>>> the long term.
>>>
>>> https://labs.ripe.net/Members/wilhelm/another-look-at-ripe-atlas-probe-lifetimes?pk_campaign=labs&pk_kwd=list-mat
>>>
>>>
>>> Kind regards,
>>> Mirjam Kuehne
>>> RIPE NCC
>>>
>>
> 
> 
> 
>

Previous message (by thread): [atlas] New on RIPE Labs: Another Look at RIPE Atlas Probe Lifetimes
Next message (by thread): [atlas] New on RIPE Labs: Another Look at RIPE Atlas Probe Lifetimes

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[ ripe-atlas Archives ]