This archive is retained to ensure existing URLs remain functional. It will not contain any emails sent to this mailing list after July 1, 2024. For all messages, including those sent before and after this date, please visit the new location of the archive at https://mailman.ripe.net/archives/list/ripe-atlas@ripe.net/

[atlas] Probe uptime?

Previous message (by thread): [atlas] Probe uptime?
Next message (by thread): [atlas] API (v1) querying probes by ID

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Viktor Naumov vnaumov at ripe.net
Mon Feb 1 12:04:53 CET 2016

Hi Max,

ctr-nue19 was rebooted around 11 o'clock UTC because it was not responding.

The logic is following: If a controller doesn't send heartbeats for more 
than 2 hours it is excluded from the list of available controllers.
The algorithm that assigns a probe to a controller tries to assign 
probes the controller it was connected last time. So your case your 
probe needed to migrate to another controller (ctr-ams07) after 2 hours 
of trying to connect to the last controller.

It is not necessarily a downtime. It is stated here int the report as 
the disconnection time. During that time your probes does measurements 
but it doesn't sent results to controller.

We are introducing the probe uptime metrics, but it is still in the 
making and not exposed to users yet and not included into the billing.

WBR

/vty

On 2/1/16 11:29 AM, Max Mühlbronner wrote:
>
> Even worse:
>
>
> Internet Address        Controller      Connected (UTC) Connected 
> for   Disconnected (UTC)      Disconnected for
> 1xx.xxx.xxx.xx  ctr-ams07, NL   2016-01-31 08:48:48     1d 1h 
> 38m       Still Connected
> 1xx.xxx.xxx.xx  ctr-nue19, DE   2016-01-29 05:07:31     2d 1h 
> 24m       2016-01-31 06:32:02     2h 16m
>
>
> Could it be a problem of the ripe "controllers"? E.g.if they are 
> offline my probe also has a downtime , how fast do they failover to 
> another Controller?? (Why was it offline for 2hours, until it switched 
> to another one)
>
>
> Best Regards
>
> Max M.
>
> On 01.02.2016 09:50, Max Mühlbronner wrote:
>> Hi,
>>
>> i suspect that there is a problem with my atlas probe, sometimes it 
>> is just gone / offline for ~10 minutes (more or less) although there 
>> is clearly no issue on the network. Other checks and monitoring works 
>> fine at the same time and shows about 100% uptime...
>>
>> I am not sure how to investigate, is there any chance to check the 
>> uptime of the probe and not just connection time? Maybe the device is 
>> hanging or rebooting?
>>
>>
>> Internet Address        Controller      Connected (UTC) Connected 
>> for   Disconnected (UTC)      Disconnected for
>> xxx.xxx.xxx.xx  ctr-nue19, DE   2016-01-06 17:03:27     5d 22h 
>> 40m      2016-01-12 15:44:09     0h 5m
>> xxx.xxx.xxx.xx  ctr-nue19, DE   2016-01-01 00:29:44     5d 16h 
>> 16m      2016-01-06 16:46:33     0h 16m
>> xxx.xxx.xxx.xx  ctr-nue19, DE   2015-12-31 23:47:35     0h 38m 
>> 2016-01-01 00:25:40     0h 4m
>> xxx.xxx.xxx.xx  ctr-nue19, DE   2015-12-31 22:59:18     0h 41m 
>> 2015-12-31 23:40:32     0h 7m
>> xxx.xxx.xxx.xx  ctr-nue19, DE   2015-12-31 22:35:12     0h 20m 
>> 2015-12-31 22:56:08     0h 3m
>>
>>
>> Best Regards
>>
>>
>> Max M.
>>
>
>
>

Previous message (by thread): [atlas] Probe uptime?
Next message (by thread): [atlas] API (v1) querying probes by ID

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[ ripe-atlas Archives ]