This archive is retained to ensure existing URLs remain functional. It will not contain any emails sent to this mailing list after July 1, 2024. For all messages, including those sent before and after this date, please visit the new location of the archive at https://mailman.ripe.net/archives/list/[email protected]/
[atlas] RIPE Atlas probe status issues
- Previous message (by thread): [atlas] Request to RIPE Atlas probe hosts in Russia
- Next message (by thread): [atlas] RIPE Atlas probe status issues
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Chris Amin
camin at ripe.net
Mon Apr 23 15:37:22 CEST 2018
Dear RIPE Atlas users, There were various issues relating to the recorded status of RIPE Atlas probes over the weekend. This was brought to our attention by internal monitoring and information provided by users on the mailing list. Throughout this period most probes did actually remain connected to controllers, and measurement results were collected as normal. The side effects included: * the number of probes reported as connected by the system was lower than it should have been * the status (connected/disconnected) of many probes was incorrect * new measurements took longer than usual to start * fewer probes than usual were available for new measurements, leading in some cases to “no suitable” probes messages when trying to schedule new measurements * various system tags were incorrectly applied, including many probes being marked as having USB problems when this was not the case * temporary discrepancies with crediting/debiting of RIPE Atlas credits for the connected time of probes The issues were caused by a bug fix deployment at Friday 9AM UTC where a package was accidentally downgraded causing a regression to an old bug in the task handling of the central system. This bug caused a backlog of messages to build, slowing down or stopping the registering of various status messages in the system. Problems built up gradually as the backlog increased, until the root cause was identified on Sunday morning. The issue was then fixed and the system stabilized completely by about 10AM UTC. We have identified procedural and technical solutions that will stop this problem happening again, and are looking at ways to improve our monitoring of these kinds of issues. We apologise for any inconvenience or confusion caused by this event and would like to thank all of you who took the time to notify us of what you were seeing. Kind regards, Chris Amin RIPE NCC -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: </ripe/mail/archives/ripe-atlas/attachments/20180423/92d7c8cf/attachment.sig>
- Previous message (by thread): [atlas] Request to RIPE Atlas probe hosts in Russia
- Next message (by thread): [atlas] RIPE Atlas probe status issues
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]