This archive is retained to ensure existing URLs remain functional. It will not contain any emails sent to this mailing list after July 1, 2024. For all messages, including those sent before and after this date, please visit the new location of the archive at https://mailman.ripe.net/archives/list/[email protected]/
[atlas] Incident report for 2019-10-02 (was: Error: No suitable probes and delayed results)
- Previous message (by thread): [atlas] Incident report for 2019-10-02 (was: Error: No suitable probes and delayed results)
- Next message (by thread): [atlas] Credits for Research
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Moritz Muller
moritz.muller at sidn.nl
Thu Oct 3 13:15:44 CEST 2019
Hi Robert, Thanks a lot for the update and good look with the review. Moritz > On 3 Oct 2019, at 11:45, Robert Kisteleki <robert at ripe.net> wrote: > > > On 2019-10-03 08:16, Moritz Muller wrote: >> Hi, >> >> In our experiment we’re trying to assign certain probes to a ping measurement but until now we always get the error message "NO SUITABLE PROBES”. >> According to the documentation, this is a sign that a probe might not have enough resources. >> However, when I check on the measurement a few hours later I do see results, but its state is still "NO SUITABLE PROBES”. >> See https://atlas.ripe.net/measurements/23016956/#!probes >> >> Is that a common problem when selecting certain probes for measurements? >> >> Moritz >> > > Hello, > > Yesterday afternoon we had an operational problem within RIPE Atlas that > had consequences visible to users. I strongly suspect the above is a > side-effect of this. > > Due to a combination of two configuration errors and a spike in requests > from users, the core infrastructure received an unreasonably high amount > of measurement requests from an internal process related to IPmap. The > measurements scheduler and participant management subsystems struggled > to keep up with this load and eventually things started piling up. > > The issue started at approximately 12 UTC. We identified the root cause > about an hour later. Processes started to normalise late in the > afternoon, and processing the backlog finished sometime after midnight. > > We're working on a post-mortem, and reviewing the code and configuration > in order to prevent this error from happening again. > > Apologies for the inconvenience, > Robert > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: Message signed with OpenPGP URL: </ripe/mail/archives/ripe-atlas/attachments/20191003/aa3f3c23/attachment.sig>
- Previous message (by thread): [atlas] Incident report for 2019-10-02 (was: Error: No suitable probes and delayed results)
- Next message (by thread): [atlas] Credits for Research
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]