This archive is retained to ensure existing URLs remain functional. It will not contain any emails sent to this mailing list after July 1, 2024. For all messages, including those sent before and after this date, please visit the new location of the archive at https://mailman.ripe.net/archives/list/[email protected]/
[atlas] Email or SMS alert when probe goes offline/online
- Previous message (by thread): [atlas] Email or SMS alert when probe goes offline/online
- Next message (by thread): [atlas] Email or SMS alert when probe goes offline/online
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Greg B - NANOG
gwbnanog at gmail.com
Wed Dec 14 18:36:29 CET 2011
Robert, That's great and I do hope the firmware update helps at least some of these situations. Looking at my last 25 connections list I also see downtimes of 4 hours, 6 hours, and two times for 1 hour over the last month. I'm pretty sure my internet connection wasn't actually down for these long periods since I have monitoring of it from another location (my office) which doesn't show these outages. So I do hope a feature is added in the near future to allow the probe host to set a threshold for when to notify of probe down in minutes instead of the default of 5 days. -Greg On Wed, Dec 14, 2011 at 9:05 AM, Robert Kisteleki <robert at ripe.net> wrote: > Hi, > > On 2011.12.14. 5:37, Greg B - NANOG wrote: > > Hi, > > I see there was a thread started back on September 7, 2011 with > > subject: Email or SMS alert when probe goes offline/online > > this was prior to me joining the mailing list. > > > > I'd like to voice my support for a user-configurable amount of time for > the > > Atlas system to send out an email notification that your probe is down > (and > > returned to service). > > Indeed, this is on our list -- but see also below. > > > My probe which I run on my home internet connection was apparently down > for > > 3.5 days before I just happened to login to look at the stats. > Considering I > > was at home for much of these 3.5 days, and my Internet connection was > > working, I assume the probe crashed because simply power-cycling it > "fixed" > > the problem. > > > > I know that if I got an email ~15 minutes after the probe was down, my > > probes downtime would probably have been closer to about 30 minutes > rather > > than 3.5 days. > > A little background story: > > We have identified a particular condition on the probes where the probe > refuses to connect back to our infrastructure after a disconnect (which can > be caused by a network hickup, anywhere between the probe and our > infrastructure, for example). This particular issue happens in low memory > situations. The probe still does measurements happily, it just cannot > connect to us and send the results in. > > After a while, the storage on the probe fills up, so as a best effort the > probe reboots -- which fixes the low memory situation and then everything > is > back to normal again. The punch line: the probe's local storage, as with > the > current configuration, fills up in about 3.5 days... > > We're rolling out a new firmware (4.280) to address this. So, unless there > are other similar conditions, after upgrading you will not see 3.5 day > downtimes. Fingers crossed :-) > > Regards, > Robert > > > Thanks. > > > > -Greg > > -------------- next part -------------- An HTML attachment was scrubbed... URL: </ripe/mail/archives/ripe-atlas/attachments/20111214/c88e51ef/attachment.html>
- Previous message (by thread): [atlas] Email or SMS alert when probe goes offline/online
- Next message (by thread): [atlas] Email or SMS alert when probe goes offline/online
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]