<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">We also need to be quite clear in our communication on what "probe down" means and that data keeps being collected.<div><br></div><div>Daniel</div><div><br><div><div>On 14.12.2011, at 18:36, Greg B - NANOG wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">Robert,<div>That's great and I do hope the firmware update helps at least some of these situations.</div><div><br></div><div>Looking at my last 25 connections list I also see downtimes of 4 hours, 6 hours, and two times for 1 hour over the last month. I'm pretty sure my internet connection wasn't actually down for these long periods since I have monitoring of it from another location (my office) which doesn't show these outages. So I do hope a feature is added in the near future to allow the probe host to set a threshold for when to notify of probe down in minutes instead of the default of 5 days.</div>
<div><br></div><div>-Greg<br><br><div class="gmail_quote">On Wed, Dec 14, 2011 at 9:05 AM, Robert Kisteleki <span dir="ltr"><<a href="mailto:robert@ripe.net">robert@ripe.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi,<br>
<div class="im"><br>
On 2011.12.14. 5:37, Greg B - NANOG wrote:<br>
> Hi,<br>
> I see there was a thread started back on September 7, 2011 with<br>
> subject: Email or SMS alert when probe goes offline/online<br>
> this was prior to me joining the mailing list.<br>
><br>
> I'd like to voice my support for a user-configurable amount of time for the<br>
> Atlas system to send out an email notification that your probe is down (and<br>
> returned to service).<br>
<br>
</div>Indeed, this is on our list -- but see also below.<br>
<div class="im"><br>
> My probe which I run on my home internet connection was apparently down for<br>
> 3.5 days before I just happened to login to look at the stats. Considering I<br>
> was at home for much of these 3.5 days, and my Internet connection was<br>
> working, I assume the probe crashed because simply power-cycling it "fixed"<br>
> the problem.<br>
><br>
> I know that if I got an email ~15 minutes after the probe was down, my<br>
> probes downtime would probably have been closer to about 30 minutes rather<br>
> than 3.5 days.<br>
<br>
</div>A little background story:<br>
<br>
We have identified a particular condition on the probes where the probe<br>
refuses to connect back to our infrastructure after a disconnect (which can<br>
be caused by a network hickup, anywhere between the probe and our<br>
infrastructure, for example). This particular issue happens in low memory<br>
situations. The probe still does measurements happily, it just cannot<br>
connect to us and send the results in.<br>
<br>
After a while, the storage on the probe fills up, so as a best effort the<br>
probe reboots -- which fixes the low memory situation and then everything is<br>
back to normal again. The punch line: the probe's local storage, as with the<br>
current configuration, fills up in about 3.5 days...<br>
<br>
We're rolling out a new firmware (4.280) to address this. So, unless there<br>
are other similar conditions, after upgrading you will not see 3.5 day<br>
downtimes. Fingers crossed :-)<br>
<br>
Regards,<br>
Robert<br>
<br>
> Thanks.<br>
><br>
> -Greg<br>
<br>
</blockquote></div><br></div>
</blockquote></div><br></div></body></html>