To celebrate, they just sent this marketing email to all of our "serviceX-alert@" addresses:
Subject: We are joining the SolarWinds family
Body: Pingdom is moving forward quickly. We can’t wait to show you all the ideas we have for taking monitoring to the next level.
But, wait, our "serviceX-alert@" emails are all hooked up to PagerDuty so that any -alert email == SMS/call the engineer on duty.
So, right, basically all of our PagerDuty alerts are going off right now, due to a damn marketing email.
They've done this crap of sending marketing emails to alert addresses instead of just user addresses before (I know they are separate in the system; our serviceX-alert emails are not on the "Users" page), but we figured it was a one-time fluke and surely they would realize their mistake.
Guess not.
This just builds on my already huge frustration with their UI--other than just being generally confusing, if you have failed pings, you don't get the HTTP logs ("Root Cause analysis") for all of them, you only get the HTTP logs for the magical 1st one from when the incident triggered.
Oh, and if you make a duplicate alarm, for the sole purpose of seeing the latest HTTP logs from your server, surprise, the 1st failed log won't have them--you first have to make your new alert pass, and then fail, and then now you'll be granted access to the magical Root Cause analysis logs.
Well, your comment about the 'root cause analysis' suggests you need something more substantial than this, but I've been very pleasantly surprised with www.uptimerobot.com.
I have both Pingdom and Uptime Robot doing basic monitoring of a single web endpoint and Uptime Robot consistently alerts me to downtime by email faster than Pingdom.
Because Uptime Robot is free and doesn't have (lacks?) a business model I didn't take them that seriously. Their performance suggests otherwise.
If you need something more robust and flexible, give www.panopta.com a try. We've been using them for a while and are pretty happy with their support, accuracy and flexibility.
So, right, basically all of our PagerDuty alerts are going off right now, due to a damn marketing email.
They've done this crap of sending marketing emails to alert addresses instead of just user addresses before (I know they are separate in the system; our serviceX-alert emails are not on the "Users" page), but we figured it was a one-time fluke and surely they would realize their mistake.
Guess not.
This just builds on my already huge frustration with their UI--other than just being generally confusing, if you have failed pings, you don't get the HTTP logs ("Root Cause analysis") for all of them, you only get the HTTP logs for the magical 1st one from when the incident triggered.
Oh, and if you make a duplicate alarm, for the sole purpose of seeing the latest HTTP logs from your server, surprise, the 1st failed log won't have them--you first have to make your new alert pass, and then fail, and then now you'll be granted access to the magical Root Cause analysis logs.
Suggestions for competitors?