Interesting. Our contribution to monitoring DNS is to make sure our HTTP uptime checker runs over both ipv4 AND ipv6 as separate checks. That way if we mess up our AAAA record but not our A record (or vice versa) we still get a fail. I've looked at some commercial SAAS uptime checkers and no-one seems to offer this as a feature - have I missed one that does?
The way to do this in Prometheus is to use the following settings on the blackbox module:
So you need 2 modules, one for each ip version. As for automating setting these up, we deploy our Prometheus server with salt so we can use Jinja templating in all our Prometheus config files. That really cuts down on repeating boiler plate code.
This is also interesting for other reasons; in host downtime situations you can sometime see they will drop one type of traffic and not the other.
checkys.net does this, but it's not an uptime monitoring system, it is a connectivity checking system. You can specify ip6 or ip4 or just ip if you don't care. Some people have both A and AAAA on the same name, while others do not and most uptime monitoring systems are not technical enough to care or know the difference.
I thought this was about monitoring all of the outgoing DNS queries.
If you're interested in that, CoreDNS has a Prometheus exporter that lets you see how many requests you've made through it. You can therefore use CoreDNS as a "proxy" DNS to log all of your DNS queries.
In my case, I modified the CoreDNS code to also include the request in the exported metrics - with this you can see all of your DNS requests (over time) and see if there was something odd.
> In my case, I modified the CoreDNS code to also include the request in the exported metrics - with this you can see all of your DNS requests (over time) and see if there was something odd.
The use case here was a single machine. Of course cardinality becomes a problem on the long run - but for a couple of days worth of data it worked quite nicely.
On top of that, at least in my case, I don't resolve a huge amount of new FQDNs
I find the standard blackbox_exporter far too limiting and static so I wrote an exporter which queries DNS zones from the Google API and creates targets dynamically from that.
It also has a feature which will query internal databases to find expected targets (kinda like service discovery). This covers more specific checks than what the DNS-based targets will provide.
These together mean that essentially no endpoint in our infrastructure is missing from being monitored in some fashion.
The exporter performs SSL checks (lifetime remaining etc) as well as providing HTTP/TCP latency metrics.
Telegraf can expose Prometheus-scrapable metrics and has a DNS plugin that can monitor several targets with a single config stanza - which was a problem with black box exporter according to TFA. Maybe it could be an alternative?
https://github.com/influxdata/telegraf/blob/release-1.29/plu...
To be clear, this is the blog of one person at the University of Toronto, representing only himself, who has been working on Unix and Linux systems for, oh, probably 40 years at this point.
It is different to monitor how/if your DNSs are running than if the world is resolving your domain/server names, they return v4/v6 records or even if what they are resolving is in the right IP blocks. It is up to what you really need to know.
"the world" is a bit too wide, using some particular ones like in the example may trigger false alarms if they can't be reached or they have some internal problem. But for widely enough used ones that may be something to be aware of anyway.
The way to do this in Prometheus is to use the following settings on the blackbox module:
https://github.com/prometheus/blackbox_exporter/blob/master/...So you need 2 modules, one for each ip version. As for automating setting these up, we deploy our Prometheus server with salt so we can use Jinja templating in all our Prometheus config files. That really cuts down on repeating boiler plate code.
This is also interesting for other reasons; in host downtime situations you can sometime see they will drop one type of traffic and not the other.