{"id":1532,"date":"2010-07-03T16:12:00","date_gmt":"2010-07-03T15:12:00","guid":{"rendered":"http:\/\/www.devco.net\/?p=1532"},"modified":"2010-08-17T12:03:08","modified_gmt":"2010-08-17T11:03:08","slug":"aggregating_nagios_checks_with_mcollective","status":"publish","type":"post","link":"https:\/\/www.devco.net\/archives\/2010\/07\/03\/aggregating_nagios_checks_with_mcollective.php","title":{"rendered":"Aggregating Nagios Checks With MCollective"},"content":{"rendered":"

A very typical scenario I come across on many sites is the requirement to monitor something like Puppet across 100s or 1000s of machines.<\/p>\n

The typical approaches are to add perhaps a central check on your puppet master or to check using NRPE or NSCA on every node. For this example the option exist to easily check on the master and get one check but that isn’t always easily achievable. <\/p>\n

Think for example about monitoring mail queues on all your machines to make sure things like root mail isn’t getting stuck. In those cases you are forced to do per node checks which inevitably result in huge notification storms in the event that your mail server was down and not receiving the mail from the many nodes.<\/p>\n

MCollective<\/a> has had a plugin that can run NRPE commands<\/a> for a long time, I’ve now added a nagios plugin using this agent to combine results from many hosts.<\/p>\n

Sticking with the Puppet example, here are my needs:<\/p>\n