Select Page

My previous homebrew backup system had a number of drawbacks, one of the biggest was that its daily emails were massive, listing all the files that was backed up.
With a lot of machines being backed up these mails can come to several MB per day but also general Human Nature means I just didn’t pay them enough attention. For instance, I would need to somehow notice if on a given day the tar died half way through by manual inspection, this was pretty useless.
Bacula provides good one-page job status emails on a daily basis but still I tend to not look at them as I will get about 20 of them a day, the ideal situation is to have it only mail you on errors and it does support this. There is one problem with this though, if anything prevents the mail from getting to you, or in-fact if the whole Director process dies and no backups get run at all you just wont know about it.
I’ve written a per-job monitoring solution that uses Bacula’s ability to run a script on the client after a successful backup has been run, it writes a small status file with a timestamp, this I pull into Net-SNMP and query over the network using Nagios.
Now if any of my jobs fail or if the whole backup system collapses Nagios will notify me via my already existing notification systems, email and SMS in my case. I will still get the Error mails from Bacula but I totally do not rely on them, they are merely there for information purposes so I can use them to quickly investigate a error once Nagios has alerted me.
I’ve documented this and put up the short scripts I use to achieve this, you can see this document in my wiki