R.I.Pienaar

Graphing on the CLI

01/20/2013

I’ve recently been thinking about ways to do graphs on the CLI. We’ve written a new Puppet Agent for MCollective that can gather all sorts of interesting data from your server estate and I’d really like to be able to show this data on the CLI. This post isn’t really about MCollective though the ideas applies to any data.

I already have sparklines in MCollective, here’s the distribution of ping times:

This shows you that most of the nodes responded quickly with a bit of a tail at the end being my machines in the US.

Sparklines are quite nice for a quick overview so I looked at adding some more of this to the UI and came up with this:

Which is quite nice – these are the nodes in my infrastructure stuck into buckets and the node counts for each bucket is shown. We can immediately tell something is not quite right – the config retrieval time shows a bunch of slow machines and the slowness does not correspond to resource counts etc. On investigation I found these are my dev machines – KVM nodes hosted on HP Micro Servers so that’s to be expected.

I am not particularly happy with these graphs though so am still exploring other options, one other option is GNU Plot.

GNU Plot can target its graphs for different terminals like PNG and also line printers – since the Unix terminal is essentially a line printer we can use this.

Here are 2 graphs of config retrieval time produced by MCollective using the same data source that produced the spark line above – though obviously from a different time period. Note that the axis titles and graph title is supplied automatically using the MCollective DDL:

$ mco plot resource config_retrieval_time
 
                   Information about Puppet managed resources
  Nodes
    6 ++-*****----+----------+-----------+----------+----------+----------++
      +      *    +          +           +          +          +           +
      |       *                                                            |
    5 ++      *                                                           ++
      |       *                                                            |
      |        *                                                           |
    4 ++       *      *                                                   ++
      |        *      *                                                    |
      |         *    * *                                                   |
    3 ++        *    * *                                                  ++
      |          *  *  *                                                   |
      |           * *   *                                                  |
    2 ++           *    *                         *        *              ++
      |                 *                         **       **              |
      |                  *                       * *      *  *             |
    1 ++                 *               *       *  *     *   **        * ++
      |                  *              * *     *   *     *     **    **   |
      +           +       *  +         * + *    *   +*   *     +     *     +
    0 ++----------+-------*************--+--****----+*****-----+--***-----++
      0           10         20          30         40         50          60
                              Config Retrieval Time

So this is pretty serviceable for showing this data on the console! It wouldn’t scale to many lines but for just visualizing some arbitrary series of numbers it’s quite nice. Here’s the GNU Plot script that made the text graph:

set title "Information about Puppet managed resources"
set terminal dumb 78 24
set key off
set ylabel "Nodes"
set xlabel "Config Retrieval Time"
plot '-' with lines
3 6
6 6
9 3
11 2
14 4
17 0
20 0
22 0
25 0
28 0
30 1
33 0
36 038 2
41 0
44 0
46 2
49 1
52 0
54 0
57 1

The magic here comes from the second line that sets the output terminal to dump and supplies some dimensions. Very handy, worth exploring some more and adding to your toolset for the CLI. I’ll look at writing a gem or something that supports both these modes.

There are a few other players in this space, I definitely recall coming across a Python tool to do graphs but cannot find it now, shout out in the comments if you know other approaches and I’ll add them to the post!

Updated: some links to related projects: sparkler, Graphite Spark

Solving monitoring state storage problems using Redis

01/06/2013

Redis is an in-memory key-value data store that provides a small number of primitives suitable to the task of building monitoring systems. As a lot of us are hacking in this space I thought I’d write a blog post summarizing where I’ve been using it in a little Sensu like monitoring system I have been working on on and off.

There’s some monitoring related events coming up like MonitoringLove in Antwerp and Monitorama in Boston – I will be attending both and I hope a few members in the community will create similar background posts on various interesting areas before these events.

I’ve only recently started looking at Redis but really like it. It’s a very light weight daemon written in C with fantastic documentation detailing things like each commands performance characteristics and most documantation pages are live in that they have a REPL right on the page like the SET page – note you can type into the code sample and see your changes in real time. It is sponsored by VMWare and released under the 3 clause BSD license.

Redis Data Types


Redis provides a few common data structures:

  • Normal key-value storage where every key has just one string value
  • Hashes where every key contains a hash of key-values strings
  • Lists of strings – basically just plain old Arrays sorted in insertion order that allows duplicate values
  • Sets are a bit like Lists but with the addition that a given value can only appear in a list once
  • Sorted Sets are sets that in addition to the value also have a weight associated with it, the set is indexed by weight

All the keys support things like expiry based on time and TTL calculation. Additionally it also supports PubSub.

At first it can be hard to imagine how you’d use a data store with only these few data types and capable of only storing strings for monitoring but with a bit of creativity it can be really very useful.

The full reference about all the types can be found in the Redis Docs: Data Types

Monitoring Needs


Monitoring systems generally need a number of different types of storage. These are configuration, event archiving and status and alert tracking. There are more but these are the big ticket items, of the 3 I am only going to focus on the last one – Status and Alert Tracking here.

Status tracking is essentially transient data. If you loose your status view it’s not really a big deal it will be recreated quite quickly as new check results come in. Worst case you’ll get some alerts again that you recently got. This fits well with Redis that doesn’t always commit data soon as it receives it – it flushes roughly every second from memory to disk.

Redis does not provide much by way of SSL or strong authentication so I tend to consider it a single node IPC system rather than say a generic PubSub system. I feed data into a node using system like ActiveMQ and then for comms and state tracking on a single node I’ll use Redis.

I’ll show how it can be used to solve the following monitoring related storage/messaging problems:

  • Check Status – a check like load on every node
  • Staleness Tracking – you need to know when a node is not receiving check results so you can do alive checks
  • Event Notification – your core monitoring system will likely feed into alerters like Opsgenie and metric storage like Graphite
  • Alert Tracking – you need to know when you last sent an alert and when you can alert again based on an interval like every 2 hours

Check Status


The check is generally the main item of monitoring systems. Something configures a check like load and then every node gets check results for this item, the monitoring system has to track the status of the checks on a per node basis.

In my example a check result looks more or less like this:

{"lastcheck"        => "1357490521", 
 "count"            => "1143", 
 "exitcode"         => "0", 
 "output"           => "OK - load average: 0.23, 0.10, 0.02", 
 "last_state_change"=> "1357412507",
 "perfdata"         => '{"load15":0.02,"load5":0.1,"load1":0.23}',
 "check"            => "load",
 "host"             => "dev2.devco.net"}

This is standard stuff and the most boring part – you might guess this goes into a Hash and you’ll be right. Note the count item there Redis has special handling for counters and I’ll show that in a minute.

By convention Redis keys are name spaced by a : so I’d store the check status for a specific node + check combination in a key like status:example.net:load

Updating or creating a new hash is real easy – just write to it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def save_check(check)
  key = "status:%s:%s" % [check.host, check.check]
 
  check.last_state_change = @redis.hget(key, "last_state_change")
  check.previous_exitcode = @redis.hget(key, "exitcode")
 
  @redis.multi do
    @redis.hset(key, "host", check.host)
    @redis.hset(key, "check", check.check)
    @redis.hset(key, "exitcode", check.exitcode)
    @redis.hset(key, "lastcheck", check.last_check)
    @redis.hset(key, "last_state_change", check.last_state_change)
    @redis.hset(key, "output", check.output)
    @redis.hset(key, "perfdata", check.perfdata)
 
    unless check.changed_state?
      @redis.hincrby(key, "count", 1)
    else
      @redis.hset(key, "count", 1)
    end
  end
 
  check.count = @redis.hget(key, "count")
end

Here I assume we have a object that represents a check result called check and we’re more or less just fetching/updating data in it. I first retrieve the previously saved state of exitcode and last state change time and save those into the object. The object will do some internal state management to determine if the current check result represents a changed state – OK to WARNING etc – based on this information.

The @redis.multi starts a transaction, everything inside the block will be written in an atomic way by the Redis server thus ensuring we do not have any half-baked state while other parts of the system might be reading the status of this check.

As I said the check determines if the current result is a state change when I set the previous exitcode on line 5 this means lines 16-20 will either set the count to 1 if it’s a change or just increment the count if not. We use the internal Redis counter handling on line 17 to avoid having to first fetch the count and then update it and saving it, this saves a round trip to the database.

You can now just retrieve the whole hash with the HGETALL command, even on the command line:

% redis-cli hgetall status:dev2.devco.net:load
 1) "check"
 2) "load"
 3) "host"
 4) "dev2.devco.net"
 5) "output"
 6) "OK - load average: 0.00, 0.00, 0.00"
 7) "lastcheck"
 8) "1357494721"
 9) "exitcode"
10) "0"
11) "perfdata"
12) "{\"load15\":0.0,\"load5\":0.0,\"load1\":0.0}"
13) "last_state_change"
14) "1357412507"
15) "count"
16) "1178"

References: Redis Hashes, MULTI, HSET, HINCRBY, HGET, HGETALL

Staleness Tracking


Staleness Tracking here means we want to know when last we saw any data about a node, if the node is not providing information we need to go and see what happened to it. Maybe it’s up but the data sender died or maybe it’s crashed.

This is where we really start using some of the Redis features to save us time. We need to track when last we saw a specific node and then we have to be able to quickly find all nodes not seen within certain amount of time like 120 seconds.

We could retrieve all the check results and check their last updated time and so figure it out but that’s not optimal.

This is what Sorted Lists are for. Remember Sorted Lists have a weight and orders the list by the weight, if we use the timestamp that we last received data at for a host as the weight it means we can very quickly fetch a list of stale hosts.

1
2
3
def update_host_last_seen(host, time)
  @redis.zadd("host:last_seen", time, host)
end

When we call this code like update_host_last_seen(“dev2.devco.net”, Time.now.utc.to_i) the host will either be added to or updated in the Sorted List based on the current UTC time. We do this every time we save a new result set with the code in the previous section.

To get a list of hosts that we have not seen in the last 120 seconds is really easy now:

1
2
3
def get_stale_hosts(age)
  @redis.zrangebyscore("host:last_seen", 0, (Time.now.utc.to_i - age))
end

If we call this with an age like 120 we’ll get an array of nodes that have not had any data within the last 120 seconds.

You can do the same check on the CLI, this shows all the machines not seen in the last 60 seconds:

% redis-cli zrangebyscore host:last_seen 0 $(expr $(date +%s) - 60)
 1) "dev1.devco.net"

Reference: Sorted Sets, ZADD, ZRANGEBYSCORE

Event Notification


When a check result enters the system thats either a state change, a problem or have metrics associated it we’d want to send those on to other pieces of code.

We don’t know or care who those interested parties are we only care that there might be some interested parties – it might be something writing to Graphite or OpenTSDB or both at the same time or something alerting to Opsgenie or Pager Duty. This is a classic use case for PubSub and Redis has a good PubSub subsystem that we’ll use for this.

I am only going to show the metrics publishing – problem and state changes are very similar:

1
2
3
4
5
6
7
8
9
10
11
def publish_metrics(check)
  if check.has_perfdata?
    msg = {"metrics" => check.perfdata, "type" => "metrics", "time" => check.last_check, "host" => check.host, "check" => check.check}.to_json
    publish(["metrics", check.host, check.check], msg)
  end
end
 
def publish(type, message)
  target = ["overwatch", Array(type).join(":")].join(":")
  @redis.publish(target, message)
end

This is pretty simple stuff, we’re just publishing some JSON to a named destination like overwatch:metrics:dev1.devco.net:load. We can now write small standalone single function tools that consume this stream of metrics and send it wherever we like – like Graphite or OpenTSDB.

We publish similar events for any incoming check result that is not OK and also for any state transition like CRITICAL to OK, these would be consumed by alerter handlers that might feed pagers or SMS.

We’re publishing these alerts to to destinations that include the host and specific check – this way we can very easily create individual host views of activity by doing pattern based subscribes.

Reference: PubSub, PUBLISH

Alert Tracking


Alert Tracking means keeping track of which alerts we’ve already sent and when we’ll need to send them again like only after 2 hours of the same problem and not on every check result which might come in every minute.

Leading on from the previous section we’d just consume the problem and state change PubSub channels and react on messages from those:

A possible consumer of this might look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
@redis.psubscribe("overwatch:state_change:*", "overwatch:issues:*") do |on|
  on.pmessage do |channel, message|
    event = JSON.parse(message)
 
    case event["type"]
      when "issue"
        sender.notify_issue(event["issue"]["exitcode"], event["host"], event["check"], event["issue"]["output"])
      when "state_change"
        if event["state_change"]["exitcode"] == 0
          sender.notify_recovery(event["host"], event["check"], event["state_change"]["output"])
        end
    end
  end
end

This subscribes to the 2 channels and pass the incoming events to a notifier. Note we’re using the patterns here to catch all alerts and changes for all hosts.

The problem here is that without any special handling this is going to fire off alerts every minute assuming we check the load every minute. This is where Redis expiry of keys come in.

We’ll need to track which messages we have sent when and on any state change clear the tracking thus restarting the counters.

So we’ll just add keys called “alert:dev2.devco.net:load:3″ to indicate an UNKNOWN state alert for load on dev2.devco.net:

1
2
3
4
5
def record_alert(host, check, status, expire=7200)
  key = "alert:%s:%s:%d" % [host, check, status]
  @redis.set(key, 1)
  @redis.expire(key, expire)
end

This takes an expire time which defaults to 2 hours and tells redis to just remove the key when its time is up.

With this we need a way to figure out if we can send again:

1
2
3
4
def alert_ttl(host, check, status)
  key = "alert:%s:%s:%d" % [host, check, status]
  @redis.ttl(key)
end

This will return the amount of seconds till next alert and -1 if we are ready to send again

And finally on every state change we need to just purge all the tracking for a given node + check combo. The reason for this is if we notified on CRITICAL a minute ago then the service recovers to OK but soon goes to CRITICAL again this most recent CRITICAL alert will be suppressed as part of the previous cycle of alerts.

1
2
3
def clear_alert_ttls(host, check)
  @redis.del(@redis.keys.grep(/^alert:#{host}:#{check}:\d/))
end

So now I can show the two methods that will actually publish the alerts:

The first notifies of issues but only every @interval seconds and it uses the alert_ttl helper above to determine if it should or shouldn’t send:

1
2
3
4
5
6
7
8
9
10
11
12
def notify_issue(exitcode, host, check, output)
  if (ttl = @storage.alert_ttl(host, check, exitcode)) == -1
    subject = "%s %s#%s" % [status_for_code(exitcode), host, check]
    message = "%s: %s" % [subject, output]
 
    send(message, subject, @recipients)
 
    @redis.record_alert(host, check, exitcode, @alert_interval)
  else
    Log.info("Not alerting %s#%s due to interval restrictions, next alert in %d seconds" % [host, check, ttl])
  end
end

The second will publish recovery notices and we’d always want those and they will not repeat, here we clear all the previous alert tracking to avoid incorrect alert surpressions:

1
2
3
4
5
6
7
8
def notify_recovery(host, check, output)
  subject = "RECOVERY %s#%s" % [host, check]
  message = "%s: %s" % [subject, output]
 
  send_alert(message, subject, @recipients)
 
  @redis.clear_alert_ttls(host, check)
end

References: SET, EXPIRE, SUBSCRIBE, TTL, DEL

Conclusion


This covered a few Redis basics but it’s a very rich system that can be used in many areas so if you are interested spend some quality time with its docs.

Using its facilities saved me a ton of effort while working on a small monitoring system. It is fast and light weight and enable cross language collaboration that I’d have found hard to replicate in a performant manner without it.

Scaling Nagios NRPE checks

01/01/2013

Most Nagios systems does a lot of forking especially those built around something like NRPE where each check is a connection to be made to a remote system. On one hand I like NRPE in that it puts the check logic on the nodes using a standard plugin format and provides a fairly re-usable configuration file but on the other hand the fact that the Nagios machine has to do all this forking has never been good for me.

In the past I’ve shown one way to scale checks by aggregate all results for a specific check into one result but this is not always a good fit as pointed out in the post. I’ve now built a system that use the same underlying MCollective infrastructure as in the previous post but without the aggregation.

I have a pair of Nagios nodes – one in the UK and one in France – and they are on quite low spec VMs doing around 400 checks each. The problems I have are:

  • The machines are constantly loaded under all the forking, one would sit on 1.5 Load Average almost all the time
  • They use a lot of RAM and it’s quite spikey, if something is wrong especially I’d have a lot of checks concurrently so the machines have to be bigger than I want them
  • The check frequency is quite low in the usual Nagios manner, sometimes 10 minutes can go by without a check
  • The check results do not represent a point in time, I have no idea how the check results of node1 relate to those on node2 as they can be taken anywhere in the last 10 minutes

These are standard Nagios complaints though and there are many more but these ones specifically is what I wanted to address right now with the system I am showing here.

Probably not a surprise but the solution is built on MCollective, it uses the existing MCollective NRPE agent and the existing queueing infrastructure to push the forking to each individual node – they would do this anyway for every NRPE check – and read the results off a queue and spool it into the Nagios command file as Passive results. Internally it splits the traditional MCollective request-response system into a async processing system using the technique I blogged about before.

As you can see the system is made up of a few components:

  • The Scheduler takes care of publishing requests for checks
  • MCollective and the middleware provides AAA and transport
  • The nodes all run the MCollective NRPE agent which put their replies on the Queue
  • The Receiver reads the results from the Queue and write them to the Nagios command file

The Scheduler

The scheduler daemon is written using the excellent Rufus Scheduler gem – if you do not know it you totally should check it out, it solves many many problems. Rufus allows me to create simple checks on intervals like 60s and I combine these checks with MCollective filters to create a simple check configuration as below:

nrpe 'check_bacula_main', '6h', 'bacula::node monitored_by=monitor1'
nrpe 'check_disks', '60s', 'monitored_by=monitor1'
nrpe 'check_greylistd', '60s', 'greylistd monitored_by=monitor1'
nrpe 'check_load', '60s', 'monitored_by=monitor1'
nrpe 'check_mailq', '60s', 'monitored_by=monitor1'
nrpe 'check_mongodb', '60s', 'mongodb monitored_by=monitor1'
nrpe 'check_mysql', '60s', 'mysql::server monitored_by=monitor1'
nrpe 'check_pki', '60m', 'monitored_by=monitor1'
nrpe 'check_swap', '60s', 'monitored_by=monitor1'
nrpe 'check_totalprocs', '60s', 'monitored_by=monitor1'
nrpe 'check_zombieprocs', '60s', 'monitored_by=monitor1'

Taking the first line it says: Run the check_bacula_main NRPE check every 6 hours on machines with the bacula::node Puppet Class and with the fact monitored_by=monitor1. I had the monitored_by fact already to assist in building my Nagios configs using a simple search based approach in Puppet.

When the scheduler starts it will log:

W, [2012-12-31T22:10:12.186789 #32043]  WARN -- : activemq.rb:96:in `on_connecting' TCP Connection attempt 0 to stomp://nagios@stomp.example.net:6163
W, [2012-12-31T22:10:12.193405 #32043]  WARN -- : activemq.rb:101:in `on_connected' Conncted to stomp://nagios@stomp.example.net:6163
I, [2012-12-31T22:10:12.196387 #32043]  INFO -- : scheduler.rb:23:in `nrpe' Adding a job for check_bacula_main every 6h matching 'bacula::node monitored_by=monitor1', first in 19709s
I, [2012-12-31T22:10:12.196632 #32043]  INFO -- : scheduler.rb:23:in `nrpe' Adding a job for check_disks every 60s matching 'monitored_by=monitor1', first in 57s
I, [2012-12-31T22:10:12.197173 #32043]  INFO -- : scheduler.rb:23:in `nrpe' Adding a job for check_load every 60s matching 'monitored_by=monitor1', first in 23s
I, [2012-12-31T22:10:35.326301 #32043]  INFO -- : scheduler.rb:26:in `nrpe' Publishing request for check_load with filter 'monitored_by=monitor1'

You can see it reads the file and schedule the first check a random interval between now and the interval window this spread out the checks.

The Receiver


The receiver has almost no config, it just need to know what queue to read and where your Nagios command file lives, it logs:

I, [2013-01-01T11:49:38.295661 #23628]  INFO -- : mnrpes.rb:35:in `daemonize' Starting in the background
W, [2013-01-01T11:49:38.302045 #23631]  WARN -- : activemq.rb:96:in `on_connecting' TCP Connection attempt 0 to stomp://nagios@stomp.example.net:6163
W, [2013-01-01T11:49:38.310853 #23631]  WARN -- : activemq.rb:101:in `on_connected' Conncted to stomp://nagios@stomp.example.net:6163
I, [2013-01-01T11:49:38.310980 #23631]  INFO -- : receiver.rb:16:in `subscribe' Subscribing to /queue/mcollective.nagios_passive_results_monitor1
I, [2013-01-01T11:49:41.572362 #23631]  INFO -- : receiver.rb:34:in `receive_and_submit' Submitting passive data to nagios: [1357040981] PROCESS_SERVICE_CHECK_RESULT;node1.example.net;mongodb;0;OK: connected, databases admin local my_db puppet mcollective
I, [2013-01-01T11:49:42.509061 #23631]  INFO -- : receiver.rb:34:in `receive_and_submit' Submitting passive data to nagios: [1357040982] PROCESS_SERVICE_CHECK_RESULT;node2.example.net;zombieprocs;0;PROCS OK: 0 processes with STATE = Z
I, [2013-01-01T11:49:42.510574 #23631]  INFO -- : receiver.rb:34:in `receive_and_submit' Submitting passive data to nagios: [1357040982] PROCESS_SERVICE_CHECK_RESULT;node3.example.net;zombieprocs;0;PROCS OK: 1 process with STATE = Z

As the results get pushed to Nagios I see the following in its logs:

[1357042122] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;node1.example.net;zombieprocs;0;PROCS OK: 0 processes with STATE = Z
[1357042124] PASSIVE SERVICE CHECK: node1.example.net;zombieprocs;0;PROCS OK: 0 processes with STATE = Z

Did it solve my problems?


I listed the set of problems I wanted to solve so it’s worth evaluating if I did solve them properly.

Less load and RAM use on the Nagios nodes

My Nagios nodes have gone from load averages of 1.5 to 0.1 or 0.0, they are doing nothing, they use a lot less RAM and I have removed some of the RAM from the one and given it to my Jenkins VM instead, it was a huge win. The sender and receiver is quite light on resources as you can see below:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
nagios    9757  0.4  1.8 130132 36060 ?        S     2012   3:41 ruby /usr/bin/mnrpes-receiver --pid=/var/run/mnrpes/mnrpes-receiver.pid --config=/etc/mnrpes/mnrpes-receiver.cfg
nagios    9902  0.3  1.4 120056 27612 ?        Sl    2012   2:22 ruby /usr/bin/mnrpes-scheduler --pid=/var/run/mnrpes/mnrpes-scheduler.pid --config=/etc/mnrpes/mnrpes-scheduler.cfg

On the RAM side I now never get a pile up of many checks. I do have the stale detection enabled on my Nagios template so if something breaks in the scheduler/receiver/broker triplet Nagios will still try to do a traditional check to see what’s going on but that’s bearable.

Check frequency too low

With this system I could do my checks every 10 seconds without any problems, I settled on 60 seconds as that’s perfect for me. Rufus scheduler does a great job of managing that and the requests from the scheduler are effectively fire and forget as long as the broker is up.

Results are spread over 10 minutes

The problem with the results for load on node1 and node2 having no temporal correlation is gone too now, because I use MCollectives parallel nature all the load checks happen at the same time:

Here is the publisher:

I, [2013-01-01T12:00:14.296455 #20661]  INFO -- : scheduler.rb:26:in `nrpe' Publishing request for check_load with filter 'monitored_by=monitor1'

…and the receiver:

I, [2013-01-01T12:00:14.380981 #23631]  INFO -- : receiver.rb:34:in `receive_and_submit' Submitting passive data to nagios: [1357041614] PROCESS_SERVICE_CHECK_RESULT;node1.example.net;load;0;OK - load average: 0.92, 0.54, 0.42|load1=0.920;9.000;10.000;0; load5=0.540;8.000;9.000;0; load15=0.420;7.000;8.000;0; 
I, [2013-01-01T12:00:14.383875 #23631]  INFO -- : receiver.rb:34:in `receive_and_submit' Submitting passive data to nagios: [1357041614] PROCESS_SERVICE_CHECK_RESULT;node2.example.net;load;0;OK - load average: 0.00, 0.00, 0.00|load1=0.000;1.500;2.000;0; load5=0.000;1.500;2.000;0; load15=0.000;1.500;2.000;0; 
I, [2013-01-01T12:00:14.387427 #23631]  INFO -- : receiver.rb:34:in `receive_and_submit' Submitting passive data to nagios: [1357041614] PROCESS_SERVICE_CHECK_RESULT;node3.example.net;load;0;OK - load average: 0.02, 0.07, 0.07|load1=0.020;1.500;2.000;0; load5=0.070;1.500;2.000;0; load15=0.070;1.500;2.000;0; 
I, [2013-01-01T12:00:14.388754 #23631]  INFO -- : receiver.rb:34:in `receive_and_submit' Submitting passive data to nagios: [1357041614] PROCESS_SERVICE_CHECK_RESULT;node4.example.net;load;0;OK - load average: 0.07, 0.02, 0.00|load1=0.070;1.500;2.000;0; load5=0.020;1.500;2.000;0; load15=0.000;1.500;2.000;0; 
I, [2013-01-01T12:00:14.404650 #23631]  INFO -- : receiver.rb:34:in `receive_and_submit' Submitting passive data to nagios: [1357041614] PROCESS_SERVICE_CHECK_RESULT;node5.example.net;load;0;OK - load average: 0.03, 0.09, 0.04|load1=0.030;1.500;2.000;0; load5=0.090;1.500;2.000;0; load15=0.040;1.500;2.000;0; 
I, [2013-01-01T12:00:14.405689 #23631]  INFO -- : receiver.rb:34:in `receive_and_submit' Submitting passive data to nagios: [1357041614] PROCESS_SERVICE_CHECK_RESULT;node6.example.net;load;0;OK - load average: 0.06, 0.06, 0.07|load1=0.060;3.000;4.000;0; load5=0.060;3.000;4.000;0; load15=0.070;3.000;4.000;0; 
I, [2013-01-01T12:00:14.489590 #23631]  INFO -- : receiver.rb:34:in `receive_and_submit' Submitting passive data to nagios: [1357041614] PROCESS_SERVICE_CHECK_RESULT;node7.example.net;load;0;OK - load average: 0.06, 0.14, 0.14|load1=0.060;1.500;2.000;0; load5=0.140;1.500;2.000;0; load15=0.140;1.500;2.000;0;

All the results are from the same second, win.

Conclusion

So my scaling issues on my small site is solved and I think the way this is built will work for many people. The code is on GitHub and requires MCollective 2.2.0 or newer.

Having reused the MCollective and Rufus libraries for all the legwork including logging, daemonizing, broker connectivity, addressing and security I was able to build this in a very short time, the total code base is only 237 lines excluding packaging etc. which is a really low number of lines for what it does.

Simple Puppet Module Structure Redux

12/13/2012

Back in September 2009 I wrote a blog post titled “Simple Puppet Module Structure” which introduced a simple approach to writing Puppet Modules. This post has been hugely popular in the community – but much has changed in Puppet since then so it is time for an updated version of that post.

As before I will show a simple module for a common scenario. Rather than considering this module a blueprint for every module out there you should instead study its design and use it as a starting point when writing your own modules. You can build on it and adapt it but the basic approach should translate well to more complex modules.

I should note that while I work for Puppet Labs I do not know if this reflect any kind of standard suggested approach by Puppet Labs – this is what I do when managing my own machines and no more.

The most important deliverables


When writing a module I have a few things I keep in mind – these are all centered around down stream users of my module and future-me trying to figure out what is going on:

  • A module should have a single entry point where someone reviewing it can get an overview of it’s behavior
  • Modules that have configuration should be configurable in a single way and single place
  • Modules should be made up of several single-responsibility classes. As far as possible these classes should be private details hidden from the user
  • For the common use cases, users should not need to know individual resource names
  • For the most common use case, users should not need to provide any parameters, defaults should be used
  • Modules I write should have a consistant design and behaviour

The module layout I will present below is designed so that someone who is curious about the behaviour of the module only have to look in the init.pp to see:

  • All the parameters and their defaults used to configure the behaviour of the module
  • Overview of the internal structure of the module by way of descriptive class names
  • Relationships and notifications that exist inside the module and what classes they can notify

This design will never remove the need for documenting your modules but a clear design will guide your users in discovering the internals of your module and how they interact with it.

More important than what a module does is how accessible it is to you and others, how easy is it to understand, debug and extend.

Thinking about your module


For this post I will write a very simple module to manage NTP – it really is very simple, you should check the Forge for more complete ones.

To go from nowhere to having NTP on your machine you would have to do:

  • Install the packages and any dependencies
  • Write out appropriate configuration files with some environment specific values
  • Start the service or services you need once the configuration files are written. Restart it if the config file change later.

There is a clear implied dependency chain here and this basic pattern applies to most pieces of software.

These 3 points basically translate to distinct groups of actions and sticking with the above principal of single function classes I will create a class for each group.

To keep things clear and obvious I will call these class install, config and service. The names don’t matter as long as they are descriptive – but you really should pick something and stick with it in all your modules.

Writing the module


I’ll show the 3 classes that does the heavy lifting here and discuss parts of them afterwards:

class ntp::install {
   package{'ntpd':
      ensure => $ntp::version
   }
}
 
class ntp::config {
   $ntpservers = $ntp::ntpservers
 
   File{
      owner   => root,
      group   => root,
      mode    => 644,
   }
 
   file{'/etc/ntp.conf':
         content => template('ntp/ntp.conf.erb');
 
        '/etc/ntp/step-tickers':
         content => template('ntp/step-tickers.erb');
    }
}
 
class ntp::service {
   $ensure = $ntp::start ? {true => running, default => stopped}
 
   service{"ntp":
      ensure  => $ensure,
      enable  => $ntp::enable,
   }
}

Here I have 3 classes that serve a single purpose each and do not have any details like relationships, ordering or notifications in them. They roughly just do the one thing they are supposed to do.

Take a look at each class and you will see they use variables like $ntp::version, $ntp::ntpservers etc. These are variables from the the main ntp class, lets take a quick look at that class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# == Class: ntp
#
# A basic module to manage NTP
#
# === Parameters
# [*version*]
#   The package version to install
#
# [*ntpservers*]
#   An array of NTP servers to use on this node
#
# [*enable*]
#   Should the service be enabled during boot time?
#
# [*start*]
#   Should the service be started by Puppet
class ntp(
   $version = "present",
   $ntpservers = ["1.pool.ntp.org", "2.pool.ntp.org"],
   $enable = true,
   $start = true
) {
   class{'ntp::install': } ->
   class{'ntp::config': } ~>
   class{'ntp::service': } ->
   Class["ntp"]
}

This is the main entry point into the module that was mentioned earlier. All the variables the module use is documented in a single place, the basic design and parts of the module is clear and you can see that the service class can be notified and the relationships between the parts.

I use the new chaining features to inject the dependencies and relationships here which surfaces these important interactions between the various classes back up to the main entry class for users to see easily.

All this information is immediately available in the obvious place without looking at any additional files or by being bogged down with implementation details.

Line 26 here requires some extra explanation – This ensures that all the NTP member classes are applied before this main NTP class so that cases where someone say require => Class["ntp"] elsewhere they can be sure the associated tasks are completed. This is a light weight version of the Anchor Pattern.

Using the module

Let’s look at how you might use this module from knowing nothing.

Ideally simply including the main entry point on a node should be enough:

include ntp

This does what you’d generally expect – installs, configures and starts the NTP service.

After looking at the init.pp you can now supply some new values for some of the parameters to tune it for your needs:

class{"ntp": ntpservers => ["ntp1.example.com", "ntp2.example.com"]}

Or you can use the new data bindings in Puppet 3 and supply new data in Hiera to override these variables by supplying data for the keys like ntp::ntpservers.

Finally if for some or other related reason you need to restart the service you know from looking at the ntp class that you can notify the ntp::service class to achieve that.

Using classes for relationships

There’s a huge thing to note here in the main ntp class. I specify all relationships and notifies on the classes and not the resources themselves.

As personal style I only mention resources by name inside a class that contains that resource – if I ever have to access a resource outside of the class that it is contained in I access the class.

I would not write:

class ntp::service {
   service{"ntp": require => File["/etc/ntp.conf"]}
}

These are many issues with this approach that mostly come down to maintenance headaches. Here I require the ntp config file but what if a service have more than one file? Do you then list all the files? Do you later edit every class that reference these when another file gets managed?

These issues quickly multiply in a large code base. By always acting on class names and by creating many small single purpose classes as here I effectively contain these by grouping names and not individual resource names. This way any future refactoring of individual classes would not have an impact on other classes.

So the above snippet would rather be something like this:

class ntp::service {
   service{"ntp": require => Class["ntp::config"]}
}

Here I require the containing class and not the resource. This has the effect of requiring all resources inside that class. This has the effect of isolating changes to that class and avoiding a situation where users have to worry about the internal implementation details of the other class. Along the same lines you can also notify a class – and all resources inside that class gets notified.

I only include other classes at the top ntp level and never have include statements in my classes like ntp::confg and so forth – this means when I require the class ntp::config or notify ntp::service I get just what I want and no more.

If you create big complex classes you run the risk of having refreshonly execs that relate to configuration or installation associated with services in the same class which would have disastrous consequences if you notify the wrong thing or if a user do not study your code before using it.

A consistant style of small single purpose classes named descriptively avoid these and other problems.

What we learned and further links

There is a lot to learn here and much of it is about soft issues like the value of consistency and clarity of design and thinking about your users – and your future self.

On the technical side you should learn about the effects of relationships and notifications based on containing classes and not by naming resources by name.

And we came across a number of recently added Puppet features:

Parameterized Classes are used to provide multiple convenient methods for supplying data to your module – defaults in the module, specifically in code, using Hiera and (not shown here) an ENC.

Chaining Arrows are used in the main class to inject the dependencies and notifications in a way that is visible without having to study each individual class.

These are important new additions to Puppet. Some new features like Parameterised classes are not quite ready for prime time imho but in Puppet 3 when combined with the data bindings a lot of the pain points have been removed.

Finally there are a number of useful things I did not mention here. Specifically you should study the Puppet Style Guide and use the Puppet Lint tool to validate your modules comply. You should consider writing tests for your modules using rspec-puppet and finally share it on the Puppet Forge.

And perhaps most importantly – do not reinvent the wheel, check the Forge first.

Using MongoDB as Publish Subscribe middleware

08/31/2012

Yesterday I mentioned on Twitter that I was playing with the MongoDB pub/sub features and that it worked quite well for my needs.

What I didn’t mention was that the documentation and blog posts were a bit all over the show and the Ruby examples I saw didn’t actually do what they said they did so I’ll show in this post working code and some basic approaches I took to deal with per consumer destinations etc.

Why?


So why would anyone want to use MongoDB as a queue or indeed MongoDB at all since everyone knows it’s unusable and does not save any data ever and kills kittens?

Actually MongoDB is a really nice database but like most NoSQL databases the thing to know about it is what shortcuts it takes with your data to do it’s magic. Knowing this you have to evaluate its suitability to your specific problem and if it’s not suitable, don’t use it.

It’s fast and has a flexible query system to search over arbitrary structured JSON data. Yes it has some interesting ideas about data durability but this is well known by now and if your needs match it’s features it’s not bad.

For shops with needs well suited to MongoDB who might want to add some queueing ability it can be daunting to bring in new tech like RabbitMQ or ActiveMQ because it brings new unknowns requires an investment in more monitoring, training and learning by making mistakes. If you already have a Mongo instance and know its quirks using it for a queue might not be such a terrible thing.

Additionally MongoDB is really easy to get going and generally I find for my work loads it just works with little maintenance required.

So my interest in its queueing abilities lies in providing a simpler ‘getting started’ for MCollective. New MCollective has pluggable discovery which works really well when discovering against a MongoDB cache of registration data so it would be nice if a simple starter edition setup could include both the queue and discovery data in one simple bit of software.

There are other options of course like Redis and I’ll evaluate them but of the various options MongoDB is the only one that comes with both pubsub and searching/querying capabilities that does what I need, isn’t written in Java and has OS packages for most distros easily available.

Background


In MongoDB when you do a find on a collection the returned result set is a Cursor. Cursors can have a number of modes or flags associated with them. Further it has something called Capped Collections that are fixed size and rotate old data out when they fill up.

The combination of some of these Cursor flags and Capped Collections enables a kind of tail -f behavior that works like a queue.

When you have a collection it usually returns nil when you reached the end of your results as can be seen here:

>> coll = db.collection('commands')
=> Mongo::DB:0x7fa1ae005f58 ....>
>> cursor = coll.find()
=> Mongo::Cursor:0x3fd0d6f61184 ....>
>> cursor.skip(cursor.count)
=> Mongo::Cursor:0x3fd0d6f61184 ....>
>> cursor.next_document
=> nil

Here we opened a collections and did a find. We moved to the end of the results and fetched the next result which immediately returned a nil indicating there’s nothing new.

Lets see how we can change the behavior of this collection that instead of returning immediately it will block for a while waiting for a new document and then return a nil after after a timeout if nothing new was found:

>> cursor = coll.find()
=> Mongo::Cursor:0x3fd0d6f61184 ....>
>> cursor.skip(cursor.count)
=> Mongo::Cursor:0x3fd0d6f61184 ....>
>> cursor.add_option(Mongo::Cursor::OP_QUERY_TAILABLE)
=> 2
>> cursor.add_option(Mongo::Cursor::OP_QUERY_AWAIT_DATA)
=> 34
>> loop { puts "#{Time.now}> Tailing...."; p cursor.next_document }
Fri Aug 31 13:40:19 +0100 2012> Tailing....
nil
Fri Aug 31 13:40:21 +0100 2012> Tailing....
nil
Fri Aug 31 13:40:23 +0100 2012> Tailing....
nil

Now instead of immediately returning a nil it will wait 2 to 3 seconds at the end of the collection incase new data comes.

So this is your consumer to the queue called commands here, anyone who saves data into the collection are producers. It’s quite light on resources on both the client and the MongoDB server, on a fairly low spec VM I was easily able to run 50+ consumers, a MongoDB instance and some producers.

MongoDB calls this feature Tailable Cursors and the thing the Ruby docs don’t tell you and that the Ruby library does not do for you is set the Mongo::Cursor::OP_QUERY_AWAIT_DATA option as above. Without this option it will still return nil immediately and the example code has a sleep to combat a infinite high resource usage loop. The proposed sleeping solution to the problem makes it completely pointless as a high performance Queue but the Mongo::Cursor::OP_QUERY_AWAIT_DATA option sorts that out.

A simple message structure


In my use case I have to be able to send messages to all consumers or sometimes just to a specific consumer. In other middleware you do this with different queue names or perhaps headers and then do selective subscribes to the queue picking off just the messages you are interested in.

I chose to use a single capped collection and use a structure similar to middleware headers to identify message targets:

{"headers" : {"target" : "all"},
 "payload" : "data"}

{"headers" : {"target" : "some.consumer"},
 "payload" : "data"}

The 2 examples show different target headers in one I am targeting everyone consuming the queue and in the 2nd one just a specific consumer. The payload can be anything, text, hashes whatever your needs are.

Lets look at a consumer that has a consumer name and that’s interested in messages directed at it or all consumers:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
@consumer_identity = "example.com"
@database = "queue"
@collection = "commands"
 
def get_collection
  @db ||= Mongo::Connection.new().db(@database)
 
  until @db.connection.active?
     puts ">>> Retrying database connection..."
     @db.connection.close
     @db.connection.connect
 
     sleep 0.5 unless @db.connection.active?
  end
 
  unless @db.collections.include?(@collection)
    coll = @db.create_collection(@collection, :capped => true, :size => 10240)
  else
    coll = @db.collection(@collection)
  end
 
  coll
end
 
loop do
  begin
    cursor = get_collection.find({"headers.target" => {"$in" => [@consumer_identity, "all"]}})
 
    # ignore old stuff
    cursor.skip(cursor.count)
 
    # blocking tail reads
    cursor.add_option(Mongo::Cursor::OP_QUERY_TAILABLE)
    cursor.add_option(Mongo::Cursor::OP_QUERY_AWAIT_DATA)
 
    begin
      # fetch all the docs forever
      loop do
        if doc = cursor.next_document
          p doc["payload"]
        end
      end
    rescue Mongo::OperationFailure => e
      puts ">>> Cursor closed: %s (%s)" % [e.to_s, e.class]
    end
  rescue Mongo::ConnectionFailure
    puts ">>> DB connection failed"
  end
end

On line 30 we’re setting up a Cursor for all messages matching “all” and our identity. You can now simply publish data with correct headers to target specific consumers or all consumers. The 2 loops will forever attempt to reconnect to any failed database and forever read whatever new messages arrives after connection.

Using this method it’s really easy to come up with all kinds of addressing modes for your queue. For example you can give work being done a job name and combine it with the target header to create sets of named consumers that will all receive commands that match just the work they’re able to do.

Results


As I initially said I did all this to test a MCollective connector that uses MongoDB as a middleware. It worked surprisingly well and I have both broadcast and directed modes working:

$ mco ping
.
.
---- ping statistics ----
15 replies max: 57.94 min: 48.56 avg: 54.72

I’ll try out some other options for a small site or starter edition middleware and post follow up blog posts.

I’ll say I’ve been very surprised by how well this worked though. The connector is a bit complex and awkward because of how thread safety is handled in the MongoDB Ruby drivers but it’s not been a big problem overall to solve a pretty complex use case with this.

Specifically I noted performance didn’t degrade hugely with 50 nodes connected or with larger payloads which is very nice.

Newer Posts
Older Posts