R.I.Pienaar

Better Puppet Modules Using Hiera Data

12/08/2013

When writing Puppet Modules there tend to be a ton of configuration data – generally things like different paths for different operating systems. Today the general pattern to manage this data is a class module::param with a bunch of logic in it.

Here’s a simplistic example below – for an example of the full horror of this pattern see the puppetlabs-ntp module.

# ntp/manifests/init.pp
class ntp (
     $config = $ntp::params::config,
     $keys_file = $ntp::params::keys_file
   ) inherits ntp::params {
 
   file{$config:
      ....
   }
}

# ntp/manifests/params.pp
class ntp::params {
   case $::osfamily {
      'AIX': {
         $config = "/etc/ntp.conf"
         $keys_file = '/etc/ntp.keys'
      }
 
      'Debian': {
         $config = "/etc/ntp.conf"
         $keys_file = '/etc/ntp/keys'
      }
 
      'RedHat': {
         $config = "/etc/ntp.conf"
         $keys_file = '/etc/ntp/keys'
      }
 
      default: {
         fail("The ${module_name} module is not supported on an ${::osfamily} based system.")
      }
   }
}

This is the exact reason Hiera exists – to remove this kind of spaghetti code and move it into data, instinctively now whenever anyone see code like this they think they should refactor this and move the data into Hiera.

But there’s a problem. This works for your own modules in your own repos, you’d just use the Puppet 3 automatic parameter bindings and override the values in the ntp class – not ideal, but many people do it. If however you wanted to write a module for the Forge though there’s a hitch because the module author has no idea what kind of hierarchy exist where the module is used. If the site even used Hiera and today the module author can’t ship data with his module. So the only sensible thing to do is to embed a bunch of data in your code – the exact thing Hiera is supposed to avoid.

I proposed a solution to this problem that would allow module authors to embed data in their modules as well as control the Hierarchy that would be used when accessing this data. Unfortunately a year on we’re still nowhere and the community – and the forge – is suffering as a result.

The proposed solution would be a always-on Hiera backend that as a last resort would look for data inside the module. Critically the module author controls the hierarchy when it gets to the point of accessing data in the module. Consider the ntp::params class above, it is a code version of a Hiera Hierarchy keyed on the $::osfamily fact. But if we just allowed the module to supply data inside the module then the module author has to just hope that everyone has this tier in their hierarchy – not realistic. My proposal then adds a module specific Hierarchy and data that gets consulted after the site Hierarchy.

So lets look at how to rework this module around this proposed solution:

# ntp/manifests/init.pp
class ntp ($config, $keysfile)  {
   validate_absolute_path($config)
   validate_absolute_path($keysfile)
 
   file{$config:
      ....
   }
}

Next you configure Hiera to consult a hierarchy on the $::osfamily fact, note the new data directory that goes inside the module:

# ntp/data/hiera.yaml
---
:hierarchy:
  - "%{::osfamily}"

And finally we create some data files, here’s just the one for RedHat:

# ntp/data/RedHat.yaml
---
ntp::config: /etc/ntp.conf
ntp::keys_file: /etc/ntp/keys

Users of the module could add a new OS without contributing back to the module or forking the module by simply providing similar data to the site specific hierarchy leaving the downloaded module 100% untouched!

This is a very simple view of what this pattern allows, time will tell what the community makes of it. There are many advantages to this over the ntp::params pattern:

This helps the contributor to a public module:

  • Adding a new OS is easy, just drop in a new YAML file. This can be done with confidence as it will not break existing code as it will only be read on machines of the new OS. No complex case statements or 100s of braces to get right
  • On a busy module when adding a new OS they do not have to worry about complex merge problems, working hard at rebasing or any git escoteria – they’re just adding a file.
  • Syntactically it’s very easy, it’s just a YAML file. No complex case statements etc.
  • The contributor does not have to worry about breaking other Operating Systems he could not test on like AIX here. The change is contained to machines for the new OS
  • In large environments this help with change control as it’s just data – no logic changes

This helps the maintainer of a module:

  • Module maintenance is easier when it comes to adding new Operating Systems as it’s simple single files
  • Easier contribution reviews
  • Fewer merge commits, less git magic needed, cleaner commit history
  • The code is a lot easier to read and maintain. Fewer tests and validations are needed.

This helps the user of a module:

  • Well written modules now properly support supplying all data from Hiera
  • He has a single place to look for the overridable data
  • When using a module that does not support his OS he can deploy it into his site and just provide data instead of forking it

Today I am releasing my proposed code as a standalone module. It provides all the advantages above including the fact that it’s always on without any additional configuration needed.

It works exactly as above by adding a data directory with a hiera.yaml inside it. The only configuration being considered in this hiera.yaml is the hierarchy.

This module is new and does some horrible things to get itself activated automatically without any configuration, I’ve only tested it on Puppet 3.2.x but I think it will work in 3.x as is. I’d love to get feedback on this from users.

If you want to write a forge module that uses this feature simply add a dependency on the ripienaar/module_data module, soon as someone install this dependency along with your module the backend gets activated. Similarly if you just want to use this feature in your own modules, just puppet module install ripienaar/module_data.

Note though that if you do your module will only work on Puppet 3 or newer.

It’s unfortunate that my Pull Request is now over a year old and did not get merged and no real progress is being made. I hope if enough users adopt this solution we can force progress rather than sit by and watch nothing happen. Please send me your feedback and use this widely.

CLI Report viewer for Puppet

10/10/2013

When using Puppet you often run it in a single run mode on the CLI and then go afk. When you return you might notice it was slow for some or other reason but did not run it with –evaltrace and in debug mode so the information to help you answer this simply isn’t present – or scrolled off or got rotated away from your logs.

Typically you’d deploy something like foreman or report handlers on your masters which would receive and display reports. But while you’re on the shell it’s a big context switch to go and find the report there.

Puppet now saves reports in it’s state dir including with apply if you ran it with –write-catalog-summary and in recent versions these reports include the performance data that you’d only find from –evaltrace.

So to solve this problem I wrote a little tool to show reports on the CLI. It’s designed to run on the shell of the node in question and as root. If you do this it will automatically pick up the latest report and print it but it will also go through and check the sizes of files and show you stats. You can run it against saved reports on some other node but you’ll lose some utility. The main focus of the information presented is to let you see logs from the past run but also information that help you answer why it was slow to run.

It’s designed to work well with very recent versions of Puppet maybe even only 3.3.0 and newer, I’ve not tested it on older versions but will gladly accept patches.

Here are some snippets of a report of one of my nodes and some comments about the sections. A full sample report can be found here.

First it’s going to show you some metadata about the report, what node, when for etc:

sudo report_print.rb
Report for puppetmaster.example.com in environment production at Thu Oct 10 13:37:04 +0000 2013
 
             Report File: /var/lib/puppet/state/last_run_report.yaml
             Report Kind: apply
          Puppet Version: 3.3.1
           Report Format: 4
   Configuration Version: 1381412220
                    UUID: 99503fe8-38f2-4441-a530-d555ede9067b
               Log Lines: 350 (show with --log)

Some important information here, you can see it figured out where to find the report by parsing the Puppet config – agent section – what version of Puppet and what report format. You can also see the report has 350 lines of logs in it but it isn’t showing them by default.

Next up it shows you a bunch of metrics from the report:

Report Metrics:
 
   Changes:
                        Total: 320
 
   Events:
                        Total: 320
                      Success: 320
                      Failure: 0
 
   Resources:
                        Total: 436
                  Out of sync: 317
                      Changed: 317
                    Restarted: 7
            Failed to restart: 0
                      Skipped: 0
                       Failed: 0
                    Scheduled: 0
 
   Time:
                        Total: 573.671295
                      Package: 509.544123
                         Exec: 33.242635
      Puppetdb conn validator: 22.767754
             Config retrieval: 4.096973
                         File: 1.343388
                         User: 1.337979
                      Service: 1.180588
                  Ini setting: 0.127856
                       Anchor: 0.013984
            Datacat collector: 0.008954
                         Host: 0.003265
             Datacat fragment: 0.00277
                     Schedule: 0.000504
                        Group: 0.00039
                   Filebucket: 0.000132

These are numerically sorted and the useful stuff is in the last section – what types were to blame for the biggest slowness in your run. Here we can see we spent 509 seconds just doing packages.

Having seen how long each type of resource took it then shows you a little report of how many resources of each type was found:

Resources by resource type:
 
    288 File
     30 Datacat_fragment
     25 Anchor
     24 Ini_setting
     22 User
     18 Package
      9 Exec
      7 Service
      6 Schedule
      3 Datacat_collector
      1 Group
      1 Host
      1 Puppetdb_conn_validator
      1 Filebucket

From here you’ll see detail about resources and files, times, sizes etc. By default it’s going to show you 20 of each but you can increase that using the –count argument.

First we see the evaluation time by resource, this is how long the agent spent to complete a specific resource:

Slowest 20 resources by evaluation time:
 
    356.94 Package[activemq]
     41.71 Package[puppetdb]
     33.31 Package[apache2-prefork-dev]
     33.05 Exec[compile-passenger]
     23.41 Package[passenger]
     22.77 Puppetdb_conn_validator[puppetdb_conn]
     22.12 Package[libcurl4-openssl-dev]
     10.94 Package[httpd]
      4.78 Package[libapr1-dev]
      3.95 Package[puppetmaster]
      3.32 Package[ntp]
      2.75 Package[puppetdb-terminus]
      2.71 Package[mcollective-client]
      1.86 Package[ruby-stomp]
      1.72 Package[mcollective]
      0.58 Service[puppet]
      0.30 Service[puppetdb]
      0.18 User[jack]
      0.16 User[jill]
      0.16 User[ant]

You can see by far the longest here was the activemq package that took 356 seconds and contributed most to the 509 seconds that Package types took in total. A clear indication that maybe this machine is picking the wrong mirrors or that I should create my own nearby mirror.

File serving in Puppet is notoriously slow so when run as root on the node in question it will look for all File resources and print the sizes. Unfortunately it can’t know if a file contents came from source or content as that information isn’t in the report. Still this might give you some information on where to target optimization. In this case nothing really stands out:

20 largest managed files (only those with full path as resource name that are readable)
 
     6.50 KB /usr/local/share/mcollective/mcollective/util/actionpolicy.rb
     3.90 KB /etc/mcollective/facts.yaml
     3.83 KB /var/lib/puppet/concat/bin/concatfragments.sh
     2.78 KB /etc/sudoers
     1.69 KB /etc/apache2/conf.d/puppetmaster.conf
     1.49 KB /etc/puppet/fileserver.conf
     1.20 KB /etc/puppet/rack/config.ru
    944.00 B /etc/apache2/apache2.conf
    573.00 B /etc/ntp.conf
    412.00 B /usr/local/share/mcollective/mcollective/util/actionpolicy.ddl
    330.00 B /etc/apache2/mods-enabled/passenger.conf
    330.00 B /etc/apache2/mods-available/passenger.conf
    262.00 B /etc/default/puppet
    215.00 B /etc/apache2/mods-enabled/worker.conf
    215.00 B /etc/apache2/mods-available/worker.conf
    195.00 B /etc/apache2/ports.conf
    195.00 B /var/lib/puppet/concat/_etc_apache2_ports.conf/fragments.concat
    195.00 B /var/lib/puppet/concat/_etc_apache2_ports.conf/fragments.concat.out
    164.00 B /var/lib/puppet/concat/_etc_apache2_ports.conf/fragments/10_Apache ports header
    158.00 B /etc/puppet/hiera.yaml

And finally if I ran it with –log I’d get the individual log lines:

350 Log lines:
 
   Thu Oct 10 13:37:06 +0000 2013 /Stage[main]/Concat::Setup/File[/var/lib/puppet/concat]/ensure (notice): created
   Thu Oct 10 13:37:06 +0000 2013 /Stage[main]/Concat::Setup/File[/var/lib/puppet/concat/bin]/ensure (notice): created
   Thu Oct 10 13:37:06 +0000 2013 /Stage[main]/Concat::Setup/File[/var/lib/puppet/concat/bin/concatfragments.sh]/ensure (notice): defined content as '{md5}2fbba597a1513eb61229551d35d42b9f'
   .
   .
   .

The code is on GitHub, I’d like to make it available as a Puppet Forge module but there really is no usable option to achieve this. The Puppet Face framework is the best available option but the UX is so poor that I would not like to expose anyone to this to use my code.

Introduction to MCollective deck

06/14/2013

I’ve not had a good introduction to MCollective slide deck ever, I usually just give demos and talk through it. I was invited to talk in San Francisco about MCollective so made a new deck for this talk.

On the night I gave people the choice of talks between the new Introduction talk and the older Managing Puppet using MCollective and sadly the intro talk lost out.

Last night the excellent people at Workday flew me to Dublin to talk to the local DevOps group there and this group was predominantly Chef users who chose the Introduction talk so I finally had a chance to deliver it. This talk was recorded, hopefully it’ll be up soon and I’ll link to it once available.

This slide deck is a work in progress, it’s clear I need to add some more information about the non-cli orientated uses of MCollective but it’s good to finally have a deck that’s receiving good feedback.

We uploaded the slides back when I was in San Francisco to slideshare and those are the ones you see here.


Managing Puppet Using MCollective

02/03/2013

I recently gave a talk titled “Managing Puppet Using MCollective” at the Puppet Camp in Ghent.

The talk introduces a complete rewrite of the MCollective plugin used to manage Puppet. The plugin can be found on our Github repo as usual. Significantly this is one of a new breed of plugin that we ship as native OS packages and practice continuous delivery on.

The packages can be found on apt.puppetlabs.com and yum.puppetlabs.com and are simply called mcollective-puppet-agent and mcollective-puppet-client.

This set of plugins show case a bunch of recent MCollective features including:

  • Data Plugins
  • Aggregation Functions
  • Custom Validators
  • Configurable enabling and disabling of the Agent
  • Direct Addressing and pluggable discovery to significantly improve the efficiency of the runall method
  • Utility classes shared amongst different types of plugin
  • Extensive testing using rspec and our mcollective specific rspec plugins

It’s a bit of a beast coming at a couple thousand lines but this was mostly because we had to invent a rather sizeable wrapper for Puppet to expose a nice API around Puppet 2.7 and 3.x for things like running them and obtaining their status.

The slides from the talk can be seen below, hopefully a video will be up soon else I’ll turn it into a screencast.

Graphing on the CLI

01/20/2013

I’ve recently been thinking about ways to do graphs on the CLI. We’ve written a new Puppet Agent for MCollective that can gather all sorts of interesting data from your server estate and I’d really like to be able to show this data on the CLI. This post isn’t really about MCollective though the ideas applies to any data.

I already have sparklines in MCollective, here’s the distribution of ping times:

This shows you that most of the nodes responded quickly with a bit of a tail at the end being my machines in the US.

Sparklines are quite nice for a quick overview so I looked at adding some more of this to the UI and came up with this:

Which is quite nice – these are the nodes in my infrastructure stuck into buckets and the node counts for each bucket is shown. We can immediately tell something is not quite right – the config retrieval time shows a bunch of slow machines and the slowness does not correspond to resource counts etc. On investigation I found these are my dev machines – KVM nodes hosted on HP Micro Servers so that’s to be expected.

I am not particularly happy with these graphs though so am still exploring other options, one other option is GNU Plot.

GNU Plot can target its graphs for different terminals like PNG and also line printers – since the Unix terminal is essentially a line printer we can use this.

Here are 2 graphs of config retrieval time produced by MCollective using the same data source that produced the spark line above – though obviously from a different time period. Note that the axis titles and graph title is supplied automatically using the MCollective DDL:

$ mco plot resource config_retrieval_time
 
                   Information about Puppet managed resources
  Nodes
    6 ++-*****----+----------+-----------+----------+----------+----------++
      +      *    +          +           +          +          +           +
      |       *                                                            |
    5 ++      *                                                           ++
      |       *                                                            |
      |        *                                                           |
    4 ++       *      *                                                   ++
      |        *      *                                                    |
      |         *    * *                                                   |
    3 ++        *    * *                                                  ++
      |          *  *  *                                                   |
      |           * *   *                                                  |
    2 ++           *    *                         *        *              ++
      |                 *                         **       **              |
      |                  *                       * *      *  *             |
    1 ++                 *               *       *  *     *   **        * ++
      |                  *              * *     *   *     *     **    **   |
      +           +       *  +         * + *    *   +*   *     +     *     +
    0 ++----------+-------*************--+--****----+*****-----+--***-----++
      0           10         20          30         40         50          60
                              Config Retrieval Time

So this is pretty serviceable for showing this data on the console! It wouldn’t scale to many lines but for just visualizing some arbitrary series of numbers it’s quite nice. Here’s the GNU Plot script that made the text graph:

set title "Information about Puppet managed resources"
set terminal dumb 78 24
set key off
set ylabel "Nodes"
set xlabel "Config Retrieval Time"
plot '-' with lines
3 6
6 6
9 3
11 2
14 4
17 0
20 0
22 0
25 0
28 0
30 1
33 0
36 038 2
41 0
44 0
46 2
49 1
52 0
54 0
57 1

The magic here comes from the second line that sets the output terminal to dump and supplies some dimensions. Very handy, worth exploring some more and adding to your toolset for the CLI. I’ll look at writing a gem or something that supports both these modes.

There are a few other players in this space, I definitely recall coming across a Python tool to do graphs but cannot find it now, shout out in the comments if you know other approaches and I’ll add them to the post!

Updated: some links to related projects: sparkler, Graphite Spark

Newer Posts
Older Posts