Complex data and Puppet

08/31/2009

Often while writing Puppet manifests you find yourself needing data, things like the local resolver, SMTP relay, SNMP Contact, Root Aliases etc. once you start thinking about it the amount of data you deal with is quite staggering.

It’s strange then that Puppet provides no way to work with this in a flexible way.  By flexible I mean:

  • A way to easily retrieve it
  • A way to choose data per host, domain, location, data center or any other criteria you could possibly wish
  • A way to provide defaults that allow your code to degrade gracefully
  • A way to make it a critical error should expected data not exist
  • A way that works with LDAP nodes, External Nodes or normal node{} blocks

This is quite a list of requirements, and in vanilla puppet you’d need to use case statements, if statements etc.

For example, here’s a use case, set SNMP Contact and root user alias.  Some machines for a specific client should have different contact details than other, indeed even some machines should have different contact details.  There should be a fall back default value should nothing be set specifically for a host.

You might attempt to do this with case and if statements:

class snmp::config {
   if $fqdn == "some.box.your.com" {
      $contactname = "Foo Sysadmin"
      $contactemail = "sysadmin@foo.com"
   }
 
   if $domain == "bar.com" {
      $contactname = "Bar Sysadmin"
      $contactemail = "sysadmin@bar.com"
   }
 
   if $location == "ldn_dc" && (! $contactname && ! $contactemail) {
      $contactname = "London Sysadmin"
      $contactemail = "ldnops@your.com"
   }
 
   if (! $contactname && ! $contactemail) {
      $contactname = "Sysadmin"
      $contactemail = "sysadmin@you.com"
   }
}

You can see that this might work, but it’s very unwieldy and your data is all over the code and soon enough you’ll be nesting selectors in case statements inside if statements, it’s totally unwieldy not to mention not reusable throughout your code.  

Not only is it unwieldy but if you wish to add more specifics in the future you will need to use tools like grep, find etc to find all the cases in your code where you use this and update them all.  You could of course come up with one file that contains all this logic but it would be aweful, I’ve tried it’s not viable.

What we really want to do is just this, and it should take care of all the code above, you should be able to call this wherever you want with complete disregard for the specifics of the overrides in data:

$contactname = extlookup("contactname")
$contactemail = extlookup("contactemail")

I’ve battled for ages with ways to deal with this and have come up with something that fits the bill perfectly, been using it and promoting it for almost a year now and so far found it to be totally life saver.

Sticking with the example above, first we should configure a lookup order that will work for us, here is my actual use:

$extlookup_precedence = ["%{fqdn}", "location_%{location}", "domain_%{domain}", "country_%{country}", "common"]

This sets up the lookup code to first look for data specified for the host, then the location the host is hosted at, then the domain, country and eventually a set of defaults.

My current version of this code uses CSV files to store the data simply because it was convenient and universally available with no barrier to entry.  It would be trivial to extend the code to use a database, LDAP or other system like that.

For my example if I put into the file some.box.your.com.csv the following:

contactemail,sysadmin@foo.com
contactname,Foo Sysadmin

And in common.csv if I put:

contactemail,sysadmin@you.com
contactname,Sysadmin

The lookup code will use this data whenever extlookup(“contactemail”) gets called on that machine, but will use the default when called from other hosts.  If you follow the logic above you’ll see this completely replace the case statement above with simple data files. 
 
Using a system like this you can model all your data needs and deal with the data and your location, machine, domain etc specific data outside of your manifests.

The code is very flexible, you can reuse existing variables in your code inside your data, for example:

ntpservers,1.pool.%{country}.ntp.org,2.pool.%{country}.ntp.org

In this case if you have $country defined in your manifest the code will use this variable and put it into the answer.  This snippet of data also shows that it supports arrays.

Here is another use case:

package{"screen":
   ensure => extlookup("pkg_screen", "absent")
}

This code will ensure that, unless otherwise specified, I do not want to have screen installed on any of my servers.  I could now though decide that all machines in a domain, or all machines in a location, country or specific hosts could have screen installed by simply setting them to present in the data file. 

This makes the code not only configurable but configurable in a way that suits every possible case as it depends on the precedence defined above.  If your use case does not rely on countries for example you can just replace the country ordering with whatever works for you.

I use this code in all my manifests and it’s helped me to make an extremely configurable set of manifests.  It has proven to be very flexible as I can use the same code for different clients in different industries and with different needs and network layouts without changing the code.

The code as it stands is available here: http://www.devco.net/code/extlookup.rb

Follow the comments in the script for install instructions and full usage guides.