A Puppet 4 Hiera Based Node Classifier

When I first wrote Hiera I included a simple little hack called hiera_include() that would do a Array lookup and include everything it found. I only included it even because include at the time did not take Array arguments. In time this has become quite widely used and many people do their node classification using just this and the built in hierarchical nature of Hiera.

I’ve always wanted to do better though, like maybe write an actual ENC that uses Hiera data keys on the provided certname? Seemed like the only real win would be to be able to set the node environment from Hiera, I guess this might be valuable enough on it’s own.

Anyway, I think the ENC interface is really pretty bad and should be replaced by something better. So I’ve had the idea of a Hiera based classifier in my mind for years.

Some time ago Ben Ford made a interesting little hack project that used a set of rules to classify nodes and this stuck to my mind as being quite a interesting approach. I guess it’s a bit like the new PE node classifier.

Anyway, so I took this as a starting point and started working on a Hiera based classifier for Puppet 4 – and by that I mean the very very latest Puppet 4, it uses a bunch of the things I blogged about recently and the end result is that the module is almost entirely built using the native Puppet 4 DSL.

Simple list-of-classes based Classification

So first lets take a look at how this replaces/improves on the old hiera_include().

Not really much to be done I am afraid, it’s an array with some entries in it. It now uses the Knockout Prefix features of Puppet Lookup that I blogged about before to allow you to exclude classes from nodes:

So we want to include the sysadmins and sensu classes on all nodes, stick this in your common tier:

# common.yaml
classifier::extra_classes:
 - sysadmins
 - sensu

Then you have some nodes that need some more classes:

# clients/acme.yaml
classifier::extra_classes:
 - acme_sysadmins

At this point it’s basically same old same old, but lets see if we had some node that needed Nagios and not Sensu:

# nodes/example.net.yaml
classifier::extra_classes:
 - --sensu
 - nagios

Here we use the knockout prefix of — to remove the sensu class and add the nagios one instead. That’s already a big win from old hiera_include() but to be fair this is just as a result of the new Lookup features.

It really gets interesting later when you throw in some rules.

Rule Based Classification

The classifier is built around a set of Classifications and these are made up of one or many rules per Classification which if they match on a host means a classification applies to the node. And the classifications can include classes and create data.

Here’s a sample rule where I want to do some extra special handling of RedHat like machines. But I want to handle VMs different from Physical machines.

# common.yaml
classifier::rules:
  RedHat VMs:
    match: all
    rules:
      - fact: "%{facts.os.family}"
        operator: ==
        value: RedHat
      - fact: "%{facts.is_virtual}"
        operator: ==
        value: "true"
    data:
      redhat_vm: true
    classes:
      - centos::vm

  RedHat:
    rules:
      - fact: "%{facts.os.family}"
        operator: ==
        value: RedHat
    data:
      redhat_os: true
    classes:
      - centos::common

This shows 2 Classifications one called “RedHat VMs” and one just “RedHat”, you can see the VMs one contains 2 rules and it sets match: all so they both have to match.

End result here is that all RedHat machines get centos::common and RedHat VMs also get centos::vm. Additionally 2 pieces of data will be created, a bit redundant in this example but you get the idea.

Using the Classifier

So using the classifier in the basic sense is just like hiera_include():

node default {
  include classifier
}

This will process all the rules and include the resulting classes. It will also expose a bunch of information via this class, the most interesting is $classifier::data which is a Hash of all the data that the rules emit. But you can also access the the included classes via $classifier::classes and even the whole post processed classification structure in $classifier::classification. Some others are mentioned in the README.

You can do very impressive Hiera based overrides, here’s an example of adjusting a rule for a particular node:

# clients/acme.yaml
classifier::rules:
  RedHat VMs:
    classes:
      - some::other
    data:
      extra_data: true

This has the result that for this particular client additional data will be produced and additional classes will be included – but only on their RedHat VMs. You can even use the knockout feature here to really adjust the data and classes.

The classes get included automatically for you and if you set classifier::debug you’ll get a bunch of insight into how classification happens.

Hiera Inception

So at this point things are pretty neat, but I wanted to both see how the new Data Provider API look and also see if I can expose my classifier back to Hiera.

Imagine I am making all these classifications but with what I shown above it’s quite limited because it’s just creating data for the $classifier::data hash. What you really want is to create Hiera data and be able to influence Automatic Parameter Lookup.

So a rule like:

# clients/acme.yaml
classifier::rules:
  RedHat:
    data:
      centos::common::selinux: permissive

Here I am taking the earlier RedHat rule and setting centos::common::selinux: permissive, now you want this to be Data that will be used by the Automatic Parameter Lookup system to set the selinux parameter of the centos::common class.

You can configure your Environment with this hiera.yaml

# environments/production/hiera.yaml
---
version: 4
datadir: "hieradata"
hierarchy:
  - name: "%{trusted.certname}"
    backend: "yaml"

  - name: "classification data"
    backend: "classifier"
 
  # ... and the rest

Here I allow node specific YAML files to override the classifier and then have a new Data Provider called classifier that expose the classification back to Hiera. Doing it this way is super important, the priority the classifier have on a site is not a single one size fits all choice, doing it this way means the site admins can decide where in their life classification site so it best fits their workflows.

So this is where the inception reference comes in, you extract data from Hiera, process it using the Puppet DSL and expose it back to Hiera. At first thought this is a bit insane but it works and it’s really nice. Basically this lets you completely redesign hiera from something that is Hierarchical in nature and turn it into a rule based system – or a hybrid.

And you can even test it from the CLI:

% puppet lookup --compile --explain centos::common::selinux
Merge strategy first
  Data Binding "hiera"
    No such key: "centos::common::selinux"
  Data Provider "Hiera Data Provider, version 4"
    ConfigurationPath "environments/production/hiera.yaml"
    Merge strategy first
      Data Provider "%{trusted.certname}"
        Path "environments/production/hieradata/dev2.devco.net.yaml"
          Original path: "%{trusted.certname}"
          No such key: "centos::common::selinux"
      Data Provider "classification data"
        Found key: "centos::common::selinux" value: "permissive"
      Merged result: "permissive"
  Merged result: "permissive"

I hope to expose here which rule provided this data like the other lookup explanations do.

Clearly this feature is a bit crazy, so consider this a exploration of what’s possible rather than a strong endorsement of this kind of thing 🙂

Implementation

Implementing this has been pretty interesting, I got to use a lot of the new Puppet 4 features. Like I mentioned all the data processing, iteration and deriving of classes and data is done using the native Puppet DSL, take a look at the functions directory for example.

It also makes use of the new Type system and Type Aliases all over the place to create a strong schema for the incoming data that gets validated at all levels of the process. See the types directory.

The new Modules in Data is used to set lookup strategies so that there are no manual calling of lookup(), see the module data.

Writing a Data Provider ie. a Hiera Backend for the new lookup system is pretty nice, I think the APIs around there is still maturing so definitely bleeding edge stuff. You can see the bindings and data provider in the lib directory.

As such this module only really has a hope of working on Puppet 4.4.0 at least, and I expect to use new features as they come along.

Conclusion

There’s a bunch more going on, check the module README. It’s been quite interesting to be able to really completely rethink how Hiera data is created and what a modern take on classification can achieve.

With this approach if you’re really not too keen on the hierarchy you can totally just use this as a rules based Hiera instead, that’s pretty interesting! I wonder what other strategies for creating data could be prototyped like this?

I realise this is very similar to the PE node classifier but with some additional benefits in being exposed to Hiera via the Data Provider, being something you can commit to git and being adjustable and overridable using the new Hiera features I think it will appeal to a different kind of user. But yeah, it’s quite similar. Credit to Ben Ford for his original Ruby based implementation of this idea which I took and iterated on. Regardless the ‘like a iTunes smart list’ node classifier isn’t exactly a new idea and have been discussed for literally years 🙂

You can get the module on the forge as ripienaar/classifier and I’d greatly welcome feedback and ideas.