R.I.Pienaar

A look at the Puppet 4 Application Orchestration feature

07/28/2016

Puppet 4 got some new language constructs that let you model multi node applications and it assist with passing information between nodes for you. I recently wrote a open source orchestrator for this stuff which is part of my Choria suite, figured I’ll write up a bit about these multi node applications since they are now useable in open source.

The basic problem this feature solves is about passing details between modules. Lets say you have a LAMP stack, you’re going to have Web Apps that need access to a DB and that DB will have a IP, User, Password etc. Exported resources never worked for this because it’s just stupid to have your DB exporting web resources, there are unlimited amount of web apps and configs, no DB module support this. So something new is needed.

The problem is made worse by the fact that Puppet does not really have something like a Java interface, you can’t say that foo-mysql module implements a standard interface called database so that you can swap out one mysql module for another, they’re all snowflakes. So a intermediate translation layer is needed.

In a way you can say this new feature brings a way to create an interface – lets say SQL – and allows you to hook random modules into both sides of the interface. On one end a database and on the other a webapp. Puppet will then transfer the information across the interface for you, feeding the web app with knowledge of port, user, hosts etc.

LAMP Stack Walkthrough


Lets walk through creating a standard definition for a multi node LAMP stack, and we’ll create 2 instances of the entire stack. It will involve 4 machines sharing data and duties.

These interfaces are called capabilities, here’s an example of a SQL one:

Puppet::Type.newtype :sql, :is_capability => true do
  newparam :name, :is_namevar => true
  newparam :user
  newparam :password
  newparam :host
  newparam :database
end

This is a generic interface to a database, you can imagine Postgres or MySQL etc can all satisfy this interface, perhaps you could add here a field to confer the type of database, but lets keep it simple. The capability provides a translation layer between 2 unrelated modules.

It’s a pretty big deal conceptually, I can see down the line there be some blessed official capabilities and we’ll see forge modules starting to declare their compatibility. And finally we can get to a world of interchangeable infrastructure modules.

Now I’ll create a defined type to make my database for my LAMP stack app, I’m just going to stick a notify in instead of the actual creating of a database to keep it easy to demo:

define lamp::mysql (
  $db_user,
  $db_password,
  $host     = $::hostname,
  $database = $name,
) {
  notify{"creating mysql db ${database} on ${host} for user ${db_user}": }
}

I need to tell Puppet this defined type exist to satisfy the producing side of the interface, there’s some new language syntax to do this, it feels kind of out of place not having a logical file to stick this in, I just put it in my lamp/manifests/mysql.pp:

Lamp::Mysql produces Sql {
  user     => $db_user,
  password => $db_password,
  host     => $host,
  database => $database,
}

Here you can see the mapping from the variables in the defined type to those in the capability above. $db_user feeds into the capability property $user etc.

With this in place if you have a lamp::mysql or one based on some other database, you can always query it’s properties based on the standard user etc, more on that below.

So we have a database, and we want to hook a web app onto it, again for this we use a defined type and again just using notifies to show the data flow:

define lamp::webapp (
  $db_user,
  $db_password,
  $db_host,
  $db_name,
  $docroot = '/var/www/html'
) {
  notify{"creating web app ${name} with db ${db_user}@${db_host}/${db_name}": }
}

As this is the other end of the translation layer enabled by the capability we tell Puppet that this defined type consumes a Sql capability:

Lamp::Webapp consumes Sql {
  db_user     => $user,
  db_password => $password,
  db_host     => $host,
  db_name     => $database,
}

This tells Puppet to read the value of user from the capability and stick it into db_user of the defined type. Thus we can plumb arbitrary modules found on the forge together with a translation layer between their properties!

So you have a data producer and a data consumer that communicates across a translation layer called a capability.

The final piece of the puzzle that defines our LAMP application stack is again some new language features:

application lamp (
  String $db_user,
  String $db_password,
  Integer $web_instances = 1
) {
  lamp::mysql { $name:
    db_user     => $db_user,
    db_password => $db_password,
    export      => Sql[$name],
  }
 
  range(1, $web_instances).each |$instance| {
    lamp::webapp {"${name}-${instance}":
      consume => Sql[$name],
    }
  }
}

Pay particular attention to the application bit and export and consume meta parameters here. This tells the system to feed data from the above created translation layer between these defined types.

You should kind of think of the lamp::mysql and lamp::webapp as node roles, these define what an individual node will do in this stack. If I create this application and set $instances = 10 I will need 1 x database machine and 10 x web machines. You can cohabit some of these roles but I think that’s a anti pattern. And since these are different nodes – as in entirely different machines – the magic here is that the capability based data system will feed these variables from one node to the next without you having to create any specific data on your web instances.

Finally, like a traditional node we now have a site which defines a bunch of nodes and allocate resources to them.

site {
  lamp{'app2':
    db_user       => 'user2',
    db_password   => 'secr3t',
    web_instances => 3,
    nodes         => {
      Node['dev1.example.net'] => Lamp::Mysql['app2'],
      Node['dev2.example.net'] => Lamp::Webapp['app2-1'],
      Node['dev3.example.net'] => Lamp::Webapp['app2-2'],
      Node['dev4.example.net'] => Lamp::Webapp['app2-3']
    }
  }
 
  lamp{'app1':
    db_user       => 'user1',
    db_password   => 's3cret',
    web_instances => 3,
    nodes         => {
      Node['dev1.example.net'] => Lamp::Mysql['app1'],
      Node['dev2.example.net'] => Lamp::Webapp['app1-1'],
      Node['dev3.example.net'] => Lamp::Webapp['app1-2'],
      Node['dev4.example.net'] => Lamp::Webapp['app1-3']
    }
  }
}

Here we are creating two instances of the LAMP application stack, each with it’s own database and with 3 web servers assigned to the cluster.

You have to be super careful about this stuff, if I tried to put my Mysql for app1 on dev1 and the Mysql for app2 on dev2 this would basically just blow up, it would be a cyclic dependency across the nodes. You generally best avoid sharing nodes across many app stacks or if you do you need to really think this stuff through. It’s a pain.

You now have this giant multi node monolith with order problems not just inter resource but inter node too.

Deployment


Deploying these stacks with the abilities the system provide is pretty low tech. If you take a look at the site above you can infer dependencies. First we have to run dev1.example.net. It will both produce the data needed and install the needed infrastructure, and then we can run all the web nodes in any order or even at the same time.

There’s a problem though, traditionally Puppet runs every 30 minutes and gets a new catalog every 30 minutes. We can’t have these nodes randomly get catalogs in random order since there’s no giant network aware lock/ordering system. So Puppet now has a new model, nodes are supposed to run cached catalogs for ever and only get a new catalog when specifically told so. You tell it to deploy this stack and once deployed Puppet goes into a remediation cycle fixing the stack as it is with an unchanging catalog. If you want to change code, you again have to run this entire stack in this specific order.

This is a nice improvement for release management and knowing your state, but without tooling to manage this process it’s a fail, and today that tooling is embryonic and PE only.

So Choria which I released in Beta yesterday provides at least some relief, it brings a manual orchestrator for these things so you can kick of a app deploy on demand, later maybe some daemon will do this regularly I don’t know yet.

Lets take a look at Choria interacting with the above manifests, lets just show the plan:

This shows all the defined stacks in your site and group them in terms of what can run in parallel and in what order.

Lets deploy the stack, Choria is used again and it uses MCollective to do the runs using the normal Puppet agent, it tries to avoid humans interfering with a stack deploy by disabling Puppet and enabling Puppet at various stages etc:

It has options to limit the runs to a certain node batch size so you don’t nuke your entire site at once etc.

Lets look at some of the logs and notifies:

07:46:53 dev1.example.net> puppet-agent[27756]: creating mysql db app2 on dev1 for user user2
07:46:53 dev1.example.net> puppet-agent[27756]: creating mysql db app1 on dev1 for user user1
 
07:47:57 dev4.example.net> puppet-agent[27607]: creating web app app2-3 with db user2@dev1/app2
07:47:57 dev4.example.net> puppet-agent[27607]: creating web app app1-3 with db user1@dev1/app1
 
07:47:58 dev2.example.net> puppet-agent[23728]: creating web app app2-1 with db user2@dev1/app2
07:47:58 dev2.example.net> puppet-agent[23728]: creating web app app1-1 with db user1@dev1/app1
 
07:47:58 dev3.example.net> puppet-agent[23728]: creating web app app2-2 with db user2@dev1/app2
07:47:58 dev3.example.net> puppet-agent[23728]: creating web app app1-2 with db user1@dev1/app1

All our data flowed nicely through the capabilities and the stack was built with the right usernames and passwords etc. Timestamps reveal dev{2,3,4} ran concurrently thanks to MCollective.

Conclusion


To be honest, this whole feature feels like a early tech preview and not something that should be shipped. This is basically the plumbing a user friendly feature should be written on and that’s not happened yet. You can see from above it’s super complex – and you even have to write some pure ruby stuff, wow.

If you wanted to figure this out from the docs, forget about it, the docs are a complete mess, I found a guide in the Learning VM which turned out to be the best resource showing a actual complete walk through. This is sadly par for the course with Puppet docs these days 🙁

There’s some more features here – you can make cross node monitor checks to confirm the DB is actually up before attempting to start the web server for example, interesting. But implementing new checks is just such a chore – I can do it, I doubt your average user will be bothered, just make it so we can run Nagios checks, there’s 1000s of these already written and we all have them and trust them. Tbf, I could probably write a generic nagios checker myself for this, I doubt average user can.

The way nodes depend on each other and are ordered is of course obvious. It should be this way and these are actual dependencies. But at the same time this is stages done large. Stages failed because they make this giant meta dependency layered over your catalog and a failure in any one stage results in skipping entire other, later, stages. They’re a pain in the arse, hard to debug and hard to reason about. This feature implements the exact same model but across nodes. Worse there does not seem to be a way to do cross node notifies of resources. It’s as horrible.

That said though with how this works as a graph across nodes it’s the only actual option. This outcome should have been enough to dissuade the graph approach from even being taken though and something new should have been done, alas. It’s a very constrained system, it demos well but building infrastructure with this is going to be a losing battle.

The site manifest has no data capabilities. You can’t really use hiera/lookup there in any sane way. This is unbelievable, I know there were general lack of caring for external data at Puppet but this is like being back in Puppet 0.22 days before even extlookup existed and about as usable. It’s unbelievable that there’s no features for programatic node assignment to roles for example etc, though given how easy it is to make cycles and impossible scenarios I can see why. I know this is something being worked on though. External data is first class. External data modelling has to inform everything you do. No serious user uses Puppet without external data. It has to be a day 1 concern.

The underlying Puppet Server infrastructure that builds these catalogs is ok, I guess, the catalog is very hard to consume and everyone who want to write a thing to interact with it will have to write some terrible sorting/ordering logic themselves – and probably all have their own set of interesting bugs. Hopefully one day there will be a gem or something, or just a better catalog format. Worse it seems to happily compile and issue cyclic graphs without error, filed a ticket for that.

The biggest problem for me is that this is in the worst place of intersection between PE and OSS Puppet, it is hard/impossible to find out roadmap, or activity on this feature set since it’s all behind private Jira tickets. Sometimes some bubble up and become public, but generally it’s a black hole.

Long story short, I think it’s just best avoided in general until it becomes more mature and more visible what is happening. The technical issues are fine, it’s a new feature that’s basically new R&D, this stuff happens. The softer issues makes it a tough one to consider using.

Fixing the mcollective deployment story

07/27/2016

Getting started with MCollective has always been an adventure, you have to learn a ton of new stuff like Middleware etc. And once you get that going the docs tend to present you with a vast array of options and choices including such arcane topics like which security plugin to use while the security model chosen is entirely unique to mcollective. To get a true feeling for the horror see the official deployment guide.

This is not really a pleasant experience and probably results in many insecure or half build deployments out there – and most people just not bothering. This is of course entirely my fault, too many options with bad defaults chosen is to blame.

I saw the graph of the learning curve of Eve Online and immediately always think of mcollective 🙂 Hint: mcollective is not the WoW of orchestration tools.

I am in the process of moving my machines to Puppet 4 and the old deployment methods for MCollective just did not work, everything is falling apart under the neglect the project has been experiencing. You can’t even install any plugin packages on Debian as they will nuke your entire Puppet install etc.

So I figured why not take a stab at rethinking this whole thing and see what I can do, today I’ll present the outcome of that – a new Beta distribution of MCollective tailored to the Puppet 4 AIO packaging that’s very easy to get going securely.

Overview


My main goals with these plugins were that they share as much security infrastructure with Puppet as possible. This means we get a understandable model and do not need to mess around with custom CAs and certs and so forth. Focussing on AIO Puppet means I can have sane defaults that works for everyone out of the box with very limited config. The deployment guide should be a single short page.

For a new user who has never used MCollective and now need certificates there should be no need to write a crazy ~/.mcollective file and configure a ton of SSL stuff, they should only need to do:

$ mco choria request_cert

This will make a CSR, submit it to the PuppetCA and wait for it to be signed like Puppet Agent. Once signed they can immediately start using MCollective. No config needed. No certs to distribute. Secure by default. Works with the full AAA stack by default.

Sites may wish to have tighter than default security around what actions can be made, and deploying these policies should be trivial.

Introducing Choria


Choria is a suite of plugins developed specifically with the Puppet AIO user in mind. It rewards using Puppet as designed with defaults and can yield a near zero configuration setup. It combines with a new mcollective module used to configure AIO based MCollective.

The deployment guide for a Choria based MCollective is a single short page. The result is:

  • A Security Plugin that uses the Puppet CA
  • A connector for NATS
  • A discovery cache that queries PuppetDB using the new PQL language
  • A open source Application Orchestrator for the new Puppet Multi Node Application stuff (naming is apparently still hard)
  • Puppet Agent, Package Agent, Service Agent, File Manager Agent all setup and ready to use
  • SSL and TLS used everywhere, any packet that leaves a node is secure. This cannot be turned off
  • A new packager that produce Puppet Modules for your agents etc and supports every OS AIO Puppet does
  • The full Authentication, Authorization and Auditing stack set up out of the box, with default secure settings
  • Deployment scenarios works by default, extensive support for SRV records and light weight manual configuration for those with custom needs

It’s easy to configure using the new lookup system and gives you a full, secure, usable, mcollective out of the box with minimal choices to make.

You can read how to deploy it at it’s deployment guide.

Status


This is really a Beta release at the moment, I’m looking for testers and feedback. I am particularly interested in feedback on NATS and the basic deployment model, in future I might give the current connectors a same treatment with chosen defaults etc.

The internals of the security plugin is quite interesting, it proposes a new internal message structure for MCollective which should be much easier to support in other languages and is more formalised – to be clear these messages always existed, they were just a bit adhoc.

Additionally it’s the first quality security plugin that has specific support for building a quality web stack compatible MCollective REST server that’s AAA compatible and would even allow centralised RBAC and signature authority.

Interacting with the Puppet CA from Ruby

07/20/2016

I recently ran into a known bug with the puppet certificate generate command that made it useless to me for creating user certificates.

So I had to do the CSR dance from Ruby myself to work around it, it’s quite simple actually but as with all things in OpenSSL it’s weird and wonderful.

Since the Puppet Agent is written in Ruby and it can do this it means there’s a HTTP API somewhere, these are documented reasonably well – see /puppet-ca/v1/certificate_request/ and /puppet-ca/v1/certificate/. Not covered is how to make the CSRs and such.

First I have a little helper to make the HTTP client:

def ca_path; "/home/rip/.puppetlabs/etc/puppet/ssl/certs/ca.pem";end
def cert_path; "/home/rip/.puppetlabs/etc/puppet/ssl/certs/rip.pem";end
def key_path; "/home/rip/.puppetlabs/etc/puppet/ssl/private_keys/rip.pem";end
def csr_path; "/home/rip/.puppetlabs/etc/puppet/ssl/certificate_requests/rip.pem";end
def has_cert?; File.exist?(cert_path);end
def has_ca?; File.exist?(ca_path);end
def already_requested?;!has_cert? && File.exist?(key_path);end
 
def http
  http = Net::HTTP.new(@ca, 8140)
  http.use_ssl = true
 
  if has_ca?
    http.ca_file = ca_path
    http.verify_mode = OpenSSL::SSL::VERIFY_PEER
  else
    http.verify_mode = OpenSSL::SSL::VERIFY_NONE
  end
 
  http
end

This is a HTTPS client that uses full verification of the remote host if we have a CA. There’s a small chicken and egg where you have to ask the CA for it’s own certificate where it’s a unverified connection. If this is a problem you need to arrange to put the CA on the machine in a safe manner.

Lets fetch the CA:

def fetch_ca
  return true if has_ca?
 
  req = Net::HTTP::Get.new("/puppet-ca/v1/certificate/ca", "Content-Type" => "text/plain")
  resp, _ = http.request(req)
 
  if resp.code == "200"
    File.open(ca_path, "w", Ob0644) {|f| f.write(resp.body)}
    puts("Saved CA certificate to %s" % ca_path)
  else
    abort("Failed to fetch CA from %s: %s: %s" % [@ca, resp.code, resp.message])
  end
 
  has_ca?
end

At this point we have the CA and saved it, future requests will be verified against this CA. If you put the CA there using some other means this will do nothing.

Now we need to start making our CSR, first we have to make a private key, this is a 4096 bit key saved in pem format:

def write_key
  key = OpenSSL::PKey::RSA.new(4096)
  File.open(key_path, "w", Ob0640) {|f| f.write(key.to_pem)}
  key
end

And the CSR needs to be made using this key, Puppet CSRs are quite simple with few fields filled in, can’t see why you couldn’t fill in more fields and of course it now supports extensions, I didn’t add any of those here, just a OU:

def write_csr(key)
  csr = OpenSSL::X509::Request.new
  csr.version = 0
  csr.public_key = key.public_key
  csr.subject = OpenSSL::X509::Name.new(
    [
      ["CN", @certname, OpenSSL::ASN1::UTF8STRING],
      ["OU", "my org", OpenSSL::ASN1::UTF8STRING]
    ]
  )
  csr.sign(key, OpenSSL::Digest::SHA1.new)
 
  File.open(csr_path, "w", Ob0644) {|f| f.write(csr.to_pem)}
 
  csr.to_pem
end

Let’s combine these to make the key and CSR and send the request to the Puppet CA, this request is verified using the CA:

def request_cert
  req = Net::HTTP::Put.new("/puppet-ca/v1/certificate_request/%s?environment=production" % @certname, "Content-Type" => "text/plain")
  req.body = write_csr(write_key)
  resp, _ = http.request(req)
 
  if resp.code == "200"
    puts("Requested certificate %s from %s" % [@certname, @ca])
  else
    abort("Failed to request certificate from %s: %s: %s: %s" % [@ca, resp.code, resp.message, resp.body])
  end
end

You’ll now have to sign the cert on your Puppet CA as normal, or use autosign, nothing new here.

And finally you can attempt to fetch the cert, this method is designed to return false if the cert is not yet ready on the master – ie. not signed yet.

def attempt_fetch_cert
  return true if has_cert?
 
  req = Net::HTTP::Get.new("/puppet-ca/v1/certificate/%s" % @certname, "Content-Type" => "text/plain")
  resp, _ = http.request(req)
 
  if resp.code == "200"
    File.open(cert_path, "w", Ob0644) {|f| f.write(resp.body)}
    puts("Saved certificate to %s" % cert_path)
  end
 
  has_cert?
end

Pulling this all together you have some code to make keys, CSR etc, cache the CA and request a cert is signed, it will then do a wait for cert like Puppet does till things are signed.

def main
  abort("Already have a certificate '%s', cannot continue" % @certname) if has_cert?
 
  make_ssl_dirs
  fetch_ca
 
  if already_requested?
    puts("Certificate %s has already been requested, attempting to retrieve it" % @certname)
  else
    puts("Requesting certificate for '%s'" % @certname)
    request_cert
  end
 
  puts("Waiting up to 120 seconds for it to be signed")
  puts
 
  12.times do |time|
    print "Attempting to download certificate %s: %d / 12\r" % [@certname, time]
 
    break if attempt_fetch_cert
 
    sleep 10
  end
 
  abort("Could not fetch the certificate after 120 seconds") unless has_cert?
 
  puts("Certificate %s has been stored in %s" % [@certname, ssl_dir])
end

Hiera Node Classifier 0.7

04/19/2016

A while ago I released a Puppet 4 Hiera based node classifier to see what is next for hiera_include(). This had the major drawback that you couldn’t set an environment with it like with a real ENC since Puppet just doesn’t have that feature.

I’ve released a update to the classifier that now include a small real ENC that takes care of setting the environment based on certname and then boots up the classifier on the node.

Usage


ENCs tend to know only about the certname, you could imagine getting most recent seen facts from PuppetDB etc but I do not really want to assume things about peoples infrastructure. So for now this sticks to supporting classification based on certname only.

It’s really pretty simple, lets assume you are wanting to classify node1.example.net, you just need to have a node1.example.net.yaml (or JSON) file somewhere in a path. Typically this is going to be in a directory environment somewhere but could of course also be a site wide hiera directory.

In it you put:

classifier::environment: development

And this will node will form part of that environment. Past that everything in the previous post just applies so you make rules or assign classes as normal, and while doing so you have full access to node facts.

The classifier now expose some extra information to help you determine if the ENC is in use and based on what file it’s classifying the node:

  • $classifier::enc_used – boolean that indicates if the ENC is in use
  • $classifier::enc_source – path to the data file that set the environment. undef when not found
  • $classifier::enc_environment – the environment the ENC is setting

It supports a default environment which you configure when configuring Puppet to use a ENC as below.

Configuring Puppet


Configuring Puppet is pretty simple for this:

[main]
node_terminus = exec
external_nodes = /usr/local/bin/classifier_enc.rb --data-dir /etc/puppetlabs/code/hieradata --node-pattern nodes/%%.yaml

Apart from these you can do –default development to default to that and not production and you can add –debug /tmp/enc.log to get a bunch of debug output.

The data-dir above is for your classic Hiera single data dir setup, but you can also use globs to support environment data like –data-dir /etc/puppetlabs/code/environments/*/hieradata. It will now search the entire glob until it finds a match for the certname.

That’s really all there is to it, it produce a classification like this:

---
environment: production
classes:
  classifier:
    enc_used: true
    enc_source: /etc/puppetlabs/code/hieradata/node.example.yaml
    enc_environment: production

Conclusion


That’s really all there is to it, I think this might hit a high percentage of user cases and bring a key ability to the hiera classifiers. It’s a tad annoying there is no way really to do better granularity than just per node here, I might come up with something else but don’t really want to go too deep down that hole.

In future I’ll look about adding a class to install the classifier into some path and configure Puppet, for now that’s up to the user. It’s shipped in the bin dir of the module.

A Puppet 4 Hiera Based Node Classifier

03/22/2016

When I first wrote Hiera I included a simple little hack called hiera_include() that would do a Array lookup and include everything it found. I only included it even because include at the time did not take Array arguments. In time this has become quite widely used and many people do their node classification using just this and the built in hierarchical nature of Hiera.

I’ve always wanted to do better though, like maybe write an actual ENC that uses Hiera data keys on the provided certname? Seemed like the only real win would be to be able to set the node environment from Hiera, I guess this might be valuable enough on it’s own.

Anyway, I think the ENC interface is really pretty bad and should be replaced by something better. So I’ve had the idea of a Hiera based classifier in my mind for years.

Some time ago Ben Ford made a interesting little hack project that used a set of rules to classify nodes and this stuck to my mind as being quite a interesting approach. I guess it’s a bit like the new PE node classifier.

Anyway, so I took this as a starting point and started working on a Hiera based classifier for Puppet 4 – and by that I mean the very very latest Puppet 4, it uses a bunch of the things I blogged about recently and the end result is that the module is almost entirely built using the native Puppet 4 DSL.

Simple list-of-classes based Classification


So first lets take a look at how this replaces/improves on the old hiera_include().

Not really much to be done I am afraid, it’s an array with some entries in it. It now uses the Knockout Prefix features of Puppet Lookup that I blogged about before to allow you to exclude classes from nodes:

So we want to include the sysadmins and sensu classes on all nodes, stick this in your common tier:

# common.yaml
classifier::extra_classes:
 - sysadmins
 - sensu

Then you have some nodes that need some more classes:

# clients/acme.yaml
classifier::extra_classes:
 - acme_sysadmins

At this point it’s basically same old same old, but lets see if we had some node that needed Nagios and not Sensu:

# nodes/example.net.yaml
classifier::extra_classes:
 - --sensu
 - nagios

Here we use the knockout prefix of to remove the sensu class and add the nagios one instead. That’s already a big win from old hiera_include() but to be fair this is just as a result of the new Lookup features.

It really gets interesting later when you throw in some rules.

Rule Based Classification


The classifier is built around a set of Classifications and these are made up of one or many rules per Classification which if they match on a host means a classification applies to the node. And the classifications can include classes and create data.

Here’s a sample rule where I want to do some extra special handling of RedHat like machines. But I want to handle VMs different from Physical machines.

# common.yaml
classifier::rules:
  RedHat VMs:
    match: all
    rules:
      - fact: "%{facts.os.family}"
        operator: ==
        value: RedHat
      - fact: "%{facts.is_virtual}"
        operator: ==
        value: "true"
    data:
      redhat_vm: true
    classes:
      - centos::vm
 
  RedHat:
    rules:
      - fact: "%{facts.os.family}"
        operator: ==
        value: RedHat
    data:
      redhat_os: true
    classes:
      - centos::common

This shows 2 Classifications one called “RedHat VMs” and one just “RedHat”, you can see the VMs one contains 2 rules and it sets match: all so they both have to match.

End result here is that all RedHat machines get centos::common and RedHat VMs also get centos::vm. Additionally 2 pieces of data will be created, a bit redundant in this example but you get the idea.

Using the Classifier


So using the classifier in the basic sense is just like hiera_include():

node default {
  include classifier
}

This will process all the rules and include the resulting classes. It will also expose a bunch of information via this class, the most interesting is $classifier::data which is a Hash of all the data that the rules emit. But you can also access the the included classes via $classifier::classes and even the whole post processed classification structure in $classifier::classification. Some others are mentioned in the README.

You can do very impressive Hiera based overrides, here’s an example of adjusting a rule for a particular node:

# clients/acme.yaml
classifier::rules:
  RedHat VMs:
    classes:
      - some::other
    data:
      extra_data: true

This has the result that for this particular client additional data will be produced and additional classes will be included – but only on their RedHat VMs. You can even use the knockout feature here to really adjust the data and classes.

The classes get included automatically for you and if you set classifier::debug you’ll get a bunch of insight into how classification happens.

Hiera Inception


So at this point things are pretty neat, but I wanted to both see how the new Data Provider API look and also see if I can expose my classifier back to Hiera.

Imagine I am making all these classifications but with what I shown above it’s quite limited because it’s just creating data for the $classifier::data hash. What you really want is to create Hiera data and be able to influence Automatic Parameter Lookup.

So a rule like:

# clients/acme.yaml
classifier::rules:
  RedHat:
    data:
      centos::common::selinux: permissive

Here I am taking the earlier RedHat rule and setting centos::common::selinux: permissive, now you want this to be Data that will be used by the Automatic Parameter Lookup system to set the selinux parameter of the centos::common class.

You can configure your Environment with this hiera.yaml

# environments/production/hiera.yaml
---
version: 4
datadir: "hieradata"
hierarchy:
  - name: "%{trusted.certname}"
    backend: "yaml"
 
  - name: "classification data"
    backend: "classifier"
 
  # ... and the rest

Here I allow node specific YAML files to override the classifier and then have a new Data Provider called classifier that expose the classification back to Hiera. Doing it this way is super important, the priority the classifier have on a site is not a single one size fits all choice, doing it this way means the site admins can decide where in their life classification site so it best fits their workflows.

So this is where the inception reference comes in, you extract data from Hiera, process it using the Puppet DSL and expose it back to Hiera. At first thought this is a bit insane but it works and it’s really nice. Basically this lets you completely redesign hiera from something that is Hierarchical in nature and turn it into a rule based system – or a hybrid.

And you can even test it from the CLI:

% puppet lookup --compile --explain centos::common::selinux
Merge strategy first
  Data Binding "hiera"
    No such key: "centos::common::selinux"
  Data Provider "Hiera Data Provider, version 4"
    ConfigurationPath "environments/production/hiera.yaml"
    Merge strategy first
      Data Provider "%{trusted.certname}"
        Path "environments/production/hieradata/dev2.devco.net.yaml"
          Original path: "%{trusted.certname}"
          No such key: "centos::common::selinux"
      Data Provider "classification data"
        Found key: "centos::common::selinux" value: "permissive"
      Merged result: "permissive"
  Merged result: "permissive"

I hope to expose here which rule provided this data like the other lookup explanations do.

Clearly this feature is a bit crazy, so consider this a exploration of what’s possible rather than a strong endorsement of this kind of thing 🙂

Implementation


Implementing this has been pretty interesting, I got to use a lot of the new Puppet 4 features. Like I mentioned all the data processing, iteration and deriving of classes and data is done using the native Puppet DSL, take a look at the functions directory for example.

It also makes use of the new Type system and Type Aliases all over the place to create a strong schema for the incoming data that gets validated at all levels of the process. See the types directory.

The new Modules in Data is used to set lookup strategies so that there are no manual calling of lookup(), see the module data.

Writing a Data Provider ie. a Hiera Backend for the new lookup system is pretty nice, I think the APIs around there is still maturing so definitely bleeding edge stuff. You can see the bindings and data provider in the lib directory.

As such this module only really has a hope of working on Puppet 4.4.0 at least, and I expect to use new features as they come along.

Conclusion


There’s a bunch more going on, check the module README. It’s been quite interesting to be able to really completely rethink how Hiera data is created and what a modern take on classification can achieve.

With this approach if you’re really not too keen on the hierarchy you can totally just use this as a rules based Hiera instead, that’s pretty interesting! I wonder what other strategies for creating data could be prototyped like this?

I realise this is very similar to the PE node classifier but with some additional benefits in being exposed to Hiera via the Data Provider, being something you can commit to git and being adjustable and overridable using the new Hiera features I think it will appeal to a different kind of user. But yeah, it’s quite similar. Credit to Ben Ford for his original Ruby based implementation of this idea which I took and iterated on. Regardless the ‘like a iTunes smart list’ node classifier isn’t exactly a new idea and have been discussed for literally years 🙂

You can get the module on the forge as ripienaar/classifier and I’d greatly welcome feedback and ideas.

Older Posts