R.I.Pienaar

Interacting with the Puppet CA from Ruby

07/20/2016

I recently ran into a known bug with the puppet certificate generate command that made it useless to me for creating user certificates.

So I had to do the CSR dance from Ruby myself to work around it, it’s quite simple actually but as with all things in OpenSSL it’s weird and wonderful.

Since the Puppet Agent is written in Ruby and it can do this it means there’s a HTTP API somewhere, these are documented reasonably well – see /puppet-ca/v1/certificate_request/ and /puppet-ca/v1/certificate/. Not covered is how to make the CSRs and such.

First I have a little helper to make the HTTP client:

def ca_path; "/home/rip/.puppetlabs/etc/puppet/ssl/certs/ca.pem";end
def cert_path; "/home/rip/.puppetlabs/etc/puppet/ssl/certs/rip.pem";end
def key_path; "/home/rip/.puppetlabs/etc/puppet/ssl/private_keys/rip.pem";end
def csr_path; "/home/rip/.puppetlabs/etc/puppet/ssl/certificate_requests/rip.pem";end
def has_cert?; File.exist?(cert_path);end
def has_ca?; File.exist?(ca_path);end
def already_requested?;!has_cert? && File.exist?(key_path);end
 
def http
  http = Net::HTTP.new(@ca, 8140)
  http.use_ssl = true
 
  if has_ca?
    http.ca_file = ca_path
    http.verify_mode = OpenSSL::SSL::VERIFY_PEER
  else
    http.verify_mode = OpenSSL::SSL::VERIFY_NONE
  end
 
  http
end

This is a HTTPS client that uses full verification of the remote host if we have a CA. There’s a small chicken and egg where you have to ask the CA for it’s own certificate where it’s a unverified connection. If this is a problem you need to arrange to put the CA on the machine in a safe manner.

Lets fetch the CA:

def fetch_ca
  return true if has_ca?
 
  req = Net::HTTP::Get.new("/puppet-ca/v1/certificate/ca", "Content-Type" => "text/plain")
  resp, _ = http.request(req)
 
  if resp.code == "200"
    File.open(ca_path, "w", Ob0644) {|f| f.write(resp.body)}
    puts("Saved CA certificate to %s" % ca_path)
  else
    abort("Failed to fetch CA from %s: %s: %s" % [@ca, resp.code, resp.message])
  end
 
  has_ca?
end

At this point we have the CA and saved it, future requests will be verified against this CA. If you put the CA there using some other means this will do nothing.

Now we need to start making our CSR, first we have to make a private key, this is a 4096 bit key saved in pem format:

def write_key
  key = OpenSSL::PKey::RSA.new(4096)
  File.open(key_path, "w", Ob0640) {|f| f.write(key.to_pem)}
  key
end

And the CSR needs to be made using this key, Puppet CSRs are quite simple with few fields filled in, can’t see why you couldn’t fill in more fields and of course it now supports extensions, I didn’t add any of those here, just a OU:

def write_csr(key)
  csr = OpenSSL::X509::Request.new
  csr.version = 0
  csr.public_key = key.public_key
  csr.subject = OpenSSL::X509::Name.new(
    [
      ["CN", @certname, OpenSSL::ASN1::UTF8STRING],
      ["OU", "my org", OpenSSL::ASN1::UTF8STRING]
    ]
  )
  csr.sign(key, OpenSSL::Digest::SHA1.new)
 
  File.open(csr_path, "w", Ob0644) {|f| f.write(csr.to_pem)}
 
  csr.to_pem
end

Let’s combine these to make the key and CSR and send the request to the Puppet CA, this request is verified using the CA:

def request_cert
  req = Net::HTTP::Put.new("/puppet-ca/v1/certificate_request/%s?environment=production" % @certname, "Content-Type" => "text/plain")
  req.body = write_csr(write_key)
  resp, _ = http.request(req)
 
  if resp.code == "200"
    puts("Requested certificate %s from %s" % [@certname, @ca])
  else
    abort("Failed to request certificate from %s: %s: %s: %s" % [@ca, resp.code, resp.message, resp.body])
  end
end

You’ll now have to sign the cert on your Puppet CA as normal, or use autosign, nothing new here.

And finally you can attempt to fetch the cert, this method is designed to return false if the cert is not yet ready on the master – ie. not signed yet.

def attempt_fetch_cert
  return true if has_cert?
 
  req = Net::HTTP::Get.new("/puppet-ca/v1/certificate/%s" % @certname, "Content-Type" => "text/plain")
  resp, _ = http.request(req)
 
  if resp.code == "200"
    File.open(cert_path, "w", Ob0644) {|f| f.write(resp.body)}
    puts("Saved certificate to %s" % cert_path)
  end
 
  has_cert?
end

Pulling this all together you have some code to make keys, CSR etc, cache the CA and request a cert is signed, it will then do a wait for cert like Puppet does till things are signed.

def main
  abort("Already have a certificate '%s', cannot continue" % @certname) if has_cert?
 
  make_ssl_dirs
  fetch_ca
 
  if already_requested?
    puts("Certificate %s has already been requested, attempting to retrieve it" % @certname)
  else
    puts("Requesting certificate for '%s'" % @certname)
    request_cert
  end
 
  puts("Waiting up to 120 seconds for it to be signed")
  puts
 
  12.times do |time|
    print "Attempting to download certificate %s: %d / 12\r" % [@certname, time]
 
    break if attempt_fetch_cert
 
    sleep 10
  end
 
  abort("Could not fetch the certificate after 120 seconds") unless has_cert?
 
  puts("Certificate %s has been stored in %s" % [@certname, ssl_dir])
end

Hiera Node Classifier 0.7

04/19/2016

A while ago I released a Puppet 4 Hiera based node classifier to see what is next for hiera_include(). This had the major drawback that you couldn’t set an environment with it like with a real ENC since Puppet just doesn’t have that feature.

I’ve released a update to the classifier that now include a small real ENC that takes care of setting the environment based on certname and then boots up the classifier on the node.

Usage


ENCs tend to know only about the certname, you could imagine getting most recent seen facts from PuppetDB etc but I do not really want to assume things about peoples infrastructure. So for now this sticks to supporting classification based on certname only.

It’s really pretty simple, lets assume you are wanting to classify node1.example.net, you just need to have a node1.example.net.yaml (or JSON) file somewhere in a path. Typically this is going to be in a directory environment somewhere but could of course also be a site wide hiera directory.

In it you put:

classifier::environment: development

And this will node will form part of that environment. Past that everything in the previous post just applies so you make rules or assign classes as normal, and while doing so you have full access to node facts.

The classifier now expose some extra information to help you determine if the ENC is in use and based on what file it’s classifying the node:

  • $classifier::enc_used – boolean that indicates if the ENC is in use
  • $classifier::enc_source – path to the data file that set the environment. undef when not found
  • $classifier::enc_environment – the environment the ENC is setting

It supports a default environment which you configure when configuring Puppet to use a ENC as below.

Configuring Puppet


Configuring Puppet is pretty simple for this:

[main]
node_terminus = exec
external_nodes = /usr/local/bin/classifier_enc.rb --data-dir /etc/puppetlabs/code/hieradata --node-pattern nodes/%%.yaml

Apart from these you can do –default development to default to that and not production and you can add –debug /tmp/enc.log to get a bunch of debug output.

The data-dir above is for your classic Hiera single data dir setup, but you can also use globs to support environment data like –data-dir /etc/puppetlabs/code/environments/*/hieradata. It will now search the entire glob until it finds a match for the certname.

That’s really all there is to it, it produce a classification like this:

---
environment: production
classes:
  classifier:
    enc_used: true
    enc_source: /etc/puppetlabs/code/hieradata/node.example.yaml
    enc_environment: production

Conclusion


That’s really all there is to it, I think this might hit a high percentage of user cases and bring a key ability to the hiera classifiers. It’s a tad annoying there is no way really to do better granularity than just per node here, I might come up with something else but don’t really want to go too deep down that hole.

In future I’ll look about adding a class to install the classifier into some path and configure Puppet, for now that’s up to the user. It’s shipped in the bin dir of the module.

A Puppet 4 Hiera Based Node Classifier

03/22/2016

When I first wrote Hiera I included a simple little hack called hiera_include() that would do a Array lookup and include everything it found. I only included it even because include at the time did not take Array arguments. In time this has become quite widely used and many people do their node classification using just this and the built in hierarchical nature of Hiera.

I’ve always wanted to do better though, like maybe write an actual ENC that uses Hiera data keys on the provided certname? Seemed like the only real win would be to be able to set the node environment from Hiera, I guess this might be valuable enough on it’s own.

Anyway, I think the ENC interface is really pretty bad and should be replaced by something better. So I’ve had the idea of a Hiera based classifier in my mind for years.

Some time ago Ben Ford made a interesting little hack project that used a set of rules to classify nodes and this stuck to my mind as being quite a interesting approach. I guess it’s a bit like the new PE node classifier.

Anyway, so I took this as a starting point and started working on a Hiera based classifier for Puppet 4 – and by that I mean the very very latest Puppet 4, it uses a bunch of the things I blogged about recently and the end result is that the module is almost entirely built using the native Puppet 4 DSL.

Simple list-of-classes based Classification


So first lets take a look at how this replaces/improves on the old hiera_include().

Not really much to be done I am afraid, it’s an array with some entries in it. It now uses the Knockout Prefix features of Puppet Lookup that I blogged about before to allow you to exclude classes from nodes:

So we want to include the sysadmins and sensu classes on all nodes, stick this in your common tier:

# common.yaml
classifier::extra_classes:
 - sysadmins
 - sensu

Then you have some nodes that need some more classes:

# clients/acme.yaml
classifier::extra_classes:
 - acme_sysadmins

At this point it’s basically same old same old, but lets see if we had some node that needed Nagios and not Sensu:

# nodes/example.net.yaml
classifier::extra_classes:
 - --sensu
 - nagios

Here we use the knockout prefix of to remove the sensu class and add the nagios one instead. That’s already a big win from old hiera_include() but to be fair this is just as a result of the new Lookup features.

It really gets interesting later when you throw in some rules.

Rule Based Classification


The classifier is built around a set of Classifications and these are made up of one or many rules per Classification which if they match on a host means a classification applies to the node. And the classifications can include classes and create data.

Here’s a sample rule where I want to do some extra special handling of RedHat like machines. But I want to handle VMs different from Physical machines.

# common.yaml
classifier::rules:
  RedHat VMs:
    match: all
    rules:
      - fact: "%{facts.os.family}"
        operator: ==
        value: RedHat
      - fact: "%{facts.is_virtual}"
        operator: ==
        value: "true"
    data:
      redhat_vm: true
    classes:
      - centos::vm
 
  RedHat:
    rules:
      - fact: "%{facts.os.family}"
        operator: ==
        value: RedHat
    data:
      redhat_os: true
    classes:
      - centos::common

This shows 2 Classifications one called “RedHat VMs” and one just “RedHat”, you can see the VMs one contains 2 rules and it sets match: all so they both have to match.

End result here is that all RedHat machines get centos::common and RedHat VMs also get centos::vm. Additionally 2 pieces of data will be created, a bit redundant in this example but you get the idea.

Using the Classifier


So using the classifier in the basic sense is just like hiera_include():

node default {
  include classifier
}

This will process all the rules and include the resulting classes. It will also expose a bunch of information via this class, the most interesting is $classifier::data which is a Hash of all the data that the rules emit. But you can also access the the included classes via $classifier::classes and even the whole post processed classification structure in $classifier::classification. Some others are mentioned in the README.

You can do very impressive Hiera based overrides, here’s an example of adjusting a rule for a particular node:

# clients/acme.yaml
classifier::rules:
  RedHat VMs:
    classes:
      - some::other
    data:
      extra_data: true

This has the result that for this particular client additional data will be produced and additional classes will be included – but only on their RedHat VMs. You can even use the knockout feature here to really adjust the data and classes.

The classes get included automatically for you and if you set classifier::debug you’ll get a bunch of insight into how classification happens.

Hiera Inception


So at this point things are pretty neat, but I wanted to both see how the new Data Provider API look and also see if I can expose my classifier back to Hiera.

Imagine I am making all these classifications but with what I shown above it’s quite limited because it’s just creating data for the $classifier::data hash. What you really want is to create Hiera data and be able to influence Automatic Parameter Lookup.

So a rule like:

# clients/acme.yaml
classifier::rules:
  RedHat:
    data:
      centos::common::selinux: permissive

Here I am taking the earlier RedHat rule and setting centos::common::selinux: permissive, now you want this to be Data that will be used by the Automatic Parameter Lookup system to set the selinux parameter of the centos::common class.

You can configure your Environment with this hiera.yaml

# environments/production/hiera.yaml
---
version: 4
datadir: "hieradata"
hierarchy:
  - name: "%{trusted.certname}"
    backend: "yaml"
 
  - name: "classification data"
    backend: "classifier"
 
  # ... and the rest

Here I allow node specific YAML files to override the classifier and then have a new Data Provider called classifier that expose the classification back to Hiera. Doing it this way is super important, the priority the classifier have on a site is not a single one size fits all choice, doing it this way means the site admins can decide where in their life classification site so it best fits their workflows.

So this is where the inception reference comes in, you extract data from Hiera, process it using the Puppet DSL and expose it back to Hiera. At first thought this is a bit insane but it works and it’s really nice. Basically this lets you completely redesign hiera from something that is Hierarchical in nature and turn it into a rule based system – or a hybrid.

And you can even test it from the CLI:

% puppet lookup --compile --explain centos::common::selinux
Merge strategy first
  Data Binding "hiera"
    No such key: "centos::common::selinux"
  Data Provider "Hiera Data Provider, version 4"
    ConfigurationPath "environments/production/hiera.yaml"
    Merge strategy first
      Data Provider "%{trusted.certname}"
        Path "environments/production/hieradata/dev2.devco.net.yaml"
          Original path: "%{trusted.certname}"
          No such key: "centos::common::selinux"
      Data Provider "classification data"
        Found key: "centos::common::selinux" value: "permissive"
      Merged result: "permissive"
  Merged result: "permissive"

I hope to expose here which rule provided this data like the other lookup explanations do.

Clearly this feature is a bit crazy, so consider this a exploration of what’s possible rather than a strong endorsement of this kind of thing πŸ™‚

Implementation


Implementing this has been pretty interesting, I got to use a lot of the new Puppet 4 features. Like I mentioned all the data processing, iteration and deriving of classes and data is done using the native Puppet DSL, take a look at the functions directory for example.

It also makes use of the new Type system and Type Aliases all over the place to create a strong schema for the incoming data that gets validated at all levels of the process. See the types directory.

The new Modules in Data is used to set lookup strategies so that there are no manual calling of lookup(), see the module data.

Writing a Data Provider ie. a Hiera Backend for the new lookup system is pretty nice, I think the APIs around there is still maturing so definitely bleeding edge stuff. You can see the bindings and data provider in the lib directory.

As such this module only really has a hope of working on Puppet 4.4.0 at least, and I expect to use new features as they come along.

Conclusion


There’s a bunch more going on, check the module README. It’s been quite interesting to be able to really completely rethink how Hiera data is created and what a modern take on classification can achieve.

With this approach if you’re really not too keen on the hierarchy you can totally just use this as a rules based Hiera instead, that’s pretty interesting! I wonder what other strategies for creating data could be prototyped like this?

I realise this is very similar to the PE node classifier but with some additional benefits in being exposed to Hiera via the Data Provider, being something you can commit to git and being adjustable and overridable using the new Hiera features I think it will appeal to a different kind of user. But yeah, it’s quite similar. Credit to Ben Ford for his original Ruby based implementation of this idea which I took and iterated on. Regardless the ‘like a iTunes smart list’ node classifier isn’t exactly a new idea and have been discussed for literally years πŸ™‚

You can get the module on the forge as ripienaar/classifier and I’d greatly welcome feedback and ideas.

Puppet 4 Type Aliases

03/18/2016

Back when I first took a look at Puppet 4 features I explored the new Data Types and said:

Additionally I cannot see myself using a Struct like above in the argument list – to which Henrik says they are looking to add a typedef thing to the language so you can give complex Struc’s a more convenient name and use that. This will help that a lot.

And since Puppet 4.4.0 this has now become a reality. So a quick post to look at that.

The Problem


I’ve been writing a Hiera based node classifier both to scratch and itch and to have something fairly complex to explore the new features in Puppet 4.

The classifier takes a set of classification rules and produce classifications – classes to include and parameters – from there. Here’s a sample classification:

classifier::rules:
  RedHat VMs:
    match: all
    rules:
      - fact: "%{facts.os.family}"
        operator: ==
        value: RedHat
      - fact: "%{facts.is_virtual}"
        operator: ==
        value: "true"
    data:
      redhat_vm: true
      centos::vm::someprop: someval
    classes:
      - centos::vm

This is a classification rule that has 2 rules to match against machines running RedHat like operating systems and that are virtual. In that case if both these are true it will:

  • Include the class centos::vm
  • Create some data redhat_vm => true and centos::vm::someprop => someval

You can have an arbitrary amount of classifications made up of a arbitrary amount of rules. This data lives in hiera so you can have all sorts of merging, overriding and knock out fun with it.

The amazing thing is since Puppet 4.4.0 there is now no Ruby code involved in doing what I said above, all the parsing, looping, evaluating or rules and building of data structures are all done using functions written in the pure Puppet DSL.

There’s some Ruby there in the form of a custom backend for the new lookup based hiera system – but this is experimental, optional and a bit crazy.

Anyway, so here’s the problem, before Puppet 4.4.0 my main class had this in:

class classifier (
  Hash[String,
    Struct[{
      match    => Enum["all", "any"],
      rules    => Array[
        Struct[{
          fact     => String,
          operator => Enum["==", "=~", ">", " =>", "<", "<="],
          value    => Data,
          invert   => Optional[Boolean]
        }]
      ],
      data     => Optional[Hash[Pattern[/\A[a-z0-9_][a-zA-Z0-9_]*\Z/], Data]],
      classes  => Array[Pattern[/\A([a-z][a-z0-9_]*)?(::[a-z][a-z0-9_]*)*\Z/]]
    }]
  ] $rules = {}
) {
....
}

This describes the full valid rule as a Puppet Type. It’s pretty horrible. Worse I have a number of functions and classes all that receives the full classification or parts of it and I’d have to duplicate all this all over.

The Solution


So as of yesterday I can now make this a lot better:

class classifier (
  Classifier::Classifications  $rules = {},
) {
....
}

to do this I made a few files in the module:

# classifier/types/matches.pp
type Classifier::Matches = Enum["all", "any"]
# classifier/types/classname.pp
type Classifier::Classname = Pattern[/\A([a-z][a-z0-9_]*)?(::[a-z][a-z0-9_]*)*\Z/]

and a few more, eventually ending up in:

# classifier/types/classification.pp
type Classifier::Classification = Struct[{
  match    => Classifier::Matches,
  rules    => Array[Classifier::Rule],
  data     => Classifier::Data,
  classes  => Array[Classifier::Classname]
}]

Which you can see solves the problem quite nicely. Now in classes and functions where I need lets say just a Rule all I do is use Classifier::Rule instead of all the crazy.

This makes the native Puppet Data Types perfectly usable for me, well worth adopting these.

The Puppet 4 Lookup Function

03/13/2016

Puppet 4 has a new lookup subsystem exposed to the user in a few places:

  • The lookup() function
  • Automatic parameter lookups
  • Configuring the automatic parameter lookups via Data in Modules

I’ve not been able to figure out everything the docs have been trying to say about this function but it turns out they were copied from the deep_merge gem and it actually has better examples in some cases. So I thought a post exploring it and it’s various forms is in order

It’s pivotal to the use of data in Puppet so while you probably don’t need to fully grasp all of it’s intricacies as in this post a passing knowledge is valuable as is knowing how to find good help for it. I do think there’s some opportunity for improving the UX of this function though.

As usual the challenge when faced with all these options isn’t in how to use them all but in which options to use when that won’t result in a giant unmaintainable mess down the line. I think this function is definitely on the wrong side of the line in this regard. It’s massive and unwieldy in that it is exposing internals of Puppet in a 1:1 manner to the user.

So I would not recommend writing code that calls this function directly unless in extraordinary circumstances. With the Data in Modules and Automatic Parameter Lookup features you can achieve this, see the last section of the post for that.

First though you need to know the behaviours and terminology of the lookup() function in order to get to a point where you can use the other methods, so lets dive in.

Lookup Patterns

Basic usage


The function comes in a few forms past the most obvious lookup(“thing”):

lookup("some::thing", String, "first", "default value")

Here we’re looking up the key some::thing and it has to be a String from the data store. It will do a first style lookup which is your basic traditional Hiera first-match-wins and there’s a default. Apparently there is no simple case lookup(“some::thing”, “default”) which seems like it would be the most common use. You can come kind of close though with (more on this below):

lookup({"name" => "some::thing", "default_value" => "default"})

Anyway, you’re not really going to be using the lookup function directly much so this is probably fine

The thing to note here are the lookup strategies, there are a few and you will always have to know them:

first First match found is returned, just like in traditional hiera() default behaviour
unique This is an array merge like old hiera_array().
hash This is hiera_hash() without deep merging enabled.
deep This is hiera_hash() with deep merging enabled. You would not guess this from the description in the docs.

So this is your basic replacement for the old hiera(), hiera_hash() and hiera_array() and as you can see from the last 2 the merge strategy isn’t set globally like in old Hiera, this is a big improvement.

I will not go into a full exploration of what Tiers mean, the old Hiera docs are pretty good for that. Effectively a merge strategy describe what Hiera does when it finds interesting data in many different levels of data or in different data sources.

Complex Strategies for Setting Defaults


From here it gets a bit crazy, but there are some really great things you can do with some of these so lets look at them.

First I’ll look at the task of setting defaults. Hiera had quite basic features in this space which was enough to get going but lookup has some nice additions.

First the above lookup can also be written like this:

lookup({"name" => "some::thing", "value_type" => String, "default_value" => "default", "merge" => "first"})
lookup({"name" => "some::thing", "default_value" => "default"}) # though accepts any data type

So this is quite nice because now you can decide the order of arguments and which to include.

There’s a more powerful way to set defaults though:

function some_module::params() {
  $result = {
    "some_module::thing" => "default",
    "some_module::other_thing" => false
  }
}
 
lookup({"name" => "some_module::thing", "default_values_hash" => some_module::params()})

Which at first does not seem a huge improvement, but if you’re thinking about strategies to replace something like params.pp you could come up with some interesting patterns using this method. For example you can have a module function like here and an environment one (it supports environment level native functions) and combine them like environment_params() + some_module::params() to come up with layered sets of defaults, in effect this would be a micro hiera on it’s own programmed in pure Puppet DSL.

And finally you can use a lambda to set the default:

lookup("some::thing") |$key| { "Could not find a value for key '${key}', please configure it in your hiera data" }

Here we return a custom string instead that tells the user what is going on rather than blow up badly and we can of course include any helpful information like fact values and such to help them find the right place in your possibly complex data store.

Sticking to the Lambda I saw Henrik mention this on IRC yesterday:

$result = with(lookup("some::thing")) |$value| { if $value =~ Array { $value } else { [$value] } }

This does a lookup and ensures that the result is always an array, like the Ruby code Array(thing). These 2 Lambda approaches can’t really be done without calling lookup() specifically, so probably a bit niche.

I won’t go into all the details just now about Data in Modules and Merge Strategies but to see how these things tie together you should know you can set these option hashes via your data layer, see the linked to blog post for some details about this. The last section of this post shows a end to end working setup with Data in Modules and Merge Strategies in data.

Merge Strategies


The merge strategies in Hiera is where things really gets interesting and this function has even more than before. Some that I honestly can’t imagine any use for but I tend to lean on the less is more side of things wrt Puppet code.

We’ve seen the basic merge strategies above:

lookup("some::thing", String, "first", "default value")
lookup({"name" => "some::thing", "value_type" => String, "default_value" => "default", "merge" => "first"})

Here the strategy is first. But when the strategy is deep this can also be a hash with more merging options.

The most interesting for me is the knockout_prefix one. A common question when using Hiera for node classification is how to exclude a class from a certain node. This was kind of doable at least in Puppet 4 by using Arrays like:

include(hiera_array("classes", []) - hiera_array("exclude_classes", []))

Which will lookup classes and exclude_classes and subtract them from each other. This is a hack, lets look at a better option:

Given data like this:

# common.yaml
classification:
  classes:
    - sensu
    - sysadmin
# node1.example.net.yaml
classification:
  classes:
    - --sensu
    - nagios
    - webserver

What we’re trying to say is that the node1.example.net is not monitored by Sensu but by Nagios instead, the following lookup achieves this and includes the resulting classes:

$classification = lookup({"name" => "classifiation", 
        "merge" => {
          "strategy" => "deep", 
          "knockout_prefix" => "--",
          "sort_merge_arrays" => true
        }
})
 
$classification["classes"].include

Additionally I sorted the merged arrays. The tells it to remove data that matches the prefix. You can remove just some array member like here or entire keys from a resulting hash.

There’s another option where if some array member was a hash and you wanted to merge these hashes in the result sets you can set merge_hash_arrays. At that point you should probably rather rethink your data though tbh.

And the last one which I cannot figure out any use for and was quite baffled at is about turning Strings into Arrays. Henrik says they did not add this one for a reason other than it’s available on the deep_merge gem.

Lets change the data for our node to look like this:

# node1.example.net.yaml
classification:
  classes:
    - --sensu,nagios
    - webserver

While leaving the common data as is. If you set “unpack_arrays” => “,” in the merge options it will take every string found, split it by “,” which would turn this into a array of [“–sensu”, “nagios”] and then merge it up and then perform any knockouts so you get the same outcome ie. [“nagios”, “sysadmin”, “webserver”].

You should probably rethink your data instead if you find this useful πŸ™‚ That said though this –sensu,nagios does look like a search and replace, so perhaps in the context of a classifier utility it’s not all bad.

CLI tool


Like in the old hiera there’s a CLI tool for this function, unlike the old hiera one it does not suck.

To recreate the above lookup on the cli you’d do (though only once PUP-6050 is fixed):

% puppet lookup --hiera_config hiera.yaml --merge deep --knock-out-prefix "--" --unpack-arrays "," --sort-merge-arrays classification
---
classes:
- sysadmin
- nagios
- webserver

This is fine, but it’s a lot nicer than that. If you add the option –explain you get this:

Merge strategy deep
  Options: {
    "sort_merge_arrays" => true,
    "merge_hash_arrays" => false,
    "knockout_prefix" => "--",
    "unpack_arrays" => ","
  }
  Data Binding "hiera"
    Found key: "classification" value: {
      "classes" => [
        "sysadmin",
        "nagios",
        "webserver"
      ]
    }
  Data Provider "EnvironmentDataProvider"
    No such key: "classification"
  Merged result: {
    "classes" => [
      "sysadmin",
      "nagios",
      "webserver"
    ]
  }

A bit lacking in the case of old school hiera data since old Hiera does not emit the right kinds of detail for it to show where it gets your data from. It’s handy though since you can see the merge options hash and what data providers are queries. See below for the full potential.

Bringing it all together

When I started this fairly epic post I said I do not recommend people use lookup() directly, so lets take a look at pulling this all together.

I’ll make a simple classifier class like above in a module. Note the classes variable would above be done with the huge lookup() but not here. We do not want to use the lookup() function instead use Automatic Parameter Lookup:

class classifier($classes) {
  $classes.include
}

I’ll set it up for data in modules and add to it the lookup options:

# production/modules/classifier/data/common.yaml
lookup_options:
  classifier::classes:
    merge:
      strategy: deep
      knockout_prefix: "--"
      unpack_arrays: ","
      sort_merge_arrays: true

Note this is basically a lookup() call but attached to a specific key – classifier::classes. This way as we add more classification data we can have different strategies and such, doing it here means it works across all types of Hiera data old and new.

Now the data, I am using the environment data provider here – so no classic hiera at all:

First we configure our production environment to have it’s own instance of Hiera and it’s own hiera.yaml – take note, this is huge. Per environment hiera and hierarchies now works!

# production/environment.conf
environment_data_provider = hiera
# production/hiera.yaml
---
version: 4
datadir: "hieradata"
hierarchy:
  - name: "%{trusted.certname}"
    backend: "yaml"
  - name: "common"
    backend: "yaml"

Here’s our production environment data:

# production/hieradata/common.yaml
classifier::classes:
  - sensu
  - sysadmins
# production/hieradata/dev1.devco.net.yaml
classifier::classes:
  - nagios
  - --sensu
  - webserver

At this point it all works a charm, our node knocks out Sensu and brings in Nagios. This is a major wishlist item that old hiera_include() did not have!

Note this is just Array data that’s being knocked out and not Hash data here, while the deep strategy is supposed to work with Hashes only, so I am a bit surprised it works but I’ll take it as it makes this classifier better.

% puppet lookup --environmentpath environments classifier::classes
---
- sysadmins
- nagios
- webserver

And if we added –explain you can finally get the massive benefit of finally learning how Hiera finds your data:

% puppet lookup --environmentpath environments --explain classifier::classes
Merge strategy deep
  Options: {
    "knockout_prefix" => "--",
    "sort_merge_arrays" => true,
    "unpack_arrays" => ","
  }
  Data Binding "hiera"
    No such key: "classifier::classes"
  Data Provider "Hiera Data Provider, version 4"
    ConfigurationPath "/home/rip/temp/lookup/environments/production/hiera.yaml"
    Merge strategy deep
      Options: {
        "knockout_prefix" => "--",
        "sort_merge_arrays" => true,
        "unpack_arrays" => ","
      }
      Data Provider "%{trusted.certname}"
        Path "/home/rip/temp/lookup/environments/production/hieradata/dev1.devco.net.yaml"
          Original path: "%{trusted.certname}"
          Found key: "classifier::classes" value: [
            "nagios",
            "--sensu",
            "webserver"
          ]
      Data Provider "common"
        Path "/home/rip/temp/lookup/environments/production/hieradata/common.yaml"
          Original path: "common"
          Found key: "classifier::classes" value: [
            "sensu",
            "sysadmins"
          ]
      Merged result: [
        "sysadmins",
        "nagios",
        "webserver"
      ]
  Module "classifier" using Data Provider "Hiera Data Provider, version 4"
    ConfigurationPath "/home/rip/temp/lookup/environments/production/modules/classifier/hiera.yaml"
    Merge strategy deep
      Options: {
        "knockout_prefix" => "--",
        "sort_merge_arrays" => true,
        "unpack_arrays" => ","
      }
      Data Provider "%{trusted.certname}"
        Path "/home/rip/temp/lookup/environments/production/modules/classifier/data/dev1.devco.net.yaml"
          Original path: "%{trusted.certname}"
          Path not found
      Data Provider "common"
        Path "/home/rip/temp/lookup/environments/production/modules/classifier/data/common.yaml"
          Original path: "common"
          No such key: "classifier::classes"
  Merged result: [
    "sysadmins",
    "nagios",
    "webserver"
  ]

Every data file and every config file is shown and the full merge logic in all it’s glory is included. Huge win over previous hiera.

The result is a bit dense but if you follow along you can see it all works quite nicely and it’s super helpful for debugging cases where hiera just don’t work.

It’s a bit awkward – here I am doing it on the node the data is for, but for other nodes you would need their facts. As I understand it, it basically compiles the catalog and profiles the lookups during that process, so it needs facts as usual.

Conclusion

So that’s a rather epic exploration of the lookup() function which eventually ended us up with – do not use the lookup() function πŸ™‚

You can see how this is a big step forward and in the end by using environment and module data – and no site data – I am not using old Hiera at all anymore as far as I know. This is purely the new lookup subsystem and it’s really powerful.

  • Environments and Modules can have data and independent hierarchies
  • The lookup subsystem is fully exposed in lookup() but the bulk of the features are accessible via lookup_options and so the Automatic Parameter Lookups
  • It has a really good CLI command which once a few bugs are sorted can bring amazing visibility to where your data comes from and what data is assigned to a node. Even without those bugs fixed though if you use lookup_options as in the last option it’s totally usable today
Newer Posts
Older Posts