Native Puppet 4 Data in Modules

Back in August 2012 I requested an enhancement to the general data landscape of Puppet and a natural progression on the design of Hiera to enable it to be used in modules that are shared outside of your own environments. I called this Data in Modules. There was lots of community interest in this but not much movement, eventually I made a working POC that I released in December 2013.

The basic idea around the feature is that we want to be able to use Hiera to model internal data found in modules as well as site specific data and that these 2 sets of data coexist and compliment each other. Full details of this can be found in my post titled Better Puppet Modules Using Hiera Data and some more background can be found in The problem with params.pp. These posts are a bit old now and some things have moved on but they’re good background reading.

It’s taken a while but as part of the Puppet 4 rework effort the data ingesting mechanisms have also been rewritten in finally in Puppet 4.3.0 native data in modules have arrived. The original Jira for this is 4474. It’s really pretty close to what I had in mind in my proposals and my POC and I am really happy with this. Along the way a new function called lookup() have been introduced to replace the old collection of hiera(), hiera_array() and hiera_hash().

The official docs for this feature can be found at the Puppet Labs Docs site. Here I’ll more or less just take my previous NTP example and show how you could use the new Data in Modules to simplify it as per the above mentioned posts.

This is the very basic Puppet class we’ll be working with here:

class ntp (
  String $config,
  String $keys_file
) {
 ...
}

In the past these variables would have needed to interact with the params.pp file like $config = $ntp::params::config, but now it’s just a simple class. At this point it’ll not yet use any data in the module, to do that you have to activate it in the metadata.json:

# ntp/metadata.json
{
  ...
  "data_provider": "hiera"
}

At this point Puppet knows you want to use the hiera data in the module. But key to the feature and really the whole reason it exists is because a module needs to be able to specify it’s own hierarchy. Imagine you want to set $keys_file here, you’ll have to be sure the hierarchy in question includes the OS Family and you must have control over that data. In the past with the hierarchy being controlled completely by the site hiera.yaml this was not possible at all and the outcome was that if you wanted to share a module outside of your environment you have to go the params.pp route as that was the only portable solution.

So now your modules can have their own hiera.yaml. It’s slightly different from the past but should be familiar to past hiera users, it goes in your module so this would be ntp/hiera.yaml:

---
version: 4
datadir: data
hierarchy:
  - name: "OS family"
    backend: yaml
    path: "os/%{facts.os.family}"

  - name: "common"
    backend: yaml

This is the new format for the hiera configuration, it’s more flexible and a future version of hiera will have some changing semantics that’s quite nice over the original design I came up with so you have to use that new format here.

Here you can see the module has it’s own OS Family tier as well as a common tier. Lets see the ntp/data/common.yaml:

---
ntp::config: "/etc/ntp.conf"
ntp::keys_file: "/etc/ntp.keys"

These are sane defaults to use for any non specifically supported operating systems.

Below are examples for AIX and Debian:

# data/os/AIX.yaml
---
ntp::config: "/etc/ntpd.conf"

# data/os/Debian.yaml
---
ntp::keys_file: "/etc/ntp/keys"

At this point the need for params.pp is gone – at least in this simplistic example – and this data along with the environment specific or site specific data cohabit really nicely. If you specified any of these data items in your site Hiera data your site data will override the module. The advantages of this might not be immediately obvious. I have a very long list of advantages over params.pp in my Better Puppet Modules Using Hiera Data post, be sure to read that for background.

There’s an alternative approach where you write a Puppet function that returns a hash of data and the data system will fetch the keys from there. This is really powerful and might end up being a interesting solution to something along the lines of a module specific custom hiera backend – but a lighter weight version of that. I might write that up later, this post is already a bit long.

The remaining problem is to do with data that needs to be merged as traditionally Hiera and Puppet has no idea you want this to happen when you do a basic lookup – hence these annoying hiera_hash() functions etc – , there’s a solution for this and I’ll post a blog post about that next week once the next Puppet 4 release is out and a bug I found that makes it unusable is fixed in that version.

This feature is a great addition to Puppet and I am really glad to finally see this land. My hacky modules in data code was used quite extensively with 72 000 downloads from the forge but I was never really happy with it and was desperate to see this land natively. This is a big step forward and I hope it sees wide adoption in the community.

A note about the old ripienaar-module_data module

As seen above the new built in feature is great and a very close match to what I had envisioned when creating the proof of concept module.

It would not be a good idea to support both these methods on Puppet 4 and turns out it is also quite difficult because we both use the hiera.yaml file in the module but with small differences in format. So the transition period will no doubt be a bit painful especially for those attempting to use this while supporting both Puppet 3 and 4 users.

Further the old module actually broke the Puppet 4 feature for a while in a way that was really difficult to debug. Puppet Labs kindly reached out and notified me of this and helped me fix it in MODULES-3102. So there is now a new release of the old module that works again on Puppet 4 BUT it warns very loudly that this is a bad idea.

The old module is now deprecated and unsupported. You should stop using it and imho stop using Puppet 3, but whatever you do stop using it on Puppet 4. I wish the metadata.json supported a supported Puppet version requirement so I can force this but alas it doesn’t so I can’t.

I will after a few months make a release that will raise an error on Puppet 4 and refuse to work there. You should move forward and adopt the excellent native implementation of this feature.

Native Puppet 4 Data in Modules

A note about the old ripienaar-module_data module

Licence