params.pp in Puppet 4


I do not like the params.pp pattern. Puppet 4 has brought native Data in Modules that’s pretty awesome and to a large extend it removes the traditional need for params.pp.

Thing is, we kind of do still need some parts of params.pp. To understand this we have to consider what the areas of concern params.pp has in Puppet world:

  • Holds data, often in large if or case statements that ultimately resemble hiera data
  • Derives new data using logic based on situation specific data like facts
  • Validates data is valid. This was kind of all over the place and not in params.pp since it’s not parameterised generally. But it’s closely related.

Points 1 and 3 are roughly sorted out by Puppet 4 types and data in modules, but what about the 2nd point and to some extend more complex data validation that falls outside of the type system?

Before I start looking at how to derive data though I’ll take a look at the new function API in Puppet 4.

Native Functions

Puppet has always allowed us to write functions but they needed to be in Ruby and nothing else. This isn’t really great. The message is kind of:

Puppet has a DSL for managing systems, we think it’s awesome and can do everything you need. But in order to use it you have to learn 2 programming languages with different models.

And I always felt the same about the general suggestion to write ENCs etc, luckily not something we hear much these days.

And they had a few major issues:

  • They do not work right in environments, just like custom providers and types do not. This is a showstopper bug as environments have become indispensable in modern Puppet use.
  • They are not namespaced. This is a showstopper for putting them on the forge.

The Puppet 4 functions API fix this, you can write functions in the native DSL and they work fine in environments. The Puppet 4 DSL with it’s loops and blocks and so forth have matured enough that it can do a lot of the things I’d need to do for deriving data from other data.

They live in your module in the functions directory, they’re namespaced and environment safe:

function mymod::myfunc(Fixnum $input) {
  $input * 2

And you’d use this like any other function: $x = mymod::myfunc(10). Simple stuff, whatever is the last value is returned like in Ruby.

Update: these have now been documented in the Puppet Labs docs

Derived Data in Puppet 4

So we’re finally where I can show my preferred method for deriving data in Puppet 4, and that’s to use a native function.

As an example we’ll stick with Apache but this time a wrapper for the main class. From the previous blog post you’ll remember (or if not please read that post) that we wrapped the puppetlabs-apache module to create our own vhost define. Here I’ll show a wrapper for the main apache class.

class site::apache(
  Boolean $passenger = false,
  Hash $apache = {}
  Hash $module_options = {}
) {
  if $passenger {
    $_passenger_defaults = {
      "passenger_max_pool_size" => site::apache::passenger_pool_size(),
      # ...
      * => $_passenger_defaults + site::fetch($module_options, "passenger", {})
  $defaults = {
    "default_vhost" => false,
    # ....
    * => $defaults + $apache

Here I have a wrapper that does the basic Apache configuration with some overridable defaults via $apache and I have a way to configure Passenger again with overridable defaults via $module_options[“passenger”].

The Passenger part uses 2 functions: site::apache::passenger_poolsize and site::fetch. These are name spaced to the site module and are functions that you can see below:

First the site:apache::passenger_poolsize that follows typical community guidelines for the pool size based on core count, it’s also aware if the machine is virtual or physical. This is a good example of derived data that would be impossible to do using just Hiera – and so simply does not have a place there.

function site::apache::passenger_poolsize {
  if $is_virtual {
    $multiplier = 1.5
  } else {
    $multiplier = 2
  floor($facts["processors"]["count"] * $multiplier)

And this is site::fetch that’s like Ruby’s Hash#fetch. stdlib will soon have dig() that does something similar.

function site::fetch(
  Hash $data,
  String $key,
  Data $default
) {
  if $data[$key] {
  } else {

Why functions and not inlining the logic?

This seems like a bit more work than just sticking the site::apache::passenger_poolsize logic into the class that’s calling it so why bother? The first is obviously that it’s reusable so if you have anywhere else you might need this logic you could use it. But the second is about isolation.

I am not a big fan of writing Puppet rspec tests since I tend to shy away from Puppet logic in modules. But if I have to put logic in modules I’d like to isolate the logic so I can easily test it in isolation. I have no idea if rspec-puppet supports these functions yet, but if it did having this logic in as small a package as possible for testing is absolutely the right thing to do.

Further today the function is quite limited, but I can see I might want to expand it later to consider total memory as well as core count. When that day comes, I only have to edit this function and nothing else. The potential fallout from logic errors and so forth is neatly contained and importantly I can be fairly sure that this function is used for 1 thing only and changing it’s internals is something I can safely do – the things calling it really should not care for it’s internals.

Early on here I touched on complex validation of data as a possibly thing these functions could solve. The example here does not really do this, but imagine that for my site I never want to set the passenger_poolsize above some threshold that might relate to the memory on the machine. Given that this poolsize is user overridable I’d write a function like site::apache::validate_poolsize that takes care of this and fails when needed.

These validations could become very complex and situation specific (ie. based on facts) so this is more than we can expect from a Type system. Writing validations as native functions is easy and fits in neatly with the DSL.


These functions are great, to me they are everything defined types should have been and more. I think they move Puppet as a whole a huge leap forward in that you can achieve more complex things using just the Puppet DSL and they combine very nicely with the recent epp native Puppet based templates.

They fix massive show stopper bugs of environment compatibility and makes sharing modules like this on a forge a lot safer.

Using them in this manner here Puppet 4 can close the loop on all the functionality that params.pp had:

  • Pure data that is hierarchical in nature can live in modules.
  • Input validation can be done using the data type system.
  • Derived data can be done in isolation and in a reusable manner using native functions

When combined in this manner params.pp can be removed completely without any loss of functionality. Every one of these above points improve significantly on the old pattern.

I could not find docs for the new functions on the Puppet Labs site, hopefully we’ll see some soon.

I have a short wishlist for these functions:

  • I want to be able to specify their return type from functions, I think this is critical.
  • I want a return() function like in other languages. I know you can generally do without but sometimes that can lead to some pretty awkward code.
  • More docs

Bonus: The end of defined types?

These functions can create resources just like any other manifest can. This is a big difference from old Ruby functions who had to do all kinds of nasty things, possibly via create_resources. But since they can create resources they might be a viable replacement for defined types.

There are a few issues with this idea: The immediate missing part is that you cannot export a function. Additionally as they are outside of the resource system you couldn’t do overrides and do any relations on them. You can’t say install a package before a vhost made by a function.

The first I don’t really personally care for since I do not and will never use exported resources. The 2nd is perhaps a more important issue, from a ordering perspective the MOAR ordering in Puppet 4 helps but for doing notifies and such it might not be that hot.

It’s a interesting thought experiment though, I think with a bit of work defined types can be deprecated, people want to think of defined types as functions but they aren’t and this is a hurdle in learning Puppet for newcomers, with some work I think functions can eventually replace defined types. That’s a good goal to work toward.

The Resource Wrapper Pattern in Puppet 4


One tends to need to wrap resources quite often in Puppet and prior to Puppet 4 this was extremely annoying and resulted in a high maintenance burden, but in Puppet 4 this has significantly improved so I thought I’ll write a quick post about that.

Why wrap resources?

The example I’ll show here is going to wrap the apache::vhost resource from the PuppetLabs Apache module. This resource has 139 possible attributes, it’s a beast.

While it has some helpers to create doc roots and logdirs I want to add some higher level support:

  • Copy standardly named SSL certs out – you should use something like hiera-eyaml for keys of course
  • Set sane defaults for a few of the properties like ServerAdmin but keep them overridable by callers
  • Create some directories and default locations for docroots and logroots
  • Perhaps down the line create standard monitoring or backup policies

In Puppet 3 your only option to retain full features was to do something like:

def my::vhost (
  $copy_ssl_source = undef,
  $manage_docroot  = true,
  $virtual_docroot = false,
  # all the rest of the 139 properties
) {
  if $copy_ssl_source {
       source => "${copy_ssl_source}/${name}.crt"
    # and the same for the chain, key etc
     docroot => $docroot,
     manage_docroot => $manage_docroot,
     virtual_docroot => $virtual_docroot,
     # and reproduce the rest of the 139 proeprties again

As you can see this is a nightmare, it might be workable on a small module but the amount of work required to manage it on this large module is insane. You’ll forever have to play catch up with upstream and hope they never get the same property as one you’ve added in your wrapper.

In this module you’ll have to list 139 properties twice, once in the parameters and once when creating the apache::vhost resource.

It’s also fairly limited, you could no doubt come up with a way to set resource defaults that’s overridable from the caller by using the pick() function, but it will almost end up with repeating the property list a 3rd time.

I never found this useable at all and it was a big reason why I never used the forge much as you end up needing this pattern a lot.

Below is an approach to do something better using Puppet 4 and it’s native built in capabilities. You can get close to what’s below by using the stdlib merge() function and create_resources() by using it’s 3rd argument to set defaults, I wouldn’t recommend anyone use create_resources though, it’s a stop gap till something better comes along. Which is what is below.

Puppet 4

Since Puppet 4 there are now a few new features that make this completely comfortable and easy and allow your wrapper to focus on the features they add and nothing more.

The outcome of using the wrapper will be this:

  copy_ssl_source => "puppet:///modules/${module_name}/ssl/example_net",
  vhost => {
    "access_log_file" => "secure_access.log",
    "serveralias" => ["www.example.net", "other.com"],
    # and any other of the 139 properties

So basically my decorated properties are the top level and there’s a hash that directly matches the wrapped resource.

The wrapper will look like this:

define my::vhost (
  Optional[String] $copy_ssl_source = undef,
  Hash             $vhost           = {}
) {
  if $copy_ssl_source {
    ["crt", "key", "chain"].each |$item| {
         source => "${copy_ssl_source}.${item}",
         # .....
    $ssl_options = {
      "access_log_file" => "ssl_access.log"
  } else {
    $ssl_options = {}
  $defaults = {
    "docroot" => "/srv/www/${name}_docroot",
    "serveradmin" => "webmaster@example.net",
    "logroot" => "/var/log/httpd/${name}",
    "logroot_ensure" => "directory"
    "access_log_file" => "access.log",
    "error_log_file" => "error.log"
    * => $defaults + $ssl_options + $vhost

Here you can see the new wrapper, there are a few things to note here:

  • It’s pretty much focussed on just what the wrapper is supposed to achieve with almost no details of the wrapped resources. Primarily this relates to copying out SSL certs but it also sets some sane defaults like ServerAdmin as per my site policies.
  • Any of the defaults the wrapper sets can be overridden from the caller, here we set a custom access_log_file and serveralias in the caller, they would override the ones from the defaults
  • Hashes are immutable but here is an example of setting a set of custom options depending on other internal state. The $ssl_options variable will override the $defaults while still remaining overridable by the site user
  • There is almost no ongoing maintenance required just because the Apache module gets an update – unless it changes behaviour on one of the properties we’re defaulting. Everything else is not a concern of the wrapper.
  • I do plain hash merges here to create the parameters but you could use something like the deep_merge() function to really handle complex data, but for me less is more.


I find this really nice, I know there’s been some community interest in basically adding inheritance to defined types but honestly I do not see the use. This has a number of advantages over inheritance – for example I’d have no chance ever of shadowing a inherited property for example which would be quite a surprise.

The handling of defaults and merging in other set of defaults as here with the $ssl_options is super handy and without adding a lot of extra stuff would become quite awkward with an inheritance scheme. Especially given the immutable nature of Puppet variables.

Puppet 4 data lookup strategies


I recently wrote about the new Data in Modules support in Puppet 4, there’s another new feature that goes hand in hand with this to finally rid us of functions like hiera_hash() and such.

Up to now we’ve had to do something ugly like this to handle merged class parameters:

class users($local = hiera_hash("users::local", {}) {

This is functional but quite ugly and ties your module to having hiera. While these days it’s a reasonably safe assumption but with the ability to specify different environment data sources this will not always be the case. For example there’s a new kid on the block called Jerakia that lives in this world so having Hiera specific calls in modules is going to be a limiting strategy.

A much safer abstraction is to be able to rely on the automatic parameter lookup feature – but it had no way to know about the fact that this item should be a hash merge and so the functions were used as above.

Worse things like merge strategies were set globally, a module could not say a certain key should be deep merged and others just shallow merged etc, and if a module required a specific way it had no control over this.

A solution for this problem landed in recent Puppet 4 via a special merged hash called lookup_options. This is documented quite lightly in the official docs so I thought I’ll put up a example here.

lookup() function

To understand how this work you first have to understand the lookup() function, it’s documented here. But this is basically the replacement for the various hiera() functions and have a matching puppet lookup CLI tool.

If you wanted to do a hiera_hash() lookup that is doing the old deeper hash merge you’d do something like:

$local = lookup("users::local", Hash, {"strategy" => "deep", "merge_hash_arrays" => true})

This would merge just this key rather than say setting the merge strategy to deeper globally in hiera and it’s something the module author can control. The Hash above describes the data type the result should match and support all the various complex composite type definitions so you can really in detail describe the desired result data – almost like a Schema.

There are much more to the lookup function and it’s CLI, they’re both pretty awesome and you can now see where data comes from etc, I guess there’s a follow up blog post about that coming.

lookup_options hiera key

We saw above how to instruct the lookup() function to do a hiera_hash() but wouldn’t it be great if we could somehow tell Puppet that a specific key should always be merged in this way? That way a simple lookup(“users::local”) would do the merge and crucially so would the automatic parameter lookups – even across backends and data providers.

We just want:

class users(Hash $local = {}) {

For this to make sense the users module must be able to indicate this in the data layer. And since we now have data in modules there’s a obvious place to put this.

If you set up the users module here to use the hiera data service for data in modules as per my previous blog post you can now specify the merge strategy in your data:

# users/data/common.yaml
      strategy: deep
      merge_hash_arrays: true

Note how this match exactly the following lookup():

$local = lookup("users::local", Hash, {"strategy" => "deep", "merge_hash_arrays" => true})

The data type validation is done on the class parameters where it will also validate specifically specified data and the strategies for processing the data is in the module data level.

The way this works is that puppet will do a lookup_options lookup from the data source that is merged together – so you could set this at site level as well – but there is a check to ensure a module can only set keys for itself so it can not change behaviours of other modules.

At this point a simple lookup(“users::local”) will do the merge and therefore so will this code:

class users(Hash $local = {}) {

No more hiera_hash() here. The old hiera() function is not aware of this – it’s a lookup() feature but with this in place we’ll hopefully never see hiera*() functions being used in Puppet 4 modules.

This is a huge win and really shows what can be done with the Data in Modules features and something that’s been impossible before. This really brings the automatic parameter lookup feature a huge way forward and combines for me to be one of the most compelling features of Puppet 4.

I am not sure who proposed this behaviour, the history is a bit muddled but if someone can tweet me links to mailing list threads or something I’ll link them here for those who want to discover the background and reasoning that went into it. UPDATE: Henrik informs me that Rob Nelson was the driving force on this – it’s something they wanted to do for a while but really without Rob sticking with it and working with the devs it would not have been done.


The lookup function and the options are a great move forward however I find the UX of the various lookup options and merge strategies etc quite bad. It’s really hard for me to go from reading the documentation to knowing what a certain option will do with my data – in fact I still have no idea what some of these do the only way to discover it seems to be just spending time playing with it which I haven’t had, it would be great for new users to get some more clarity there.

Some doc updates that provide a translation from old Hiera terms to new strategies would be great and maybe some examples of what these actually do.

Native Puppet 4 Data in Modules


Back in August 2012 I requested an enhancement to the general data landscape of Puppet and a natural progression on the design of Hiera to enable it to be used in modules that are shared outside of your own environments. I called this Data in Modules. There was lots of community interest in this but not much movement, eventually I made a working POC that I released in December 2013.

The basic idea around the feature is that we want to be able to use Hiera to model internal data found in modules as well as site specific data and that these 2 sets of data coexist and compliment each other. Full details of this can be found in my post titled Better Puppet Modules Using Hiera Data and some more background can be found in The problem with params.pp. These posts are a bit old now and some things have moved on but they’re good background reading.

It’s taken a while but as part of the Puppet 4 rework effort the data ingesting mechanisms have also been rewritten in finally in Puppet 4.3.0 native data in modules have arrived. The original Jira for this is 4474. It’s really pretty close to what I had in mind in my proposals and my POC and I am really happy with this. Along the way a new function called lookup() have been introduced to replace the old collection of hiera(), hiera_array() and hiera_hash().

The official docs for this feature can be found at the Puppet Labs Docs site. Here I’ll more or less just take my previous NTP example and show how you could use the new Data in Modules to simplify it as per the above mentioned posts.

This is the very basic Puppet class we’ll be working with here:

class ntp (
  String $config,
  String $keys_file
) {

In the past these variables would have needed to interact with the params.pp file like $config = $ntp::params::config, but now it’s just a simple class. At this point it’ll not yet use any data in the module, to do that you have to activate it in the metadata.json:

# ntp/metadata.json
  "data_provider": "hiera"

At this point Puppet knows you want to use the hiera data in the module. But key to the feature and really the whole reason it exists is because a module needs to be able to specify it’s own hierarchy. Imagine you want to set $keys_file here, you’ll have to be sure the hierarchy in question includes the OS Family and you must have control over that data. In the past with the hierarchy being controlled completely by the site hiera.yaml this was not possible at all and the outcome was that if you wanted to share a module outside of your environment you have to go the params.pp route as that was the only portable solution.

So now your modules can have their own hiera.yaml. It’s slightly different from the past but should be familiar to past hiera users, it goes in your module so this would be ntp/hiera.yaml:

version: 4
datadir: data
  - name: "OS family"
    backend: yaml
    path: "os/%{facts.os.family}"
  - name: "common"
    backend: yaml

This is the new format for the hiera configuration, it’s more flexible and a future version of hiera will have some changing semantics that’s quite nice over the original design I came up with so you have to use that new format here.

Here you can see the module has it’s own OS Family tier as well as a common tier. Lets see the ntp/data/common.yaml:

ntp::config: "/etc/ntp.conf"
ntp::keys_file: "/etc/ntp.keys"

These are sane defaults to use for any non specifically supported operating systems.

Below are examples for AIX and Debian:

# data/os/AIX.yaml
ntp::config: "/etc/ntpd.conf"
# data/os/Debian.yaml
ntp::keys_file: "/etc/ntp/keys"

At this point the need for params.pp is gone – at least in this simplistic example – and this data along with the environment specific or site specific data cohabit really nicely. If you specified any of these data items in your site Hiera data your site data will override the module. The advantages of this might not be immediately obvious. I have a very long list of advantages over params.pp in my Better Puppet Modules Using Hiera Data post, be sure to read that for background.

There’s an alternative approach where you write a Puppet function that returns a hash of data and the data system will fetch the keys from there. This is really powerful and might end up being a interesting solution to something along the lines of a module specific custom hiera backend – but a lighter weight version of that. I might write that up later, this post is already a bit long.

The remaining problem is to do with data that needs to be merged as traditionally Hiera and Puppet has no idea you want this to happen when you do a basic lookup – hence these annoying hiera_hash() functions etc – , there’s a solution for this and I’ll post a blog post about that next week once the next Puppet 4 release is out and a bug I found that makes it unusable is fixed in that version.

This feature is a great addition to Puppet and I am really glad to finally see this land. My hacky modules in data code was used quite extensively with 72 000 downloads from the forge but I was never really happy with it and was desperate to see this land natively. This is a big step forward and I hope it sees wide adoption in the community.

A note about the old ripienaar-module_data module

As seen above the new built in feature is great and a very close match to what I had envisioned when creating the proof of concept module.

It would not be a good idea to support both these methods on Puppet 4 and turns out it is also quite difficult because we both use the hiera.yaml file in the module but with small differences in format. So the transition period will no doubt be a bit painful especially for those attempting to use this while supporting both Puppet 3 and 4 users.

Further the old module actually broke the Puppet 4 feature for a while in a way that was really difficult to debug. Puppet Labs kindly reached out and notified me of this and helped me fix it in MODULES-3102. So there is now a new release of the old module that works again on Puppet 4 BUT it warns very loudly that this is a bad idea.

The old module is now deprecated and unsupported. You should stop using it and imho stop using Puppet 3, but whatever you do stop using it on Puppet 4. I wish the metadata.json supported a supported Puppet version requirement so I can force this but alas it doesn’t so I can’t.

I will after a few months make a release that will raise an error on Puppet 4 and refuse to work there. You should move forward and adopt the excellent native implementation of this feature.

Iterating in Puppet


Iteration in Puppet has been a long standing pain point, Puppet 4 address this by adding blocks, loops etc. Here I capture the various approaches to working with some complex data in Puppet before and after Puppet 4

To demonstrate this I’ll take some data from a previous blog post and see how to deal with it, here’s the data that will be in $domains in the examples blow:

    "x.net": {
      "nexthop": "70.x.x.x",
      "spamdestination": "rip@devco.net",
      "spamthreshold": 1500,
      "enable_antispam": 1
    "x.co.uk": {
      "nexthop": "70.x.x.x",
      "spamdestination": "rip@devco.net",
      "spamthreshold": 1500,
      "enable_antispam": 1

First we’re going to need some defined type that can create an individual domain, we’ll call that mail::domain but I won’t show the code here, as that’s not really important.

Puppet 3 + stdlib

The first approach I’ll show your basic Puppet 3 approach. The basic idea here is to get a list of domains and use the array iteration Puppet has always had on name.

The trick here is to get the domain names using the keys() function and then pass all the data into every instance of the define – the instance fetch it’s data from the data passed into the define.

$domain_names = keys($domains)
  domains => $domains
define mail::domains($domains) {
  $domain = $domains[$name]
    nexthop => $domain["nexthop"]

Puppet 3 + create_resources

A hacky riff on eval() was added to Puppet during 3 to make it a bit easier to deal with data from Hiera or similar, it takes some data in a standard format and create instances of a defined type:

create_resources("mail::domain", $domains, {"spamthreshold" => 1500, "enable_antispam" => 1})

This replaces all the code above plus adds some default handling in the case that the data is not uniform. Some people love it, some hate it, I think it’s a bit too magical so prefer to avoid it.

Puppet 4 – each loop

This is the approach you’d probably want to use in Puppet 4 it uses a simple each loop over the data:

$domains.each |$name, $domain| { 
    nexthop => $domain["nexthop"]

It’s quite readable and obvious what’s happening here, it’s more typing than the create_resources example but I think this is the preferred way due to clarity etc

Below this we get into the more academic solutions to the problem, mainly showing off some Puppet 4 features.

Puppet 4 – wildcard shortcut

If listing every key is tedious like above and if you know your hashes map 1:1 to the defined type parameters you can short circuit things a bit, this is quite close to the create_resources convenience:

each($domains) |$name, $domain| { 
    * => $domain

The splat operator takes all the data in the hash and maps it right onto properties of the define type, quite handy

Puppet 4 – wildcard and defaults

Your data might not all be complete so you’d want to get some defaults merged in, this is something create resources also supports so this is how you’d do it without create_resources:

$defaults = {
  "spamthreshold" => 1500,
  "enable_antispam" => 1
$domains.each |$name, $domain| { 
    * => $defaults + $domain  # + now merge hashes 

Puppet 4 – wildcard and resource defaults

An alternative to the above that’s a bit more verbose but might be more readable can be seen below:

$defaults = {
  "spamthreshold" => 1500,
  "enable_antispam" => 1
$domains.each |$name, $domain| { 
      * => $defaults;
      * => $domain

Puppet 4 – Native DSL create_resources()

Puppet 4 supports functions written in the native DSL, this means you can use the above and generalize it a bit and end up with a reimplementation of create_resources. Not sure I’d recommend this but it does show some techniques that’s related:

function my::create_resources (
  String $type,
  Hash $instances,
  Hash $defaults = {}
) {
  $instances.each |$r_name, $r_properties| {
    Resource[$type] {$r_name:
      * => $defaults + $r_properties

The magic here is the Resource[$type] that lets you reference a type programatically. It also works for classes.

So this is close as I can tell an equivalent to create_resources.


That’s about it, there are many more iteration tricks in Puppet 4 but this shows you how to achieve what you did with create_resources in the past and a couple of possible approaches to solving that problem.

Not sure which I’d recommend, but I suspect the choice comes down to personal style and situation.

Newer Posts
Older Posts