params.pp in Puppet 4

I do not like the params.pp pattern. Puppet 4 has brought native Data in Modules that’s pretty awesome and to a large extend it removes the traditional need for params.pp.

Thing is, we kind of do still need some parts of params.pp. To understand this we have to consider what the areas of concern params.pp has in Puppet world:

Holds data, often in large if or case statements that ultimately resemble hiera data
Derives new data using logic based on situation specific data like facts
Validates data is valid. This was kind of all over the place and not in params.pp since it’s not parameterised generally. But it’s closely related.

Points 1 and 3 are roughly sorted out by Puppet 4 types and data in modules, but what about the 2nd point and to some extend more complex data validation that falls outside of the type system?

Before I start looking at how to derive data though I’ll take a look at the new function API in Puppet 4.

Native Functions

Puppet has always allowed us to write functions but they needed to be in Ruby and nothing else. This isn’t really great. The message is kind of:

Puppet has a DSL for managing systems, we think it’s awesome and can do everything you need. But in order to use it you have to learn 2 programming languages with different models.

And I always felt the same about the general suggestion to write ENCs etc, luckily not something we hear much these days.

And they had a few major issues:

They do not work right in environments, just like custom providers and types do not. This is a showstopper bug as environments have become indispensable in modern Puppet use.
They are not namespaced. This is a showstopper for putting them on the forge.

The Puppet 4 functions API fix this, you can write functions in the native DSL and they work fine in environments. The Puppet 4 DSL with it’s loops and blocks and so forth have matured enough that it can do a lot of the things I’d need to do for deriving data from other data.

They live in your module in the functions directory, they’re namespaced and environment safe:

function mymod::myfunc(Fixnum $input) {
  $input * 2
}

And you’d use this like any other function: $x = mymod::myfunc(10). Simple stuff, whatever is the last value is returned like in Ruby.

Update: these have now been documented in the Puppet Labs docs

Derived Data in Puppet 4

So we’re finally where I can show my preferred method for deriving data in Puppet 4, and that’s to use a native function.

As an example we’ll stick with Apache but this time a wrapper for the main class. From the previous blog post you’ll remember (or if not please read that post) that we wrapped the puppetlabs-apache module to create our own vhost define. Here I’ll show a wrapper for the main apache class.

class site::apache(
  Boolean $passenger = false,
  Hash $apache = {}
  Hash $module_options = {}
) {
  if $passenger {
    $_passenger_defaults = {
      "passenger_max_pool_size" => site::apache::passenger_pool_size(),
      # ...
    }
 
    class{"apache::mod::passenger":
      * => $_passenger_defaults + site::fetch($module_options, "passenger", {})
    }
  }
 
  $defaults = {
    "default_vhost" => false,
    # ....
  }
 
  class{"apache":
    * => $defaults + $apache
  }
}

Here I have a wrapper that does the basic Apache configuration with some overridable defaults via $apache and I have a way to configure Passenger again with overridable defaults via $module_options[“passenger”].

The Passenger part uses 2 functions: site::apache::passenger_poolsize and site::fetch. These are name spaced to the site module and are functions that you can see below:

First the site:apache::passenger_poolsize that follows typical community guidelines for the pool size based on core count, it’s also aware if the machine is virtual or physical. This is a good example of derived data that would be impossible to do using just Hiera – and so simply does not have a place there.

function site::apache::passenger_poolsize {
  if $is_virtual {
    $multiplier = 1.5
  } else {
    $multiplier = 2
  }
 
  floor($facts["processors"]["count"] * $multiplier)
}

And this is site::fetch that’s like Ruby’s Hash#fetch. stdlib will soon have dig() that does something similar.

function site::fetch(
  Hash $data,
  String $key,
  Data $default
) {
  if $data[$key] {
    $data[$key]
  } else {
    $default
  }
}

Why functions and not inlining the logic?

This seems like a bit more work than just sticking the site::apache::passenger_poolsize logic into the class that’s calling it so why bother? The first is obviously that it’s reusable so if you have anywhere else you might need this logic you could use it. But the second is about isolation.

I am not a big fan of writing Puppet rspec tests since I tend to shy away from Puppet logic in modules. But if I have to put logic in modules I’d like to isolate the logic so I can easily test it in isolation. I have no idea if rspec-puppet supports these functions yet, but if it did having this logic in as small a package as possible for testing is absolutely the right thing to do.

Further today the function is quite limited, but I can see I might want to expand it later to consider total memory as well as core count. When that day comes, I only have to edit this function and nothing else. The potential fallout from logic errors and so forth is neatly contained and importantly I can be fairly sure that this function is used for 1 thing only and changing it’s internals is something I can safely do – the things calling it really should not care for it’s internals.

Early on here I touched on complex validation of data as a possibly thing these functions could solve. The example here does not really do this, but imagine that for my site I never want to set the passenger_poolsize above some threshold that might relate to the memory on the machine. Given that this poolsize is user overridable I’d write a function like site::apache::validate_poolsize that takes care of this and fails when needed.

These validations could become very complex and situation specific (ie. based on facts) so this is more than we can expect from a Type system. Writing validations as native functions is easy and fits in neatly with the DSL.

Conclusion

These functions are great, to me they are everything defined types should have been and more. I think they move Puppet as a whole a huge leap forward in that you can achieve more complex things using just the Puppet DSL and they combine very nicely with the recent epp native Puppet based templates.

They fix massive show stopper bugs of environment compatibility and makes sharing modules like this on a forge a lot safer.

Using them in this manner here Puppet 4 can close the loop on all the functionality that params.pp had:

Pure data that is hierarchical in nature can live in modules.
Input validation can be done using the data type system.
Derived data can be done in isolation and in a reusable manner using native functions

When combined in this manner params.pp can be removed completely without any loss of functionality. Every one of these above points improve significantly on the old pattern.

I could not find docs for the new functions on the Puppet Labs site, hopefully we’ll see some soon.

I have a short wishlist for these functions:

I want to be able to specify their return type from functions, I think this is critical.
I want a return() function like in other languages. I know you can generally do without but sometimes that can lead to some pretty awkward code.
More docs

Bonus: The end of defined types?

These functions can create resources just like any other manifest can. This is a big difference from old Ruby functions who had to do all kinds of nasty things, possibly via create_resources. But since they can create resources they might be a viable replacement for defined types.

There are a few issues with this idea: The immediate missing part is that you cannot export a function. Additionally as they are outside of the resource system you couldn’t do overrides and do any relations on them. You can’t say install a package before a vhost made by a function.

The first I don’t really personally care for since I do not and will never use exported resources. The 2nd is perhaps a more important issue, from a ordering perspective the MOAR ordering in Puppet 4 helps but for doing notifies and such it might not be that hot.

It’s a interesting thought experiment though, I think with a bit of work defined types can be deprecated, people want to think of defined types as functions but they aren’t and this is a hurdle in learning Puppet for newcomers, with some work I think functions can eventually replace defined types. That’s a good goal to work toward.