I do not like the params.pp pattern. Puppet 4 has brought native Data in Modules that’s pretty awesome and to a large extend it removes the traditional need for params.pp.
Thing is, we kind of do still need some parts of params.pp. To understand this we have to consider what the areas of concern params.pp has in Puppet world:
- Holds data, often in large if or case statements that ultimately resemble hiera data
- Derives new data using logic based on situation specific data like facts
- Validates data is valid. This was kind of all over the place and not in params.pp since it’s not parameterised generally. But it’s closely related.
Points 1 and 3 are roughly sorted out by Puppet 4 types and data in modules, but what about the 2nd point and to some extend more complex data validation that falls outside of the type system?
Before I start looking at how to derive data though I’ll take a look at the new function API in Puppet 4.
Native Functions
Puppet has always allowed us to write functions but they needed to be in Ruby and nothing else. This isn’t really great. The message is kind of:
Puppet has a DSL for managing systems, we think it’s awesome and can do everything you need. But in order to use it you have to learn 2 programming languages with different models.
And I always felt the same about the general suggestion to write ENCs etc, luckily not something we hear much these days.
And they had a few major issues:
- They do not work right in environments, just like custom providers and types do not. This is a showstopper bug as environments have become indispensable in modern Puppet use.
- They are not namespaced. This is a showstopper for putting them on the forge.
The Puppet 4 functions API fix this, you can write functions in the native DSL and they work fine in environments. The Puppet 4 DSL with it’s loops and blocks and so forth have matured enough that it can do a lot of the things I’d need to do for deriving data from other data.
They live in your module in the functions directory, they’re namespaced and environment safe:
function mymod::myfunc(Fixnum $input) { $input * 2 } |
And you’d use this like any other function: $x = mymod::myfunc(10). Simple stuff, whatever is the last value is returned like in Ruby.
Update: these have now been documented in the Puppet Labs docs
Derived Data in Puppet 4
So we’re finally where I can show my preferred method for deriving data in Puppet 4, and that’s to use a native function.
As an example we’ll stick with Apache but this time a wrapper for the main class. From the previous blog post you’ll remember (or if not please read that post) that we wrapped the puppetlabs-apache module to create our own vhost define. Here I’ll show a wrapper for the main apache class.
class site::apache( Boolean $passenger = false, Hash $apache = {} Hash $module_options = {} ) { if $passenger { $_passenger_defaults = { "passenger_max_pool_size" => site::apache::passenger_pool_size(), # ... } class{"apache::mod::passenger": * => $_passenger_defaults + site::fetch($module_options, "passenger", {}) } } $defaults = { "default_vhost" => false, # .... } class{"apache": * => $defaults + $apache } } |
Here I have a wrapper that does the basic Apache configuration with some overridable defaults via $apache and I have a way to configure Passenger again with overridable defaults via $module_options[“passenger”].
The Passenger part uses 2 functions: site::apache::passenger_poolsize and site::fetch. These are name spaced to the site module and are functions that you can see below:
First the site:apache::passenger_poolsize that follows typical community guidelines for the pool size based on core count, it’s also aware if the machine is virtual or physical. This is a good example of derived data that would be impossible to do using just Hiera – and so simply does not have a place there.
function site::apache::passenger_poolsize { if $is_virtual { $multiplier = 1.5 } else { $multiplier = 2 } floor($facts["processors"]["count"] * $multiplier) } |
And this is site::fetch that’s like Ruby’s Hash#fetch. stdlib will soon have dig() that does something similar.
function site::fetch( Hash $data, String $key, Data $default ) { if $data[$key] { $data[$key] } else { $default } } |
Why functions and not inlining the logic?
This seems like a bit more work than just sticking the site::apache::passenger_poolsize logic into the class that’s calling it so why bother? The first is obviously that it’s reusable so if you have anywhere else you might need this logic you could use it. But the second is about isolation.
I am not a big fan of writing Puppet rspec tests since I tend to shy away from Puppet logic in modules. But if I have to put logic in modules I’d like to isolate the logic so I can easily test it in isolation. I have no idea if rspec-puppet supports these functions yet, but if it did having this logic in as small a package as possible for testing is absolutely the right thing to do.
Further today the function is quite limited, but I can see I might want to expand it later to consider total memory as well as core count. When that day comes, I only have to edit this function and nothing else. The potential fallout from logic errors and so forth is neatly contained and importantly I can be fairly sure that this function is used for 1 thing only and changing it’s internals is something I can safely do – the things calling it really should not care for it’s internals.
Early on here I touched on complex validation of data as a possibly thing these functions could solve. The example here does not really do this, but imagine that for my site I never want to set the passenger_poolsize above some threshold that might relate to the memory on the machine. Given that this poolsize is user overridable I’d write a function like site::apache::validate_poolsize that takes care of this and fails when needed.
These validations could become very complex and situation specific (ie. based on facts) so this is more than we can expect from a Type system. Writing validations as native functions is easy and fits in neatly with the DSL.
Conclusion
These functions are great, to me they are everything defined types should have been and more. I think they move Puppet as a whole a huge leap forward in that you can achieve more complex things using just the Puppet DSL and they combine very nicely with the recent epp native Puppet based templates.
They fix massive show stopper bugs of environment compatibility and makes sharing modules like this on a forge a lot safer.
Using them in this manner here Puppet 4 can close the loop on all the functionality that params.pp had:
- Pure data that is hierarchical in nature can live in modules.
- Input validation can be done using the data type system.
- Derived data can be done in isolation and in a reusable manner using native functions
When combined in this manner params.pp can be removed completely without any loss of functionality. Every one of these above points improve significantly on the old pattern.
I could not find docs for the new functions on the Puppet Labs site, hopefully we’ll see some soon.
I have a short wishlist for these functions:
- I want to be able to specify their return type from functions, I think this is critical.
- I want a return() function like in other languages. I know you can generally do without but sometimes that can lead to some pretty awkward code.
- More docs
Bonus: The end of defined types?
These functions can create resources just like any other manifest can. This is a big difference from old Ruby functions who had to do all kinds of nasty things, possibly via create_resources. But since they can create resources they might be a viable replacement for defined types.
There are a few issues with this idea: The immediate missing part is that you cannot export a function. Additionally as they are outside of the resource system you couldn’t do overrides and do any relations on them. You can’t say install a package before a vhost made by a function.
The first I don’t really personally care for since I do not and will never use exported resources. The 2nd is perhaps a more important issue, from a ordering perspective the MOAR ordering in Puppet 4 helps but for doing notifies and such it might not be that hot.
It’s a interesting thought experiment though, I think with a bit of work defined types can be deprecated, people want to think of defined types as functions but they aren’t and this is a hurdle in learning Puppet for newcomers, with some work I think functions can eventually replace defined types. That’s a good goal to work toward.
Specifying return type is high on the list – coming soon. Being able to do return(), as well as next() and break() in iteration is being discussed.
There are several aspects to user defined resource types (defines):
* They are light weight ‘resource types’ (easier than to implement a real resource type in Ruby. Also a lot less powerful)
* They are used as if they are functions
* They have identity and containment
Functions will never take over the “identity and containment” as they will continue to remain pure functions. Identifying every invocation of a function and then using that as a named container would be anything but pure.
We are working on cleaning up the other concepts as well, but that has a longer fuse.
Hi,
Personally I’m using the “function” data provider in my modules (via ./functions/data.pp) *and* a params.pp class.
1. in ./functions/data.pp I define smart default values (when it’s possible) of the params.pp parameters.
2. params.pp do absolutely nothing (empty body).
3. The other classes of the module have no parameter (and I don’t use inheritance at all), I just *include* module::params and use the variables $param1 = $::module::params::param1, $param2 = $::module::params::param2 etc.
Thus, I keep the advantage of the data binding mechanism of Puppet 4 but if needed I can use parameters of the module in the functions/data.pp of another module modB (and of course I set “module” as dependency of modB) via a simple (and harmless) “include module::params” in the functions/data.pp of modB . Thus I can define default values of modB equal to (some) _values_ (not default values) of the module.
Less I put data in hiera, better it is for me and be able to have (some) default values of a module A equal to values of a module B is sometime useful. For now, I’m satisfied with this design.
Regards.
flaf
Flaf, not sure what you mean it’s better not to put data in Hiera, it seems to me this scheme will work equally well with the data in Hiera or a function?
Not sure to understand how do what I have described with hiera. If you can tell me more. I explain more in my case.
In the ./functions/data.pp of modB, I have something like this:
——————————————————–
function modB::data {
# Imagine that the smartest default value of the parameter param2 in modB is
# the _value_ of ::modA::params::param2.
include ‘::modA::params’
$param2 = $::modA::params::param2;
{
modB::params::param1 => “a smart default value”,
modB::params::param2 => $params2,
modB::params::password => undef, # no smart default value for this parameter.
}
}
——————————————————–
In the ./manifests/params.pp of modB, I have just this (empty body, an include of this class is totally harmless everywhere):
——————————————————–
class modB::params (
$param1,
$param2,
$password,
) {
}
——————————————————–
And for instance, in the manisfests/init.pp of modB, I have this (no parameter for this class):
——————————————————–
class modB {
include ‘::mobB::params’
$param1 = $::modB::params::param1
$param2 = $::modB::params::param2
$password = $::modB::params::password
# Some code…
}
——————————————————–
Imagine that modA has the same design. Now, the default value of param2 in modB will be the value of param2 in modA (but if I want, I can set the value manually in hiera with the key modB::params::param2).
Of course, in modB/metadata.json I put modA as dependency and it should be documented in the README of modB.
Each module has a “params” class which is harmless (an include is without any risk) and which contains all its data. Thus, _if_ it’s useful, a module can pick out data (not default data) of another module.
Of course, I don’t abuse of that, such kind of dependencies is not so frequent but it happens sometimes between to modules. The gain for me is to avoid data duplication in hiera. As I said less I put data in hiera (or in other word “more I put the _strict_ minimum of data in hiera), better it is for me. I always prefer manipulate code rather than manipulate data (I think code is easier, more flexible than data, imho of course).
So, how do the same with hiera? That’s not clear for me.
Of course, I have absolutely no certainty about this topic. This design seems to me work correctly but maybe I was wrong. I’m really open to discussions and criticism, with interest.
Sorry for the typo => between _two_ modules
“I don’t really personally care for since I do not and will never use exported resources.”
Probably stated elsewhere, but, how do you manage inter-host dependencies without exported resources?
There are many options to share info between nodes. The problem with exported resources are that they’re badly coupled. Lets say you want to configure a web app to talk to a DB. The DB module is the one with the information needed and exported resources for the DB module to export the web app config.
Web apps are all different and from different languages etc and soon your DB becomes this nightmare of information it shouldnt know and it makes them impossible to share on the forge as they’re full of unrelated rubbish.
Exported resources have a very very limited application where they’re a good fit, everything else is just a matter of using them because nothing else exist