When writing Puppet Modules there tend to be a ton of configuration data – generally things like different paths for different operating systems. Today the general pattern to manage this data is a class module::param with a bunch of logic in it.
Here’s a simplistic example below – for an example of the full horror of this pattern see the puppetlabs-ntp module.
# ntp/manifests/init.pp class ntp ( $config = $ntp::params::config, $keys_file = $ntp::params::keys_file ) inherits ntp::params { file{$config: .... } } |
# ntp/manifests/params.pp class ntp::params { case $::osfamily { 'AIX': { $config = "/etc/ntp.conf" $keys_file = '/etc/ntp.keys' } 'Debian': { $config = "/etc/ntp.conf" $keys_file = '/etc/ntp/keys' } 'RedHat': { $config = "/etc/ntp.conf" $keys_file = '/etc/ntp/keys' } default: { fail("The ${module_name} module is not supported on an ${::osfamily} based system.") } } } |
This is the exact reason Hiera exists – to remove this kind of spaghetti code and move it into data, instinctively now whenever anyone see code like this they think they should refactor this and move the data into Hiera.
But there’s a problem. This works for your own modules in your own repos, you’d just use the Puppet 3 automatic parameter bindings and override the values in the ntp class – not ideal, but many people do it. If however you wanted to write a module for the Forge though there’s a hitch because the module author has no idea what kind of hierarchy exist where the module is used. If the site even used Hiera and today the module author can’t ship data with his module. So the only sensible thing to do is to embed a bunch of data in your code – the exact thing Hiera is supposed to avoid.
I proposed a solution to this problem that would allow module authors to embed data in their modules as well as control the Hierarchy that would be used when accessing this data. Unfortunately a year on we’re still nowhere and the community – and the forge – is suffering as a result.
The proposed solution would be a always-on Hiera backend that as a last resort would look for data inside the module. Critically the module author controls the hierarchy when it gets to the point of accessing data in the module. Consider the ntp::params class above, it is a code version of a Hiera Hierarchy keyed on the $::osfamily fact. But if we just allowed the module to supply data inside the module then the module author has to just hope that everyone has this tier in their hierarchy – not realistic. My proposal then adds a module specific Hierarchy and data that gets consulted after the site Hierarchy.
So lets look at how to rework this module around this proposed solution:
# ntp/manifests/init.pp class ntp ($config, $keysfile) { validate_absolute_path($config) validate_absolute_path($keysfile) file{$config: .... } } |
Next you configure Hiera to consult a hierarchy on the $::osfamily fact, note the new data directory that goes inside the module:
# ntp/data/hiera.yaml --- :hierarchy: - "%{::osfamily}" |
And finally we create some data files, here’s just the one for RedHat:
# ntp/data/RedHat.yaml --- ntp::config: /etc/ntp.conf ntp::keys_file: /etc/ntp/keys |
Users of the module could add a new OS without contributing back to the module or forking the module by simply providing similar data to the site specific hierarchy leaving the downloaded module 100% untouched!
This is a very simple view of what this pattern allows, time will tell what the community makes of it. There are many advantages to this over the ntp::params pattern:
This helps the contributor to a public module:
- Adding a new OS is easy, just drop in a new YAML file. This can be done with confidence as it will not break existing code as it will only be read on machines of the new OS. No complex case statements or 100s of braces to get right
- On a busy module when adding a new OS they do not have to worry about complex merge problems, working hard at rebasing or any git escoteria – they’re just adding a file.
- Syntactically it’s very easy, it’s just a YAML file. No complex case statements etc.
- The contributor does not have to worry about breaking other Operating Systems he could not test on like AIX here. The change is contained to machines for the new OS
- In large environments this help with change control as it’s just data – no logic changes
This helps the maintainer of a module:
- Module maintenance is easier when it comes to adding new Operating Systems as it’s simple single files
- Easier contribution reviews
- Fewer merge commits, less git magic needed, cleaner commit history
- The code is a lot easier to read and maintain. Fewer tests and validations are needed.
This helps the user of a module:
- Well written modules now properly support supplying all data from Hiera
- He has a single place to look for the overridable data
- When using a module that does not support his OS he can deploy it into his site and just provide data instead of forking it
Today I am releasing my proposed code as a standalone module. It provides all the advantages above including the fact that it’s always on without any additional configuration needed.
It works exactly as above by adding a data directory with a hiera.yaml inside it. The only configuration being considered in this hiera.yaml is the hierarchy.
This module is new and does some horrible things to get itself activated automatically without any configuration, I’ve only tested it on Puppet 3.2.x but I think it will work in 3.x as is. I’d love to get feedback on this from users.
If you want to write a forge module that uses this feature simply add a dependency on the ripienaar/module_data module, soon as someone install this dependency along with your module the backend gets activated. Similarly if you just want to use this feature in your own modules, just puppet module install ripienaar/module_data.
Note though that if you do your module will only work on Puppet 3 or newer.
It’s unfortunate that my Pull Request is now over a year old and did not get merged and no real progress is being made. I hope if enough users adopt this solution we can force progress rather than sit by and watch nothing happen. Please send me your feedback and use this widely.
Thanks for this, RI. It has definitely been holding up my module development, and I will use this until we can get a reasonable solution in the core.
Sure beats my hack of a second yaml back end that looks in the modules.
– Chad
Sane and simple to understand for existing Hiera users and above all solves a common issue facing module developers. I hope this gets merged sometime soon.
Overall I think that I like this idea. I would have to test it more before I’m certain. I can think of one drawback, which perhaps you have a recommendation for:
Suppose you have a multiply nested hierarchy, of $::operatingsystem and $::operatingsystemrelease. You could have multiple directories like so:
but if on a Debian system the package name or config value doesn’t change across operatingsystemrelease, but on a Redhat system it does, you’ll end up needing different files for each RedHat system, but for the Debian system you’ll have to duplicate identical files.
This problem compounds itself when most operatingsystemrelease’s have the same config value, but when there is one exception.
The other way that this is solved uses the well known puppet:
Which is more expressive. It would be great if this sort of logic could be easily expressed in hiera somehow to keep the data out of the Puppet code. With the hiera solution, I’m not sure how you can avoid duplicating the same data everywhere. Maybe that’s not a huge issue.
Ideas?
Would it work to put just ::operatingsystem below your ::operatingsystemrelease tier? Then you can have Debian defaults with per release override possibilities.
so something like:
I think that’s what you are asking at least 🙂
Actually, yeah I think this would work! If it doesn’t find the value higher up, it will default to looking in the lower files… Right? I should test to make sure there aren’t any gotchas.
Great stuff! I’m not sure if most people appreciate the value of this.
Exactly, contrast that with the puppetlabs-ntp module thats FULL of duplication. You can avoid all that here (though you might choose not to for readability or clarity when looking at the data files)
Hello R.I.
I have just tested your module and it works like a charm with apply and agent.
I will use this for my modules from now on. It should not be an issue to have a second dependency besides stdlib.
Many thanks again for making this work so smoothly without any configuration!
Zipkid.
I’m a systems guy by trade that knows enough code to get into trouble when needed. I still prefer the first example because its very readable to me and easy to comprehend and seems to fit my subjective view of what a “Desired state” would be as “systems as code”. within the limitations of the current manifests and module paradigms. While i understand the idea of code separation, it just seems its done so for separation sake (hey look, pretty code!) and not simplicity. Having 20 modules with 20 include paths just seems like more work than simply seeing the data in the params scope already.. but this is a concept that needs a better solution, so i agree with what you’re trying.
For me, the elegant solution would be to have these yamls not in the module path, but in a global path that gets implemented in such a way like a provider… Much like how puppet has built in providers for basic interfaces, i think yaml is a great way to abstract out the OS details so OS providers can provide their own conventions there but i believe puppet is much more stronger when modules behave as providers so you can not only implement your desired config, but you can describe it as well. tl;dr – change modules from modules to providers and keep the pp code simple for the system configs/definitions. (one could actually just use pure yaml to implement a provider if the providers can provide the scope for such)
I don’t know if i’m explaining myself as well as i’m trying to envision my answer 🙂 i’m not a pure developer by anymeans and may be mixing up terminologies in my post workout haze
There is now native support for this in Puppet 4 https://www.devco.net/archives/2016/01/08/native-puppet-4-data-in-modules.php
sweet; thanks for updating this article w/ 2016 relevance! cheers.