Select Page

Native Puppet 4 Data in Modules

Back in August 2012 I requested an enhancement to the general data landscape of Puppet and a natural progression on the design of Hiera to enable it to be used in modules that are shared outside of your own environments. I called this Data in Modules. There was lots of community interest in this but not much movement, eventually I made a working POC that I released in December 2013.

The basic idea around the feature is that we want to be able to use Hiera to model internal data found in modules as well as site specific data and that these 2 sets of data coexist and compliment each other. Full details of this can be found in my post titled Better Puppet Modules Using Hiera Data and some more background can be found in The problem with params.pp. These posts are a bit old now and some things have moved on but they’re good background reading.

It’s taken a while but as part of the Puppet 4 rework effort the data ingesting mechanisms have also been rewritten in finally in Puppet 4.3.0 native data in modules have arrived. The original Jira for this is 4474. It’s really pretty close to what I had in mind in my proposals and my POC and I am really happy with this. Along the way a new function called lookup() have been introduced to replace the old collection of hiera(), hiera_array() and hiera_hash().

The official docs for this feature can be found at the Puppet Labs Docs site. Here I’ll more or less just take my previous NTP example and show how you could use the new Data in Modules to simplify it as per the above mentioned posts.

This is the very basic Puppet class we’ll be working with here:

class ntp (
  String $config,
  String $keys_file
) {
 ...
}

In the past these variables would have needed to interact with the params.pp file like $config = $ntp::params::config, but now it’s just a simple class. At this point it’ll not yet use any data in the module, to do that you have to activate it in the metadata.json:

# ntp/metadata.json
{
  ...
  "data_provider": "hiera"
}

At this point Puppet knows you want to use the hiera data in the module. But key to the feature and really the whole reason it exists is because a module needs to be able to specify it’s own hierarchy. Imagine you want to set $keys_file here, you’ll have to be sure the hierarchy in question includes the OS Family and you must have control over that data. In the past with the hierarchy being controlled completely by the site hiera.yaml this was not possible at all and the outcome was that if you wanted to share a module outside of your environment you have to go the params.pp route as that was the only portable solution.

So now your modules can have their own hiera.yaml. It’s slightly different from the past but should be familiar to past hiera users, it goes in your module so this would be ntp/hiera.yaml:

---
version: 4
datadir: data
hierarchy:
  - name: "OS family"
    backend: yaml
    path: "os/%{facts.os.family}"

  - name: "common"
    backend: yaml

This is the new format for the hiera configuration, it’s more flexible and a future version of hiera will have some changing semantics that’s quite nice over the original design I came up with so you have to use that new format here.

Here you can see the module has it’s own OS Family tier as well as a common tier. Lets see the ntp/data/common.yaml:

---
ntp::config: "/etc/ntp.conf"
ntp::keys_file: "/etc/ntp.keys"

These are sane defaults to use for any non specifically supported operating systems.

Below are examples for AIX and Debian:

# data/os/AIX.yaml
---
ntp::config: "/etc/ntpd.conf"
# data/os/Debian.yaml
---
ntp::keys_file: "/etc/ntp/keys"

At this point the need for params.pp is gone – at least in this simplistic example – and this data along with the environment specific or site specific data cohabit really nicely. If you specified any of these data items in your site Hiera data your site data will override the module. The advantages of this might not be immediately obvious. I have a very long list of advantages over params.pp in my Better Puppet Modules Using Hiera Data post, be sure to read that for background.

There’s an alternative approach where you write a Puppet function that returns a hash of data and the data system will fetch the keys from there. This is really powerful and might end up being a interesting solution to something along the lines of a module specific custom hiera backend – but a lighter weight version of that. I might write that up later, this post is already a bit long.

The remaining problem is to do with data that needs to be merged as traditionally Hiera and Puppet has no idea you want this to happen when you do a basic lookup – hence these annoying hiera_hash() functions etc – , there’s a solution for this and I’ll post a blog post about that next week once the next Puppet 4 release is out and a bug I found that makes it unusable is fixed in that version.

This feature is a great addition to Puppet and I am really glad to finally see this land. My hacky modules in data code was used quite extensively with 72 000 downloads from the forge but I was never really happy with it and was desperate to see this land natively. This is a big step forward and I hope it sees wide adoption in the community.

A note about the old ripienaar-module_data module


As seen above the new built in feature is great and a very close match to what I had envisioned when creating the proof of concept module.

It would not be a good idea to support both these methods on Puppet 4 and turns out it is also quite difficult because we both use the hiera.yaml file in the module but with small differences in format. So the transition period will no doubt be a bit painful especially for those attempting to use this while supporting both Puppet 3 and 4 users.

Further the old module actually broke the Puppet 4 feature for a while in a way that was really difficult to debug. Puppet Labs kindly reached out and notified me of this and helped me fix it in MODULES-3102. So there is now a new release of the old module that works again on Puppet 4 BUT it warns very loudly that this is a bad idea.

The old module is now deprecated and unsupported. You should stop using it and imho stop using Puppet 3, but whatever you do stop using it on Puppet 4. I wish the metadata.json supported a supported Puppet version requirement so I can force this but alas it doesn’t so I can’t.

I will after a few months make a release that will raise an error on Puppet 4 and refuse to work there. You should move forward and adopt the excellent native implementation of this feature.

Iterating in Puppet

Iteration in Puppet has been a long standing pain point, Puppet 4 address this by adding blocks, loops etc. Here I capture the various approaches to working with some complex data in Puppet before and after Puppet 4

To demonstrate this I’ll take some data from a previous blog post and see how to deal with it, here’s the data that will be in $domains in the examples blow:

{
    "x.net": {
      "nexthop": "70.x.x.x",
      "spamdestination": "rip@devco.net",
      "spamthreshold": 1500,
      "enable_antispam": 1
    },
    "x.co.uk": {
      "nexthop": "70.x.x.x",
      "spamdestination": "rip@devco.net",
      "spamthreshold": 1500,
      "enable_antispam": 1
    },
}

First we’re going to need some defined type that can create an individual domain, we’ll call that mail::domain but I won’t show the code here, as that’s not really important.

Puppet 3 + stdlib

The first approach I’ll show your basic Puppet 3 approach. The basic idea here is to get a list of domains and use the array iteration Puppet has always had on name.

The trick here is to get the domain names using the keys() function and then pass all the data into every instance of the define – the instance fetch it’s data from the data passed into the define.

$domain_names = keys($domains)
 
mail::domains{$domain_names:
  domains => $domains
}
 
define mail::domains($domains) {
  $domain = $domains[$name]
 
  mail::domain{$name:
    nexthop => $domain["nexthop"]
    .
    .
  }
}

Puppet 3 + create_resources

A hacky riff on eval() was added to Puppet during 3 to make it a bit easier to deal with data from Hiera or similar, it takes some data in a standard format and create instances of a defined type:

create_resources("mail::domain", $domains, {"spamthreshold" => 1500, "enable_antispam" => 1})

This replaces all the code above plus adds some default handling in the case that the data is not uniform. Some people love it, some hate it, I think it’s a bit too magical so prefer to avoid it.

Puppet 4 – each loop

This is the approach you’d probably want to use in Puppet 4 it uses a simple each loop over the data:

$domains.each |$name, $domain| { 
  mail::domain{$name:
    nexthop => $domain["nexthop"]
    .
    .
  }
}

It’s quite readable and obvious what’s happening here, it’s more typing than the create_resources example but I think this is the preferred way due to clarity etc

Below this we get into the more academic solutions to the problem, mainly showing off some Puppet 4 features.

Puppet 4 – wildcard shortcut

If listing every key is tedious like above and if you know your hashes map 1:1 to the defined type parameters you can short circuit things a bit, this is quite close to the create_resources convenience:

each($domains) |$name, $domain| { 
  mail::domain{$name:
    * => $domain
  }
}

The splat operator takes all the data in the hash and maps it right onto properties of the define type, quite handy

Puppet 4 – wildcard and defaults

Your data might not all be complete so you’d want to get some defaults merged in, this is something create resources also supports so this is how you’d do it without create_resources:

$defaults = {
  "spamthreshold" => 1500,
  "enable_antispam" => 1
}
 
$domains.each |$name, $domain| { 
  mail::domain{$name:
    * => $defaults + $domain  # + now merge hashes 
  }
}

Puppet 4 – wildcard and resource defaults

An alternative to the above that’s a bit more verbose but might be more readable can be seen below:

$defaults = {
  "spamthreshold" => 1500,
  "enable_antispam" => 1
}
 
$domains.each |$name, $domain| { 
  mail::domain{
    default:
      * => $defaults;
 
    $name:
      * => $domain
  }
}

Puppet 4 – Native DSL create_resources()


Puppet 4 supports functions written in the native DSL, this means you can use the above and generalize it a bit and end up with a reimplementation of create_resources. Not sure I’d recommend this but it does show some techniques that’s related:
function my::create_resources (
  String $type,
  Hash $instances,
  Hash $defaults = {}
) {
  $instances.each |$r_name, $r_properties| {
    Resource[$type] {$r_name:
      * => $defaults + $r_properties
    }
  }
}

The magic here is the Resource[$type] that lets you reference a type programatically. It also works for classes.

So this is close as I can tell an equivalent to create_resources.

Conclusion

That’s about it, there are many more iteration tricks in Puppet 4 but this shows you how to achieve what you did with create_resources in the past and a couple of possible approaches to solving that problem.

Not sure which I’d recommend, but I suspect the choice comes down to personal style and situation.

Using ruby mocha outside of unit testing frameworks

I find myself currently writing a lot of orchastration code that manages hardware. This is very difficult because I like doing little test.rb scripts or testing things out in irb or pry to see if APIs are comfortable to use.

The problem with hardware is in order to properly populate my objects I need to query things like the iDRACs or gather inventories from all my switches to figure out where a piece of hardware is and this take a lot of time and requires constant access to my entire lab.

Of course my code has unit tests and so all the objects that represents servers and switches etc are already designed to be somewhat comfortable to load stub data for and to be easy to mock. So I ended up using rspec as my test.rb environment of choice.

I figured there has to be a way to use mocha in a non rspec environment, and turns out there is and it’s quite easy.

The magic here is line 1 and line 5, including Mocha::API will extend Object and Class with all the stubbing and mocking methods. I’d avoid using expectations and instead use stubs in this scenario.

At this point I’d be dropped into a pry shell loaded up with the service fixture in my working directory where I can quickly interact with my faked up hardware environment. I have many such hardware captures for each combination of hardware I support – and the same data is used in my unit tests too.

Below I iterate all the servers and find their mac addresses of the primary interfaces in each partition and then find all the switches they are connected to. Behind the scenes in real life this would walk all my switches looking for the port each mac is connected to and so forth, quite a time consuming operation and would require me to dedicate this lab hardware to me. Now I can just snapshot the hardware and load up my models later and it’s really quick.

I found this incredibly handy and will be using it pretty much all the time now, so thought it worth sharing ๐Ÿ™‚

Translating Webhooks with AWS API Gateway and Lambda

Webhooks are great, so many services now support them but I found actually doing anything with them a pain as there are no standards for what goes in them and any 3rd party service you wish to integrate with has to support the particular hooks you are producing.

For instance I want to use SignalFX for my metrics and events but they have very few integrations. A translator could take an incoming hook and turn it into a SignalFX event and pass it onward.

For a long time I’ve wanted to build a translator but never got around to doing it because I did not feel like self hosting it and write a whole bunch of supporting infrastructure. With the release of AWS API Gateway this has become quite easy and really convenient as there are no infrastructure or instances to manage.

I’ll show a bit of a walk through on how I built a translator that sends events to Signal FX. Note I do not do any kind of queueing or retrying on the gateway at present so it’s lossy and best efforts.

AWS Lambda runs stateless functions on demand. At launch it only supported ingesting their own Events but the recently launched API Gateway lets you front it using a REST API of your own design and this made it a lot easier.

For the rest of this post I assume you’re over the basic hurdles of signing up for AWS and are already familiar with the basics, so some stuff will be skipped but it’s not really that complex to get going.

The Code


To get going you need some JS code to handle the translation, here’s a naive method to convert a GitHub push notification into a SignalFX event:

This will be the meat of the of the processing and it includes a bit of code to create a request using the https module which includes the SignalFX authentication header.

Note this creates dimensions to the event that is being sent, I guess you can think of them like some kind of key=val tags for the event. In the Signal FX UI I can select events like this:

And any other added dimension can be used too, events shows up as little diamonds on graphs, so if I am graphing a service using these dimensions I can pick out events that relate to the branches and repositories that influence the data.

This is called as below:

There’s some stuff not shown here for brevity, it’s all in GitHub. The entry point here is handleGitHubPushNotifications, this is the Lambda function that will be run. I can put many different ones in here and in the previous code and share this same zip file across many functions. All I have to do is tell Lambda to run handleGitHubPushNotifications or handleOpsGeniePushNotifications etc. so this is a library of functions. See the next section for how.

Setting up the Lambda functions

We have to create a Lambda function, for now I’ll use the console but you can use terraform for this it helps quite a lot.

As this repo is made up of a few files your only option is to zip it up. You’ll have to clone it and make your own config.js based on the sample prior to creating the zip file.

Once you have it just create a Lambda function which I’ll call gitHubToSFX and choose your zip file as source. While setting it up you have to supply a handler. This is how Lambda finds your function to call.

In my case I specify index.handleGitHubPushNotifications – uses the handleGitHubPushNotifications function found in index.js.

It ends up looking like this:

Once created you can test it right there if you have a sample GitHub commit message.

The REST End Point

Now we need to create somewhere for GitHub to send the POST request to. Gateway works with resources and methods. A resource is something like /github-hook and a method is POST.

I’ve created the resource and method, and told it to call the Lambda function here:

You have to deploy your API – just hit the big Deploy API button and follow the steps, you can create stages like development, staging, production and deploy API’s through such a life cycle. I just went straight to prod.

Once deployed it gives you a URL like https://12344xnb.execute-api.eu-west-1.amazonaws.com/prod and your GitHub hook would be configured to hit https://12344xnb.execute-api.eu-west-1.amazonaws.com/prod/github-hook .

Conclusion


That’s about it, once you’ve configured GitHub you’ll start seeing events flow through.

Both Lambda and API Gateway can write logs to Cloud Watch and from the JS side you can see do something like console.log(“hello”) and this will show up in the Cloud Watch logs to help with debugging.

I hope to start gathering a lot of translations like these and am still learning Node, so not really sure yet how to make packages or classes but so far this seems really easy to use.

Cost wise it’s really cheap. You’d pay $3.50 per million API calls received on the Gateway and $0.09/GB for the transfer costs, but given the nature of these events this will be negligible. Lambda is free for the first 1 million requests and you’ll pay some tiny amount for the time used. They are both eligible for the free tier too in case you’re new to AWS.

There are many advantages to this approach:

  • It’s very cheap as there are no instances to run, just the requests
  • Adding webhooks to many services is a clickfest hell. This gives me a API that I can change the underlying logic of without updating GitHub etc
  • Today I use SignalFX but it’s event feature is pretty limited, I can move all the events elsewhere on the backend without any API changes
  • I can use my own domain and SSL certs
  • As the REST API is pretty trivial I can later move it in-house if I need, again without changing any 3rd parties – assuming I set up my own domain

I have 2 outstanding issues to address:

  • How to secure it, API Gateway supports headers as tokens but this is not something webhooks tend to support
  • Monitoring it, I do not want to some webhook sender to get in a loop and send 100s of thousands of requests without it going unnoticed

Shiny new things in Puppet 4

Puppet 4 has been out a while but given the nature of the update – new packaging requiring new modules to manage it etc I’ve been reluctant to upgrade and did not have the time really. Ditto for Centos 7. But Docker will stop supporting Centos 6 Soon Now so this meant I had to look into both a bit closer.

Puppet 4 really is a whole new thing, it maintains backward compatibility but really in terms of actually using its features I think you’d be better off just starting fresh. I am moving the bulk of my services out of CM anyway so my code base will be tiny so not a big deal for me to just throw it all out and start fresh.

I came across a few really interesting new things amongst it’s many features and wanted to highlight a few of these. This is by no means an exhaustive list, it’s just a whirlwind tour of a few things I picked up on.

The Forge

Not really a Puppet 4 thing per se but more a general eco system comment. I have 23 modules in my new freshly minted Puppet repo with 13 of them coming from the forge. To my mind that is a really impressive figure, it makes the new starter experience so much better.

Things I still do on my own: exim, iptables, motd, pki, roles/profiles of course and users.

In the case of exim I have almost no config, it’s just a package/config/service and all it does is setup a local config that talks to my smart relays. It does use my own CA though and that’s why I also have my own PKI module to configure the CA and distribute certs and keys and such. The big one is iptables really and I just haven’t had the time to really consider a replacement – whatever I choose it needs to play well with docker and that’s probably going to be a tall order.

Anyway, big kudos on the forge team and shout outs to forge users: puppetlabs, jfryman, saz and garethr.

Still some things to fix – puppet module tool is pretty grim wrt errors and feedback and I think there’s work left to do on discoverability of good modules and finding ways to promote people investing time in making better ones, but this is a big change from 2 years ago for sure.

Puppet 4 Type System


Puppet 4 has a data type system, it’s kind of optional which is weird as things go but you can almost think of it like a built in way to do validate_hash and friends, almost. The implications of having it though are huge – it means down the line there will be a lot fewer edge cases with things just behaving weirdly.

Data used to go from hiera to manifests and ending up strings when the data was Boolean now Puppet knows about actual Booleans and does not mess it up – things will become pretty consistant and solid and it will be easy to write well behaved code.

For now though it’s the opposite, there are many more edge cases as a result of it.

Particularly functions that previously took a number and did something with it might have assumed the number was a string with a number in it. Now it’s going to get an actual number and this causes breakage. There are a few of these in stdlib but they are getting fixed – expect this will catch out many templates and functions so there will be a settling in period but it’s well worth the effort.

Here’s an example:

define users::user(
  ...
  Enum["present", "absent"] $ensure       = "present",
  Optional[String] $ssh_authorized_file   = undef,
  Optional[String] $email                 = undef,
  Optional[Integer] $uid                  = undef,
  Optional[Integer] $gid                  = undef,
  Variant[Boolean, String] $sudoer        = false,
  Boolean $setup_shell                    = false,
  Boolean $setup_rbenv                    = false
) {
...
}

If I passed ensure => bob to this I get:

Error: Expected parameter 'ensure' of 'Users::User[rip]' to have type Enum['present', 'absent'], got String

Pretty handy though the errors can improve a lot – something I know is on the road map already.

You can get pretty complex with this like describe the entire contents of a hash and Puppet will just ensure any hash you receive matches this, doing this would have been really hard even with all the stuff in old stdlib:

Struct[{mode            => Enum[read, write, update],
        path            => Optional[String[1]],
        NotUndef[owner] => Optional[String[1]]}]

I suggest you spend a good amount of time with the docs About Values and Data Types, Data Types: Data Type Syntax and Abstract Data Types. There are many interesting types like ones that do Pattern matching etc.

Case statements and Selectors have also become type aware as have normal expressions to test equality etc:

$enable_real = $enable ? {
  Boolean => $enable,
  String  => str2bool($enable),
  Numeric => num2bool($enable),
  default => fail('Illegal value for $enable parameter'),
}
 
if 5 =~ Integer[1,10] {
  notice("it's a number between 1 and 10")
}

It’s not all wonderful though, I think the syntax choices are pretty poor. I scan parameter lists: a) to discover module features b) to remind myself of the names c) to find things to edit. With the type preceding the variable name every single use case I have for reading a module code has become worse and I fear I’ll have to resort to lots of indention to make the var names stand out from the type definitions. I cannot think of a single case where I will want to know the variable data type before knowing it’s name. So from a readability perspective this is not great at all.

Additionally I cannot see myself using a Struct like above in the argument list – to which Henrik says they are looking to add a typedef thing to the language so you can give complex Struc’s a more convenient name and use that. This will help that a lot. Something like this:

type MyData = Struct[{ .... }]
 
define foo(MyData $bar) {
}

That’ll be handy and Henrik says this is high on the priority list, it’s pretty essential from a usability perspective.

UPDATE: As of 4.4.0 this has been delivered, see Puppet 4 Type Aliases

Native data merges

You can merge arrays and hashes easily:

$ puppet apply -e '$a={"a" => "b"}; $b={"c" => "d"}; notice($a+$b)'
Notice: Scope(Class[main]): {a => b, c => d}
 
$ puppet apply -e 'notice([1,2,3] + [4,5,6])'
Notice: Scope(Class[main]): [1, 2, 3, 4, 5, 6]

And yes you can now use a ; instead of awkwardly making new lines all the time for quick one-liner tests like this.

Resource Defaults

There’s a new way to do resource defaults. I know this is a widely loathed syntax but I quite like it:

file {
  default:
    mode   => '0600',
    owner  => 'root',
    group  => 'root',
    ensure => file,
  '/etc/ssh_host_key':
  ;
  '/etc/ssh_host_dsa_key.pub':
    mode => '0644',
}

The specific mode on /etc/ssh_host_dsa_key.pub will override the defaults, pretty handy. And it address a previous issue with old style defaults that they would go all over the scope and make a mess of things. This is confined to just these files.

Accessing resource parameter values

This is something people often ask for, it’s seems exciting but I don’t think it will be of any practical use because it’s order dependant just like defined().

notify{"hello": message => "world"}
 
$message = Notify["hello"]["message"]  # would be 'world'

So this fetches another resource parameter value.

You can also fetch class parameters this way but this seems redundant, there are several ordering caveats so test your code carefully.

Loops

This doesn’t really need comment, perhaps only OMFG FINALLY is needed.

["puppet", "facter"].each |$file| {
  file{"/usr/bin/${file}":
    ensure => "link",
    target => "/opt/puppetlabs/bin/${file}"
  }
}

More complex things like map and select exist too:

$domains = ["foo.com", "bar.com"]
$domain_definition = $domains.reduce({}) |$memo, $domain| {
  $memo + {$domain => {"relay" => "mx.${domain}"}}
}

This yields a new hash made up of all the parts:

{
 "foo.com" => {"relay" => "mx.foo.com"},
 "bar.com" => {"relay" => "mx.bar.com"}
}

See Iterating in Puppet for more details on this.

. syntax

If you’re from Ruby this might be a bit more bearable, you can use any function interchangably it seems:

$x = join(["a", "b"], ",")
 
$y = ["a", "b"].join(",")

Both result in a,b

Default Ordering

By default it now does manifest ordering. This is a big deal, I’ve had to write no ordering code at all. None. Not a single require or ordering arrows. It’s just does things top down by default but parameters like notifies and specific requires influence it. Such an amazingly massive time saver. Good times when things that were always obviously dumb ideas goes away.

It’s clever enough to also do things in the order they are included. So if you had:

class myapp {
  include myapp::install
  include myapp::config
  include myapp::service
}

Ordering will magically be right. Containment is still a issue though.

Facts hash

Ever since the first contributor summit I’ve been campaigning for $facts[“foo”] and it’s gone all round with people wanting to invent some new hash like construct and worse, but finally we have now a by default enabled facts hash.

Unfortunately we are still stuck with $settings::vardir but hopefully some new hash will be created for that.

It’s a reserved word everywhere so you can safely just do $facts[“location”] and not even have to worry about $::facts, though you might still do that in the interest of consistency.

Facter 3

Facter 3 is really fast:

$ time facter
facter  0.08s user 0.03s system 44% cpu 0.248 total

This makes everything better. It’s also structured data but this is still a bit awkward in Puppet:

$x = $facts["foo"]["bar"]["baz"]

There seems to be no elegant way to handle a missing ‘foo’ or ‘bar’ key, things just fail badly in ways you can’t catch or recover from. On the CLI you can do facter foo.bar.baz so we’re already careful to not have “.” in a key. We need some function to extract data from hashes like:

$x = $facts.fetch("foo.bar.baz", "default")

It’ll make it a lot easier to deal with.

Hiera 3

Hiera 3 is out and at first I thought it didn’t handle hashes well, but it does:

:hierarchy:
  - "%{facts.fqdn}"
  - "location_%{facts.location}"
  - "country_%{facts.country}"
  - common

That’s how you’d fetch values out of hashes and it’s pretty great. Notice I didn’t do ::facts that’s because facts is reserved so there’ll be no scope layering issues.

Much better parser

You can use functions almost everywhere:

$ puppet apply -e 'notify{hiera("rsyslog::client::server"): }'
Notice: loghost.example.net

There are an immeasurable amount of small improvements in things the old parser did badly, now it’s really nice to use, things just work the way I expect them to do from other languages.

Even horrible stuff like this works:

$x = hiera_hash("something")["foo"]

Which previously needed an intermediate variable.

puppet apply –test

A small thing but –test in apply now works like in agent – colors, verbose etc, really handy.

Data in Modules

I did a PoC to enable Hiera in modules a few years ago and many people loved the idea. This has finally landed in recent Puppet 4 versions and it’s pretty awesome. It lets you have a data directory and hiera.yaml in your module, this goes some way towards removing what is currently done with params.pp

I wrote a blog post about it: Native Puppet 4 Data in Modules. An additional blog post that covers this is params.pp in Puppet 4 which shows how it ties together with some other new things.

Native create_resources

create_resources is a hack that exists because it was easier to hack up than fix Puppet. Puppet has now been fixed, so this is the new create_resources.

each($domains) |$name, $domain| { 
  mail::domain{$name:
    * => $domain
  }
}

See Iterating in Puppet for extensive examples.

No more hiera_hash() and hiera_array()

There’s a new function called lookup() that’s pretty powerful. When combined with the new Data in Modules feature you can replace these functions AND have your automatic parameter lookups do merges.

See Puppet 4 data lookup strategies for an extensive look at these

Native language functions

You can now write name spaced functions using the Puppet DSL:

function site::fetch(
  Hash $data,
  String $key,
  $default
) {
  if $data[$key] {
    $data[$key]
  } else {
    $default
  }
}

And you’ll use this like any function really:

$h = {}
 
$item = site::fetch($h, "thing", "default") # $item is now 'default'

It also has a splat argument handler:

function site::thing(*$args) {
  $args
}
 
$list = site::thing("one", "two", "three") # $list becomes ["one", "two", "three"]

AIO Packaging

Of course by now almost everyone know we’re getting omnibus style packaging. I am a big supporter of this direction, the new bundled ruby is fast and easy to get onto older machines.

The execution of this is unspeakably bad though. It’s so half baked and leave so much to be desired.

Here’s a snippet from the current concat module:

    if defined('$is_pe') and str2bool("${::is_pe}") { # lint:ignore:only_variable_string
      if $::kernel == 'windows' {
        $command_path = "${::env_windows_installdir}/bin:${::path}"
      } else {
        $command_path = "/opt/puppetlabs/puppet/bin:/opt/puppet/bin:${::path}"
      }
    } elsif $::kernel == 'windows' {
      $command_path = $::path
    } else {
      $command_path = "/opt/puppetlabs/puppet/bin:${::path}"
    }
 
    exec{"...":
       path => $command_path
    }

There are no words. Without this abomination it would try to use system ruby to run the #!/usr/bin/env ruby script. Seriously, if something ships that cause this kind of code to be written by users you’ve failed. Completely.

Things like the OS not being properly setup with symlinks into /usr/bin – can kind of understand it to avoid conflicts with existing Puppet, but meh, it just makes it feel unpolished and as if it comes without batteries included the RPM conflicts with puppet so it’s not that, it’s just comes without batteries included.

The file system choices are completely arbitrary:

# puppet apply --configprint vardir
/opt/puppetlabs/puppet/cache

This is intuitive to exactly no-one who has ever used any unix or windows or any computer.

Again, I totally support the AIO direction but the UX of this is so poor that while I’ve been really positive about Puppet 4 up to now I’d say this makes the entire thing be Alpha quality. The team absolutely must go back to the drawing board and consider how this is done from the perspective of usability by people who have likely used Unix before.

Users have decades of experience to build on and the system as a whole need to be coherent and compliment them – it should be a natural and comfortable fit. This and many other layout choice just does not make sense. Sure the location is arbitrary it makes no technical different if it’s in /opt/puppetlabs/puppet/cache or some other directory.

It DOES though make a massive difference cognitively to users when thinking of the option vardir and think of their entire career experience of what that mean and then cannot for the life of them find the place these files go without having to invest effort in finding it and then having to remember it as a odd one out. Even knowing things are in $prefix you still can’t find this dir because it’s now been arbitrarily renamed to cache and instead of using well known tools like find I now have to completely context switch.

Not only is this a senseless choice but frankly it’s insulting that this software seems to think it’s so special that I have to remember their crappy paths differently from any of the other 100s of programs out there. It’s not, it’s just terrible and makes it a nightmare to use. Sure put the stuff in /opt/puppetlabs, but don’t just go and make things up and deviate from what we’ve learned over years of supporting Puppet. It’s an insult.

Your users have invested countless hours in learning the software, countless hours in supporting others and in some cases Paid for this knowledge. Arbitrarily changing vardir to mean cache trivialise that investment and puts a unneeded toll on those of us who support others in the community.

Conclusion

There’s a whole lot more to show about Puppet 4, I’ve only been at it for a few nights after work but overall I am super impressed by the work done on Puppet Core. The packaging lets the efforts down and I’d be weary of recommending anyone go to Puppet 4 as a result, it’s a curiosity to investigate in your spare time while hopefully things improve on the packaging front to the level of a production usable system.

Collecting links to free services for developers

A while ago Brandon Burton tweeted the following:

And I said someone should make a list. I looked around and could not find one so I decided I want to make such a list.

I am gathering links in a flat Markdown file at the moment but once I get an idea for the categories and kinds of links I’ll look at setting up a site with better UX than this big readme. If you have design chops for a site like this and want to help, get in touch.

Already I gathered quite a few and had some good links sent as PRs, if you have any links or if you work for a company who think they might fit the bill please send me links.

I am looking for links to services that provide free services especially to Open Source developers. Past that if they provide a developer account with some free resources – like a monitoring service that allows 5 free devices etc. I’d probably favour links good for infrastructure coders rather than say mobile app developers as there’s a huge list of those around.

I am not after services in Private Beta or Free During Beta or things like this, the only ones of those I’d accept are ones who specifically state that beta accounts will become free dev accounts or similar in future.