Select Page

NOTE: This ended up being a proof of concept for a more complete system called Hiera please consider that instead.

Back in 2009 I wrote the first implementation of extlookup for Puppet later on it got merged – after a much needed rewrite – into Puppet mainstream. If you don’t know what extlookup does please go and read that post first.

The hope at the time was that someone would make it better and not just a hacky function that uses global variables for its config. I was exploring some ideas and showing how rich data would apply to the particular use case and language of Puppet but sadly nothing has come of these hopes.

The complaints about extlookup fall into various categories:

  • CSV does not make a good data store
  • I have a personal hate for the global variable abuse in extlookup, I was hoping Puppet config items will become pluggable at some point, alas.
  • Using functions does not let you introspect the data usage inside your modules for UI creation
  • Other complaints fall in the ‘Not Invented Here’ category and the ‘TL; DR’ category of people who simply didn’t bother understanding what extlookup does

The complaint about using functions to handle data not being visible to external sources is valid. Puppet has not made introspection of classes and their parameters easy for ENCs yet so this just seems to me like people who don’t understand that extlookup is simply a data model not a prescription for how to use the data. In a follow up post I will show an extlookup based ENC that supports parametrized classes and magical data resolution for those parametrized classes using the exact same extlookup data store and precedence rules.

Not much to be done for the last group of people but as @jordansissel said “haters gonna hate, coders gonna code!”.

I have addressed the first complaint now by making an extlookup that is pluggable so you can bring different backends.

First of course, in bold defiance of the Ruby Way, I made it backward compatible with older versions of extlookup and gave it a 1:1 compatible CSV backend.

I addressed my global variable hate by adding a config file that might live in /etc/puppet/extlookup.yaml.


I wrote this code yesterday afternoon, so you should already guess that there might be bugs and some redesigns ahead and that it will most likely destroy your infrastructure. I will add unit tests to it etc so please keep an eye on it and it will become mature for sure.

I have currently done backends for CSV, YAML and Puppet manifests. A JSON one will follow and later perhaps one querying Foreman and other data stores like that.

The code lives at

Basic Configuration

Configuration of precedence is a setting that applies equally to all backends, the config file should live in the same directory as your puppet.conf and should be called extlookup.yaml. Examples of it below.

CSV Backend

The CSV backend is backward compatible, it will also respect your old style global variables for configuration – but the other backends wont. To configure it simply put something like this in your config file:

:parser: CSV
- environment_%{environment}
- common
  :datadir: /etc/puppet/extdata

YAML Backend

The most common proposed alternatives to extlookup seem to be YAML based. The various implementations out there though are pretty weak and seemed to get bored with the idea before reaching feature parity with extlookup. With a plugable backend it was easy enough for me to create a YAML data store that has all the extlookup features.

In the case of simple strings being returned I have kept the extlookup feature that parses variables like %{country} in the result data out from the current scope – something mainline puppet extlookup actually broke recently in a botched commit – but if you put hash or array data in the YAML files I don’t touch the data.

Sample data:

country: uk
  docroot: /var/www/

All of this data is accessible using the exact same extlookup function. Configuration of the YAML backend:

:parser: YAML
- environment_%{environment}
- common
  :datadir: /etc/puppet/extdata

Puppet Backend

Nigel Kersten has been working on the proposal of a new data format called the PDL. I had pretty high hopes for the initial targeted feature list but now it seems to have been watered down to a minimal feature set extlookup with a different name and backend.

I implemented the proposed data lookup in classes and modules as a extlookup backend and made it full featured to what you’d expect from extlookup – full configurable lookup orders and custom overrides. Just like we’ve had for years in the CSV version.

Personally I think if you’re going to spend hours creating data that describes your infrastructure you should:

  • Not stick it in a language that’s already proven to be bad at dealing with data
  • Not stick it in a place where nothing else can query the data
  • Not stick it in code that requires code audits for simple data changes – as most change control boards really just won’t see the difference.
  • Not artificially restrict what kind of data can go into the data store by prescribing a unmovable convention with no configuration.

When I show the extlookup based ENC I am making I will really show why putting data in the Puppet Language is like a graveyard for useful information and not actually making anything better.

You can configure this backend to behave exactly the way Nigel designed it using this config file:

:parser: Puppet
- %{calling_class}
- %{calling_module}
   :datasource: data

Which will lookup data in these classes:

  • data::$calling_class
  • data::$calling_module
  • $calling_class::data
  • $calling_module::data

Or you can do better and configure proper precedence which would replace the 1st 2 above with ones for datacenter, country, whatever. The last 2 will always be in the list. An alternative might be:

  • data::$customer
  • data::$environment
  • $calling_class::data
  • $calling_module::data

You could just configure this behavior with the extlookup precedence setting. Pretty nice for those of you feeling nostalgic for config.php files as hated by Sysadmins everywhere.

And as you can see you can also configure the initial namespace – data – in the config file.