Puppet Query Language


For a few releases now PuppetDB had a new query language called Puppet Query Language or PQL for short. It’s quite interesting, I thought a quick post might make a few more people aware of it.


To use it you need a recent PuppetDB and as this is quite a new feature you really want the latest PuppetDB. There is nothing to enable when you install it the feature is already active. The feature is marked as experimental so some things will change as it moves to production.

PQL Queries look more or less like this:

nodes { certname ~ 'devco' }

This is your basic query it will return a bunch of nodes, something like:

    "deactivated": null,
    "latest_report_hash": null,
    "facts_environment": "production",
    "cached_catalog_status": null,
    "report_environment": null,
    "latest_report_corrective_change": null,
    "catalog_environment": "production",
    "facts_timestamp": "2016-11-01T06:42:15.135Z",
    "latest_report_noop": null,
    "expired": null,
    "latest_report_noop_pending": null,
    "report_timestamp": null,
    "certname": "devco.net",
    "catalog_timestamp": "2016-11-01T06:42:16.971Z",
    "latest_report_status": null

There are a bunch of in-built relationships between say a node and it’s facts and inventory, so queries can get quite complex:

inventory[certname] { 
  facts.osfamily = "RedHat" and
  facts.dc = "linodeldn" and
  resources { 
    type = "Package" and
    title = "java" and
    parameters.ensure = "1.7.0" 

This finds all the RedHat machines in a particular DC with Java 1.7.0 on them. Be aware this will also find machines that are deactivated.

I won’t go into huge details of the queries, the docs are pretty good – examples, overview.

So this is quite interesting, this finally gives us a reasonably usable DB to do queries that mcollective discovery used to be used for – but of course its not a live view nor does it have any clue what the machines are up to but as a cached data source for discovery this is interesting.



You can of course query this stuff on the CLI and I suggest you familiarise yourself with JQ.

First you’ll have to set up your account:

  "puppetdb": {
    "server_urls": "https://puppet:8081",
    "cacert": "/home/rip/.puppetlabs/etc/puppet/ssl/certs/ca.pem",
    "cert": "/home/rip/.puppetlabs/etc/puppet/ssl/certs/rip.mcollective.pem",
    "key": "/home/rip/.puppetlabs/etc/puppet/ssl/private_keys/rip.mcollective.pem"

This is in ~/.puppetlabs/client-tools/puppetdb.conf which is a bit senseless to me since there clearly is a standard place for config files, but alas.

Once you have this and you installed the puppet-client-tools package you can do queries like:

$ puppet query "nodes { certname ~ 'devco.net' }"

Puppet Code

Your master will have the puppetdb-termini package on it and this brings with it Puppet functions to query PuppetDB so you do not need to use a 3rd party module anymore:

$nodes = puppetdb_query("nodes { certname ~ 'devco' }")

Puppet Job

At the recent PuppetConf Puppet announced that their enterprise tool puppet job supports using this as discovery, if I remember right it’s something like:

$ puppet job run -q 'nodes { certname ~ 'devco' }'


At PuppetConf I integrated this into MCollective and my Choria tool, mco feature still due a release (MCO-776, choria):

Run Puppet on all the nodes matched by the query:

$ puppet query "nodes { certname ~ 'devco.net' }"|mco rpc puppet runonce

The above is a bit limited in that the apps in question have to specifically support this kind of STDIN discovery – the rpc app does.

I then also added support to the Choria CLI:

$ mco puppet runonce -I "pql:nodes[certname] { certname ~ 'devco.net' }"

These queries are a bit special in that they must return just the certname as here, I’ll document this up. The reason for this is that they are actually part of a much larger query done in the Choria discovery system (that uses PQL internally and is a good intro on how to query this API from code).

Here’s an example of a complex query – as used by Choria internally – that limits nodes to ones in a particular collective, our PQL query, which ones have mcollective installed and running. You can see you can nest and combine queries into quite complex ones:

nodes[certname, deactivated] { 
  # finds nodes in the chosen subcollective via a fact
  (certname in inventory[certname] { 
    facts.mcollective.server.collectives.match("\d+") = "mcollective" 
  }) and 
  # does the supplied PQL query, note the specific certname match
  (certname in nodes[certname] {
    certname ~ 'devco.net'
  }) and
  # limited to machines with mcollective installed
  (resources {
    type = "Class" and title = "Mcollective"
  }) and 
  # who also have the service started
  (resources {
    type = "Class" and title = "Mcollective::Service"


This is really handy and I hope more people will become familiar with it. I don’t think this quite rolls off the fingers easily – but neither does SQL or any other similar system so par for the course. What is amazing is that we can get nearer to having a common language across CLI, Code, Web UIs and 3rd party tools for describing queries of our estate so this is a major win.