For a few releases now PuppetDB had a new query language called Puppet Query Language or PQL for short. It’s quite interesting, I thought a quick post might make a few more people aware of it.
Overview
To use it you need a recent PuppetDB and as this is quite a new feature you really want the latest PuppetDB. There is nothing to enable when you install it the feature is already active. The feature is marked as experimental so some things will change as it moves to production.
PQL Queries look more or less like this:
nodes { certname ~ 'devco' } |
This is your basic query it will return a bunch of nodes, something like:
[ { "deactivated": null, "latest_report_hash": null, "facts_environment": "production", "cached_catalog_status": null, "report_environment": null, "latest_report_corrective_change": null, "catalog_environment": "production", "facts_timestamp": "2016-11-01T06:42:15.135Z", "latest_report_noop": null, "expired": null, "latest_report_noop_pending": null, "report_timestamp": null, "certname": "devco.net", "catalog_timestamp": "2016-11-01T06:42:16.971Z", "latest_report_status": null } ] |
There are a bunch of in-built relationships between say a node and it’s facts and inventory, so queries can get quite complex:
inventory[certname] { facts.osfamily = "RedHat" and facts.dc = "linodeldn" and resources { type = "Package" and title = "java" and parameters.ensure = "1.7.0" } } |
This finds all the RedHat machines in a particular DC with Java 1.7.0 on them. Be aware this will also find machines that are deactivated.
I won’t go into huge details of the queries, the docs are pretty good – examples, overview.
So this is quite interesting, this finally gives us a reasonably usable DB to do queries that mcollective discovery used to be used for – but of course its not a live view nor does it have any clue what the machines are up to but as a cached data source for discovery this is interesting.
Using
CLI
You can of course query this stuff on the CLI and I suggest you familiarise yourself with JQ.
First you’ll have to set up your account:
{ "puppetdb": { "server_urls": "https://puppet:8081", "cacert": "/home/rip/.puppetlabs/etc/puppet/ssl/certs/ca.pem", "cert": "/home/rip/.puppetlabs/etc/puppet/ssl/certs/rip.mcollective.pem", "key": "/home/rip/.puppetlabs/etc/puppet/ssl/private_keys/rip.mcollective.pem" } } |
This is in ~/.puppetlabs/client-tools/puppetdb.conf which is a bit senseless to me since there clearly is a standard place for config files, but alas.
Once you have this and you installed the puppet-client-tools package you can do queries like:
$ puppet query "nodes { certname ~ 'devco.net' }" |
Puppet Code
Your master will have the puppetdb-termini package on it and this brings with it Puppet functions to query PuppetDB so you do not need to use a 3rd party module anymore:
$nodes = puppetdb_query("nodes { certname ~ 'devco' }") |
Puppet Job
At the recent PuppetConf Puppet announced that their enterprise tool puppet job supports using this as discovery, if I remember right it’s something like:
$ puppet job run -q 'nodes { certname ~ 'devco' }' |
MCollective
At PuppetConf I integrated this into MCollective and my Choria tool, mco feature still due a release (MCO-776, choria):
Run Puppet on all the nodes matched by the query:
$ puppet query "nodes { certname ~ 'devco.net' }"|mco rpc puppet runonce |
The above is a bit limited in that the apps in question have to specifically support this kind of STDIN discovery – the rpc app does.
I then also added support to the Choria CLI:
$ mco puppet runonce -I "pql:nodes[certname] { certname ~ 'devco.net' }" |
These queries are a bit special in that they must return just the certname as here, I’ll document this up. The reason for this is that they are actually part of a much larger query done in the Choria discovery system (that uses PQL internally and is a good intro on how to query this API from code).
Here’s an example of a complex query – as used by Choria internally – that limits nodes to ones in a particular collective, our PQL query, which ones have mcollective installed and running. You can see you can nest and combine queries into quite complex ones:
nodes[certname, deactivated] { # finds nodes in the chosen subcollective via a fact (certname in inventory[certname] { facts.mcollective.server.collectives.match("\d+") = "mcollective" }) and # does the supplied PQL query, note the specific certname match (certname in nodes[certname] { certname ~ 'devco.net' }) and # limited to machines with mcollective installed (resources { type = "Class" and title = "Mcollective" }) and # who also have the service started (resources { type = "Class" and title = "Mcollective::Service" }) } |
Conclusion
This is really handy and I hope more people will become familiar with it. I don’t think this quite rolls off the fingers easily – but neither does SQL or any other similar system so par for the course. What is amazing is that we can get nearer to having a common language across CLI, Code, Web UIs and 3rd party tools for describing queries of our estate so this is a major win.
“I don’t think this quite rolls off the fingers easily – but neither does SQL or any other similar system so par for the course.”
Um…. actually, SQL rolls off the fingers quite a bit better. Especially for select statements.
“select from where ”
Remember, SQL was designed so that not only could the programmers use it effectively, but that managers could ask questions of the database without having to get a programmer involved.
There’s a clear syntax and a cadence to an SQL query that is oriented around interactive use (why else would you have the ‘select *’ shortcut[1]?). It’s one of those rare cases where aping natural language seems to have helped.
Couple that with the ability to list the data sources and describe their structure, and you have a tool that significantly empowers the average user. Who isn’t a programmer. You just teach them “select stuff from location where conditions” and they’re off and running.
To be fair, every database vendor puts their own twist on SQL, which means bridging from one vendor to another means matching an impedance gap. That never rolls off one’s fingers, it seems. And adding in joins (several kinds), subselects, stored procedures, and functions will quickly take a query into quite a complicated place.
I’ve also noticed that a lot of programmers don’t stack their SQL in the same way that they stack other queries. They run the query into one long string, instead of aligning the major keywords. That makes the SQL a lot more intimidating and hides the natural structure of the query.
[1] That programmers should never use in their code.