Puppet Search Engine with MCollective

The current supported way of making a Puppet node aware of the state of other nodes is Exported Resources, there’s some movement in this space see for example a recent thread on the developers list.

I’ve personally never found exported resources to be a good fit for me, I have many masters spread over long network distances and they often operate in a split brain scenario where the masters couldn’t see each other. This makes using something like MySQL as a database hard since you’d effectively need a single write point and replication is point to point.

There are other problems too, consider a use case where you have modules for phpbb, wordpress, rails apps, phpmyadmin and several others that need to talk to the master database. Maybe you also want to show in the motd a resource list for customers.

Every one of these modules have configuration parameters that differ in some way or the other. The exported resources model kind of expect you to export from the MySQL Master the configs for all the apps using the database, this isn’t viable when you don’t know what all will talk to this MySQL server and the resource abstraction doesn’t really work well when the resulting config might need to look like this:

$write_database = "db1.your.net";
$read_databases = "db2.your.net", "db3.your.net", "db4.your.net";

Filling in that array with exported resources is pretty hard.

So I favor searching nodes and walking the results. I tend to use extlookup for this and just maintain the hosts lists by hand, this works but it’s maintained by hand, there has to be a better way.

I wrote a mcollective registration system that puts all node data in MongoDB instances.

Simply using mcollective to manage the data store has several advantages:

You can run many instances of the registration receiver, one per puppet master. As writes are not done by the master but by MCollective there’s no need for crazy database replication system, mcollective gives you that for free
Using ActiveMQ meshed network architecture you can achieve better resiliency and ensure more up to date data than with simple point to point replication of database systems. Routing around internet outages is much easier this way.
If you bootstrap machines via mcollective your registration data will be there before you even do your first Puppet run

And storing the data in a document database – and specifically Mongo over some of the other contenders:

Writes are very fast – ms per record written as the update is written as a single document and not 100s or 1000s of rows
Mongo has a query language to query the documents, other NoSQL systems insist on writing map reduce queries, this would be a bad fit for use inside Puppet manifests without writing a lot of boiler place query map reduce functions, something I don’t have time for.

The mcollective registration system will store per node the following data:

A list of all the Agents
A list of all the facts
A list of all the Puppet classes assigned to this node
FQDN, time of most recent update and a few other bits

You can now use this in Puppet manifests as follows to fill some of our stated needs, the code below use the search engine to find all the machines with class “mysql::server” for a specific customer and then display this data in the MOTD:

# Connect to the database, generally do this in site.pp
search_setup("localhost", "puppet", "nodes")
 
# A define to add the hostname and ip address of the MySQL server to motd
define motd::register_mysql {
   $node = load_node($name)
 
   motd::register{"MySQL Server:  ${node['fqdn']} / ${node['facts']['ipaddress']}"
}
 
$nodes = search_nodes("{'facts.customer' => '$customer', 'classes' => 'mysql::server'}")
 
motd::register_mysql{$nodes: }

Here is a template that does a search using this too, it makes a comma seperate list of mysql slave ip addresses for a customer:

$read_databases = <%= 
nodes = Puppet::Util::MongoQuery.instance.find_nodes({"facts.customer" => "#{customer}", "classes" => "mysql::slave"})
 
nodes.map {|n| '"' + n["facts"]["ipaddress"] + '"' }.join(", ")
%>

This code can probably be cleaned up a bit, would be helpful if Puppet gave us better ways to extend it to enable this kind of thing.

I find this approach much less restrictive and easier to get my head around than exported resources. I obviously see the need for tracking actual state across a infrastructure and exported resources will solve that in future when you have the ability to require a database on another node to be up before starting a webserver process, but that seems to be long way off for Puppet. And I think even then a combination system like this will have value.

A note about the code and how to install it. Puppet has a very limited extension system at present, you cannot add Puppet::Util classes via pluginsync and there’s no support for any kind of pooling of resources from plugins. So this code needs to be installed into your Ruby libdir directly and I use a Singleton to maintain a database pool per Puppetmaster process. This is not ideal, I’ve filed feature enhancement requests with Puppet Labs, hopefully things will become easier in time.

The code to enable above is in my GitHub account..

Puppet Search Engine with MCollective

Licence