Making machine metadata visible

I’m quite the fan of data, metadata and querying these to interact with my infrastructure rather than interacting by hostnames and wanted to show how far I am down this route.

This is more an iterative ongoing process than a fully baked idea at this point since the concept of hostnames is so heavily embedded in our Sysadmin culture. Today I can’t yet fully break away from it due to tools like nagios etc still relying heavily on the hostname as the index but these are things that will improve in time.

The background is that in the old days we attempted to capture a lot of metadata in hostnames, domain names and so forth. This was kind of OK since we had static networks with relatively small amounts of hosts. Today we do ever more complex work on our servers and we have more and more servers. The advent of cloud computing has also brought with it a whole new pain of unpredictable hostnames, rapidly changing infrastructures a much bigger emphasis on role based computing.

My metadata about my machines comes from 3 main sources:

My Puppet manifests – classes and modules that gets put on a machine
Facter facts with the ability to add many per machine easily
MCollective stores the meta data in a MongoDB and let me query the network in real time

Puppet manifests based on query

When setting up machines I keep some data like database master hostnames in extlookup but in many cases I am now moving to a search based approach to finding resources. Here’s a sample manifest that will find the master database for a customers development machines:

$masterdb = search_nodes("{'facts.customer': '${customer}', 'facts.environment':${environment}, classes: 'mysql::master'}")

This is MongoDB query against my infrastructure database, it will find for a given node the name of a node that has the class mysql::master on it, by convention there should be only one per customer in my case. When using it in a template I can get back full objects with all the meta data for a node. Hopefully with Puppet 2.6 I can get full hashes into puppet too!

Making Metadata Visible

With machines doing a lot of work, filling a lot of roles etc and with more and more machines you need to be able to tell immediately what machine you are on.

I do this in several places, first my MOTD can look something like this:

   Welcome to Synchronize Your Dogmas 
            hosted at Hetzner, Germany
 
        Puppet Modules:
                - apache
                - iptables
                - mcollective member
                - xen dom0 skeleton
                - mw1.xxx.net virtual machine

I build this up using snippet from my concat module, each important module like apache can just put something like this in:

motd::register{"Apache Web Server": }

Being managed by my snippet library, if you just remove the include line from the manifests the MOTD will automatically update.

With a big block of welcome done, I now need to also be able to show in my prompts what a machine does, who its for a importantly what environment it is in.

Above a shot of 2 prompts in different environments, you see customer name, environment and major modules. Like with the motd I have a prompt::register define that module use to register into the prompt.

SSH Based on Metadata

With all this meta data in place, mcollective rolled out and everything integrated it’s very easy to now find and access machines based on this.

MCollective does real time resource discovery, so keeping with the mysql example above from puppet:

$ mc-ssh -W "environment=development customer=acme mysql::master"
Running: ssh db1.acme.net
Last login: Thu Jul 29 00:22:58 2010 from xxxx

$

Here i am ssh’ing to a server based on a query, if it found more than one machine matching the query a menu would be presented offering me a choice.

Monitoring Based on Metatdata

Finally setting up monitoring and keeping it in sync with reality can be a big challenge especially in dynamic cloud based environments, again I deal with this through discovery based on meta data:

$ check-mc-nrpe -W "environment=development customer=acme mysql::master"  check_load
check_load: OK: 1 WARNING: 0 CRITICAL: 0 UNKNOWN: 0|total=1 ok=1 warn=0 crit=0 unknown=0 checktime=0.612054

Summary

This is really the tip of the ice berg, there is a lot more that I already do – like scheduling puppet runs on groups of machines based on metadata – but also a lot more to do this really is early days down this route. I am very keen to get views from others who are struggling with shortcomings in hostname based approaches and how they deal with it.