Bootstrapping Puppet on EC2 with MCollective

07/14/2010

The problem of getting EC2 images to do what you want is quite significant, mostly I find the whole thing a bit flakey and with too many moving parts.

  • When and what AMI to start
  • Once started how to do you configure it from base to functional. Especially in a way that doesn’t become a vendor lock.
  • How do you manage the massive sprawl of instances, inventory them and track your assets
  • Monitoring and general life cycle management
  • When and how do you shut them, and what cleanup is needed. Being billed by the hour means this has to be a consideration

These are significant problems and just a tip of the ice berg. All of the traditional aspects of infrastructure management – like Asset Management, Monitoring, Procurement – are totally useless in the face of the cloud.

A lot of work is being done in this space by tools like Pool Party, Fog, Opscode and many other players like the countless companies launching control panels, clouds overlaying other clouds and so forth. As a keen believer in Open Source many of these options are not appealing.

I want to focus on the 2nd step above here today and show how I pulled together a number of my Open Source projects to automate that. I built a generic provisioner that hopefully is expandable and usable in your own environments. The provisioner deals with all the interactions between Puppet on nodes, the Puppet Master, the Puppet CA and the administrators.

<rant> Sadly the activity in the Puppet space is a bit lacking in the area of making it really easy to get going on a cloud. There are suggestions on the level of monitoring syslog files from a cronjob and signing certificates based on that. Really. It’s a pretty sad state of affairs when that’s the state of the art.

Compare the ease of using Chef’s Knife with a lot of the suggestions currently out there for using Puppet in EC2 like these: 1, 2, 3 and 4.

Not trying to have a general Puppet Bashing session here but I think it’s quite defining of the 2 user bases that Cloud readiness is such an after thought so far in Puppet and its community. </rant>

My basic needs are that instances all start in the same state, I just want 1 base AMI that I massage into the desired final state. Most of this work has to be done by Puppet so it’s repeatable. Driving this process will be done by MCollective.

I bootstrap the EC2 instances using my EC2 Bootstrap Helper and I use that to install MCollective with just a provision agent. It configures it and hook it into my collective.

From there I have the following steps that need to be done:

  • Pick a nearby Puppet Master, perhaps using EC2 Region or country as guides
  • Set up the host – perhaps using /etc/hosts – to talk to the right master
  • Revoke and clean any old certs for this hostname on all masters
  • Instruct the node to create a new CSR and send it to its master
  • Sign the certificate
  • Run my initial bootstrap Puppet environment, this sets up some hard to do things like facts my full build needs
  • Run the final Puppet run in my normal production environment.
  • Notify me using XMPP, Twitter, Google Calendar, Email, Boxcar and whatever else I want of the new node

This is a lot of work to be done on every node. And more importantly it’s a task that involves many other nodes like puppet masters, notifiers and so forth. It has to adapt dynamically to your environment and not need reconfiguring when you get new Puppet Masters. It has to deal with new data centers, regions and countries without needing any configuration or even a restart. It has to happen automatically without any user interaction so that your auto scaling infrastructure can take care of booting new instances even while you sleep.

The provisioning system I wrote does just this. It follows the above logic for any new node and is configurable for which facts to use to pick a master and how to notify you of new systems. It adapts automatically to your ever changing environments thanks to discovery of resources. The actions to perform on the node are easily pluggable by just creating an agent that complies to the published DDL like the sample agent.

You can see it in action in the video below. I am using Amazon’s console to start the instance, you’d absolutely want to automate that for your needs. You can also see it direct on blip.tv here. For best effect – and to be able to read the text – please fullscreen.

In case the text is unreadable in the video a log file similar to the one in the video can be seen here and an example config here

Past this point my Puppet runs are managed by my MCollective Puppet Scheduler.

While this is all done using EC2 nothing prevents you from applying these same techniques to your own data center or non cloud environment.

Hopefully this shows that you can wrap all the logic needed to do very complex interactions with systems that are perhaps not known for their good reusable API’s in simple to understand wrappers with MCollective, exposing those systems to the network at large with APIs that can be used to reach your goals.

The various bits of open source I used here are: