Tag Archives: puppet

Rapid Puppet runs with MCollective

The typical Puppet use case is to run the daemon every 30 minutes or so and just let it manage your machines. Sometimes though you want to be able to run it on all your machines as quick as your puppet master can handle.

This is tricky as you generally do not have a way to cap the concurrency and it’s hard to orchestrate that. I’ve extended the MCollective Puppet Agent to do this for you so you can do a rapid run at roll out time and then go back to the more conservative slow pace once your window is over.

The basic logic I implemented is this:

  1. Discover all nodes, sort them alphabetically
    1. Count how many nodes are active now, wait till it’s below threshold
    2. Run a node by just starting a –onetime background run
    3. Sleep a second

This should churn through your nodes very quickly without overwhelming the resources of your master. You can see it in action here, you can see it started 3 nodes and once it got to the 4th 3 were already running and it waited for one of them to finish:

% mc-puppetd -W /dev_server/ runall 2
Thu Aug 05 17:47:21 +0100 2010> Running all machines with a concurrency of 2
Thu Aug 05 17:47:21 +0100 2010> Discovering hosts to run
Thu Aug 05 17:47:23 +0100 2010> Found 4 hosts
Thu Aug 05 17:47:24 +0100 2010> Running dev1.one.net, concurrency is 0
Thu Aug 05 17:47:26 +0100 2010> dev1.one.net schedule status: OK
Thu Aug 05 17:47:28 +0100 2010> Running dev1.two.net, concurrency is 1
Thu Aug 05 17:47:30 +0100 2010> dev1.two.net schedule status: OK
Thu Aug 05 17:47:32 +0100 2010> Running dev2.two.net, concurrency is 2
Thu Aug 05 17:47:34 +0100 2010> dev2.two.net schedule status: OK
Thu Aug 05 17:47:35 +0100 2010> Currently 3 nodes running, waiting
Thu Aug 05 17:48:00 +0100 2010> Running dev3.two.net, concurrency is 2
Thu Aug 05 17:48:05 +0100 2010> dev3.two.net schedule status: OK

This is integrated into the existing mc-puppetd client script you don’t need to roll out anything new to your servers just the client side.

Using this to run each of 47 machines with a concurrency of just 4 I was able to complete a cycle in 8 minutes. Doesn’t sound too impressive but my average run time is around 40 seconds on every node with some being 90 to 150 seconds. My puppetmaster server that usually sits at a steady 0.2mbit out were serving a constant 2mbit/sec for the duration of this run.

Read full storyComments { 2 }
Making machine metadata visible

Making machine metadata visible

I’m quite the fan of data, metadata and querying these to interact with my infrastructure rather than interacting by hostnames and wanted to show how far I am down this route.

This is more an iterative ongoing process than a fully baked idea at this point since the concept of hostnames is so heavily embedded in our Sysadmin culture. Today I can’t yet fully break away from it due to tools like nagios etc still relying heavily on the hostname as the index but these are things that will improve in time.

The background is that in the old days we attempted to capture a lot of metadata in hostnames, domain names and so forth. This was kind of OK since we had static networks with relatively small amounts of hosts. Today we do ever more complex work on our servers and we have more and more servers. The advent of cloud computing has also brought with it a whole new pain of unpredictable hostnames, rapidly changing infrastructures a much bigger emphasis on role based computing.

My metadata about my machines comes from 3 main sources:

  • My Puppet manifests – classes and modules that gets put on a machine
  • Facter facts with the ability to add many per machine easily
  • MCollective stores the meta data in a MongoDB and let me query the network in real time

Puppet manifests based on query

When setting up machines I keep some data like database master hostnames in extlookup but in many cases I am now moving to a search based approach to finding resources. Here’s a sample manifest that will find the master database for a customers development machines:

$masterdb = search_nodes("{'facts.customer': '${customer}', 'facts.environment':${environment}, classes: 'mysql::master'}")

This is MongoDB query against my infrastructure database, it will find for a given node the name of a node that has the class mysql::master on it, by convention there should be only one per customer in my case. When using it in a template I can get back full objects with all the meta data for a node. Hopefully with Puppet 2.6 I can get full hashes into puppet too!

Making Metadata Visible

With machines doing a lot of work, filling a lot of roles etc and with more and more machines you need to be able to tell immediately what machine you are on.

I do this in several places, first my MOTD can look something like this:

   Welcome to Synchronize Your Dogmas 
            hosted at Hetzner, Germany
 
        Puppet Modules:
                - apache
                - iptables
                - mcollective member
                - xen dom0 skeleton
                - mw1.xxx.net virtual machine

I build this up using snippet from my concat module, each important module like apache can just put something like this in:

motd::register{"Apache Web Server": }

Being managed by my snippet library, if you just remove the include line from the manifests the MOTD will automatically update.

With a big block of welcome done, I now need to also be able to show in my prompts what a machine does, who its for a importantly what environment it is in.

Above a shot of 2 prompts in different environments, you see customer name, environment and major modules. Like with the motd I have a prompt::register define that module use to register into the prompt.

SSH Based on Metadata

With all this meta data in place, mcollective rolled out and everything integrated it’s very easy to now find and access machines based on this.

MCollective does real time resource discovery, so keeping with the mysql example above from puppet:

$ mc-ssh -W "environment=development customer=acme mysql::master"
Running: ssh db1.acme.net
Last login: Thu Jul 29 00:22:58 2010 from xxxx

$

Here i am ssh’ing to a server based on a query, if it found more than one machine matching the query a menu would be presented offering me a choice.

Monitoring Based on Metatdata

Finally setting up monitoring and keeping it in sync with reality can be a big challenge especially in dynamic cloud based environments, again I deal with this through discovery based on meta data:

$ check-mc-nrpe -W "environment=development customer=acme mysql::master"  check_load
check_load: OK: 1 WARNING: 0 CRITICAL: 0 UNKNOWN: 0|total=1 ok=1 warn=0 crit=0 unknown=0 checktime=0.612054

Summary

This is really the tip of the ice berg, there is a lot more that I already do – like scheduling puppet runs on groups of machines based on metadata – but also a lot more to do this really is early days down this route. I am very keen to get views from others who are struggling with shortcomings in hostname based approaches and how they deal with it.

Read full storyComments { 0 }

Bootstrapping Puppet on EC2 with MCollective

The problem of getting EC2 images to do what you want is quite significant, mostly I find the whole thing a bit flakey and with too many moving parts.

  • When and what AMI to start
  • Once started how to do you configure it from base to functional. Especially in a way that doesn’t become a vendor lock.
  • How do you manage the massive sprawl of instances, inventory them and track your assets
  • Monitoring and general life cycle management
  • When and how do you shut them, and what cleanup is needed. Being billed by the hour means this has to be a consideration

These are significant problems and just a tip of the ice berg. All of the traditional aspects of infrastructure management – like Asset Management, Monitoring, Procurement – are totally useless in the face of the cloud.

A lot of work is being done in this space by tools like Pool Party, Fog, Opscode and many other players like the countless companies launching control panels, clouds overlaying other clouds and so forth. As a keen believer in Open Source many of these options are not appealing.

I want to focus on the 2nd step above here today and show how I pulled together a number of my Open Source projects to automate that. I built a generic provisioner that hopefully is expandable and usable in your own environments. The provisioner deals with all the interactions between Puppet on nodes, the Puppet Master, the Puppet CA and the administrators.

<rant> Sadly the activity in the Puppet space is a bit lacking in the area of making it really easy to get going on a cloud. There are suggestions on the level of monitoring syslog files from a cronjob and signing certificates based on that. Really. It’s a pretty sad state of affairs when that’s the state of the art.

Compare the ease of using Chef’s Knife with a lot of the suggestions currently out there for using Puppet in EC2 like these: 1, 2, 3 and 4.

Not trying to have a general Puppet Bashing session here but I think it’s quite defining of the 2 user bases that Cloud readiness is such an after thought so far in Puppet and its community. </rant>

My basic needs are that instances all start in the same state, I just want 1 base AMI that I massage into the desired final state. Most of this work has to be done by Puppet so it’s repeatable. Driving this process will be done by MCollective.

I bootstrap the EC2 instances using my EC2 Bootstrap Helper and I use that to install MCollective with just a provision agent. It configures it and hook it into my collective.

From there I have the following steps that need to be done:

  • Pick a nearby Puppet Master, perhaps using EC2 Region or country as guides
  • Set up the host – perhaps using /etc/hosts – to talk to the right master
  • Revoke and clean any old certs for this hostname on all masters
  • Instruct the node to create a new CSR and send it to its master
  • Sign the certificate
  • Run my initial bootstrap Puppet environment, this sets up some hard to do things like facts my full build needs
  • Run the final Puppet run in my normal production environment.
  • Notify me using XMPP, Twitter, Google Calendar, Email, Boxcar and whatever else I want of the new node

This is a lot of work to be done on every node. And more importantly it’s a task that involves many other nodes like puppet masters, notifiers and so forth. It has to adapt dynamically to your environment and not need reconfiguring when you get new Puppet Masters. It has to deal with new data centers, regions and countries without needing any configuration or even a restart. It has to happen automatically without any user interaction so that your auto scaling infrastructure can take care of booting new instances even while you sleep.

The provisioning system I wrote does just this. It follows the above logic for any new node and is configurable for which facts to use to pick a master and how to notify you of new systems. It adapts automatically to your ever changing environments thanks to discovery of resources. The actions to perform on the node are easily pluggable by just creating an agent that complies to the published DDL like the sample agent.

You can see it in action in the video below. I am using Amazon’s console to start the instance, you’d absolutely want to automate that for your needs. You can also see it direct on blip.tv here. For best effect – and to be able to read the text – please fullscreen.

In case the text is unreadable in the video a log file similar to the one in the video can be seen here and an example config here

Past this point my Puppet runs are managed by my MCollective Puppet Scheduler.

While this is all done using EC2 nothing prevents you from applying these same techniques to your own data center or non cloud environment.

Hopefully this shows that you can wrap all the logic needed to do very complex interactions with systems that are perhaps not known for their good reusable API’s in simple to understand wrappers with MCollective, exposing those systems to the network at large with APIs that can be used to reach your goals.

The various bits of open source I used here are:

Read full storyComments { 3 }

Puppet resources on demand with MCollective

Some time ago I wrote how to reuse Puppet providers in your Ruby script, I’ll take that a bit further here and show you to create any kind of resource.

Puppet works based on resources and catalogs. A catalog is a collection of resources and it will apply the catalog to a machine. So in order to do something you can do as before and call the type’s methods directly but if you wanted to build up a resource and say ‘just do it’ then you need to go via a catalog.

Here’s some code, I don’t know if this is the best way to do it, I dug around the code for ralsh to figure this out:

params = { :name => "rip",
           :comment => "R.I.Pienaar",
           :password => '......' }
 
pup = Puppet::Type.type(:user).new(params)
 
catalog = Puppet::Resource::Catalog.new
catalog.add_resource pup
catalog.apply

That’s really simple and doesn’t require you to know much about the inner workings of a type, you’re just mapping the normal Puppet manifest to code and applying it. Nifty.

The natural progression – to me anyway – is to put this stuff into a MCollective agent and build a distributed ralsh.

Here’s a sample use case, I wanted to change my users password everywhere:

$ mc-rpc puppetral do type=user name=rip password='$1$xxx'

And that will go out, find all my machines and use the Puppet RAL to change my password for me. You can do anything puppet can, manage /etc/hosts, add users, remove users, packages, services and anything even your own custom types can be used. Distributed and in parallel over any number of hosts.

Some other examples:

Add a user:

$ mc-rpc puppetral do type=user name=foo comment="Foo User" managehome=true

Run a command using exec, with the magical creates option:

$ mc-rpc puppetral do type=exec name="/bin/date > /tmp/date" user=root timeout=5 creates="/tmp/date"

Add an aliases entry:

$ mc-rpc puppetral do type=mailalias name=foo recipient="rip@devco.net" target="/etc/aliases"

Install a package:

$ mc-rpc puppetral do type=package name=unix2dos ensure=present

Read full storyComments { 0 }

Puppet Concat 20100507

I’ve had quite a lot of contributions to my Puppet Concat module and after some testing by various people I’m ready to do a new release.

Thanks to Paul Elliot, Chad Netzer and David Schmitt for patches and assistance.

For background of what this is about please see my earlier post: Building files from fragments with Puppet

You can download the release here. Please pay special attention to the upgrade instructions below.

Changes in this release

  • Several robustness improvements to the helper shell script.
  • Removed all hard coded paths in the helper script to improve portability.
  • We now use file{} to copy the combined file to its location. This means you can now change the ownership of a file by just changing the owner/group in concat{}.
  • You can specify ensure => “/some/other/file” in concat::fragment to include the contents of another file in the fragment. Even files not managed by puppet.
  • The code is now hosted on Github and we’ll accept patches there.

Upgrading

When upgrading to this version you need to take particular care. All the fragments are now owned by root, the shell script runs as root and we use file{} to copy the resulting file out.

This means you’ll see the diff of not just the fragments but also the final file when running puppetd –test but unfortunately it also means the first time you run puppet with the new code your Puppet will fire off all notifies that you have on your concat{} resources. You’ll also see a lot of changes to resources in the fragments directory on first run. This is normal and expected behavior.

So if say you’re using the concat to create my.cf and notify the service to restart automatically then simply upgrading this module will result in MySQL restarting. This is a one off notify that happens only the first time, from then on it will be as normal. So I’d suggest when upgrading to disable those notifies till this upgrade is running everywhere and then put it back.

Read full storyComments { 1 }

Puppet localconfig parser – 20100330

I had a few reports of problems with Puppet 0.25 and my localconfig.yaml parser, finally Andy Asquelt sent me a patch that resolved the problem, you can download the latest here.

For background about what this is see my original post: What does Puppet manage on a node?

Read full storyComments { 4 }
Scheduling Puppet with MCollective

Scheduling Puppet with MCollective

Scheduling Puppet runs is a hard problem, you either run the daemon or run it through cron, both have drawbacks. There’s been some discussion about decoupling this or to improve the remote control abilities of Puppet, this is my entry into that discussion.

Running the daemon it’s all about the memory problems of pretty much everything involved, you also suffer if a dom0 reboots as the 20 domU’s on it will pile up and cause huge concurrency runs.

Running from cron you have problems scheduling it nicely, the simplest approach is just to sleep a random period of time, but this means clients don’t always run a predictable time and you still get concurrency issues.

I’ve written an mcollective based Command and Control for Puppet that launches Puppet runs. The aim is to spread the CPU load on my masters out evenly to ensure I can use lower spec machines for masters. Or in my case I can re-use my master machines as monitoring and middleware nodes.

It basically has these features:

  • Discover the list of nodes to manage based on a supplied filter, I have regional masters so I will manage groups of Puppet nodes independently
  • Evenly spreads out the Puppet runs over an interval, if I have 10 nodes and a 30 minute interval I will get a run every 3 minutes.
  • Nodes run at a predictable time every time, even after reboots since the node list is just run through alphabetically. If the node list stays constant you’ll always run at the same time give or take 10 seconds. If nodes get added the behavior will be predictable.
  • Before scheduling a run it checks the overall concurrency of Puppet runs, if it exceeds a limit it will skip a background run. I want to give priority to runs that I run by hand with –test, this ensures that happens.
  • If the client it is about to run ran its Catalog recently – maybe via –test – it will skip that run

The result is pretty good, spreading 6 nodes out over 30 minutes I get a nice even CPU spread, the spike in the graph after the change is when the node itself runs Puppet. The 2nd graph is eth0 network output, the dip is when localhost is running:

The resulting CPU usage is much smoother, there aren’t periods of no CPU usage and there are no spikes caused by nodes bunching up together.

Below output from a C&C session managing 3 machines with an interval of 1 minute and a max concurrency of 1, these machines were still running cron based puppetd so you can see the C&C is not scheduling runs when it hits the concurrency limit due to cron runs:

$ puppetcommander.rb --interval 1 -W /dev_server/ --max-concurrent 1
Wed Mar 17 08:31:29 +0000 2010> Looping clients with an interval of 1 minute(s)
Wed Mar 17 08:31:29 +0000 2010> Restricting to 1 concurrent puppet run(s)
Wed Mar 17 08:31:31 +0000 2010> Found 3 puppet nodes, sleeping for ~20 seconds between runs
Wed Mar 17 08:31:31 +0000 2010> Current puppetds running: 1
Wed Mar 17 08:31:31 +0000 2010> Puppet run for client dev1.my.net skipped due to current concurrency of 1
Wed Mar 17 08:31:31 +0000 2010> Sleeping for 20 seconds
Wed Mar 17 08:31:51 +0000 2010> Current puppetds running: 1
Wed Mar 17 08:31:51 +0000 2010> Puppet run for client dev2.my.net skipped due to current concurrency of 1
Wed Mar 17 08:31:51 +0000 2010> Sleeping for 20 seconds
Wed Mar 17 08:32:12 +0000 2010> Current puppetds running: 0
Wed Mar 17 08:32:12 +0000 2010> Running agent for dev3.my.net
Wed Mar 17 08:32:15 +0000 2010> Sleeping for 16 seconds

There are many advantages to this approach over some other that’s been suggested:

  • No host lists to maintain, it reconfigures itself dynamically on demand.
  • It doesn’t rely on some other on-master fact like signed certificates that breaks models where the CA is separate
  • It doesn’t rely on stored configs that doesn’t work well at scale or on a setup with many regional masters.
  • It doesn’t suffer from issues if a node isn’t available but it’s in your host lists.
  • It understands the state of the entire platform and so you can control concurrency and therefore resources on your master.
  • It’s easy to extend with our own logic or demands, the current version of the code is only 90 lines of Ruby including CLI options parsing.
  • Concurrency control can mitigate other problems. Have a cluster of 10 nodes, don’t want your config change to restart them all at the same time, no problem. Just make sure you only run 2 a time.

In reality this means I can remove 256MB RAM from my master – since I can now run fewer puppetmasterd processes, this will save me $15/month hosting fee on this specific master, it’s small change but always good to control my platform costs.

Read full storyComments { 1 }

Puppet Concat 20100312

I am pleased to announce the next version of my Puppet Concat script, we now have 0.24.8 and newer support and a few smaller bits mentioned below.

For background of what this is about please see my earlier post: Building files from fragments with Puppet

New in this release

Paul Elliot sent in most of the patches that enabled this release, lots of thanks Paul!

  • 0.24.8 and newer is supported
  • You can now prepend warnings to generated files as a shell style comment using the warn property
  • You can enable the ability to create empty concat files using the force property
  • You can configure the location of your sort binary in setup.pp

The code should auto configure for 0.24.8 use, if it does not work please see setup.pp.

You can grab the code here.

Known issues

As with my earlier attempts at making a concat tool for 0.24.x this version when used on 0.24 will raise some false notifies. Basically the method we use to clear the concat store of unmanaged files has a side effect and on the next run you will get an unneeded notify. Puppets behavior has improved in 0.25 so it works as expected there, for 0.24 though there is no known work around.

You cannot change the owner of a file, I know how to work around this issue and will have something in the next release.

Read full storyComments { 1 }

Puppet localconfig parser – 20100303

I’ve had some good feedback on my previous post about the puppet localconfig parser, have implemented the requested features so here’s a new version.

First the ability to limit what resources are being printed:

# parselocalconfig.rb --limit package
Classes included on this node:
        fqdn
        common::linux
 
Resources managed by puppet on this node:
        package{redhat-lsb: }
                defined in common/modules/puppet/manifests/init.pp:15

You should only see package resources. You can also disable the classes list using –no-classes and on 0.25.x disable the tags list with –no-tags.

I’ve improved the detection of where to find the yaml file for 0.25 nodes and added an option –config if your config file is not in the usual place.

You can get the latest version here.

Read full storyComments { 0 }

What does Puppet manage on a node?

Last year I wrote a tool to parse the localconfig.yaml from Puppet 0.24 and display a list of resources and classes. This script failed when 0.25 came out, I’ve updated it for 0.25 support.

The yaml cache has some added features in 0.25 so now I can also show the list of tags on a node, output would be:

# parselocalconfig.rb /var/lib/puppet/client_yaml/catalog/fqdn.yaml
Classes included on this node:
        fqdn
        common::linux
        <snip>
 
Tags for this node:
        fqdn
        common::linux
        <snip>
 
Resources managed by puppet on this node:
        yumrepo{centos-base: }
                defined in common/modules/yum/manifests/init.pp:24
 
        file{/root/.ssh: }
                defined in common/modules/users/manifests/root.pp:20
 
        <snip>

You can get the script that supports both 0.24 and 0.25 here.

Read full storyComments { 7 }