Real time Puppet events and network wide callbacks

07/03/2011

I’ve wanted to be notified the moment Puppet changes a resource for ages, I’ve often been told this cannot be done without monkey patching nasty Puppet internals.

Those following me on Twitter have no doubt noticed my tweets complaining about the complexities in getting this done. I’ve now managed to get this done so am happy to share here.

The end result is that my previously mentioned event system is now getting messages sent the moment Puppet does anything. And I have callbacks at the event system layer so I can now instantly react to change anywhere on my network.

If a new webserver in a certain cluster comes up I can instantly notify the load balancer to add it – perhaps via a Puppet run of of its own. This lets me build reactive cross node orchestration. Apart from the callbacks I also get run metrics in Graphite as seen in the image.

The best way I found to make Puppet do this is by creating your own transaction reports processor that is the client side of a normal puppet report, here’s a shortened version of what I did. It’s overly complex and a pain to get working since Puppet Pluginsync still cannot sync out things like Applications.

The code overrides the finalize_report method to send the final run metrics and the add_resource_status method to publish events for changing resources. Puppet will use the add_resource_status method to add the status of a resource right after evaluating that resource. By tapping into this method we can send events to the network the moment the resource has changed.

require 'puppet/transaction/report'
 
class Puppet::Transaction::UnimatrixReport < Puppet::Transaction::Report
  def initialize(kind, configuration_version=nil)
    super(kind, configuration_version)
 
    @um_config = YAML.load_file("/etc/puppet/unimatrix.yaml")
  end
 
  def um_connection
    @um_connection ||= Stomp::Connection.new(@um_config[:user], @um_config[:password], @um_config[:server], @um_config[:port], true)
  end
 
  def um_newevent
    event = { ... } # skeleton event, removed for brevity
  end
 
  def finalize_report
    super
 
    metrics = um_newevent
    sum = raw_summary
 
    # add run metrics from raw_summary to metrics, removed for brevity
 
    Timeout::timeout(2) { um_connection.publish(@um_config[:portal], metrics.to_json) }
  end
 
  def add_resource_status(status)
    super(status)
 
    if status.changed?
      event = um_newevent
 
      # add event status to event hash, removed for brevity
 
      Timeout::timeout(2) { um_connection.publish(@um_config[:portal], event.to_json) }
    end
  end
end

Finally as I really have no need for sending reports to my Puppet Masters I created a small Application that replace the standard agent. This application has access to the report even when reporting is disabled so it will never get saved to disk or copied to the masters.

This application sets up the report using the class above and creates a log destination that feeds the logs into it. This is more or less exactly the Puppet::Application::Agent so I retain all my normal CLI usage like –tags and –test etc.

require 'puppet/application'
require 'puppet/application/agent'
require 'rubygems'
require 'stomp'
require 'json'
 
class Puppet::Application::Unimatrix < Puppet::Application::Agent
  run_mode :unimatrix
 
  def onetime
    unless options[:client]
      $stderr.puts "onetime is specified but there is no client"
      exit(43)
      return
    end
 
    @daemon.set_signal_traps
 
    begin
      require 'puppet/transaction/unimatrixreport'
      report = Puppet::Transaction::UnimatrixReport.new("unimatrix")
 
      # Reports get logs, so we need to make a logging destination that understands our report.
      Puppet::Util::Log.newdesttype :unimatrixreport do
        attr_reader :report
 
        match "Puppet::Transaction::UnimatrixReport"
 
        def initialize(report)
          @report = report
        end
 
        def handle(msg)
          @report << msg
        end
      end
 
      @agent.run(:report => report)
    rescue => detail
      puts detail.backtrace if Puppet[:trace]
      Puppet.err detail.to_s
    end
 
    if not report
      exit(1)
    elsif options[:detailed_exitcodes] then
      exit(report.exit_status)
    else
      exit(0)
    end
  end
end

And finally I can now create a callback in my event system this example is over simplified but the gist of it is that I am triggering a Puppet run on my machines with class roles::frontend_lb as a result of a web server starting on any machine with the class roles::frontend_web – effectively immediately causing newly created machines to get into the pool. The Puppet run is triggered via MCollective so I am using it’s discovery capabilities to run all the instances of load balancers.

add_callback(:name => "puppet_change", :type => ["archive"]) do |event|
   data = event["extended_data"]
 
   if data["resource_title"] && data["resource_type"]
      if event["name"] == "Service[httpd]"
         # only when its a new start, this is short for brevity you want to do some more checks
         if data["restarted"] == false && data["tags"].include?("roles::frontend_web")
           # does a puppet run via mcollective on the frontend load balancer
           UM::Util.run_puppet(:class_filter => "roles::frontend_lb")
         end
     end
  end
end

Doing this via a Puppet run demonstrates to me where the balance lie between Orchestration and CM. You still need to be able to build a new machine, and that new machine needs to be in the same state as those that were managed using the Orchestration tool. So by using MCollective to trigger Puppet runs I know I am not doing anything out of bounds of my CM system, I am simply causing it to work when I want it to work.