Tag Archives: ruby

MCollective and other languages

I often get asked about MCollective and other programming languages. Thus far we only support Ruby but my hope is in time we’ll be able to be more generic.

Initially I had a few requirements from serialization:

  • It must retain data types
  • Encoding the same data – like a hash – twice should give the same result from the point of view of md5()

That was about it really. This was while we used a pre-shared key to validate requests and so the result of the encode and decode should be the same on the sender as on the receiver. With YAML this was never the case so I used Marshal.

We recently had a SSL based security plugin contributed that relaxed the 2nd requirement so we can go back to using YAML. We could in theory relax the 1st requirement but it would just inhibit the kind of tools you can build with MCollective quite a bit. So I’d strongly suggest this is a must have.

Today there are very few cross language serializers that let you just deal with arbitrary data YAML is one that seems to have a log of language support. Prior to version 1.0.0 of MCollective the SSL security system only supported Marshal but we’ll support YAML in addition to Marshal in 1.0.0.

This enabled me to write a Perl client that speaks to your standard Ruby collective (if it runs this new plugin).

You can see the Perl client here. The Perl code is roughly a mc-find-hosts written in Perl and without command line options for filtering – though you can just adjust the filters in the code. It’s been years since I wrote any Perl so that’s just the first thing that worked for me.

Point is someone should be able to take any language that has the Syck YAML bindings and write a client library to talk with Mcollective. I tried the non Syck bindings in PHP and it’s unusable, I suspect the PHP Syck bindings will work better but I didn’t try them.

As mentioned on the user list post 1.0.0 I intend to focus on long running and scheduled requests I’ll then also work on some kind of interface between Mcollective and Agents written in other languages – since that is more or less how long running scheduled tasks would work anyway. This will then use the Ruby as a transport hooking clients and agents in different languages together.

I can see that I’ll enable this but I am very unlikely to write the clients myself. I am therefore keen to speak to community members who want to speak to MCollective from languages like Python and who have some time to work on this.

Read full storyComments { 2 }

Rapid Puppet runs with MCollective

The typical Puppet use case is to run the daemon every 30 minutes or so and just let it manage your machines. Sometimes though you want to be able to run it on all your machines as quick as your puppet master can handle.

This is tricky as you generally do not have a way to cap the concurrency and it’s hard to orchestrate that. I’ve extended the MCollective Puppet Agent to do this for you so you can do a rapid run at roll out time and then go back to the more conservative slow pace once your window is over.

The basic logic I implemented is this:

  1. Discover all nodes, sort them alphabetically
    1. Count how many nodes are active now, wait till it’s below threshold
    2. Run a node by just starting a –onetime background run
    3. Sleep a second

This should churn through your nodes very quickly without overwhelming the resources of your master. You can see it in action here, you can see it started 3 nodes and once it got to the 4th 3 were already running and it waited for one of them to finish:

% mc-puppetd -W /dev_server/ runall 2
Thu Aug 05 17:47:21 +0100 2010> Running all machines with a concurrency of 2
Thu Aug 05 17:47:21 +0100 2010> Discovering hosts to run
Thu Aug 05 17:47:23 +0100 2010> Found 4 hosts
Thu Aug 05 17:47:24 +0100 2010> Running dev1.one.net, concurrency is 0
Thu Aug 05 17:47:26 +0100 2010> dev1.one.net schedule status: OK
Thu Aug 05 17:47:28 +0100 2010> Running dev1.two.net, concurrency is 1
Thu Aug 05 17:47:30 +0100 2010> dev1.two.net schedule status: OK
Thu Aug 05 17:47:32 +0100 2010> Running dev2.two.net, concurrency is 2
Thu Aug 05 17:47:34 +0100 2010> dev2.two.net schedule status: OK
Thu Aug 05 17:47:35 +0100 2010> Currently 3 nodes running, waiting
Thu Aug 05 17:48:00 +0100 2010> Running dev3.two.net, concurrency is 2
Thu Aug 05 17:48:05 +0100 2010> dev3.two.net schedule status: OK

This is integrated into the existing mc-puppetd client script you don’t need to roll out anything new to your servers just the client side.

Using this to run each of 47 machines with a concurrency of just 4 I was able to complete a cycle in 8 minutes. Doesn’t sound too impressive but my average run time is around 40 seconds on every node with some being 90 to 150 seconds. My puppetmaster server that usually sits at a steady 0.2mbit out were serving a constant 2mbit/sec for the duration of this run.

Read full storyComments { 2 }

Monitoring ActiveMQ

I have a number of ActiveMQ servers, 7 in total, 3 in a network of brokers the rest standalone. For MCollective I use topics extensively so don’t really need to monitoring them much other than for availability. I also though do a lot of Queued work where lots of machines put data in a queue and others process the data.

In the Queue scenario you absolutely need to monitor queue sizes, memory usage and such. You also need to graph things like rates of messages, consumer counts and memory use. I am busy writing a number of Nagios and Cacti plugins to help with this, you can find them on Github.

To use these you need to have the ActiveMQ Statistics Plugin enabled.

First we need to monitor queue sizes:

$ check_activemq_queue.rb --host localhost --user nagios --password passw0rd --queue exim.stats --queue-warn 1000 --queue-crit 2000
OK: ActiveMQ exim.stats has 1 messages

This will connect to localhost monitoring a queue exim.stats warning you when it’s got 1000 messages and critical at 2000.

I need to add to this the ability to monitor memory usage, this will come over the next few days.

I also have a plugin for Cacti it can output stats for the broker as a whole and also for a specific queue. First the whole broker:

$ activemq-cacti-plugin.rb --host localhost --user nagios --password passw0rd --report broker
stomp+ssl:stomp+ssl storePercentUsage:81 size:5597 ssl:ssl vm:vm://web3 dataDirectory:/var/log/activemq/activemq-data dispatchCount:169533 brokerName:web3 openwire:tcp://web3:6166 storeUsage:869933776 memoryUsage:1564 tempUsage:0 averageEnqueueTime:1623.90502285799 enqueueCount:174080 minEnqueueTime:0.0 producerCount:0 memoryPercentUsage:0 tempLimit:104857600 messagesCached:0 consumerCount:2 memoryLimit:20971520 storeLimit:1073741824 inflightCount:9 dequeueCount:169525 brokerId:ID:web3-44651-1280002111036-0:0 tempPercentUsage:0 stomp:stomp://web3:6163 maxEnqueueTime:328585.0 expiredCount:0

Now a specific queue:

$ activemq-cacti-plugin.rb --host localhost --user nagios --password passw0rd --report exim.stats
size:0 dispatchCount:168951 memoryUsage:0 averageEnqueueTime:1629.42897052992 enqueueCount:168951 minEnqueueTime:0.0 consumerCount:1 producerCount:0 memoryPercentUsage:0 destinationName:queue://exim.stats messagesCached:0 memoryLimit:20971520 inflightCount:0 dequeueCount:168951 expiredCount:0 maxEnqueueTime:328585.0

Grab the code on GitHub and follow there, I expect a few updates in the next few weeks.

Read full storyComments { 0 }

EC2 Bootstrap Helper

I’ve been working a bit on streamlining the builds I do on EC2 and wanted a better way to provision my machines. I use CentOS and things are pretty rough to non existent for nicely built EC2 images. I’ve used the Rightscale ones till now and while they’re nice they are also full of lots of code copyrighted by Rightscale.

What I really wanted was something as full featured as Ubuntu’s CloudInit but also didn’t feel much like touching any Python. I hacked up something that more or less do what I need. You can get it on GitHub. It’s written and tested on CentOS 5.5.

The idea is that you’ll have a single multi purpose AMI that you can easily bootstrap onto your puppet/mcollective infrastructure using this system. Below for some details.

I prepare my base CentOS AMI with the following mods:

  • Install Facter and Puppet – but not enabled
  • Install the EC2 utilities
  • Setup the usual getsshkeys script
  • Install the ec2-boot-init RPM
  • Add a custom fact that reads /etc/facts.txt – see later why. Get one here.

With this in place you need to create some ruby scripts that you will use to bootstrap your machines. Examples of this would be to install mcollective, configure it to find your current activemq. Or to set up puppet and do your initial run etc.

We host these scripts on any webserver – ideally S3 – so that when a machine boots it can grab the logic you want to execute on it. This way you can bug fix your bootstrapping without having to make new AMIs as well as add new bootstrap methods in future to existing AMIs.

Here’s a simple example that just runs a shell command:

newaction("shell") do |cmd, ud, md, config|
    if cmd.include?(:command)
        system(cmd[:command])
    end
end

You want to host this on any webserver in a file called shell.rb. Now create a file list.txt in the same location that just have this:

shell.rb

You can list as many scripts as you want. Now when you boot your instance pass it data like this:

--- 
:facts: 
  role: webserver
:actions: 
- :url: http://your.net/path/to/actions/list.txt
  :type: :getactions
- :type: :shell
  :command: date > /tmp/test

The above will fetch the list of actions – our shell.rb – from http://your.net/path/to/actions/list.txt and then run using the shell action the command date > /tmp/test. The actions are run in order so you probably always want getactions to happen first.

Other actions that this script will take:

  • Cache all the user and meta data in /var/spool/ec2boot
  • Create /etc/facts.txt with all your facts that you passed in as well as a flat version of the entire instance meta data.
  • Create a MOTD that shows some key data like AMI ID, Zone, Public and Private hostnames

The boot library provides a few helpers that help you write scripts for this environment specifically around fetching files and logging:

    ["rubygems-1.3.1-1.el5.noarch.rpm",
     "rubygem-stomp-1.1.6-1.el5.noarch.rpm",
     "mcollective-common-#{version}.el5.noarch.rpm",
     "mcollective-#{version}.el5.noarch.rpm",
     "server.cfg.templ"].each do |pkg|
        EC2Boot::Util.log("Fetching pkg #{pkg}")
        EC2Boot::Util.get_url("http://foo.s3.amazonaws.com/#{pkg}", "/mnt/#{pkg}")
     end

This code fetches a bunch of files from a S3 bucket and save them into /mnt. Each one gets logged to console and syslog. Using this GET helper has the advantage that it has sane retrying etc built in for you already.

It’s fairly early days for this code but it works and I am using it, I’ll probably be adding a few more features soon, let me know in comments if you need anything specific or even if you find it useful.

Read full storyComments { 0 }

Puppet resources on demand with MCollective

Some time ago I wrote how to reuse Puppet providers in your Ruby script, I’ll take that a bit further here and show you to create any kind of resource.

Puppet works based on resources and catalogs. A catalog is a collection of resources and it will apply the catalog to a machine. So in order to do something you can do as before and call the type’s methods directly but if you wanted to build up a resource and say ‘just do it’ then you need to go via a catalog.

Here’s some code, I don’t know if this is the best way to do it, I dug around the code for ralsh to figure this out:

params = { :name => "rip",
           :comment => "R.I.Pienaar",
           :password => '......' }
 
pup = Puppet::Type.type(:user).new(params)
 
catalog = Puppet::Resource::Catalog.new
catalog.add_resource pup
catalog.apply

That’s really simple and doesn’t require you to know much about the inner workings of a type, you’re just mapping the normal Puppet manifest to code and applying it. Nifty.

The natural progression – to me anyway – is to put this stuff into a MCollective agent and build a distributed ralsh.

Here’s a sample use case, I wanted to change my users password everywhere:

$ mc-rpc puppetral do type=user name=rip password='$1$xxx'

And that will go out, find all my machines and use the Puppet RAL to change my password for me. You can do anything puppet can, manage /etc/hosts, add users, remove users, packages, services and anything even your own custom types can be used. Distributed and in parallel over any number of hosts.

Some other examples:

Add a user:

$ mc-rpc puppetral do type=user name=foo comment="Foo User" managehome=true

Run a command using exec, with the magical creates option:

$ mc-rpc puppetral do type=exec name="/bin/date > /tmp/date" user=root timeout=5 creates="/tmp/date"

Add an aliases entry:

$ mc-rpc puppetral do type=mailalias name=foo recipient="rip@devco.net" target="/etc/aliases"

Install a package:

$ mc-rpc puppetral do type=package name=unix2dos ensure=present

Read full storyComments { 0 }

Tutorial: Writing MCollective Agents

I’ve recorded a screencast that walks you through the process of developing a SimpleRPC Agent, give it a DDL and also a simple client to communicate with it.

The tutorial creates a small echo agent that takes input and return it unmodified. It validates that you are sending a string and has a sample of dealing with intermittent failure.

Once you’ve watched this, or even during, you can use the following links are reference material: Writing Agents, Data Definition Language and Writing Clients.

You can view it directly on blip.tv which will hopefully be better quality.

I used a few VIM Snippets during the demo to boilerplate the agent and DDL, you’ll find these in the tarball for the upcoming 0.4.7 release in the ext/vim directory, they are already on GitHub too.

Read full storyComments { 0 }

Recent MCollective releases and roadmap.

I’ve had two successive Marionette Collective releases recently, I was hoping to have one big one but I was waiting for the Stomp maintainers to do a release and it was taking a while.

These two releases are both major feature releases covering major feature sets. See lower down for a breakdown of it all.

We’re nearing feature completeness for the SimpleRPC layer as I am adding a number of features of interest to Enterprise and Large users especially around security and web UIs.

Once we’re at the end of this cycle I’ll do a 1.0.0 release and then from there move onto the next major feature cycle. The next cycle will focus on queuing long running tasks, background scheduling, future scheduling of tasks and a lot of related work. I posted some detail about these plans to the list recently.

Over the new few days or weeks I’ll do a number of Screencasts exploring some of these new features in depth, for now the list of what’s new:

Security

Connectivity

We can use Ruby Gem Stomp 1.1.6 which brings a lot of enhancements:

  • Connection pools for failover between multiple ActiveMQs
  • Lots of tunables about the connection pools such as retry frequencies etc
  • SSL TLS between node and ActiveMQ

Writing Web and Dynamic UIs

  • A DDL that describes agents, inputs and outputs:
    • Creates auto generated documentation
    • Can be used to auto generate user interfaces
    • The client library will only make requests that validate against the DDL
    • In future input validations will move into the DDL and will be done automatically for you
  • Web UI’s can bypass or do their own discovery and use the DDL to auto generate user interfaces

Usability

  • Fire-and-Forget style requests, for when you just want something done but do not care about results, these requests are very quick as they do not do any discovery.
  • Agents can now be reloaded without restarting the daemon
  • A new mc-inventory tool that can be used to view facts, agents and classes for a node
  • Many UI enhancements to the CLI tools
Read full storyComments { 1 }

MCollective pgrep

The unix pgrep utility is great, it lets you grep through your process list and find interesting things. I wanted to do something similar but for my entire server group so built something quick ontop of MCollective.

I am using the Ruby sys-proctable gem to do the hard work, it returns a massive amount of information about each process and have written a simple agent on top of this.

The agent supports grepping the process tree but also supports kill and pgre+kill though I have not yet implemented more than the basic grep on the command line. Frankly the grep+kill combination scares me and I might remove it. A simple grep slipup and you will kill all processes on all your machine :) Sometimes too much power is too much and should just be avoided.

At the moment mc-pgrep outputs a set format but I intend to make that configurable on the command line, here’s a sample:

% mc-pgrep -C /dev_server/ ruby
 
 * [ ============================================================> ] 4 / 4
 
dev1.my.com
       root   9833  ruby /usr/sbin/mcollectived --pid=/var/run/mcollectived.pid 
       root  21608  /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_pass
 
dev2.my.com
       root  14568  /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_pass
       root  31595  ruby /usr/sbin/mcollectived --pid=/var/run/mcollectived.pid 
 
dev3.my.com
       root   1620  /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_pass
       root  14093  ruby /usr/sbin/mcollectived --pid=/var/run/mcollectived.pid 
 
dev4.my.com
       root   3231  /usr/lib/ruby/gems/1.8/gems/passenger-2.2.2/lib/phusion_pass
       root  20557  ruby /usr/sbin/mcollectived --pid=/var/run/mcollectived.pid 
 
   ---- process list stats ----
        Matched hosts: 4
    Matched processes: 8
        Resident Size: 37.264KB
         Virtual Size: 629.578MB

You can also limit it to only find zombies with the -z option.

This has been quite interesting for me, if I limit the pgrep to “.” (the pattern is regex) every machine will send back a Sys::ProcTable hash for all its processes. This is a 50 to 70 KByte payload per server. I’ve so far seen no problem getting his much traffic through ActiveMQ + MCollective and processing it all in a very short time:

% time mc-pgrep -F "country=/uk|us/" .
 
   ---- process list stats ----
        Matched hosts: 20
    Matched processes: 1958
        Resident Size: 1.777MB
         Virtual Size: 60.072GB
 
mc-pgrep -F "country=/uk|us/" .  0.19s user 0.06s system 7% cpu 3.420 total

That 3.4 seconds is with a 2 second discovery overhead client machine in Germany and the filter matching UK and US machines – all the way to the West Coast – my biggest delay here is network and not MC or ActiveMQ.

The code can be found at my GitHub account and still a bit of a work in progress, wiki pages will follow once I am happy with it.

And as an aside, I am slowly migrating at least my code to GitHub if not wiki and ticketing. So far my Plugins have moved, MC will move soon too.

Read full storyComments { 0 }

Puppet Concat 20100507

I’ve had quite a lot of contributions to my Puppet Concat module and after some testing by various people I’m ready to do a new release.

Thanks to Paul Elliot, Chad Netzer and David Schmitt for patches and assistance.

For background of what this is about please see my earlier post: Building files from fragments with Puppet

You can download the release here. Please pay special attention to the upgrade instructions below.

Changes in this release

  • Several robustness improvements to the helper shell script.
  • Removed all hard coded paths in the helper script to improve portability.
  • We now use file{} to copy the combined file to its location. This means you can now change the ownership of a file by just changing the owner/group in concat{}.
  • You can specify ensure => “/some/other/file” in concat::fragment to include the contents of another file in the fragment. Even files not managed by puppet.
  • The code is now hosted on Github and we’ll accept patches there.

Upgrading

When upgrading to this version you need to take particular care. All the fragments are now owned by root, the shell script runs as root and we use file{} to copy the resulting file out.

This means you’ll see the diff of not just the fragments but also the final file when running puppetd –test but unfortunately it also means the first time you run puppet with the new code your Puppet will fire off all notifies that you have on your concat{} resources. You’ll also see a lot of changes to resources in the fragments directory on first run. This is normal and expected behavior.

So if say you’re using the concat to create my.cf and notify the service to restart automatically then simply upgrading this module will result in MySQL restarting. This is a one off notify that happens only the first time, from then on it will be as normal. So I’d suggest when upgrading to disable those notifies till this upgrade is running everywhere and then put it back.

Read full storyComments { 1 }

Authorization plugins for MCollective SimpleRPC

Till now The Marionette Collective has relied on your middleware to provide all authorization and authentication for requests. You’re able to restrict certain middleware users from certain agents, but nothing more fine grained.

In many cases you want to provide much finer grain control over who can do what, some cases could be:

  • A certain user can only request service restarts on machines with a fact customer=acme
  • A user can do any service restart but only on machines that has a certain configuration management class
  • You want to deny all users except root from being able to stop services, others can still restart and start them

This kind of thing is required for large infrastructures with lots of admins all working in their own group of machines but perhaps a central NOC need to be able to work on all the machines, you need fine grain control over who can do what and we did not have this will now. It would also be needed if you wanted to give clients control over their own servers but not others.

Version 0.4.5 will have support for this kind of scheme for SimpleRPC agents. We wont provide a authorization plugin out of the box with the core distribution but I’ve made one which will be available as a plugin.

So how would you write an auth plugin, first a typical agent would be:

module MCollective
    module Agent
         class Service<RPC::Agent
             authorized_by :action_policy
 
             # ....
         end
    end
end

The new authorized_by keyword tells MCollective to use the class MCollective::Util::ActionPolicy to do any authorization on this agent.

The ActionPolicy class can be pretty simple, if it raises any kind of exception the action will be denied.

module MCollective
    module Util
         class ActionPolicy
              def self.authorize(request)
                  unless request.caller == "uid=500"
                      raise("You are not allow access to #{request.agent}::#{request.action}")
                  end
              end
         end
    end
end

This simple check will deny all requests from anyone but Unix user id 500.

It’s pretty simple to come up with your own schemes, I wrote one that allows you to make policy files like the one below for the service agent:

policy default deny
allow   uid=500 *                    *                *
allow   uid=502 status               *                *
allow   uid=600 *                    customer=acme    acme::devserver

This will allow user 500 to do everything with the service agent. User 502 can get the status of any service on any node. User 600 will be able to do any actions on machines with the fact customer=acme that also has the configuration management class acme::devserver on them. Everything else will be denied.

You can do multiple facts and multiple classes in a simple space separated list. The entire plugin to implement such policy controls was only 120 – heavy commented – lines of code.

I think this is a elegant and easy to use layer that provides a lot of functionality. We might in future pass more information about the caller to the nodes. There’s some limitations, specifically about the source of the caller information being essentially user provided so you need to keep that mind.

As mentioned this will be in MCollective 0.4.5.

Read full storyComments { 1 }