A Look At MCollective 2.0.0 and Beyond

It’s been a long time since I wrote any blog posts about MCollective, I’ll be rectifying that by writing a big series of blog posts over the next weeks or months.

MCollective 2.0 was recently released and it represents a massive internal restructure and improvement cycle. In 2.0 not a lof of the new functionality is visible immediately on the command line but the infrastructure now exist to innovate quite rapidly in areas of resource discovery and management. The API has had a lot of new capabilities added that allows MCollective to be used in many new use cases as well as improving on some older ones.

Networking and Addressing has been completely rewritten and reinvented to be both more powerful and more generic. You can now use MCollective in ways that were previously not possible or unsuitable for certain use cases, it is even more performant and more pluggable. Other parts of the ecosystem like ActiveMQ and the STOMP protocol has had major improvements and MCollective is utilising these improvements to further its capabilities.

The process of exposing new features based on this infrastructure rewrite to the end user has now started. Puppet Labs have recently released version 2.1.0 which is the first in a new development cycle and this release have hugely improved the capabilities of the discovery system – you can now literally discover against any conceivable source of data on either the client side or out on your network or a mix of both. You can choose when you want current network conditions to be your source of truth or supply the source of truth from any data source you might have. In addition an entirely new style of addressing and message delivery has been introduced that creates many new usage opportunities.

The development pace of MCollective has taken a big leap forward, I am now full time employed by Puppet Labs and working on MCollective. Future development is secure and the team behind is growing as we look at expending it’s feature set.

I’ll start with a bit of a refresher about MCollective for those new to it or those who looked in the past at but maybe want to come back for another look. In the coming weeks I’ll follow up with a deeper look into some of the aspects highlighted below and also the new features introduced since 2.0.0 came out.

Version 2.0 represents a revolutionary change to MCollective so there is lots of ground to cover each blog post in the series will focus on one aspect of the new features and capabilities.

The Problem

Modern systems management has moved on from just managing machines with some reasonably limited set of software on them to being a really large challenge in integrating many different systems together. More and more the kinds of applications we are required to support are made up of many internal components spread across 100s of machines in ever increasing complexity. We are now Integration Experts above all – integrate legacy systems with cloud ones, manage hi-brid public and private clouds, integrate external APIs with in house developed software and often using cutting edge technologies that tend to be very volatile. Today we might be building our infrastructure on some new technology that does not exist tomorrow.

Worse still the days of having a carefully crafted network that’s a known entity with individually tuned BIOS settings and hand compiled kernels is now in the distant past. Instead we have machines being created on demand and shutdown when the demand for their resources have passed. Yet we still need to be able to manage them, monitor them and configure them. The reality of a platform where at some point of the day it can be 200 nodes big and later on the same day it can be 50 nodes has invalidated many if not most of our trusted technologies like monitoring, management, dns and even asset tracking.

Developers have had tools that allow them to cope with this ever changing landscape by abstracting the communications between 2 entities via a well defined interface. Using an interface to define a communications contract between component A and component B means if we later wish to swap out B for C that if we’re able to create a wrapper around C that complies to the published interface we’ll be able to contain the fallout from a major technology change. They’ve had more dynamic service registries that’s much more capable of coping with change or failure than the older rigid approach to IT management afforded.

Systems Administrators has some of this in that most of our protocols are defined in RFCs and we can generally assume that it would be feasible to swap one SMTP server for another. But what about the management of the actual mail server software in question? You would have dashboards, monitoring, queue management, alerting on message rates, trend analysis to assist in capacity planning. You would have APIs to create new domains, users or mail boxes in the mail system often accessed directly by frontend web dashboards accessible by end users. You would expose all or some of these to various parts of your business such as your NOC, Systems Operators and other technical people who have a stake in the mail systems.

The cost of changing your SMTP server is in fact huge and the fact that the old and new server both speak SMTP is just a small part of the equation as all your monitoring, management capability and integration with other systems will need to be redeveloped often resulting in changes in how you manage those systems leading to retraining of staff and a cycle of higher than expected rate of human error. The situation is much worse if you had to run a heterogeneous environment made up of SMTP software from multiple vendors.

In very complex environments where many subsystems and teams would interact with the mail system you might find yourself with a large mixture of Authentication Methods, Protocols, User Identities, Auditing and Authorization – if you’re lucky to have them at all. You might end up with a plethora of systems from front-end web applications to NOCs or even mobile workforce all having some form of privileged access to the systems under management – often point to point requiring careful configuration management. Managing this big collection of AAA methods and network ACL controls is very complex often leading to environments with weak AAA management that are almost impossible to make compliant to systems like PCI or SOX.

A Possible Solution

One part of a solution to these problems is a strong integration framework. One that provides always present yet pluggable AAA. One that lets you easily add new capabilities to the network in a way that is done via a contract between the various systems enabling networks made up of heterogeneous software stacks. One where interacting with these capabilities can be done with ease from the CLI, Web or other mediums and that remains consistent in UX as your needs change or expand.

You need novel ways to address your machines that are both dynamic and rigid when appropriate. You need a platform thats reactive to change, stable yet capable of operating sanely in degraded networking conditions. You need a framework that’s capable of doing the simplest task on a remote node such as running a single command to being the platform you might use to build a cloud controller for your own PAAS.

MCollective is such an framework aimed at the Systems Integrator. It’s used by people just interacting with it on a web UI to do defined tasks to commercial PAAS vendors using it as the basis of their cloud management. There are private clouds built using MCollective and libvirt that manages 100s of dom0s controlling many more virtual machines. It’s used in many industries solving a wide range of integration problems.

The demos you might have seen have usually been focussed on CLI based command and control but it’s more than that – CLIs are easy to demo, long running background orchestration of communication between software subsystems is much harder to demo. As a command and control channel for the CLI MCollective shines and is a pleasure to use but MCollective is an integration framework that has all the components you might find in larger enterprise integration systems, these include:

Resource discovery and addressing
Flexible registration system capable of building CMDBs, Asset Systems or any kind of resource tracker
Contract based interfaces between client and servers
Strong introspective abilities to facilitate generic user interfaces
Strong input validation on both clients and servers for maximum protection
Pluggable AAA that allows you to change or upgrade your security independant of your code
Overlay networking based on Message Orientated Middleware where no 2 components require direct point to point communications
Industry standard security via standard SSL as delivered by OpenSSL based on published protocols like STOMP and TLS
Auto generating documentation
Auto generating packaging for system components that’s versioned and managed using your OS capabilities without reinventing packaging into yet another package format
Auto generating code using generators to promote a single consistant approach to designing network components

MCollective is built as distributed system utilising Message Orientated Middleware. It presents a Remote Procedure Call based interface between your code and the network. Unlike other RPC systems it’s a parallel RPC system where a single RPC call can affect one or many nodes at nearly the same time affording great scale, performance and flexibility – while still maintaining a more traditional rolling request cycle approach.

Distributed systems are hard, designing software to be hosted across a large number of hosts is difficult. MCollective provides a series of standards, conventions and enforced relationships that when embraced allow you to rapidly write code and run it across your network. Code that do not need to be aware of the complexities of AAA, addressing, network protocols or where clients are connecting from – these are handled by layers around your code.

MCollective specifically is designed for your productivity and joy – these are the ever present benchmarks every feature is put against before merging. It uses the Ruby language that’s very expressive and easy to pick up. It has a bunch of built in error handling that tend to do just the right thing and when used correctly you will almost never need to write a user interface – but when you do need custom user interfaces it provides a easy to use approach for doing so full of helpers and convention to make it easy to create a consistant experience for your software.

How to design interaction between loosely coupled systems is often a question people struggle with, MCollective provides a single way to design the components and provides a generic way to interact with those components. This means as a Systems Integrator you can focus on your task at hand and not be sucked into the complexities of designing message passing, serialization and other esoteric components of distributed systems. But it does not restrict you to the choices we made as framework developers as almost every possible components of MCollective is pluggable from network transport, encryption systems, AAA, serializers and even the entire RPC system can be replaced or complimented by different one that meets your needs.

The code base is meticulously written to be friendly, obvious and welcoming to newcomers to this style of programming or even the Ruby language. The style is consistant throughout, the code is laid out in a very obvious manner and commented where needed. You should not have a problem just reading the code base to learn how it works under the hood. Where possible we avoid meta programming and techniques that distract from the readability of the code. This coding style is a specific goal and required for this kind of software and an aspect we get complimented on weekly.

You can now find it pre-packaged in various distributions such as Ubuntu, Fedora and RHEL via EPEL. It’s known to run on many platforms and different versions of Ruby and has even been embedded into Java systems or ran on iPhones.

Posts in this series

This series is still being written, posts will be added here as they get written:

A Look At MCollective 2.0.0 and Beyond

The Problem

A Possible Solution

Posts in this series

Licence