R.I.Pienaar

Choria Update

02/12/2017

Recently at Config Management Camp I’ve had many discussions about Orchestration, Playbooks and Choria, I thought it’s time for another update on it’s status.

I am nearing version 1.0.0, there are a few things to deal with but it’s getting close. Foremost I wanted to get the project it’s own space on all the various locations like GitHub, Forge, etc.

Inevitably this means getting a logo, it’s been a bit of a slog but after working through loads of feedback on Twitter and offers for assistance from various companies I decided to go to a private designer called Isaac Durazo and the outcome can be seen below:


 

The process of getting the logo was quite interesting and I am really pleased with the outcome, I’ll blog about that separately.

Other than the logo the project now has it’s own GitHub organisation at https://github.com/choria-io and I have moved all the forge modules to it’s own space as well https://forge.puppet.com/choria.

There are various other places the logo show up like in the Slack notifications and so forth.

On the project front there’s a few improvements:

  • There is now a registration plugin that records a bunch of internal stats on disk, the aim is for them to be read by Collectd and Sensu
  • A new Auditing plugin that emits JSON structured data
  • Several new Data Stores for Playbooks – files, environment.
  • Bug fixes on Windows
  • All the modules, plugins etc have moved to the Choria Forge and GitHub
  • Quite extensive documentation site updates including branding with the logo and logo colors.

There is now very few things left to do to get 1.0.0 out but I guess another release or two will be done before then.

So from now to update to coming versions you need to use the choria/mcollective_choria module which will pull in all it’s dependencies from the Choria project rather than my own Forge.

Still no progress on moving the actual MCollective project forward but I’ve discussed a way to deal with forking the various projects in a way that seems to work for what I want to achieve. In reality I’ll only have time to do that in a couple of months so hopefully something positive will happen in the mean time.

Head over to Choria.io to take a look.

Choria Playbooks – Data Sources

01/23/2017

About a month ago I blogged about Choria Playbooks – a way to write series of actions like MCollective, Shell, Slack, Web Hooks and others – contained within a YAML script with inputs, node sets and more.

Since then I added quite a few tweaks, features and docs, it’s well worth a visit to choria.io to check it out.

Today I want to blog about a major new integration I did into them and a major step towards version 1 for Choria.

Overview


In the context of a playbook or even a script calling out to other system there’s many reasons to have a Data Source. In the context of a playbook designed to manage distributed systems the Data Source needed has some special needs. Needs that tools like Consul and etcd fulfil specifically.

So today I released version 0.0.20 of Choria that includes a Memory and a Consul Data Source, below I will show how these integrate into the Playbooks.

I think using a distributed data store is important in this context rather than expecting to pass variables from the Playbook around like on the CLI since the business of dealing with the consistency, locking and so forth are handled and I can’t know all the systems you wish to interact with, but if those can speak to Consul you can prepare an execution environment for them.

For those who don’t agree there is a memory Data Store that exists within the memory of the Playbook. Your playbook should remain the same apart from declaring the Data Source.

Using Consul


Defining a Data Source


Like with Node Sets you can have multiple Data Sources and they are identified by name:

data_stores:
  pb_data:
    type: consul
    timeout: 360
    ttl: 20

This creates a Consul Data Source called pb_data, you need to have a local Consul Agent already set up. I’ll cover the timeout and ttl a bit later.

Playbook Locks


You can create locks in Consul and by their nature they are distributed across the Consul network. This means you can ensure a playbook can only be executed once per Consul DC or by giving a custom lock name any group of related playbooks or even other systems that can make Consul locks.

---
locks:
  - pb_data
  - pb_data/custom_lock

This will create 2 locks in the pb_data Data Store – one called custom_lock and another called choria/locks/playbook/pb_name where pb_name is the name from the metadata.

It will try to acquire a lock for up to timeout seconds – 360 here, if it can’t the playbook run fails. The associated session has a TTL of 20 seconds and Choria will renew the sessions around 5 seconds before the TTL expires.

The TTL will ensure that should the playbook die, crash, machine die or whatever, the lock will release after 20 seconds.

Binding Variables


Playbooks already have a way to bind CLI arguments to variables called Inputs. Data Sources extend inputs with extra capabilities.

We now have two types of Input. A static input is one where you give the data on the CLI and the data stays static for the life of the playbook. A dynamic input is one bound against a Data Source and the value of it is fetched every time you reference the variable.

inputs:
  cluster:
    description: "Cluster to deploy"
    type: "String"
    required: true
    data: "pb_data/choria/kv/cluster"
    default: "alpha"

Here we have a input called cluster bound to the choria/kv/cluster key in Consul. This starts life as a static input and if you give this value on the CLI it will never use the Data Source.

If however you do not specify a CLI value it becomes dynamic and will consult Consul. If there’s no such key in Consul the default is used, but the input remains dynamic and will continue to consult Consul on every access.

You can force an input to be dynamic which will mean it will not show up on the CLI and will only speak to a data source using the dynamic: true property on the Input.

Writing and Deleting Data


Of course if you can read data you should be able to write and delete it, I’ve added tasks to let you do this:

locks:
  - pb_data
 
inputs:
  cluster:
    description: "Cluster to deploy"
    type: "String"
    required: true
    data: "pb_data/choria/kv/cluster"
    default: "alpha"
    validation: ":shellsafe"
 
hooks:
  pre_book:
    - data:
        action: "delete"
        key: "pb_data/choria/kv/cluster"
 
tasks:
  - shell:
      description: Deploy to cluster {{{ inputs.cluster }}}
      command: /path/to/script --cluster {{{ inputs.cluster }}}
 
  - data:
      action: "write"
      value: "bravo"
      key: "pb_data/choria/kv/cluster"
 
  - shell:
      description: Deploy to cluster {{{ inputs.cluster }}}
      command: /path/to/script --cluster {{{ inputs.cluster }}}

Here I have a pre_book task list that ensures there is no stale data, the lock ensures no other Playbook will mess around with the data while we run.

I then run a shell command that uses the cluster input, with nothing there it uses the default and so deploys cluster alpha, it then writes a new value and deploys cluster brova.

This is a bit verbose I hope to add the ability to have arbitrarily named tasks lists that you can branch to, then you can have 1 deploy task list and use the main task list to set up variables for it and call it repeatedly.

Conclusion


That’s quite a mouthful, the possibilities of this is quite amazing. On one hand we have a really versatile data store in the Playbooks but more significantly we have expanded the integration possibilities by quite a bit, you can now have other systems manage the environment your playbooks run in.

I will soon add task level locks and of course Node Set integration.

For now only Consul and Memory is supported, I can add others if there is demand.

Choria Playbooks

12/26/2016

Today I am very pleased to release something I’ve been thinking about for years and actively working on since August.

After many POCs and thrown away attempts at this over the years I am finally releasing a Playbook system that lets you run work flows on your MCollective network – it can integrate with a near endless set of remote services in addition to your MCollective to create a multi service playbook system.

This is a early release with only a few integrations but I think it’s already useful and I’m looking for feedback and integrations to build this into something really powerful for the Puppet eco system.

The full docs can be found on the Choria Website, but below you can get some details.

Overview


Today playbooks are basic YAML files. Eventually I envision a Service to execute playbooks on your behalf, but today you just run them in your shell, so they are pure data.

Playbooks have a basic flow that is more or less like this:

  1. Discover named Node Sets
  2. Validate the named Node Sets meet expectations such as reachability and versions of software available on them
  3. Run a pre_book task list that lets you do prep work
  4. Run the main tasks task list where you do your work, around every task certain hook lists can be run
  5. Run either the on_success or on_fail task list for notification of Slacks etc
  6. Run the post_book task list for cleanups etc

Today a task can be a MCollective request, a shell script or a Slack notification. I imagine this list will grow huge, I am thinking you will want to ping webhooks, or interact with Razor to provision machines and wait for them to finish building, run Terraform or make EC2 API requests. This list of potential integrations is endless and you can use any task in any of the above task lists.

A Node Set is simply a named set of nodes, in MCollective that would be certnames of nodes but the playbook system itself is not limited to that. Today Node Sets can be resolved from MCollective Discovery, PQL Queries (PuppetDB), YAML files with groups of nodes in them or a shell command. Again the list of integrations that make sense here is huge. I imagine querying PE or Foreman for node groups, querying etcd or Consul for service members. Talking to random REST services that return node lists or DB queries. Imagine using Terraform outputs as Node Set sources or EC2 API queries.

In cases where you wish to manage nodes via MCollective but you are using a cached discovery source you can ask node sets to be tested for reachability over MCollective. And node sets that need certain MCollective agents can express this desire as SemVer version ranges and the valid network state will be asserted before any playbook is run.

Playbooks do not have a pseudo programming language in them though I am not against the idea. I do not anticipate YAML to be the end format of playbooks but it’s good enough for today.

Example


I’ll show an example here of what I think you will be able to achieve using these Playbooks.

Here we have a web stack and we want to do Blue/Green deploys against it, sub clusters have a fact cluster. The deploy process for a cluster is:

  • Gather input from the user such as cluster to deploy and revision of the app to deploy
  • Discover the Haproxy node using Node Set discovery from PQL queries
  • Discover the Web Servers in a particular cluster using Node Set discovery from PQL queries
  • Verify the Haproxy nodes and Web Servers are reachable and running the versions of agents we need
  • Upgrade the specific web tier using:
    1. Tell the ops room on slack we are about to upgrade the cluster
    2. Disable puppet on the webservers
    3. Wait for any running puppet runs to stop
    4. Disable the nodes on a particular haproxy backend
    5. Upgrade the apps on the servers using appmgr#upgrade to the input revision
    6. Do up to 10 NRPE checks post upgrade with 30 seconds between checks to ensure the load average is GREEN, you’d use a better check here something app specific
    7. Enable the nodes in haproxy once NRPE checks pass
    8. Fetch and display the status of the deployed app – like what version is there now
    9. Enable Puppet

Should the task list all FAIL we run these tasks:

  1. Call a webhook on AWS Lambda
  2. Tell the ops room on slack
  3. Run a whole other playbook called deploy_failure_handler with the same parameters

Should the task list PASS we run these tasks:

  1. Call a webhook on AWS Lambda
  2. Tell the ops room on slack

This example and sample playbooks etc can be found on the Choria Site.

Status


Above is the eventual goal. Today the major missing piece here that I think MCollective needs to be extended with the ability for Agent plugins to deliver a Macro plugin. A macro might be something like Puppet.wait_till_idle(:timeout => 600), this would be something you call after disabling the nodes and you want to be sure Puppet is making no more changes, you can see the workflow above needs this.

There is no such Macros today, I will add a stop gap solution as a task that waits for a certain condition but adding Macros to MCollective is high on my todo list.

Other than that it works, there is no web service yet so you run them from the CLI and the integrations listed above is all that exist, they are quite easy to write so hoping some early adopters will either give me ideas or send PRs!

This is available today if you upgrade to version 0.0.12 of the ripienaar-mcollective_choria module.

See the Choria Website for much more details on this feature and a detailed roadmap.

UPDATE: Since posting this blog I had some time and added: Terraform Node Sets, ability to create GET and POST Webhook requests and the much needed ability to assert and wait for remote state.

An update on my Choria project

12/13/2016

Some time ago I mentioned that I am working on improving the MCollective Deployment story.

I started a project called Choria that aimed to massively improve the deployment UX and yield a secure and stable MCollective setup for those using Puppet 4.

The aim is to make installation quick and secure, towards that it seems a common end to end install from scratch by someone new to project using a clustered NATS setup can take less than a hour, this is a huge improvement.

Further I’ve had really good user feedback, especially around NATS. One user reports 2000 nodes on a single NATS server consuming 300MB RAM and it being very performant, much more so than the previous setup.

It’s been a few months, this is whats changed:

  • The module now supports every OS AIO Puppet supports, including Windows.
  • Documentation is available on choria.io, installation should take about a hour max.
  • The PQL language can now be used to do completely custom infrastructure discovery against PuppetDB.
  • Many bugs have been fixed, many things have been streamlined and made more easy to get going with better defaults.
  • Event Machine is not needed anymore.
  • A number of POC projects have been done to flesh out next steps, things like a very capable playbook system and a revisit to the generic RPC client, these are on GitHub issues.

Meanwhile I am still trying to get to a point where I can take over maintenance of MCollective again, at first Puppet Inc was very open to the idea but I am afraid it’s been 7 months and it’s getting nowhere, calls for cooperation are just being ignored. Unfortunately I think we’re getting pretty close to a fork being the only productive next step.

For now though, I’d say the Choria plugin set is production ready and stable any one using Puppet 4 AIO should consider using these – it’s about the only working way to get MCollective on FOSS Puppet now due to the state of the other installation options.

Starting a newsletter

11/08/2016

I share a ton of links, I have a thing that harvests my twitter account and over a few years it has collected over 4 500 links. Twitter is great for that but also it’s hard to add a bit of annotation or comment to those shared links in the constrained space. Of course I also curate my Free for Dev list which has grown quite huge, so I come across a lot of interesting stuff.

Everyone these days seem to have a news letter and I’ve toyed with the idea for a long time. The idea would be to share the random things I come across, no real theme, goal or set frequency, just a list of stuff I find with some short comments whenever I feel I have a good selection of things to share. I could stick them up here as blog posts but I don’t want one of those sad blogs where the entire frontpage is weekly link lists 😉

Well I’ve decided to give it a go, I signed up with Mail Chimp and set something up there. Last week I tweeted twice about it and tried to get some early adopters on board for the first mail in order to get some feedback and so forth. This has been quite positive thanks a lot to those who took the time to send me a note.

So if you’d like to subscribe please head over to https://www.devco.net/ and sign up at the bar above, it’s the usual Mail Chimp thing so you’ll get a confirmation email to double opt-in with. If you want to see what you’re getting yourself into the archive can be found at Mail Chimp, there is one mail there already.

The next one should go out tomorrow or so, looking forward to trying this out!

Older Posts