Select Page

Moving a service from Puppet to Docker

I’ve moved a number of my more complex infrastructure components from being Puppet managed to being Docker managed. There are many reasons for this the main one being my Puppet code is ancient and faced with a rewrite to be Puppet 4 like or just rethinking things, I’m leaning towards rethinking. I don’t think CM is solving the right problem for me for certain aspects of my infrastructure and new approaches can bring more value for my use case.

There’s a lot of posts around talking about Docker and concentrating on the image building side of it or just the running of a container side – which I find quite uninteresting and in fact pretty terrible. The real benefit for me comes in workflow, the API, the events out of the daemon and the container stats. People look at the image and container aspects in isolation and go on about how this is not new technology, but that’s missing the point.

Mainly a workflow problem

I’ll look at an example moving rbldnsd from Puppet to Docker managed and what I gain from that. Along the way I’ll also throw in some examples of a more complex migration I did for my bind servers. In case you don’t know rbldnsd is a daemon that maintains a DNS based RBLs using config files that look something like this:

$DATASET dnset senderhost
.digitalmarketeer.com   :127.0.0.2:Connection rejected after user complaints.

You can then query it using the usual ways your MTA support and decide policy based on that.

The life cycle of this service is typical of the ones I am targeting:

  • A custom RPM had to be built and maintained and served from yet another piece of infrastructure.
  • The basic package, config, service triplet. So vanilla it’s almost not worth looking at the code, it looks like all other package, config, service code.
  • Requires ongoing data management – I add/remove hosts from the blacklists constantly. But this overlaps with the config part above.
  • Requires the ability to test DNS queries work in development before even committing the change
  • Requires rapid updating of configuration data

The last 3 points here deserve some explanation. When I am editing these configuration files I want to be able to test them right there in my shell without even committing them to git. This means starting up a rbldnsd instance and querying it with dig. This is pretty annoying to do with the puppet work flow which I won’t go into here as it’s a huge subject on it’s own. Suffice to say it doesn’t work for me and end up not being production like at all.

When I am updating this config files onto the running service there’s a daemon that will load them into its running memory. I need to be pretty sure that daemon I am testing on is identical to what’s in production now. Ideally bit for bit identical. Again this is pretty hard as many/most dev environments tend to be a few steps ahead of production. I need a way to say give me the bits running production and throw this config at them and then do an end to end test with no hassles and in 5 seconds.

I need a way to orchestrate that config data update to happen when I need it to happen – and not when Puppet runs again – and ideally it has to be quick, not at the pace that Puppet manages my 600 resources. Services should let me introspect them to figure out how to update their data and a generic updater should be able to update all my services that match this flow.

I’ve never really solved the last 3 points with my Puppet workflows for anything I work on, it’s a fiendishly complex problem to solve correctly. Everyone does it with Vagrant instances or ever more complex environments. Or they do their change, commit it and make sure there are test coverage and only get feedback later when something like Beaker ran. This is way too slow for me in this scenario. I just want to block 1 annoying host. Vagrant especially does not work for me as I refuse to run things on my desktop or laptop, I develop on VMs that are remote, so Vagrant isn’t an option. Additionally Vagrant environments become so complex, basically a whole new environment. Yet built in annoyingly different ways so that keeping match with Production can be a challenge – or just prohibitively slow if you’re building them out with Puppet. So you end up again not testing in a environment that’s remotely production like.

These are pretty major things that I’ve never been able to solve to my liking with Puppet. I’ve first moved a bunch of my web sites then bind and now rbldnsd to Docker and think I’ve managed to come up with a workflow and toolchain that solves this for me.

Desired outcome

So maybe to demonstrate what I am after I should show what I want the outcome to look like. Here’s a rbldnsd dev session. I want to block *.mailingliststart.com, specifically I saw sh8.mailingliststart.com in my logs. I want to test the hosts are going to be blocked correctly before pushing to prod or even committing to git – it’s so embarrassing to make fix commits for obvious dumb things ๐Ÿ˜›

So I add to the zones/bl file:

.mailingliststart.com :127.0.0.2:Excessive spam from this host
$ vi zones/bl
$ rake test:host
Host name to test: sh8.mailingliststart.com
Testing sh8.mailingliststart.com
 
Starting the rbldnsd container...
>>> Testing black list
docker exec rbldnsd dig -p 5301 +noall +answer any sh8.mailingliststart.com.senderhost.bl.rbl @localhost
sh8.mailingliststart.com.senderhost.bl.rbl. 2100 IN A 127.0.0.2
sh8.mailingliststart.com.senderhost.bl.rbl. 2100 IN TXT "Excessive spam from this host"
 
>>> Testing white list
.
.
.
 
Removing the rbldnsd container...
$ git commit zones -m 'block mailingliststart.com'
$ git push origin master

Here I added the bits to the config file and want to be sure the hostname I saw in my logs/headers will actually be blocked.:

  • It prepares the latest container by default and mounts my working directory into the container with -v ${PWD}:/service.
  • Container starts up just like it would in production using the same bits that’s running production – but reads the new uncommitted config
  • It uses dig to query the running rbldnsd and run any in-built validation steps the service has (this container has none yet)
  • Cleans up everything

The whole thing takes about 4 seconds on a virtual machine running on virtualbox on circa 2009 Mac. I saw the host was blacklisted and not somehow also whitelisted, looks good, commit and push.

Once pushed a webhook triggers my update orchestration and the running containers get the new config files only. The whole edit, test and deploy process takes less than a minute. The data though is in git which means tonight when my containers get rebuilt from fresh they will get this change baked in and rolled out as new instances.

There’s one more pretty mind blowing early feedback story I wanted to add here. My bind zones used to be made with puppet defines:

bind::zone{"foo.com": owner => "Bob", masterip => "1.2.3.4", type => $server_type}

I had no idea what this actually did by reading that line of code. I could guess yeah sure. But you only know for sure with certainty when you run Puppet in production since no matter what the hype says, you’ll only see the diff against actual production file when that hits the production box using Puppet. Not OK. You also learn nothing with this, it’s always bothered me that Puppet end up being a crutch like a calculator, I have all these abstractions and so a junior using this define might never even know what it does or learn how bind works. Desirable in some cases, not for me.

In my Docker bind container I have a YAML file:

zones:
  Bob:
    options:
      masterip: 1.2.3.4
    domains:
     - foo.com

It’s the same data I had in my manifest just structured a bit different. Same basic problem though I have no idea what this does by looking at it. In docker world though you need to bake this YAML into bind config. And this has to be done during development so that a docker build can get to the final result. So I add a new domain bar.com:

$ vi zones.yaml
$ rake construct:files
Reading specification file buildsettings.yaml
Reading scope file zones.yaml
Rendering conf/named_slave_zones with mode 644 using template templates/slave_zones.erb
Rendering conf/named_master_zones with mode 644 using template templates/master_zones.erb
 
 conf/named_master_zones | 10 ++++++++++
 conf/named_slave_zones  |  9 +++++++++
 2 files changed, 19 insertions(+)
$ git diff
+// Bob
+zone "bar.com" {
+  type slave;
+  file "/srv/named/zones/slave/bar.com";
+  masters {
+    1.2.3.4;
+  };
+};

The rake construct:files just runs a bunch of ERB templates over the zones hash – it’s basically identical to the templates I had in Puppet with just a few var name changes and slightly different looping, no more or less complex.

This is the actual change that will hit production. No ifs or buts, that’s what will change in prod. When I rake test here without comitting this, this actual production change is being tested against the actual bits in the named binary that today runs production.

$ time rake test
docker run -ti --rm -v /home/rip/work/docker_bind:/srv/named -e TEST=1 ripienaar/bind
>> Checking named.conf syntax in master mode
>> Checking named.conf syntax in slave mode
>> Checking zones..
rake test  0.18s user 0.33s system 7% cpu 3.858 total

Again my work dir is mounted into the container version currently running in production, my uncommitted change is tested using the bit for bit identical version of bind as currently in prod. This is a massive confidence boost and the feedback cycle is Implementation Details

I won’t go into all the Dockerfile details it’s just normal stuff. The image building and running of containers is not exciting. The layout of the services are something like this:

/service/bin/start.sh
/service/bin/update.sh
/service/bin/validate.sh
/service/zones/{bl,gl,wl}
/opt/rbldnsd-0.997a/rbldnsd

What is exciting is that I can introspect a running container. The Dockerfile has lines like this:

ENV UPDATE_METHOD /service/bin/update.sh
ENV VALIDATE_METHOD /service/bin/validate.sh

And an external tool can find out how this container likes to be updated or validated – and later monitored:

$ docker inspect rbldnsd
.
.
        "Env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
            "UPDATE_METHOD=/service/bin/update.sh",
            "VALIDATE_METHOD=/service/bin/validate.sh",
            "GIT_REF=fa9dd19d93e6d6cb7d5b2ebdc57f99cd2906df6f"
        ],

My update webhook basically just does this:

mco rpc docker runtime_update container=rbldnsd -S container("rbldnsd").present=1 --batch 1

So I use mcollective to target an update operation on all machines that runs the rbldnsd container – 1 at a time. The mcollective agent uses docker inspect to introspect the container. Once it knows how the container wants to be updated it calls that command using docker exec.

Outcome Summary

For me this turned out to be a huge win. I had to do a lot of work on the image building side of things, the orchestration, deployment etc – things I had to do with Puppet too anyway. But this basically ticks all the boxes for me that I had in the beginning of this post and quite a few more:

  • A reasonable facsimile of the package, config, service triplet that yields idempotent builds
  • A comfortable way to develop and test my changes locally with instant feedback like I would with unit tests for normal code but for integration tests of infrastructure components using the same bits as in production.
  • Much better visibility over what’s actually going to change, especially in complex cases where config files are built using templates
  • An approach where my services are standalone and they all have to think about their run, update and validation cadences. With those being introspectable and callable from the outside.
  • My services are standalone artefacts and versioned as a whole. Not spread around the place on machines, in package repos, in data and in CM code that attempts to tie it all together. It’s one thing, from one git repo, stored in one place with a version.
  • With validation built into the container and the container being a runnable artefact I get to do this during CI before rolling anything out just like I do on my CLI. And always the actual bits in use or proposed to be used in Production are used.
  • Overall I have a lot more confidence in my production changes now than I had with the Puppet workflow.
  • Changes can be rolled out to running containers very rapidly – less than 10 seconds and not at the slow Puppet run pace.
  • My dev environment is hugely simplified yet much more flexible as I can run current, past and future versions of anything. With less complexity.
  • Have a very nice middle ground between immutable server and the need for updating content. Containers are still rebuilt and redeployed every night on schedule and they are still disposable but not at the cost of day to day updates.

I’ve built this process into a number of containers now some like this that are services and even some web ones like my wiki where I edit markdown files and they get rolled out to the running containers immediately on push.

I still have some way to go with monitoring and these services are standalone and not complex multi-component ones but I don’t foresee huge issues with those.

I couldn’t solve this with all these outcomes without a rapid way to stand up and destroy production environments that are isolated from my machine I am developing on. Especially if the final service is some loosely coupled combination of parts from many different sources. I’d love to talk to people who think they have something approaching this without using Docker or similar and be proven wrong but for now, this is a huge step forward for me.

So Puppet and CM tools are irrelevant now?

Getting back to the Puppet part of this post. I could come up with some way to mix Puppet in here too. There are though other interesting aspects about the Docker life cycle that I might blog about later which I think makes it a bit of a square peg in a round hole to combine these two tools. Especially I think people today who think they should use Puppet to build containers or configure containers are a bit miss guided and missing out, I hope they keep working on that though and get somewhere interesting because omfg Dockerfiles but I don’t think the current attempts are interesting.

It kind of gets back to the old thing where it turns out Puppet is not a good choice to manage deployments of Applications but its ok for Infrastructure. I am reconsidering what is infrastructure and what are applications.

So I chose to rethink things from the ground up – how would a nameserver service looked if I considered it Application and not Infrastructure and how should a Application development life cycle around that service look?

This is not a new realisation for me, I’ve often wished and expressed the desire that Puppet Labs should focus a lot more on the workflow and the development cycle and work on providing tools and hooks for that and think about how to make that better, I don’t think that’s really happened. So the conclusion for me was that for this Application or Service development and deployment life cycle Puppet was the wrong tool. I also realise I don’t even remotely resemble their paying target audience.

I am also not saying Puppet or other CM tools are irrelevant due to Docker that’s just madness. I think there’s a place where the 2 worlds meet and for me I am starting to notice that a lot of what I thought was Infrastructure are actually Applications and these have different development and deployment needs which CM and Puppet especially do not address.

Soon there will not be a single mention of DNS related infrastructure in my Puppet code. The container and related files are about equal in complexity and lines of code to what was in Puppet, the final outcome is about the same and it’s as configurable to my environments. The workflow though is massively improved because now I have the advantages that Application developers had for this piece of Infrastructure. Finally a much larger part of the Infrastructure As Code puzzle is falling together and it actually feels like I am working on code with the same feedback cycles and single verifiable artefact outcomes. And that’s pretty huge. Infrastructure are still being CM managed – I just hope to have a radically reduced Infrastructure footprint.

The big take away here isn’t that Docker is some technological magical bullet killing off vast parts of the existing landscape or destroying a subset of tools like CM completely. It brings workflow and UX improvements that are pretty unique and well worth exploring. And this is especially a part where the CM folk have basically just not focussed on. The single biggest win is probably the single artefact aspect as this enables everything I mentioned here.

It also brings a lot of other things from the daemon side – the API, the events, the stats etc that I didn’t talk about here and those are very big deals too wrt what future work they enable. But that’s for future posts.

Technically I think I have a lot of bad things to say about almost every aspect of Docker but those are out weighed by this rapid feedback and increased overall confidence in making change at the pace I would like to.

Some travlrmap updates

Been a while since I posted here about my travlrmap web app, I’ve been out of town the whole of February – first to Config Management Camp and then on holiday to Spain and Andorra.

I released version 1.5.0 last night which brought a fair few tweaks and changes like updating to latest Bootstrap, improved Ruby 1.9.3 UTF-8 support, give it a visual spruce up using the Map Icons Collection and gallery support.

I take a lot of photos and of course often these photos coincide with travels. I wanted to make it easy to put my travels and photos on the same map so have started adding a gallery ability to the map app. For now it’s very simplistic, it makes a point with a custom HTML template that just opens a new tab to the Flickr slideshow feature. This is not what I am after exactly, ideally when you click view gallery it would just open a overlay above the map and show the gallery with escape to close – that would take you right back to the map. There re some bootstrap plugins for this but they all seem to have some pain points so that’s not done now.

Today there’s only Flickr support and a gallery takes a spec like :gallery: flickr,user=ripienaar,set=12345 and from there it renders the Flickr set. Once I get the style of popup gallery figured out I’ll make that pluggable through gems so other photo gallery tools can be supported with plugins.

As you can see from above the trip to Spain was a Road Trip, I kept GPX tracks of almost the entire trip and will be adding support to show those on the map and render them. Again they’ll appear as a point just like galleries and clicking on them will show their details like a map view of the route and stats. This should be the aim for the 1.6.0 release hopefully.

Running a secure docker registry behind Apache

I host a local Docker registry and used to just have this on port 5000 over plain http. I wanted to put it behind SSL and on port 443 and it was annoying enough that I thought I’d write this up.

I start my registry pretty much as per the docs:

% docker run --restart=always -d -p 5000:5000 -v /srv/docker-registry:/tmp/registry --name registry registry

This starts it, ensure it stays running, makes it listen on port 5000 and also use a directory on my host for the file storage so I can remove and upgrade the registry without issues.

The problem with this is there’s no SSL and so you need to configure docker specifically with:

docker -d --insecure-registry registry.devco.net:5000

At first I thought just fronting it with Apache will be as easy as:

<VirtualHost *:443>
   ServerName registry.devco.net
   ServerAdmin webmaster@devco.net
 
   SSLEngine On
   SSLCertificateFile /etc/httpd/conf.d/ssl/registry.devco.net.cert
   SSLCertificateKeyFile /etc/httpd/conf.d/ssl/registry.devco.net.key
   SSLCertificateChainFile /etc/httpd/conf.d/ssl/registry.devco.net.chain
 
   ErrorLog /srv/www/registry.devco.net/logs/error_log
   CustomLog /srv/www/registry.devco.net/logs/access_log common
 
   ProxyPass / http://0.0.0.0:5000/
   ProxyPassReverse / http://0.0.0.0:5000/
</VirtualHost>

This worked on the basic level but soon as I tried to push to the registry I got errors, it seems after the initial handshake the docker daemon would get instruction from the registry to connect to http://0.0.0.0:5000/ which then fails.

Some digging into the registry code and I found it’s using the host header of the request to return a X-Docker-Endpoints header in the replies to the initial handshake with the registry service and future requests from the docker daemon will use the endpoints advertised here for communications.

By default Apache does not keep the host header in the proxy request, I had to add ProxyPreserveHost on to the vhost and after that it was all good, no more insecure registries or having to specify ugly ports in my image tags.

So the final vhost looks like:

<VirtualHost *:443>
   ServerName registry.devco.net
   ServerAdmin webmaster@devco.net
 
   SSLEngine On
   SSLCertificateFile /etc/httpd/conf.d/ssl/registry.devco.net.cert
   SSLCertificateKeyFile /etc/httpd/conf.d/ssl/registry.devco.net.key
   SSLCertificateChainFile /etc/httpd/conf.d/ssl/registry.devco.net.chain
 
   ErrorLog /srv/www/registry.devco.net/logs/error_log
   CustomLog /srv/www/registry.devco.net/logs/access_log common
 
   ProxyPreserveHost on
   ProxyPass / http://127.0.0.1:5000/
   ProxyPassReverse / http://127.0.0.1:5000/
</VirtualHost>

I also made sure it was using localhost for the port 5000 traffic and now I can start my registry like this ensuring I do not even have that port on the internet facing interfaces:

% docker run --restart=always -d -p localhost:5000:5000 -v /srv/docker-registry:/tmp/registry --name registry registry

Some travlrmap updates

Last weekend I finally got to a point of 1.0.0 of my travel map software, this week inbetween other things I made a few improvements:

  • Support 3rd party tile sets like Open Streetmap, Map Quest, Water Color, Toner, Dark Matter and Positron. These let you customise your look a bit, the Demo Site has them all enabled.
  • Map sets are supported, I use this to track my Travel Wishlist vs Places I’ve been.
  • Rather than list every individual yaml file in a directory to define a set you can now just point at a directory and everything will get loaded
  • You can designate a single yaml file as writable, the geocoder can then save points to disk directly without you having to do any YAML things.
  • The geocoder renders better on mobile devices and support geocoding based on your current position to make it easy to add points on the go.
  • Lots of UX improvements to the geocoder

Seems like a huge amount of work but it was all quite small additions, mainly done in a hour or so after work.

Travlrmap 1.0.0

As mentioned in my previous 2 posts I’ve been working on rebuilding my travel tracker app. It’s now reached something I am happy to call version 1.0.0 so this post introduces it.

I’ve been tracking major travels, day trips etc since 1999 and plotting it on maps using various tools like the defunct Xerox Parc Map Viewer, XPlanet and eventually wrote a PHP based app to draw them on Google Maps. During the years I’ve also looked at various services to use instead so I don’t have to keep doing this myself but they all die, change business focus or hold data ransom so I am still fairly happy doing this myself.

The latest iteration of this can be seen at travels.devco.net. It’s a Ruby app that you can host on the free tier at Heroku quite easily. Features wise version 1.0.0 has:

  • Responsive design that works on mobile and PC
  • A menu of pre-defined views so you can draw attention to a certain area of the map
  • Points can be catagorized by type of visit like places you've lived, visited or transited through. Each with their own icon.
  • Points can have urls, text, images and dates associated with them
  • Point clustering that combines many points into one when zoomed out with extensive configuration options
  • Several sets of colored icons for point types and clusters. Ability to add your own.
  • A web based tool to help you visually construct the YAML snippets needed using search
  • Optional authentication around the geocoder
  • Google Analytics support
  • Export to KML for viewing in tools like Google Earth
  • Full control over the Google Map like enabling or disabling the street view options

It’s important to note the intended use isn’t something like a private Foursquare or Facebook checkin service, it’s not about tracking every coffee shop. Instead it’s for tracking major city or attraction level places you’ve been to. I’m contemplating adding a mobile app to make it easier to log visits while you’re out and about but it won’t become a checkin type service.

I specifically do not use a database or anything like that, it’s just YAML files that you can check into GitHub, easily backup and hopefully never loose. Data longevity is the most important aspect for me so the input format is simple and easy to convert to others like JSON or KML. This also means I do not currently let the app write into any data files where it’s hosted. I do not want to have to figure out the mechanics of not loosing some YAML file sat nowhere else but a webserver. Though I am planning to enable writing to a incoming YAML file as mentioned above.

Getting going with your own is really easy. Open up a free Heroku account and set up a free app with one dynamo. Clone the demo site into your own GitHub and push to Heroku. That’s it, you should have your own up and running with place holder content ready to start receiving your own points which you can make using the included geocoder. You can also host it on any Ruby based app server like Passenger without any modification from the Heroku one.

The code is on GitHub ripienaar/travlrmap under Apache 2. Docs for using it and configuration references are on it’s dedicated gh-pages page.

Marker clustering using GMaps.js

In a previous post I showed that I am using a KML file as input into GMaps.js to put a bunch of points on a map for my travels site. This worked great, but I really want to do some marker clustering since too many points is pretty bad looking as can be seen below.

I’d much rather do some clustering and expand out to multiple points when you zoom in like here:

Turns out there are a few libraries for this already, I started out with one called Marker Clusterer but ended up with a improved version of this called Marker Clusterer Plus. And GMaps.js supports cluster libraries natively so should be easy, right?

Turns out the way Google maps loads KML files is done using a layer over the map and the points just are not accessible to any libraries, so the cluster libraries does nothing. Ok, so back to drawing points using my code.

I added a endpoint to the app that emits my points as JSON:

[
  {"type":"visit",
   "country":"Finland",
   "title":"Helsinki",
   "lat":60.1333,
   "popup_html":"<p>\n<font size=\"+2\">Helsinki</font>\n<hr>\nBusiness trip in 2005<br /><br />\n\n</p>\n",
   "comment":"Business trip in 2005",
   "lon":25,
   "icon":"/markers/marker-RED-REGULAR.png"}
]

Now adding all the points and getting them clustered is pretty easy:

<script type="text/javascript">
    var map;
    function addPoints(data) {
      var markers_data = [];
      if (data.length > 0) {
        for (var i = 0; i < data.length; i++) {
          markers_data.push({
            lat: data[i].lat,
            lng: data[i].lon,
            title: data[i].title,
            icon: data[i].icon,
            infoWindow: {
              content: data[i].popup_html
            },
          });
        }
      }
      map.addMarkers(markers_data);
    }
 
    $(document).ready(function(){
      infoWindow = new google.maps.InfoWindow({});
      map = new GMaps({
        div: '#main_map',
        zoom: 15,
        lat: 0,
        lng: 20,
        markerClusterer: function(map) {
          options = {
            gridSize: 40
          }
 
          return new MarkerClusterer(map, [], options);
        }
      });
 
      points = $.getJSON("/points/json");
      points.done(addPoints);
    });
</script>

This is pretty simple, the GMaps() object takes a markerClusterer option that expects an instance of the clusterer. I fetch the JSON data and each row gets added as a point. Then it all just happens automagically. Marker Clusterer Plus can take a ton of options that lets you specify custom icons, grid sizes, tweak when to kick in clustering etc. Here I am just setting the gridSize to show how to do that. In this example I have custom icons used for the clustering, might blog about that later when I figured out how to get them to behave perfectly.

You can see this in action on my travel site. As an aside I’ve taken a bit of time to document how the Sinatra app works and put together a demo deployable to Heroku that should give people hints on how to get going if anyone wants to make a map of their own.