Select Page

Managing email forwarding in Exim with Puppet

I have a number of mail servers where mail enters, get spam scanned etc and then forwarded to mail box servers. This used to be customer facing and had web interfaces and statistics etc but I am now scaling all this down to just manage my own and some friends domains.

Rather than maintain all the web interfaces that I really could not care for I’d rather manage this with Puppet, my ideal end result would be:

exim::route{"devco.net":
  nexthop         => "my.mailbox.server",
  spamthreshold   => 10,
  spamdestination => ":blackhole:",
  has_greylist    => 1,
  has_spam_check  => 1,
  has_whitelist   => 1
}

This should add all the required configuration to deliver mail arriving at the mail relay for devco.net to the server my.mailbox.server. It will set up Spam Assassin scans and send all mail that scores more than 10 to the exim specific destination :blackhole: that would simply delete the mail. I could specify any valid mail destination here like a file or other email address. I won’t be covering the has_* entries in this guide, they just control various policies in my ACLs on a per domain basis.

I’ll first cover the Exim side of things, clearly I do not want to be editing exim.conf each time so I will read the domain information from a file stored on the server. These files will be stored in /etc/exim/routes/devco.net and look like:

nexthop: my.mailbox.server
spamthreshold: 10
spamdestination: :blackhole:

In order to accept mail for a domain Exim needs a list of valid domains it will accept mail for, so as our routes are named after the domain we can just leverage that to build the list:

domainlist mw_domains = dsearch;/etc/exim/routes

Next we should pull from the file the various settings we store there:

NEXTHOP = ${lookup{nexthop}lsearch{/etc/exim/routes/${domain}}}
DOMAINREJECTSCORE = ${eval:10*${lookup{spamthreshold}lsearch{/etc/exim/routes/${domain}}}}
DOMAINSPAMDEST = ${lookup{spamdestination}lsearch{/etc/exim/routes/${domain}}}
 
ACL_SPAMSCORE = acl_m3

This creates handy variables that we can just use in our routes and spam configuration, I won’t go into the actual setup of spam assassin scanning as that’s pretty standard stuff better documented elsewhere. In the spam assassin ACLs just store your $spam_score_int in ACL_SPAMSCORE.

To deliver the mail either to the specific spam destination or to the next hop we just need to add 2 routers to the routes section. These are order dependant so they should be in the order below:

spamblock:
  driver          = redirect
  condition       = ${if >= {$ACL_SPAMSCORE}{DOMAINREJECTSCORE}{true}{false}}
  data            = DOMAINSPAMDEST
  headers_add     = X-MW-Note: Redirecting mail to domain spam destination
  domains         = +mw_domains
  no_verify

Here we’re just doing a quick if check over the stored spam score to see if its bigger or equal to the threshold stored in DOMAINREJECTSCORE and then set the data of the route – where the mail should go – to the configured address from DOMAINSPAMDEST. This router will only be active for domains that this Exim server is a relay for and it adds a little debug note as a header.

The actual mail delivery that is being used in place of the normal dnslookup route is here:

mw_domains:
  driver          = manualroute
  transport       = remote_smtp
  domains         = +mw_domains
  user            = root
  headers_add     = "X-MW-Recipient: ${local_part}@${domain}\n\
                     X-MW-Sender: $sender_address\n\
                     X-MW-Server: $primary_hostname"
  route_data      = MW_NEXTHOP

This router is also restricted to only our relay domains, it adds some headers for debug purposes and finally sets the route_data of the email to the next hop from MW_NEXTHOP thus delivering the mail to the destination.

That’s all there is to do on the Exim side, it’s pretty standard stuff. Next up the Puppet define:

define exim::route($nexthop, $spamthreshold, $spamdestination, $ensure = "present") {
  file{"/etc/exim/routes/${name}":
    ensure  => $ensure,
    content => template("exim/route.erb")
  }
}

And the template for this define is also extremely simple:

nexthop: <%= nexthop %>
spamthreshold: <%= spamthreshold %>
spamdestination: <%= spamdestination %>

I could stop here and just create a bunch of exim::route resources but that would be code changes, I prefer just changing data. So I am going to create a JSON file called mailrelay.json and store it with my Hiera data.

{
  "relay_domains": {
    "devco.net": {
      "nexthop": "my.mailbox.server",
      "spamdestination": ":blackhole:",
      "spamthreshold": 10,
      "has_dkim": 1
    },
    "another.com": {
      "nexthop": "ASPMX.L.GOOGLE.COM.",
      "spamdestination": ":blackhole:",
      "spamthreshold": 10
    }
  }
}

I assign all my incoming mail servers a single class that would look roughly like this:

class roles::mailrelay {
  include exim
  include exim::mailrelay
 
  $routes = hiera("relay_domains", "", "mailrelay")
  $domains = keys($routes)
 
  exim::routemap{$domains:
    routes => $routes
  }
}

The call to Hiera fetches the entire hash from the mailrelay.json file and stores it in $routes. I then use the keys function from puppetlabs-stdlib to extract just the list of domains into an array. I then pass that into a define exim::routemap that iterates the list and builds up individual exim::route resources.

The routemap define is just as below, I’ve shortened it a fair bit as I also have validation logic in here to make sure I pass valid data in the hash from Hiera, the stdlib module has various validator functions thats really handy for this:

define exim::routemap($routes) {
  exim::route{$name:
    nexthop => $routes[$name]["nexthop"],
    spamthreshold => $routes[$name]["spamthreshold"],
    spamdestination => $routes[$name]["spamdestination"]
  }
 
  if ($routes[$name]["has_dkim"] == 1) {
    exim::dkim_domain{$name: }
  } else {
    exim::dkim_domain{$name: ensure => absent}
  }
}

And that’s about it, now my mail routing setup, DKIM signing and other policies are managed in a simple JSON file in my Puppet Manifests.

Writing Oldskool Plugins

Earlier this week I wrote about Oldskool which is a Gem extendable search tool. Today I want to show how to create a plugin for it to query some custom source.

We’ll build a plugin that shows Puppet Type references, you can see how it will look in the image, click for a larger version.

The end result is that I can just search for “type exec” to get the correct documentation for my Puppet install. I’ll go quite quick through all the various bits here, the complete working plugin is in my GitHub.

The nice thing about rendering the type references locally is that you can choose exactly which version to render the docs for and you could possibly also render docs for locally written types that are not part of Puppet – not tried to see how you might render custom types though.

Plugins are made up of a few things that should have predictable names, in the case of our Puppet plugin I decided to call it puppet which means we need a class called Oldskool::PuppetHandler that does the work for that plugin. You can see the one here and it goes in lib/oldskool/puppet_handler.rb in your gem:

module Oldskool
  class PuppetHandler
    def initialize(params, keyword, config)
      @params = params
      @keyword = keyword
      @config = config
      self
    end
 
    def plugin_template(template)
      File.read(File.expand_path("../../../views/#{template}.erb", __FILE__))
    end
 
    def handle_request(keyword, query)
      type = Puppetdoc.new(query)
 
      menu = [{:title => "Type Reference", :url => "http://docs.puppetlabs.com/references/stable/type.html"},
              {:title => "Function Reference", :url => "http://docs.puppetlabs.com/references/stable/function.html"},
              {:title => "Language Guide", :url => "http://docs.puppetlabs.com/guides/language_guide.html"}]
 
      {:template => plugin_template(:type), :type => type.doc, :topmenu => menu}
    end
  end
end

The initialize and plugin_template methods will rarely change, the handle_request is where the magic happens. It gets called with the keyword and the query, so I set this up to respond to searched like type exec. If you needed any kind of configuration data from the main Oldskool config file you’d just add data to that YAML file and the data would be available in @config.

The keyword would be type and the query would be exec. The idea is that we could route for example type as well as function keywords into the plugin and then do different things with the query string.

I wrote a class called Puppetdoc that takes care of the Puppet introspection, I won’t go into the details but you can see it here, it just returns a hash with all the Markdown for each parameter, meta parameter and the type itself.

We then create a simple menu that’s just an array of title and url pairs that will be used to build the top menu that you see in the screenshot.

And finally we just return a hash. The hash that you return must include a template key the rest is optional, I override the meaning of the word template a bit – probably should have chosen a better name:

  • If it’s a string it’s assumed the string is a ERB template, Sinatra will just render that
  • When it’s the symbol :redirect then your hash must have a :url item in it, this will just redirect the user to another url
  • When it’s the symbol :error or just nil you can optionally add a :error key that will be shown to the user

You can see in the code above I passed the menu in as :topmenu you could also pass it back as :sidemenu which will create buttons down the side of the page, you can use both menus and buttons at the same time.

This takes care of creating the data to display but not yet the displaying aspect. The call to plugin_template(:type) will read the contents of the type.erb in the plugins view directory and return the contents. The Oldskool framework will then render that template making your returned hash available in @result

Here’s the first part of the view in question, you can see the whole thing here:

<% unless @error %>
  <h2><%= @result[:type][:name].to_s.capitalize %> version <%= @result[:type][:version] %></h2>
<% end %>

Your view can check if @error is set to show some text to the user in the case of exceptions etc otherwise just display the results. You can see here the @result variable is the data the handle_request returned.

Finally there’s a convention for gem names – this one would be oldskool-puppet so you should create a similarly named Ruby file to load the various bits, place this in lib/oldskool-puppet.rb:

require 'oldskool/puppetdoc'
require 'oldskool/puppet_handler'

From there you just need to build the gem, the Rakefile below does that:

require 'rubygems'
require 'rake/gempackagetask'
 
spec = Gem::Specification::new do |spec|
  spec.name = "oldskool-puppet"
  spec.version = "0.0.3"
  spec.platform = Gem::Platform::RUBY
  spec.summary = "oldskool-1assword"
  spec.description = "description: Generate documentation for Puppet types"
 
  spec.files = FileList["lib/**/*.rb", "views/*.erb"]
  spec.executables = []
 
  spec.require_path = "lib"
 
  spec.has_rdoc = false
  spec.test_files = nil
  spec.add_dependency 'puppet'
  spec.add_dependency 'redcarpet'
 
  spec.extensions.push(*[])
 
  spec.author = "R.I.Pienaar"
  spec.email = "rip@devco.net"
  spec.homepage = "http://devco.net/"
end
 
Rake::GemPackageTask.new(spec) do |pkg|
  pkg.need_zip = false
  pkg.need_tar = false
end

% rake gem
mkdir -p pkg
WARNING:  no rubyforge_project specified
  Successfully built RubyGem
  Name: oldskool-puppet
  Version: 0.0.3
  File: oldskool-puppet-0.0.3.gem
mv oldskool-puppet-0.0.3.gem pkg/oldskool-puppet-0.0.3.gem
% gem push pkg/oldskool-puppet-0.0.3.gem

If your gem command is setup this will publish the gem to Github ready for use. In this case all I did was add it to my Gemfile for my webapp:

gem 'puppet', '2.6.9'
gem 'facter'
gem 'oldskool-puppet', '>= 0.0.3'

And used bundler to update my site after that everything worked.

Oldskool: A Gem extendible search engine

Back in the day The Well had a text based conference system, you used dial in, then telnet and later ssh to their server and interacted with other members through a text system called PicoSpan. Eventually things moved to the web and it became a lot more forum like. The thing that I really loved was that in the web version of the forums there was a command line. You could type many of the same commands into the web CLI as you would into the Unix one and have the same effects. Posting, searching, jumping through conferences. It was the web with the CLI power for those who wanted it.

The browser is more and more our interface to all things online and frankly it sux a bit, I want the CLI speed for accessing the Web sites that I like. I’ve created a PHP system I called cmd ages ago that simply routed a command like “guk greenwich” to the Google UK search engine with results restricted to those from the UK. There are of course various online tools that does the same but I found that their ‘book’ keyword would search Amazon US while I wanted UK so I just did one that I can tweak to my liking.

Recently thanks to Googles widely hated changes to their Search UI simply redirecting to Google searches with keywords filled in just was not enough anymore. I want web search back the way it was before they made it suck. So I do what hackers do and wrote a Ruby based pluggable search system. You can see a screenshot of it here showing a Google search.

What you’re seeing here is the oldskool-gcse plugin in action. It uses the Google JSON API to query a Google Custom Search Engine and format the results in a way that does not suck. The Custom Search Engines are quite nice as you can customize all sorts of things in them like which sites to exclude, which to favor, limit results to certain countries or languages allowing you to really customize your search experience. The only down side to the GCSE approach is that Google limits API calls to 100 a day, for me that’s enough for searching but ymmv.

Using this method of searching can have some privacy wins, Google recently announced merging all their online accounts into one and will have all your online activity influence your searches. I wasn’t too worried since by then I had already written Oldskool and will simply use a different Google Account to access their search API than the one I use to read my work mail for example. Simple effective win.

My default search in oldskool is a GCSE that resembles a normal Google search but I can also search for “puppet exec” and oldskool will route that request to a specific GCSE that bumps the official Puppet Labs docs to the top, exclude some annoying things etc. So oldskool is a single entry frontend to many different GCSE backends is quite powerful.

As I said it’s plugable and I’ve written one other plugin that uses my Passmakr gem to generate random passwords. I can just search for pass 10 to get a 10 character password:

Writing your own plugins is very easy and I hope to see ones that queries Redmine instances or other internal databases that you might have using the Oldskool framework to display all the data in one handy place.

It retains the most basic feature of simple keyword base redirects, so I can search for book reamde to get Amazon UK book results instantly.

Config is through a simple YAML file:

---
:google_api_key: your.key
:username: http_auth_user
:password: http_auth_pass
:keywords:
- :type: :gcse
  :cx: you_gcse
  :keywords:
  - :default
- :type: :gcse
  :cx: your_gcse
  :keywords:
  - puppet
- :type: :url
  :url: http://amazon.co.uk/exec/obidos/search-handle-url/index=books-uk&field-keywords=%Q%
  :keywords:
  - book
  - books
- :type: :password
  :keywords: pass

This sets up 2 GCSE searches – one marked as my default search – and the mentioned book search and one that uses the password plugin I’ve shown above.

It needs no writable access to the webserver it runs on and it’s all managed by Bundler and Sinatra – perfect for hosting on the free Heroku tier.

As this is effectively my Web CLI I want it integrated in as many places as possible. I use a lot of desktops – 3 regularly – so the browser is my unified UI to all of this. Your instance will publish OpenSearch meta data which will make it seamlessly integrate into Firefox, Chrome, IE, Gnome DO, Gnome Shell and many many other places.

Here’s Firefox search box the first time you browse to a new instance:

And here is Chrome, you do not even have to add it just start typing the URL to your instance and press tab, the URL bar transforms into a Oldskool search box magically. You can add it permanently and make it default by right clicking on the URL bar and choosing Edit Search Engines….

The code is in my GitHub – Oldskool, Oldskool GCSE and Oldskool Password. I will blog again tomorrow or on another day about creating your own plugins etc.

Common Messaging Patterns Using Stomp โ€“ Part 5

This is a post in a series of about Middleware for Stomp users, please read the preceding parts starting at 1 before continuing below.

Today changing things around a bit and not so much talking about using Stomp from Ruby but rather how we would monitor ActiveMQ. The ActiveMQ broker has a statistics plugin that you can interact with over Stomp which is particularly nice – being able to interrogate it over the same protocols as you would to use it.

I’ll run through some basic approaches to monitor:

  • The size of queues
  • The memory usage of persisted messages on a queue
  • The rate of messages through a topic or a queue
  • Various memory usage statistics for the broker itself
  • Message counts and rates for the broker as a whole

These are your standard kinds of things you need to know about a running broker in addition to various things like monitoring the length of garbage collections and such which is standard when dealing with Java applications.

Keeping an eye on your queue sizes is very important. I’ve focused a lot on how Queues help you scale by facilitating horizontally adding consumers. Monitoring facilitates the decision making process for how many consumers you need – when to remove some and when to add some.

First you’re going to want to enable the plugin for ActiveMQ, open up your activemq.xml and add the plugin as below and restart when you are done:

<plugins>
   <statisticsBrokerPlugin/>
</plugins>

A quick word about the output format of the messages you’ll see below. They are a serialized JSON (or XML) representation of a data structure. Unfortunately it isn’t immediately usable without some pre-parsing into a real data structure. The Nagios and Cacti plugins you will see below have a method in them for converting this structure into a normal Ruby hash.

The basic process for requesting stats is a Request Response pattern as per part 3.

stomp.subscribe("/temp-topic/stats", {"transformation" => "jms-map-json"})
 
# request stats for the random generator queue from part 2
stomp.publish("/queue/ActiveMQ.Statistics.Destination.random_generator", "", {"reply-to" => "/temp-topic/stats"})
 
puts stomp.receive.body

First we subscribe to a temporary topic that you first saw in Part 2 and we specify that while ActiveMQ will output a JMS Map it should please convert this for us into a JSON document rather than the java structures.

We then request Destination stats for the random_generator queue and finally wait for the response and print it, what you’ll get from it can be seen below:

{"map":{"entry":[{"string":"memoryUsage","long":0},{"string":"dequeueCount","long":13},{"string":"inflightCount","long":0},{"string":"messagesCached","long":0},
{"string":"averageEnqueueTime","double":0.46153846153846156},{"string":["destinationName","queue:\/\/mcollective.nodes"]},{"string":"size","long":0},
{"string":"memoryPercentUsage","int":0},{"string":"producerCount","long":0},{"string":"consumerCount","long":56},{"string":"minEnqueueTime","double":0},
{"string":"maxEnqueueTime","double":1},{"string":"dispatchCount","long":13},{"string":"expiredCount","long":0},{"string":"enqueueCount","long":13},
{"string":"memoryLimit","long":83886080}]}}

Queue Statistics
Queue sizes are basically as you saw above, hit the Stats Plugin at /queue/ActiveMQ.Statistics.Destination.<queue name> and you get stats back for the queue in question.

Below table lists the meaning of these values from what I understand – quite conceivable I am wrong about the specifics of ones like enqueueTime for example so happy to be corrected in comments:

destinationName The name of the queue in JMS URL format
enqueueCount Amount of messages that was sent to the queue and committed to it
inflightCount Messages sent to the consumers but not consumed – they might be sat in the prefetch buffers
dequeueCount The opposite of enqueueCount – messages sent from the queue to consumers
dispatchCount Like dequeueCount but includes messages that might been rolled back
expiredCount Messages can have a maximum life, these are ones thats expired
maxEnqueueTime The maximum amount of time a message sat on the queue before being consumed
minEnqueueTime The minimum amount of time a message sat on the queue before being consumed
averageEnqueueTime The average amount of time a message sat on the queue before being consumed
memoryUsage Memory used by messages stored in the queue
memoryPercentUsage Percentage of available queue memory used
memoryLimit Total amount of memory this queue can use
size How many messages are currently in the queue
consumerCount Consumers currently subscribed to this queue
producerCount Producers currently producing messages

I have written a nagios plugin that can check the queue sizes:

$ check_activemq_queue.rb --host localhost --user nagios --password passw0rd --queue random_generator --queue-warn 10 --queue-crit 20
OK: ActiveMQ random_generator has 1 messages

You can see there’s enough information about the specific queue to be able to draw rate of messages, consumer counts and all sorts of useful information. I also have a quick script that will return all this data in a format suitable for use by Cacti:

$ activemq-cacti-plugin.rb --host localhost --user nagios --password passw0rd --report exim.stats
size:0 dispatchCount:168951 memoryUsage:0 averageEnqueueTime:1629.42897052992 enqueueCount:168951 minEnqueueTime:0.0 consumerCount:1 producerCount:0 memoryPercentUsage:0 destinationName:queue://exim.stats messagesCached:0 memoryLimit:20971520 inflightCount:0 dequeueCount:168951 expiredCount:0 maxEnqueueTime:328585.0

Broker Statistics
Getting stats for the broker is more of the same, just send a message to /queue/ActiveMQ.Statistics.Broker and tell it where to reply to, you’ll get a message back with these properties, I am only listing ones not seen above, the meanings is the same except in the broker stats its totals for all queues and topics.

storePercentUsage Total percentage of storage used for all queues
storeLimit Total storage space available
storeUsage Storage space currently used
tempLimit Total temporary space available
brokerId Unique ID for this broker that you will see in Advisory messages
dataDirectory Where the broker is configured to store its data for queue persistence etc
brokerName The name this broker was given in its configuration file

Additionally there would be a value for each of your connectors listing the URL to it including protocol and port

Again I have a Cacti plugin to get these values out in a format usable in Cacti data sources:

$ activemq-cacti-plugin.rb --host localhost --user nagios --password passw0rd --report broker
stomp+ssl:stomp+ssl storePercentUsage:81 size:5597 ssl:ssl vm:vm://web3 dataDirectory:/var/log/activemq/activemq-data dispatchCount:169533 brokerName:web3 openwire:tcp://web3:6166 storeUsage:869933776 memoryUsage:1564 tempUsage:0 averageEnqueueTime:1623.90502285799 enqueueCount:174080 minEnqueueTime:0.0 producerCount:0 memoryPercentUsage:0 tempLimit:104857600 messagesCached:0 consumerCount:2 memoryLimit:20971520 storeLimit:1073741824 inflightCount:9 dequeueCount:169525 brokerId:ID:web3-44651-1280002111036-0:0 tempPercentUsage:0 stomp:stomp://web3:6163 maxEnqueueTime:328585.0 expiredCount:0

You can find the plugins mentioned above in my GitHub account.

In the same location is a generic checker that publishes a message and wait for its return within a specified number of seconds – good turn around test for your broker.

I don’t really have good templates to share but you can see a Cacti graph I built below with the above plugins.

Common Messaging Patterns Using Stomp โ€“ Part 4

This is an ongoing post in a series of posts about Middlware for Stomp users, please read parts 1, 2 and 3 of this series first before continuing below.

Back in Part 2 we wrote a little system to ship metrics from nodes into Graphite via Stomp. This solved the goals of the problem then but now lets see what to do when our needs change.

Graphite is like RRD where it would summarize data over time and eventually discard old data. Contrast that with OpenTSDB that never summarizes or delete data and can store billions of data points. Imagine we want to use Graphite for a short term reporting service for our data but we also need to store the data long term without losing any data. So we really want to store the data in 2 locations.

We have a few options open to use:

  • Send the metric twice from every node, once to Graphite and once to OpenTSDB.
  • Write a software router that receives metrics on one queue and then route the metric to 2 other queues in the middleware.
  • Use facilities internal to the middleware to do the routing for us

The first option is an obvious bad idea and should just be avoided – this would be the worst case scenario for data collection at scale. The 3rd seems like the natural choice here but first we need to understand the facilities the middleware provides. Todays article will explore what ActiveMQ can do for you in this regard.

The 2nd seems an odd fit but as you’ll see below the capabilities for internal routing at the middleware layer isn’t all that exciting, useful in some cases but I think most projects will reach for some kind of message router in code sooner or later.

Virtual Destinations
If you think back to part 2 you’ll remember we have a publisher that publishes data into a queue and any number of consumers that consumes the queue. The queue will load balance the messages for us thus helping us scale.

In order to also create OpenTSDB data we essentially need to double up the consumer side into 2 groups. Ideally each set of consumers will be scalable horizontally and the sets of consumers should both get a copy of all the data – in other words we need 2 queues with all the data in it, one for Graphite and one for OpenTSDB.

You will also remember that Topics have the behavior of duplicating data they receive to all consumers of the topics. So really what we want is to attach 2 queues to a single topic. This way the topic will duplicate the data and the queues will be used for the scalable consumption of the data.

ActiveMQ provides a feature called Virtual Topics that solves this exact problem by convention. You publish messages to a predictably named topic and then you can create any number of queues that will all get a copy of that message.

The image above shows the convention:

  • Publish to /topic/VirtualTopic.metrics
  • Create consumers for /queue/Consumer.Graphite.VirtualTopic.metrics

Create as many of the consumer queues as you want, changing Graphite for some unique name and each of the resulting queues will behave like a normal queue with all the load balancing, storage and other queue like behaviors but all the queues will get a copy of all the data.

You can customize the name pattern of these queues by changing the ActiveMQ configuration files. I really like this approach to solving the problem vs approaches found in other brokers since this is all done by convention and you do not need to change your code to set up a bunch of internal structures that describes the routing topology. I consider routing topology that is living in code of the consumers to be a form of hard coding. Using this approach all I need to do is make sure the names of the destinations to publish to and consume from is configurable strings.

Our Graphite consumer would not need to change other than the name of the queue it should read and ditto for the producer.

If we find that we simply could not change the code for the consumers/producer or if it just was not a configurable setting you can still achieve this behavior by using something called a Composite Destinations in ActiveMQ that could describe this behavior purely in the config file with any arbitrarily named queues and topics.

Selective Consumers
Imagine we wish to give each one of our thousands of servers a unique destination on the middleware so that we can send machines a command directly. You could simple create queues like /queue/nodes.web1.example.com and keep creating queues per server.

The problem with this approach is that internally to ActiveMQ each queue is a thread. So you’d be creating thousands of threads – not ideal.

As we saw before in Part 3 messages can have headers – there we used the reply-to header. Below you’ll find some code that sets an arbitrary header:

stomp.publish("/queue/nodes", "service restart httpd", {"fqdn" => "web1.example.com"})

We are publishing a message with the text service restart httpd to a queue and we are setting a fqdn header.

Now if every server in our estate subscribed to this one queue with the knowledge you have at this point of Queues this would have the effect of sending this restart request to some random one of our servers, not ideal!

The JMS specification allow for something called selectors to be used while subscribing to a destination:

stomp.subscribe("/queue/nodes", {"selector" => "fqdn = 'web1.example.com'"})

The selector header sets the logic to apply to every message which will help decide if you get the message on your subscription or not. The selector language is defined using SQL 92 language and you can generally apply logic to any header in the message.

This way we set up a queue for all our servers without the overhead of 1000s of threads.

The choice for when to use a queue like this and when to use a traditional queue comes down to weighing up the overhead of validating all the SQL statements vs creating all the threads. There are also some side effects if you have a cluster of brokers – the queue traffic gets duplicated to all cluster brokers where with traditional queues the traffic only gets send to a broker if that broker actually has any subscribers interested in this data.

So you need to carefully consider the implications and do some tests with your work load, message sizes, message frequencies, amount of consumers etc.

Conclusion
There is a 3rd option that combines these 2 techniques. You’d create queues sourcing from the topic based on JMS Selectors deciding what data hits what queue. You would set up this arrangement in the ActiveMQ config file.

This, as far as I am aware, covers all the major areas internal to ActiveMQ that you can use to apply some routing and duplication of messages.

These methods are useful and solves some problems but as I pointed out it’s not really that flexible. In a later part of this series I will look into software routers from software like Apache Camel and how to write your own.

From a technology choices point of view future self is now thanking past self for building the initial metrics system using MOM since rather than go back to the drawing board when our needs changed we were able to solve our problems by virtue of the fact that we built it on a flexible foundation using well known patterns and without changing much if any actual code.

This series continue in part 5.

Common Messaging Patterns Using Stomp – Part 3

Yesterday I showed a detailed example of a Asynchronous system using MOM. Please read part 1 and part 2 of this series first before continuing below.

The system shown yesterday was Asynchronous since there is no coupling, no conversation or time constraints. The Producer does not know or care what happens to the messages once Published or when that happens. This is a kind of nirvana for distributed systems but sadly it’s just not possible to solve every problem using this pattern.

Today I’ll show how to use MOM technologies to solve a different kind of problem. Specifically I will show how large retailers scale their web properties using these technologies to create web sites that is more resilient to failure, easier to scale and easier to manage.

Imagine you are just hitting buy on some online retailers web page, perhaps they have implemented a 1-click based buying system where the very next page would be a Thank You page showing some details of your order and also recommendations of what other things you might like. It would have some personalized menus and in some cases even a personalized look and feel.

By the time you see this page your purchase is not complete, it is still going on in the background but you have a fast acknowledge back and immediately you are being enticed to spend more money with relevant products.

To achieve this in a PHP or Rails world you would typically have a page that runs top down and do things like generate your CSS page, generate your personalized menu then write some record into a database perhaps for a system like delayed job to process the purchase later on and finally it would do a bunch of SQL queries to find the related items.

This approach is very difficult to scale, all the hard work happens in your front controller, it has to be able to communicate with all the technology you choose in the backend and you end up with a huge monolithic chunk of code that can rapidly become a nightmare. If you need more capacity to render the recommendations you have no choice but to scale up the entire stack.

The better approach is to decouple all of the bits needed to generate a web page, if you take the narrative above you would have small single purpose services that does the following:

  • Take a 1-click order request, save it and provide an order number back. Start an Asynchronous process to fulfill the order.
  • Generate CSS for the custom look and feel for user X
  • Generate Menu for logged in user X
  • Generate recommendation for product Y based on browsing history for user X

Here we have 4 possible services that could exist on the network and that do not really relate to each other in any way. They are decoupled, do not share state with each other and can do their work in parallel independently from each other.

Your front controller now would become something that simply Published to the MOM requests for each of the 4 services providing just the information each service needs and then wait for the responses. Once all 4 responses were received the page would be assembled and rendered. If some response does not arrive in time a graceful failure can be done – like render a generic menu, or do not show the recommendations only show the Thank You text.

There are many benefits to this approach, I’ll highlight some I find compelling below:

  • You can scale each Service independently based on performance patterns – more order processors as this requires slower ACID writes into databases etc.
  • You can use different technology where appropriate. Your payment systems might be .Net while your CSS generation is in Node.JS and recommendations are in Java
  • Each system can be thought of as a simple standalone island with its own tech stack, monitoring etc, thinking about and scaling small components is much easier than a monolithic system
  • You can separate your payment processing from the rest of your network for PCI compliance by only allowing the middleware to connect into a private subnet where all actual credit information lives
  • Individual services can be upgraded or replaced with new ones much easier than in a monolithic system thus making the lives of Operations much better and lowering risk in ongoing maintenance of the system.
  • Individual services can be reused – the recommendation engine isn’t just a engine that gets called at the end of a sale but also while browsing through the store, the same service can serve both types of request

This pattern is often known as Request Response or similar terms. You should only use it when absolutely needed as it increases coupling and effectively turn your service into a Synchronous system but it does have it’s uses and advantages as seen above.

Sample Code
I’ll show 2 quick samples of how this conversation flow works in code and expand a bit into the details wrt to the ActiveMQ JMS Broker. The examples will just have the main working part of the code not the bits that would set up connections to the brokers etc, look in part 2 for some of that.

My example will create a service that generates random data using OpenSSL, maybe you have some reason to create a very large number of these and you need to distribute it across many machines so you do not run out of entropy.

As this is basically a Client / Server relationship I will use these terms, first the client part – the part that requests a random number from the server:

stomp.subscribe("/temp-queue/random_replies")
stomp.publish("/queue/random_generator", "random", {"reply-to" => "/temp-queue/random_replies"})
 
Timeout::timeout(2) do
   msg = stomp.receive
 
   puts "Got random number: #{msg.body}"    
end

This is pretty simple, the only new thing here is that we are subscribing first to a Temporary Queue that we will receive the responses on and we send the request including this queue name. Below will have some more detail on temp queues and temp topics. The timeout part is important you need this to be able to handle the case where all of the number generators died or if the service is just too overloaded to service the request.

Here is the server part, it gets a request then generates the number and replies to the reply-to destination.

require 'openssl'
 
stomp.subscribe("/queue/random_generator")
 
loop do
   begin
      msg = client.receive
 
      number = OpenSSL::Random.random_bytes(8).unpack("Q").first
 
      stomp.publish(msg.headers["reply-to"], number)
   rescue
      puts "Failed to generate random number: #{$!}"
   end
end

You could start instances of this code on 10 servers and the MOM will load share the requests across the workers thus spreading out the available entropy across 10 machines.

When run this will give you nice big random numbers like 11519368947872272894. The web page in our example would follow a very similar process only it would post the requests to each of the services mentioned and then just wait for all the responses to come in and render them.

Temporary Destination
The big thing here is that we’re using a Temporary Queue for the replies. The behavior of temporary destinations differ from broker to broker and how the Stomp library needs to be used also changes. For ActiveMQ the behavior and special headers etc can be seen in their docs.

When you subscribe to a temporary destination like the client code above internally to ActiveMQ it sets up a queue that has a your connection as an exclusive subscriber. Internally the name would be something else entirely from what you gave it, it would be unique and exclusive to you. Here is an example for a Temporary Queue setup on a remote broker:

/remote-temp-queue/ID:stomp1.us.xx.net-39316-1323647624072-3:3005:1

If you were to puts the contents of msg.headers[“reply-to”] in the server code you would see the translated queue name as above. The broker does this transparently for you.

Other processes can write to this unique destination but your connection would be the only one able to consume message from it. Soon as your connection closes or you unsubscribe from it the broker will free the queue, delete any messages on it and anyone else trying to write to it will get an exception.

Temporary queues and this magical translation happens even across a cluster of brokers so you can spread this out geographically and it would work.

Setting up a temporary queue and informing a network of brokers about it is a costly process so you should always try to set up a temporary queue early on in the process life time and reuse it for all the work you have to do.

If you need to correlate responses to their requests then you should use the correlation-id header for that – set it on the request and when constructing the reply read it from the request and set it again on the new reply.

This series continue in part 4.