Choria Network Protocols – Transport

The old MCollective protocols are now ancient and was quite Ruby slanted – full of Symbols and used YAML and quite language specific – in Choria I’d like to support other Programming Languages, REST gateways and so forth, so a rethink was needed.

I’ll look at the basic transport protocol used by the Choria NATS connector, usually it’s quite unusual to speak of Network Protocols when dealing with messages on a broker but really for MCollective it is exactly that – a Network Protocol.

The messages need enough information for strong AAA, they need to have an agreed on security structure and within them live things like RPC requests. So a formal specification is needed which is exactly what a Protocol is.

While creating Choria the entire protocol stack has been redesigned on every level except the core MCollective messages – Choria maintains a small compatibility layer to make things work. To really achieve my goal I’d need to downgrade MCollective to pure JSON data at which point multi language interop should be possible and easy.

Networks are Onions

Network protocols tend to come in layers, one protocol within another within another. The nearer you go to the transport the more generic it gets. This is true for HTTP within TCP within IP within Ethernet and likewise it’s true for MCollective.

Just like for TCP/IP and HTTP+FTP one MCollective network can carry many protocols like the RPC one, a typical MCollective install uses 2 protocols at this inner most layer. You can even make your own, the entire RPC system is a plugin!

( middleware protocol
  ( transport packet that travels over the middleware
      ( security plugin internal representation
        ( mcollective core representation that becomes M::Message
          ( MCollective Core Message )
          ( RPC Request, RPC Reply )
          ( Other Protocols, .... )
        )
      )
    )
  )
)

Here you can see when you do mco rpc puppet status you’ll be creating a RPC Request wrapped in a MCollective Message, wrapped in a structure the Security Plugin dictates, wrapped in a structure the Connector Plugin dictates and from there to your middleware like NATS.

Today I’ll look at the Transport Packet since that is where Network Federation lives which I spoke about yesterday.

Transport Layer

The Transport Layer packets are unauthenticated and unsigned, for MCollective security happens in the packet carried within the transport so this is fine. It’s not inconceivable that a Federation might only want to route signed messages and it’s quite easy to add later if needed.

Of course the NATS daemons will only accept TLS connections from certificates signed by the CA so these network packets are encrypted and access to the transport medium is restricted, but the JSON data you’ll see below is sent as is.

In all the messages shown below you’ll see a seen-by header, this is a feature of the NATS Connector Plugin that records the connected NATS broker, we’ll soon expose this information to MCollective API clients so we can make a traceroute tool for Federations. This header is optional and off by default though.

I’ll show messages in Ruby format here but it’s all JSON on the wire.

Message Targets

First it’s worth knowing where things are sent on the NATS clusters. The targets used by the NATS connector is pretty simple stuff, there will no doubt be scope for improvement once I look to support NATS Streaming but for now this is adequate.

Broadcast Request for agent puppet in the mycorp sub collective – mycorp.broadcast.agent.puppet
Directed Request to a node for any agent in the mycorp sub collective – mycorp.node.node1.example.net
Reply to a node identity dev1.example.net with pid 9999 and a message sequence of 10 – mycorp.reply.node1.example.net.9999.10

As the Federation Brokers are independent of Sub Collectives they are not prefixed with any collective specific token:

Requests from a Federation Client to a Federation Broker Cluster called production – choria.federation.production.federation queue group production_federation
Replies from the Collective to a Federation Broker Cluster called production – choria.federation.production.collective queue group production_collective
production cluster Federation Broker Instances publishes statistics – choria.federation.production.stats

These names are designed so that in smaller setups or in development you could use a single NATS cluster with Federation Brokers between standalone collectives. Not really a recommended thing but it helps in development.

Unfederated Messages

Your basic Unfederated Message is pretty simple:

{
  "data" => "... any text ...",
  "headers" => {
    "mc_sender" => "dev1.example.net",
    "seen-by" => ["dev1.example.net", "nats1.example.net"],
    "reply-to" => "mcollective.reply.dev1.example.net.999999.0",
  }
}

it’s is a discovery request within the sub collective mcollective and would be published to mcollective.broadcast.agent.discovery.
it is sent from a machine identifying as dev1.example.net
we know it’s traveled via a NATS broker called nats1.example.net.
responses to this message needs to travel via NATS using the target mcollective.reply.dev1.example.net.999999.0.

The data is completely unstructured as far as this message is concerned it just needs to be some text, so base64 encoded is common. All the transport care for is getting this data to its destination with metadata attached, it does not care what’s in the data.

The reply to this message is almost identical:

{
  "data" => "... any text ...",
  "headers" => {
    "mc_sender" => "dev2.example.net",
    "seen-by" => ["dev1.example.net", "nats1.example.net", "dev2.example.net", "nats2.example.net"],
  }
}

This reply will travel via mcollective.reply.dev1.example.net.999999.0, we know that the node dev2.example.net is connected to nats2.example.net.

We can create a full traceroute like output with this which would show dev1.example.net -> nats1.example.net -> nats2.example.net -> dev2.example.net

Federated Messages

Federation is possible because MCollective will just store whatever Headers are in the message and put them back on the way out in any new replies. Given this we can embed all the federation metadata and this metadata travels along with each individual message – so the Federation Brokers can be entirely stateless, all the needed state lives with the messages.

With Federation Brokers being clusters this means your message request might flow over a cluster member a but the reply can come via b – and if it’s a stream of replies they will be load balanced by the members. The Federation Broker Instances do not need something like Consul or shared store since all the data needed is in the messages.

Lets look at the same Request as earlier if the client was configured to belong to a Federation with a network called production as one of its members. It’s identical to before except the federation structure was added:

{
  "data" => "... any text ...",
  "headers" => {
    "mc_sender" => "dev1.example.net",
    "seen-by" => ["dev1.example.net", "nats1.fed.example.net"],
    "reply-to" => "mcollective.reply.dev1.example.net.999999.0",
    "federation" => {
       "req" => "68b329da9893e34099c7d8ad5cb9c940",
       "target" => ["mcollective.broadcast.agent.discovery"]
    }
  }
}

it’s is a discovery request within the sub collective mcollective and would be published via a Federation Broker Cluster called production via NATS choria.federation.production.federation.
it is sent from a machine identifying as dev1.example.net
it’s traveled via a NATS broker called nats1.fed.example.net.
responses to this message needs to travel via NATS using the target mcollective.reply.dev1.example.net.999999.0.
it’s federated and the client wants the Federation Broker to deliver it to it’s connected Member Collective on mcollective.broadcast.agent.discovery

The Federation Broker receives this and creates a new message that it publishes on it’s Member Collective:

{
  "data" => "... any text ...",
  "headers" => {
    "mc_sender" => "dev1.example.net",
    "seen-by" => [
      "dev1.example.net",
      "nats1.fed.example.net", 
      "nats2.fed.example.net", 
      "fedbroker_production_a",
      "nats1.prod.example.net"
    ],
    "reply-to" => "choria.federation.production.collective",
    "federation" => {
       "req" => "68b329da9893e34099c7d8ad5cb9c940",
       "reply-to" => "mcollective.reply.dev1.example.net.999999.0"
    }
  }
}

This is the same message as above, the Federation Broker recorded itself and it’s connected NATS server and produced a message, but in this message it intercepts the replies and tell the nodes to send them to choria.federation.production.collective and it records the original reply destination in the federation header.

A node that replies produce a reply, again this is very similar to the earlier reply except the federation header is coming back exactly as it was sent:

{
  "data" => "... any text ...",
  "headers" => {
    "mc_sender" => "dev2.example.net",
    "seen-by" => [
      "dev1.example.net",
      "nats1.fed.example.net", 
      "nats2.fed.example.net", 
      "fedbroker_production_a", 
      "nats1.prod.example.net",
      "dev2.example.net",
      "nats2.prod.example.net"
    ],
    "federation" => {
       "req" => "68b329da9893e34099c7d8ad5cb9c940",
       "reply-to" => "mcollective.reply.dev1.example.net.999999.0"
    }
  }
}

We know this node was connected to nats1.prod.example.net and you can see the Federation Broker would know how to publish this to the client – the reply-to is exactly what the Client initially requested, so it creates:

{
  "data" => "... any text ...",
  "headers" => {
    "mc_sender" => "dev2.example.net",
    "seen-by" => [
      "dev1.example.net",
      "nats1.fed.example.net", 
      "nats2.fed.example.net", 
      "fedbroker_production_a", 
      "nats1.prod.example.net",
      "dev2.example.net",
      "nats2.prod.example.net",
      "nats3.prod.example.net",
      "fedbroker_production_b",
      "nats3.fed.example.net"
    ],
  }
}

Which gets published to mcollective.reply.dev1.example.net.999999.0.

Route Records

You noticed above there’s a seen-by header, this is something entirely new and never before done in MCollective – and entirely optional and off by default. I anticipate you’d want to run with this off most of the time once your setup is done, it’s a debugging aid.

As NATS is a full mesh your message probably only goes one hop within the Mesh. So if you record the connected server you publish into and the connected server your message entered it’s destination from you have a full route recorded.

The Federation Broker logs and MCollective Client and Server logs all include the message ID so you can do a full trace in message packets and logs.

There’s a PR against MCollective to expose this header to the client code so I will add something like mco federation trace some.node.example.net which would send a round trip to that node and tell you exactly how the packet travelled. This should help a lot in debugging your setups as they will now become quite complex.

The structure here is kind of meh and I will probably improve on it once the PR in MCollective lands and I can see what is the minimum needed to do a full trace.

By default I’ll probably record the identities of the MCollective bits when Federated and not at all when not Federated. But if you enable the setting to record the full route it will produce a record of MCollective bits and the NATS nodes involved.

In the end though from the Federation example we can infer a network like this:

Federation NATS Cluster

Federation Broker production_a -> nats2.fed.example.net
Federation Broker production_b -> nats3.fed.example.net
Client dev1.example.net -> nats1.fed.example.net

Production NATS Cluster:

Federation Broker production_a -> nats1.prod.example.net
Federation Broker production_b -> nats3.prod.example.net
Server dev2.example.net -> nats2.prod.example.net

We don’t know the details of all the individual NATS nodes that makes up the entire NATS mesh but this is good enough.

Of course this sample is the pathological case where nothing is connected to the same NATS instances anywhere. In my tests with a setup like this the overhead added across 10 000 round trips against 3 nodes – so 30 000 replies through 2 x Federation Brokers – was only 2 seconds, I couldn’t reliably measure a per message overhead as it was just too small.

The NATS gem do expose the details of the full mesh though since NATS will announce it’s cluster members to clients, I might do something with that not sure. Either way, auto generated network maps should be totally possible.

Conclusion

So this is how Network Federation works in Choria. It’s particularly nice that I was able to do this without needing any state on the cluster thanks to past self making good design decisions in MCollective.

Once the seen-by thing is figured out I’ll publish JSON Schemas for these messages and declare protocol versions.

I can probably make future posts about the other message formats but they’re a bit nasty as MCollective itself is not yet JSON safe, the plan is it would become JSON safe one day and the whole thing will become a lot more elegant. If someone pings me for this I’ll post it otherwise I’ll probably stop here.