{"id":3597,"date":"2017-03-20T12:05:53","date_gmt":"2017-03-20T11:05:53","guid":{"rendered":"https:\/\/www.devco.net\/?p=3597"},"modified":"2017-03-21T23:16:31","modified_gmt":"2017-03-21T22:16:31","slug":"choria-network-federation","status":"publish","type":"post","link":"https:\/\/www.devco.net\/archives\/2017\/03\/20\/choria-network-federation.php","title":{"rendered":"Choria Network Federation"},"content":{"rendered":"

Running large or distributed MCollective networks have always been a pain. As much as Middleware is an enabler it starts actively working against you as you grow and as latency increases, this is felt especially when you have geographically distributed networks.<\/p>\n

Federation has been discussed often in the past but nothing ever happened, NATS ended up forcing my hand because it only supports a full mesh mode. Something that would not be suitable for a globe spanning network.<\/p>\n

Overview<\/H3>
\nI spent the last week or two building in Federation first into the Choria network protocol and later added a Federation Broker. Federation can be used to connect entirely separate collectives together into one from the perspective of a client.<\/p>\n

<\/center><\/p>\n

Here we can see a distributed Federation of Collectives<\/strong>. Effectively London<\/em>, Tokyo<\/em> and New York<\/em> are entirely standalone collectives. They are smaller, they have their own middleware infrastructure, they even function just like a normal collective and can have clients communicating with those isolated collectives like always.<\/p>\n

I set up 5 node NATS meshes in every region. We then add a Federation Broker cluster that provide bridging services to a central Federation network. I’d suggest running the Federation Broker Cluster one instance on each of your NATS nodes, but you can run as many as you like.<\/p>\n

Correctly configured Clients that connect to the central Federation network will interact with all the isolated collectives as if they are one. All current MCollective features keep working and Sub Collectives can span the entire Federation.<\/p>\n

Impact<\/H3>
\nThere are obvious advantages in large networks – instead of one giant 100 000 node middleware you now need to built 10 x 10 000 node networks, something that is a lot easier to do. With NATS, it’s more or less trivial.<\/p>\n

Not so obvious is how this scales wrt MCollective. MCollective has a mode called Direct Addressing where the client would need to create 1 message for every node targeted in the request. Generally very large requests are discouraged so it works ok. <\/p>\n

These requests being made on the client ends up having to travel individually all across the globe and this is where it starts to hurt.<\/p>\n

With Federation the client will divide the task of producing these per client messages into groups of 200 and pass the request to the Federation Broker Cluster. The cluster will then, in a load shared fashion, do the work for the client. <\/p>\n

Since the Federation Broker tends to be near the individual Collectives this yields a massive reduction in client work and network traffic. The Federation Broker Instances are entirely state free so you can run as many as you like and they will share the workload more or less evenly across them.<\/p>\n

\r\n$ mco federation observe --cluster production\r\nFederation Broker: production\r\n\r\nFederation\r\n  Totals:\r\n    Received: 1024  Sent: 12288\r\n\r\n  Instances:\r\n    1: Received: 533 (52.1%) Sent: 6192 (50.4%)\r\n    2: Received: 491 (47.9%) Sent: 6096 (49.6%)\r\n<\/pre>\n

Above you can see the client offloading the work onto a Federation Broker with 2 cluster members. The client sent 1024 messages but the broker sent 12288 messages on the clients behalf. The 2 instances does a reasonable job of sharing the load of creating and federating the messages across them.<\/p>\n

In my tests against large collectives this speeds up the request significantly and greatly reduce the client load.<\/p>\n

In the simple broadcast case there is no speed up, but when doing 10 000 requests in a loop the overhead of Federation was about 2 seconds over the 10 000 requests – so hardly noticeable.<\/p>\n

Future Direction<\/H3>
\nThe Choria protocol supports Federation in a way that is not tied to its specific Federation Broker implementation. The basic POC Federation Broker was around 200 lines so not really a great challenge to write. <\/p>\n

I imagine in time we might see a few options here:<\/p>\n