MCollective 1.0.1 demo using AWS CloudFormation

02/26/2011

Amazon is keeping things ticking along nicely by constantly adding features to their offerings. I’m almost more impressed at the pace and diversity of innovation than the final solution.

During the week they announced AWS CloudFormation. Rather than add to the already unbearable tedium of “it’s not a Chef or Puppet killer” blog posts I thought I’d just go ahead and do something with it.

Till now people who wanted to evaluate MCollective had to go through a manual process of starting first the ActiveMQ instance, gathering some data and then start a number of other instances supplying user data for the ActiveMQ instance. This was by no means a painful solution but CloudFormation can make this much better.

I’ve created a new demo using CloudFormation and recorded a video showing how to use it etc, you can read all about it here.

The demo has been upgraded with the latest production MCollective version that came out recently. This collective has the same features as the previous demos, registration, package and a few other bits.

Impact
Overall I think this is a very strong entry into the market, it needs work still but its a step in the right direction. I dislike typing JSON about as much as I dislike typing XML but this isn’t an Amazon problem to fix – that’s what frameworks and API clients are for.

It’s still markup aimed at machines and the following pretty much ensures user error as much as XML does:

 "UserData" : { 
            "Fn::Base64" : { 
                "Fn::Join" : [ ":", [ 
                    "PORT=80", 
                    "TOPIC=", {
                        "Ref" : "logical name of an AWS::SNS::Topic resource" 
                    }, 
                    "ACCESS_KEY=", { "Ref" : "AccessKey" },
                    "SECRET_KEY=", { "Ref" : "SecretKey" } ] 
                ] 
            } 
         },

CloudFormation represents a great opportunity for the Framework builders like Puppet Labs and Opscode as it can enhance their offerings by a long way especially for Puppet a platrform wide view is something that is very desperately needed – not to mention basic Cloud integration.

Tools like Fog and its peers will no doubt soon support this feature so will knife as a side effect.

Issues
I have a few issues with the current offering, it seems a bit first-iteration like and I have no doubt things will improve. The issues I have are fairly simple ones and I am surprised they weren’t addressed in the first release really.

Fn::GetAtt is too limited
You can gain access to properties of other resources using the Fn::GetAtt function, for instance say you created a RDS database and need its IP address:

"Fn::GetAtt" : [ "MyDB" , "Endpoint.Address"]

This is pretty great but unfortunately the list of resources it can access is extremely limited.

For example, given an EC2 image you might want to find out it’s private IP address – what if it’s offering an internal service like my ActiveMQ instances does to MCollective? You can’t get access to this the only attribute that is available is the public IP address. This is a problem, if you talk to that you will get billed even between your own instances! So you have to do a PTR lookup on the IP and then use the public DNS name or do another lookup and rely on the Amazon split horizon DNS to direct you to the private IP. This is unfortunate since it ads a lot of error prone steps to the process.

This situation is repeated for more or less every resource that CloudFormation can manage. Hopefully this situation will improve pretty rapidly.

Lack of options
When creating a stack it seems obvious you might want to create 2 webservers, at the moment – unless I am completely blind and missed something – you have to specify each instance completely rather than have a simple property that instructs it to make multiples of the same resource.

This seems a very obvious omission, it’s such a big one that I am sure I just missed something in the documentation.

Some stuff just don’t work
I’ll preface this by again saying I might just not be getting something. You’re supposed to be able to create a Security Group that allows all machines in another Security Group to communicate with it.

The documentation says you can set SourceSecurityGroupName and then you don’t have to set SourceSecurityGroupOwnerId.

Unfortunately try as I might I can’t get it to stop complaining that SourceSecurityGroupOwnerId isn’t set when I set SourceSecurityGroupName which is just crazy talk since there’s no way to look up the current Owner ID in any GetAtt property.

Additionally it claims the FromPort and ToPort properties are compulsory but all the docs on the APIs says you cannot set those if you set the SourceSecurityGroupName in the individual APIs

I’ve given up making proper security group the way I want them for the purpose of my demo but I am fairly sure this is a bug.

Slow
If you have a Stack with 10 resources it will do some ordering based on your ref’s, for example if 5 other instances requires the public IP of a 6th it will create the right one first.

Unfortunately even though there then is no reason for things to happen in a given order it will just sit and create the resources one by one in a serial manner rather than start all the requests in parallel.

With even a reasonably complex Stack this can be very tedious, starting a 6 node Stack can take 15 minutes easily. It then also shuts the stack down in series so just booting and shutting it wastes 30 minutes on 6 nodes!

Definitely some room for improvement here, I’d like to give developers a self service environment they won’t enjoy sitting waiting for 1/2 hour before they can get going.

Shoddy Documentation
I’ve come across a number of documentation inconsistencies that can be really annoying. Little things like PublicIP vs PublicIp that makes writing the JSON files a error prone cycle of try and try and try. This is a very easy thing to fix it’s only worth mentioning in relation to the next point.

Given how slow it is to create/tear down stacks if you got something wrong this problem can really hurt your productivity.

The AWS Console
Given that the docs are a bit bad you’ll be spending some time in the AWS console or the CLI tool. I didn’t try to the ClI tools in this case but the console is not great at all.

I am seeing weird things where I upload a new template into a new Formation and it gets an older one at first. Looking at how it works – saving the JSON to a bucket under your name all with unique names I chalked this down to user error. But then I tried harder not to mess it up and it does seem to keep happening.

I’m still not quite ready to blame the AWS console for this though, might be some browser caching or something else to blame, either way it makes the whole thing a lot more frustrating to use.

What I am 100% ready to blame it for is just general unfriendlyness with error messages and I am guessing the feedback you’d get from the API is equally bad:

"Invalid template resource property IP"

Needless to say the string ‘IP’ isn’t even anywhere to be found in my template.

When I eventually tracked this down it was due to the caching mentioned above working on an old JSON file and a user error on my side, but I didn’t see it because it wasn’t clear it was using a old JSON file and not the one I set on my browser upload.

So my recommendation is just not to use the AWS console while creating stacks, it’s an end user tool and excels at that, when building stacks use the CLI as it includes tools to do local validations of the JSON and avoid annoying browser caches etc.