Some more about my continuing experiences with ruby, in my last post I said
the language does what you’d expect and as you’ll see in my next post
spending a week with it on and off is enough to write a capable multi
threaded socket server.
As it turns out I quickly lived to regret saying that. Soon after I hit publish I started running into some problems with the very same socket server.
A bit of background, Adobe has made a change to how things work moving away from their previous crossdomain.xml file served over HTTP for cross domain authorization to a new model that requires you to run a special TCP server on port 834 serving up XML over a special protocol. I won’t go into how brain dead I think this is, suffice to say I needed to run one of those for a client. Adobe of course does provide a server for this, but it has some issues, I chose the simplest of their examples – Perl under xinetd – and quickly discovered that it has no concept of timeouts, or anything that doesn’t speak it’s protocol. The end result is that you just end up with a ever growing number of perl stuff running waiting around for ages.
I took this as a challenge to write something real under Ruby using it as a learning experience as well, so set out to write a multi threaded server for this. At first glance it looks almost laughably trivial: The Ruby STL includes GServer – a very nice class that does the hard work of thread management for you, you just inherit from it and supply the logic for your protocol and let it do the rest, awesome.
I wrote this, put in logging, options parsing and all the various bits I needed, tested it locally – 10 concurrent workers doing 200,000 requests and it served it in no time at all with limited CPU impact. I then wrote RC scripts, config files and all that and deployed it at my client.
Real soon after deploying it I noticed the wheels came off a bit. I, out of curiosity, put in some regular logging that would print lines like:
Jun 23 08:23:37 xmpp1 flashpolicyd: Had 10042 clients of which 285 were bogus. Uptime 0 days 14 hours 2 min. 23 client(s) connected now.
Note in that line how it claims to have 23 connections at present? That’s complete b/s, I added the ability to dump actual created threads and there just weren’t enough threads for 23 clients and the TCP stack agreed…Turns out gserver has issues handling bad network connections – my clients are over GPRS, Modems, and all sorts – and it seems threads die without GServer removing them from it’s list of active connections.
This is a small problem except that GServer uses the connection count towards figuring out if you’ve hit its max connections setting. So while I could just set that to some huge figure, it does indicate theres a memory leak – array grows for ever. Not to mention it just leaving me with a bad taste in my mouth over the quality of my new and improved solution.
Naturally I gave up on GServer I didn’t feel like installing all sorts of Gems on the servers so figured I’ll just write my own thread handling. While it’s not trivial its by far not the most complex thing I’ve ever done. Happy in this case with a bit of wheel reinventing for the sake of learning.
I chose to use the Ruby STL Logger Library for logging and even added the ability to alter log level on the fly through sending signals to the process, very nice and I were able to re-use much of the option parsing code etc from my previous attempt so this only took a few hours.
I did the development on my Mac using TextMate – the really kick arse Mac text editor that has nice Ruby support – the Mac is on Ruby 1.8.6. I intended to run this on RHEL 4 and 5, they have Ruby 1.8.1 and 1.8.5 respectively, so I was really setting myself up for problems all of my own making.
Turn out Logger has a bug, fixed here in revision 6262 without any useful svn log, that only bit me on the RHEL 4 machine. It would open the initial log correctly with line buffering enabled, but once it rotates the log the new log and subsequent ones wouldn’t have line buffering. Which in my case means I get log lines showing up once every 5 hours!
This sux a lot, and it’s unlikely that RedHat will backport such a small little thing, and since RedHat 4 will be here till 2012 I guess I’ll just have to patch it myself or move to RedHat 5 on this server, something I planned to do anyway.
So something that should have been fairly trivial has turned into a bit of a pain, not really Ruby’s fault that I am using 1.8.1 when much newer versions are out, but not nice regardless. At the end of it all my flash server is working really well and handling clients perfectly with no leaking or anything bad
I, [2008-06-26T23:02:36.607920 #22532] INFO — : -604398464: Had 15611 clients of which 423 were bogus. Uptime 0 days 13 hours 41 min. 0 connection(s) in use now.
Those bogus clients are ones that timeout or just otherwise never complete a request, these were the ones that would trip up GServer in the past.
Once I’ve done documenting it I’ll be releasing the flash server here