Select Page

Managing web traffic with ruby-pdns

A short while ago I wrote about releasing a Ruby Development framework for PowerDNS the release is still early days, feature complete but needs some robustness tweaks and a new release will be out in a week or so to address that.

I wanted though to highlight some success that I’ve had using it.  I have a small static farm for a client that handles around 2MiB/sec of 200×200 jpg files, this setup is for a startup so out of necessity its all built to be cheap, I host on networks I don’t own yet I need pretty good control over it, what IPs will be used to serve traffic and so forth.

The graph above shows the case before caused by the windows DNS bug, you’ll see the bottom host is working pretty hard getting a large chunk of the bandwidth.

This is a problem because come mid month this poor machine has already used up its allocation of 2.5TiB of transfer and I need to move it from the pool.

So my goal was to shift the traffic to the yellow and green machines and just generally balance things out a bit. I used the Weighted Round Robin feature of ruby-pdns to adjust the biases, it took a bit of fiddling because for some other reason even when this machine gets fewer requests per second it still seems to manage more in terms of bandwidth, this is the eventual code snippet:

ips = ["213.x.x.232",                         # dark blue
          "88.x.x.201",                       # lighter blue
          "82.x.x.180",                       # yellow
          "82.x.x.181"].randomize([1,2,2,3])  # green
 
answer.shuffle false
answer.content [:A, ips[0]]
answer.content [:A, ips[1]]
answer.content [:A, ips[2]]

The thresholds seems odd but that’s what worked after some fiddling, see the graph below.

This is much nicer balanced, it’s not perfect and I doubt I will get it perfect with just 4 machines to play with but I believe it’s already at the point where it means I can use all my machines for the entire month without hitting any limits.

Here’s another graph over the week showing things side by side:

The improvement is very obvious in this graph and you can see I’ve not lost anything in performance between first day and last day on the graph in terms of throughput (the lower days were days where lower traffic is expected).

If I look at my actual transfer used it’s better balanced now, first lets see the 12th:

08/12/09    12.67 GiB |  46.42 GiB |  59.09 GiB |   5.74 Mbit/s
08/12/09     7.71 GiB |  21.32 GiB |  29.04 GiB |   2.82 Mbit/s
08/12/09     9.05 GiB |  23.05 GiB |  32.10 GiB |   3.12 Mbit/s
08/12/09     6.94 GiB |  16.56 GiB |  23.50 GiB |   2.28 Mbit/s

Again the skew is very clear with a 23GiB on the lowest compared to 59GiB on the highest use machine, on the 17th it looked a lot better:

08/17/09     7.84 GiB |  28.55 GiB |  36.39 GiB |   3.53 Mbit/s
08/17/09     8.46 GiB |  25.66 GiB |  34.12 GiB |   3.31 Mbit/s
08/17/09    11.21 GiB |  30.70 GiB |  41.91 GiB |   4.07 Mbit/s
08/17/09    10.25 GiB |  28.20 GiB |  38.46 GiB |   3.73 Mbit/s

Obviously much better when looking at the 2nd to last column. The first column is received the increase in those is down to a slightly lower hit ratio on the caching proxy on these machines meaning it’s fetching more files from origin than the others.

Overall I am extremely pleased with this solution, I agree one should not be using DNS as a hammer to all your nails but for startups and cloud based people who do not have control over networks, BGP tables and so forth this really does represent a viable option to what would otherwise be an extremely expensive problem to solve.

News from SA

So today I decided to follow @iol on my twitter client to try and keep in touch with things back home.

These are some of the messages I got in the first 11 hours:

  • Still no trace of missing baby http://bit.ly/cUafy
  • ‘I could not clear away images of the dead’ http://bit.ly/NjQxq
  • Woman forgives parents for killing her lover http://bit.ly/xd2P7
  • ‘Mass hysteria’ sweeps six E Cape schools http://bit.ly/yE30R
  • Bona magazine editor killed http://bit.ly/13JN08
  • Suspect shoots himself in groin http://bit.ly/29UNDM
  • Baby’s body found in car boot http://bit.ly/349oq
  • Chabaan guilty of assault http://bit.ly/XpKZs
  • Hammer attack hit straight to teacher’s core http://bit.ly/YbdD6
  • MJC rejects the slaughter of chickens claim http://bit.ly/23igtE
  • ‘I just can’t wait till this is all over’ http://bit.ly/2mWbg
  • Judge’s death: Unanswered questions remain http://bit.ly/4aqPW
  • Slain editor: Family believes he knew killers http://bit.ly/anMLt

That all in 11 hours of headlines.  Shocking.

Ruby PowerDNS Framework

Regular readers here will know I patch bind with GeoIP extensions, this has served me well but my needs have now outgrown simply doing geo related replies.

I’ve for a long time had an itch to be able to do completely custom DNS, maybe respond to monitoring, or time of day, geographical location or even to work around some unbelievably annoying bugs in windows that breaks all round robin dns, this has not been possible with Bind.

PowerDNS has a backend that simply speaks via STDIN and STDOUT to any script, the documentation though is pretty shoddy but I quickly realized this is the way to go.  Once I figured out all the various weird things about PDNS and the Pipe backend I set about writing a framework to host many records in a single PDNS server – in a way that hides and abstracts all the PowerDNS details from the code

The end goal is that I would dump some Ruby code into a file on the server and it should just be served, when I get new code I just want to overwrite the old code, no restarts or anything it must just serve it.

I wanted the code to be trivially simple, something like this:

module Pdns
  newrecord(“www.your.net”) do |query, answer|
    case country(query[:remoteip])
      when “US”, “CA”
        answer.content “64.xx.xx.245”

      when “ZA”, “ZW”
        answer.content “196.xx.xx.10”

      else
        answer.content “78.xx.xx.140”
      end
  end
end

should be all that is needed to do GeoIP based serving, and really complex things like weighted random round robins that effectively work around the bugs in client resolvers like the windows one above:

        ips = [“1.x.x.x”, “2.x.x.x”, “3.x.x.x”, “4.x.x.x”, “5.x.x.x”]

        ips = ips.randomize([1,5,3,3,3])

        answer.shuffle false
        answer.ttl 300
        answer.content ips[0]
        answer.content ips[1]
        answer.content ips[2]

This code will take 5 ip addresses, shuffle them giving the first one least weight, the 2nd one most weight and return only 3 out of the 5 results, this would be impossible in Bind but trivial to imagine coding if only you could hook into the nameserver.

Anyway, so I wrote a framework that enables exactly this, the code snippets above are actual working snippets.  The code is hosted on Google Code as ruby-pdns and is at version 0.3 at present.

I’ve release tarball and RPM versions of the code, the code is publicly browsable and licensed under the GPLv2. 

At present I think I’ve documented it all fairly well with a good set of Wiki pages though the install instructions for non RPM based install leaves a bit to be desired, I’ll work on improving that.

I’ve been running this code myself serving 10’s of 1000s of queries a day and have used the technique above to work around windows bugs.  I’m looking for testers to start using the code and sending me feedback, there are groups, tickets and all set up for that on Goole Code.

SSH socks proxies hanging

I use SSH’s socks proxy feature a lot, in fact I use it all the time, most of my browsing, IM, etc all goes over it out via my hosted virtual machines,

I do this to simplify my life for things like firewall rules and also to get around things like age blocks on mobile networks.  I work for a site deemed adult by most of them so I can’t even see my nagios without age verifying.

Recently they have been driving me nuts, every now and then the whole session would just lock up and sit there doing nothing, I’ve not seen this happen before and was a bit stumped.

Turns out, it chooses to speak to TCP/53 sometimes instead of UDP/53 for resolving, not sure why exactly, I’ve not tried to figure out what queries cause this – I know there are limits to response sizes which will force it to go over TCP.  Why it’s only started doing this now I don’t know, maybe a update changed behavior, I’ve never had TCP/53 open on the cache. 

My firewall was blocking TCP/53 on the local cache so this would lock up the whole ssh session, maybe the whole ssh process is single threaded and so waiting in SYN_SENT mode just hangs the whole thing, that’s a bit sucky, I might need a better proxy.

Imposter Alert!

You’d be thinking based on the last 2 posts that someone is trying to convince the world that I’ve gone mad and do actually like Debian.

Actually I am letting some other people guest blog here, the first is Mark Webster aka LSD, he’s a developer, systems dude and all round kewl guy working in London on all sorts of interesting stuff, most recently about optimizing Linux kernels to get insane amounts of packets per second out of them.

Look out for more great posts from Mark hopefully detailing more of his experiences tuning kernels and such.

I’d also be interested to hear from other like minded people who want to guest blog here, I’ll over the next while take out some of the links and stuff that makes this site personal and more friendly to guest bloggers.

Introductions

Hello World.

I’m Mark, and I do a lot of programming, designing of systems, working ridiculous hours, ranting about many things, and I am frequently guilty of re-inventing wheels (which shall henceforth be refered to as either ‘improvement’ or ‘learning’ :-). Currently, I am involved in re-inventing designing a new suite of telephone conference call back-end systems for a rapidly expanding conference call company, from the ground up.
Doing this kind of work on the carrier grade level involves the convergence of a lot of technologies, and there’s a truckload of R&D involved, which is bloody fantastic since I get bored too easily when there’s nothing new to learn, or not enough diversity.
I arrived in this part of the IT industry after doing some weird things:
  • Five years in the games industry (hellishly boring; trust me – I’ll explain why another time)
  • A few years developing bespoke systems, providing services and Linux “appliances” for businesses around South Africa
  • Loads of freelance development on various platforms (mobile handsets, 8-bit embedded systems, even Windows apps)
  • An entire childhood & adolescence involved (misspent?) in the demoscene. Epic fun. Low-level programming, register fiddling, cycle counting and reverse engineering is the shit!
Anyway, talk is cheap, and I’ve clogged the Intertubes quite enough!
Coming up next, something that has been a constant thorn in my side as a wretched Debian user: building custom kernels.