Select Page

Bayes Host Classification

I run a little anti spam service and often try out different strategies to combat spam.  At present I have a custom nameserver that I wrote that does lots of regex checks against hostnames and tries to determine if a host is a dynamic ip or a static ip.  I use the server in standard RBL lookups.

The theory is that dynamic hosts are suspicious and so they get a greylist penalty, doing lots of regular expressions though is not the best option and I often have to fiddle these things to be effective.  I thought I’d try a Bayesian approach using Ruby Classifier

I pulled out 400 known dynamic ips and 400 good ones from my stats and used them to train the classifier:

require ‘rubygems’
require ‘stemmer’
require ‘classifier’

classifier = Classifier::Bayes.new(‘bad’, ‘good’)

classifier.train_bad(“3e70dcb2.adsl.enternet.hu”)
.
.

classifier.train_good(“mail193.messagelabs.com”)
.
.

I then fed 100 of each known good and known bad hostnames – ones not in the initial dataset –  through it and had a 100% hit on good names and only 5 bad hosts classified as good.

This is very impressive and more than acceptable for my needs, now if only there was a good Net::DNS port to Ruby that also included the Nameserver classes.

Load Balancing with HAProxy

Load Balancers are some of the most expensive bits of equipment small to medium size sites are likely to buy, even more expensive than database servers.

Since I help a number of smaller and young startups a good Open Source load balancer is essential, I use HAProxy for this purpose.

HAProxy is a high performance non threaded load balancer, it supports a lot of really excellent features like regular expression based logic to route certain types of requests to different backend servers, session tracking using cookies or URL parts and has extensive documentation.

Getting a full redundant set of load balancers going with it requires the help of something like Linux-HA which I use extensively for this purpose, the combination of HAProxy and Linux-HA gives you a full active-passive cluster with failover capabilities that really does work a charm.

I recently had to reload a HAProxy instance after about a 100 day uptime, its performance stats were 1.8 billion requests, 15TB out and just short of 2TB in


Worth checking out HAProxy before shelling out GBP15 000 for 2 x hardware load balancers.

SixXS IPv6 and CentOS

I thought its high time I get to spend some time with IPv6 so I signed up for a static tunnel from sixxs.net, apart from taking some time it’s a fairly painless process to get going.

I chose a static tunnel since I am just 9ms from one of their brokers and my machine is up all the time anyway, they have some docs on how to get RedHat machines talking to them but it was not particularly accurate, this is what I did:

You’ll get a mail from them listing your details, something like this:

Tunnel Id          : T21201
  
PoP Name           : dedus01 (de.speedpartner [AS34225])
  
Your Location      : Gunzenhausen, de
  
SixXS IPv6         : 2a01:x:x:x::1/64
  
Your IPv6          : 2a01:x:x:x::2/64
  
SixXS IPv4         : 91.184.37.98
  
Tunnel Type        : Static (Proto-41)
  
Your IPv4          : 78.x.x.x

Using this you can now configure your CentOS machine to bring the tunnel up, you need to edit these files:

/etc/sysconfig/network:

NETWORKING_IPV6=yes
IPV6_DEFAULTDEV=sit1

/etc/sysconfig/network-scripts/ifcfg-sit1

DEVICE=sit1
BOOTPROTO=none
ONBOOT=yes
IPV6INIT=yes
IPV6_TUNNELNAME=”sixxs”
IPV6TUNNELIPV4=”91.184.37.98″
IPV6TUNNELIPV4LOCAL=”78.x.x.x”
IPV6ADDR=”2a01:x:x:x::2/64″
IPV6_MTU=”1280″
TYPE=sit

Just replace the values from your email into the files above, once you have this in place reboot or restart your networking and you should see something like this:

sit1      Link encap:IPv6-in-IPv4  
          inet6 addr: 2a01:x:x:x::2/64 Scope:Global
          inet6 addr: fe80::4e2f:c3c6/128 Scope:Link
          UP POINTOPOINT RUNNING NOARP  MTU:1480  Metric:1
          RX packets:9796 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7301 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:7181061 (6.8 MiB)  TX bytes:1277642 (1.2 MiB)

% ping6 -c 3 -n noc.sixxs.net
PING noc.sixxs.net(2001:838:1:1:210:dcff:fe20:7c7c) 56 data bytes
64 bytes from 2001:838:1:1:210:dcff:fe20:7c7c: icmp_seq=0 ttl=57 time=20.2 ms
64 bytes from 2001:838:1:1:210:dcff:fe20:7c7c: icmp_seq=1 ttl=57 time=28.4 ms
64 bytes from 2001:838:1:1:210:dcff:fe20:7c7c: icmp_seq=2 ttl=57 time=20.1 ms
— noc.sixxs.net ping statistics —

3 packets transmitted, 3 received, 0% packet loss, time 2008ms

rtt min/avg/max/mdev = 20.181/22.934/28.406/3.869 ms, pipe 2

Since this is a remote machine it took me some time to figure out how to get browsing going through it, but once I reconnected my SSH SOCKS tunnel it immediately became IPv6 aware and were happily routing me to sites like ipv6.google.com.  To do this just run from your desktop:

ssh -D 1080 yourbox.net

Now set your firefox network.proxy.socks_remote_dns setting to true in about:config, and point your browser at localhost:1080 as a socks proxy, your SSH should now work as a perfectly effective ipv4-to-6 gateway.  You can test it by browsing to either the sixxs.net homepage or ipv6.google.com – watch out for the special google logo.

Location aware Bind for RedHat 5.3

Previously I wrote about RPMs I built to GeoIP enable Bind using the original patches at http://www.caraytech.com/geodns/.

I have now refreshed this for the latest CentOS 5.3, the details of the patch, install instructions etc has not changed, read the previous article I wrote for the details, the new RPMs are below:

NOTE: When you install these RPMs you won’t see a /etc/named.conf being created and a few other odd things, these are bugs that exist in the CentOS provided RPMs, they do the same.

bind-9.3.4-10.P1geodns.el5.i386.rpm
bind-chroot-9.3.4-10.P1geodns.el5.i386.rpm
bind-devel-9.3.4-10.P1geodns.el5.i386.rpm
bind-libbind-devel-9.3.4-10.P1geodns.el5.i386.rpm
bind-libs-9.3.4-10.P1geodns.el5.i386.rpm
bind-utils-9.3.4-10.P1geodns.el5.i386.rpm
bind-sdb-9.3.4-10.P1geodns.el5.i386.rpm
caching-nameserver-9.3.4-10.P1geodns.el5.i386.rpm

bind-9.3.4-10.P1geodns.el5.x86_64.rpm
bind-chroot-9.3.4-10.P1geodns.el5.x86_64.rpm
bind-libbind-devel-9.3.4-10.P1geodns.el5.x86_64.rpm
bind-devel-9.3.4-10.P1geodns.el5.x86_64.rpm
bind-libs-9.3.4-10.P1geodns.el5.x86_64.rpm
bind-sdb-9.3.4-10.P1geodns.el5.x86_64.rpm
bind-utils-9.3.4-10.P1geodns.el5.x86_64.rpm
caching-nameserver-9.3.4-10.P1geodns.el5.x86_64.rpm

bind-9.3.4-10.P1geodns.el5.src.rpm

bind.spec-diff

CentOS 5.3

CentOS 5.3 was released on the 1st of April, I’ve since updated a whole lot of my machines to this version and been very happy.

There are a few gotchas, mostly well covered in the release notes, the only other odd thing I found was that /etc/snmp/snmpd.options has now moved to /etc/sysconfig/snmpd.options ditto for snmptrapd.options.  It’s a bit of a weird change, while it makes the SNMPD config a bit more like the rest of the RedHat system, it still is different, you’d think based on all the other files in /etc/sysconfig that this one would have been called /etc/sysconfig/snmpd rather than have the .options bit tacked on.

Other changes that I noticed is that Xen is behaving a lot better now on suspends, if I reboot a dom0 and then bring it back up the domU’s resume where they were and unlike the past the clocks do not go all over the place, in fact I’ve even seen SSH sessions stay up between reboots.  Though SNMP still sometimes stop working after resume.

The general overall look of the distribution is much better, the artwork has been redone through out and now forms a nice cohesive look and feel through out.

While investigating the cause of the /etc/snmp/snmpd.options file mysteriously going missing I once again had the miss fortune of having to deal with #centos on freenode.  It really is one of the most hostile channels I’ve come across in the opensource world, people are just outright arseholes, every one including the project leaders. 

Immediately assuming you have no clue, don’t know what you’re talking about and generally just treating everyone like shit who dare suggest something is broken with the usual ‘works for me’ ‘read the docs’ or ‘its in the release notes’ or ‘looking at the source will not help’ style responses to every question.  When as it turns out every one of those remarks were just plain wrong. No it didn’t work for them, their files also got moved by the installer.  No it was not in the docs or release notes.  No looking at the source would have helped a lot more than they did because I would have then been able to see for myself that the post install of the RPM moves the files etc.  It took literally over a hour to get even one of them to actually make the effort to be helpful compared to about 2 minutes it would have taken if the SRPMs were available at release time.

I think they’re really doing the project a big disservice by not sorting out the irc channel in fact they actively defend and even promote the hostility shown there, in contrast to the puppet irc channel for instance it really is a barbaric bit of the 3rd world.