Tag Archives: sysadmin

Effective adhoc commands in clusters

Last night I had a bit of a mental dump on twitter about structured data and non structured data when communicating with a cluster or servers – Twitter fails at this kind of stuff so figured I’ll follow up with a blog post.

I started off asking for a list of tools in the cluster admin space and got some great pointers which I am reproducing here:

fabric, cap, func, clusterssh, sshpt, pssh, massh, clustershell, controltier, rash (related), dsh, chef knife ssh, pdsh+dshbak and of course mcollective. I was also sent a list of ssh related tools which is awesome.

The point I feel needs to be made is that in general these tools just run commands on remote servers. They are not aware of the commands output structure, what denotes pass or fail in the context of the command etc. Basically the commands people run are commands designed for ages to be looked at by human eyes and then parsed by a human mind. Yes they are easy to pipe and grep and chop up, but ultimately it was always designed to be run on one server at a time.

The parallel ssh’ers run these commands in parallel and you tend to get a mash of output. The output is mixed STDOUT and STDERR and often output from different machines are multiplexed into each other so you get a stream of text that is hard to decipher even on 2 machines, not to mention 200 at once.

Take as an example a simple yum command to install a package:

% yum install zsh
Loaded plugins: fastestmirror, priorities, protectbase, security
Loading mirror speeds from cached hostfile
372 packages excluded due to repository priority protections
0 packages excluded due to repository protections
Setting up Install Process
Package zsh-4.2.6-3.el5.i386 already installed and latest version
Nothing to do

When run on one machine you pretty much immediately know whats going on, package was already there so nothing got done, now lets see cap invoke:

# cap invoke COMMAND="yum -y install zsh"
  * executing `invoke'
  * executing "yum -y install zsh"
    servers: ["web1", "web2", "web3"]
    [web2] executing command
    [web1] executing command
    [web3] executing command
 ** [out :: web2] Loaded plugins: fastestmirror, priorities, protectbase, security
 ** [out :: web2] Loading mirror speeds from cached hostfile
 ** [out :: web3] Loaded plugins: fastestmirror, priorities, protectbase
 ** [out :: web3] Loading mirror speeds from cached hostfile
 ** [out :: web3] 495 packages excluded due to repository priority protections
 ** [out :: web2] 495 packages excluded due to repository priority protections
 ** [out :: web3] 0 packages excluded due to repository protections
 ** [out :: web3] Setting up Install Process
 ** [out :: web2] 0 packages excluded due to repository protections
 ** [out :: web2] Setting up Install Process
 ** [out :: web1] Loaded plugins: fastestmirror, priorities, protectbase
 ** [out :: web3] Package zsh-4.2.6-3.el5.x86_64 already installed and latest version
 ** [out :: web3] Nothing to do
 ** [out :: web1] Loading mirror speeds from cached hostfile
 ** [out :: web1] Install       1 Package(s)
 ** [out :: web2] Package zsh-4.2.6-3.el5.x86_64 already installed and latest version
 ** [out :: web2] Nothing to do
 ** [out :: web1] 548 packages excluded due to repository priority protections
 ** [out :: web1] 0 packages excluded due to repository protections
 ** [out :: web1] Setting up Install Process
 ** [out :: web1] Resolving Dependencies
 ** [out :: web1] --> Running transaction check
 ** [out :: web1] ---> Package zsh.x86_64 0:4.2.6-3.el5 set to be updated
 ** [out :: web1] --> Finished Dependency Resolution
 ** [out :: web1]
 ** [out :: web1] Dependencies Resolved
 ** [out :: web1]
 ** [out :: web1] ================================================================================
 ** [out :: web1] Package      Arch            Version                Repository            Size
 ** [out :: web1] ================================================================================
 ** [out :: web1] Installing:
 ** [out :: web1] zsh          x86_64          4.2.6-3.el5            centos-base          1.7 M
 ** [out :: web1]
 ** [out :: web1] Transaction Summary
 ** [out :: web1] ================================================================================
 ** [out :: web1] Install       1 Package(s)
 ** [out :: web1] Upgrade       0 Package(s)
 ** [out :: web1]
 ** [out :: web1] Total download size: 1.7 M
 ** [out :: web1] Downloading Packages:
 ** [out :: web1] Running rpm_check_debug
 ** [out :: web1] Running Transaction Test
 ** [out :: web1] Finished Transaction Test
 ** [out :: web1] Transaction Test Succeeded
 ** [out :: web1] Running Transaction
 ** [out :: web1] Installing     : zsh                                                      1/1
 ** [out :: web1]
 ** [out :: web1]
 ** [out :: web1] Installed:
 ** [out :: web1] zsh.x86_64 0:4.2.6-3.el5
 ** [out :: web1]
 ** [out :: web1] Complete!
    command finished
zlib(finalizer): the stream was freed prematurely.
zlib(finalizer): the stream was freed prematurely.
zlib(finalizer): the stream was freed prematurely.

Most of this stuff scrolled off my screen and at the end all I had was the last bit of output. I could scroll up and still figure out ok what was going on – 2 of the 3 already had it installed, one got it. Now imagine 100 or 500 of these machines output all mixed in? Just parsing this output would be prone to human error and you’re likely to miss that something failed.

So here is my point, your cluster management tool need to provide an API around the every day commands like packages, process listing etc. It should return structured data and you could use the structured data to create tools more fit for the purpose of using on large amount of machines. Being that the output is standardized it should provide generic tools that just do the right thing out of the box for you.

With the package example above knowing that all 500 machines had spewed out a bunch of stuff while installing isn’t important, you just want to know the result in a nice way. Here’s what mcollective does:

$ mc-package install zsh
 
 * [ ============================================================> ] 3 / 3
 
web2.my.net                      version = zsh-4.2.6-3.el5
web3.my.net                      version = zsh-4.2.6-3.el5
web1.my.net                      version = zsh-4.2.6-3.el5
 
---- package agent summary ----
           Nodes: 3 / 3
        Versions: 3 * 4.2.6-3.el5
    Elapsed Time: 16.33 s

In the case of a package you want to just know the version post the event and a summary of status. Just by looking at the stats I know the desired result was achieved, if I had different versions listed I could very quickly identify the problem ones.

Here’s another example – NRPE this time:

% mc-rpc nrpe runcommand command=check_disks
 
 * [ ============================================================> ] 47 / 47
 
 
dev1.my.net                      Request Aborted
   CRITICAL
          Exit Code: 2
   Performance Data:  /=4111MB;3706;3924;0;4361 /boot=26MB;83;88;0;98 /dev/shm=0MB;217;230;0;256
             Output: DISK CRITICAL - free space: / 24 MB (0% inode=86%);
 
 
Finished processing 47 / 47 hosts in 766.11 ms

Here notice I didn’t use a NRPE specific mc- command, I just used the generic rpc caller and the caller knows that I am only interesting in seeing the results of machines that are in WARNING or CRITICAL state. If you run this on your console you’d see the ‘Request Aborted’ would be red and the ‘CRITICAL’ would be yellow. Immediately pulling your eye to the important information. Also note how the result shows human friendly field names like ‘Performance Data’.

The formatting, highlighting, knowledge to only show failing resources and human friendly headings all happen automatically, no programming of client side UI is required you get the ability to do this for free simply from the fact that mcollective focuses on putting structure around outputs.

Here’s the earlier package install example with the standard rpc caller not with a specialized package frontend:

% mc-rpc package install package=zsh
Determining the amount of hosts matching filter for 2 seconds .... 47
 
 * [ ============================================================> ] 47 / 47
 
Finished processing 47 / 47 hosts in 2346.05 ms

Everything worked, all 47 machines have the package installed and your desired action was taken. So no point in spamming you with pages of junk, who cares to see all the Yum output? Had an install failed you’d have had usable error message just for the host that failed. The output would be equally usable on one or a thousand hosts with very little margin for human error in knowing the result of your request.

This happens because mcollective has a standard structure of responses, each response has a absolute success value that tells you if the request failed or not and by using this you can get generic CLI, Web, etc tools that displays large amounts of data from a network of hosts in a way that is appropriate and context aware.

For reference here’s the response as received on the client:

{:sender=>"dev1.my.net",
 :statuscode=>1,
 :statusmsg=>"CRITICAL",
 :data=>
  {:perfdata=>
    " /=4111MB;3706;3924;0;4361 /boot=26MB;83;88;0;98 /dev/shm=0MB;217;230;0;256",
   :output=>"DISK CRITICAL - free space: / 24 MB (0% inode=86%);",
   :exitcode=>2}}

Only by thinking about CLI and admin tasks in this way do I believe we can take the Unix utilities that we call on remote hosts and turn them into something appropriate for large scale parallel use that doesn’t overwhelm the human at the other end with information. Additionally since this is an API that is computer friendly it makes those tools usable in many other places like code deployers – for example to enable your continues deployment using robust use of unix tools via such an API.

There are many other advantages to this approach. Requests are authorized on a very fine level, requests are audited. API wrappers are code that’s versioned, that can be tested in development and makes the margin for error much smaller than just running random unix commands ad hoc. Finally if you’re using the code on a CLI ad-hoc as above or in your continues deployer you share the same code that you’ve already tested and trust.

Read full storyComments { 4 }

What does puppet manage on a node?

Sometimes it’s nice to try and figure out what resources of a machine are being managed by puppet.  Puppet keeps a state file in either YAML or Marshall format called localconfig.yaml it’s full of useful information, I wrote a quick script to parse it and show you what’s being managed.

Typical output is:

Classes included on this node:
        nephilim.ml.org
        common::linux
        <snip>

Resources managed by puppet on this node:
        service{smokeping: }
                defined in common/modules/smokeping/manifests/service.pp:6

        file{/etc/cron.d/mrtg: }
                defined in common/modules/puppet/manifests/init.pp:201
<snip>

It will show all classes and all resources including where in your manifests the resource comes from.  Unfortunately for resources created by defines it shows the define as the source but I guess you can’t have it all.

You can get the code here it’s pretty simple, just pass it a path to your localconfig.yaml file, it supports both YAML and Marshal formats.

The file also has every property of the resources in it etc, so you can easily extend this to print a lot of other information, just use something like pp to dump out the contents of Puppet::TransObject objects to see what’s possible.

Read full storyComments { 1 }
MySQL Defaults and Load time

MySQL Defaults and Load time

We all know not to use the default mysql config, right?

Well I accidentally left a machine to defaults, then tried to load a massive dump file into it, a month later I finally killed the process loading the data.  I gave up on it ages ago but it got to the point where it was some curiosity to see just how long it will take.

As you can see from above, it was pretty dismal, slowly creeping up over time – the big jump in the beginning is when I scp’d the data onto the machine.  So after killing it I had another look at the config and noticed it was the default distributed one, tuned it to better use the memory for innodb buffers and got the result below.

That’s just short of 2 days to load the data, still pretty crap, but so much better at the same time.

Read full storyComments { 1 }

iptables chains

A lifetime ago when I still gave a damn for FreeBSD I wrote about ipfw tables, I still really love ipfw’s simple syntax and really wish there was something similar for Linux rather than Human Error Guaranteed convoluted syntax mess that’s iptables.

Anyway, so in my case I have machines all over, one off VPS machines, dom0′s with a subnet routed to them and so forth.  I often have rules that need to match on all my ips, things like allow data into my backup server, allow config retrieval from my puppetmaster etc.  I do not want to maintain my total list of ips 10 times over so how to deal with it?

This is a good fit for ipfw tables, you create a table – essentially an object group like in a Cisco PIX or ASA – and then use it to match source IPs.

In the last week I’ve asked quite a few people how they’d do something similar with iptables but no-one seemed to know, I had people who were happy to maintain the same list many times.  People who would use tools like sed to insert it into their rules and everything in between.  I think I know a better way so I figured I’ll blog about it because it’s obviously something people do not just understand.

Iptables ofcourse use chains, and you can jump to and from chains all you want, this is very simple, so lets create a chain with all my IPs

-A my_ips -s 192.168.1.1 -m comment –comment “box1.com” -j ACCEPT
-A my_ips -s 192.168.2.1 -m comment –comment “box2.com” -j ACCEPT
-A my_ips -s 192.168.3.1 -m comment –comment “box3.com” -j ACCEPT

This creates a chain my_ips that just accepts all traffic from my IP addresses, now lets see how we’d allow all my ip addresses into my webserver?

-A INPUT -p tcp –dport 80 -m tcp -j my_ips

So this is something almost as good as a ipfw table, I can reuse it many times on many machines and my overall configuration is much more simple.  It’s not quite as powerful as a table but for my needs it’s fine.

Combined with a tool like Puppet that manages your configurations you can ensure that this chain is installed on any machine that uses iptables, ready to be used and also trivial to update whenever you need too without having to worry about human error incurred from having to maintain many copies of essentially the same data. 

In my environment when I update this table, I check it into SVN and within 30 minutes every machine in my control has the new table and they’ve all reloaded their iptables rules to activate it.  Testing is very easy since puppet allows you to use environments similar to Rails has and so if I really need to I can easily test firewall changes on a small contained set of machines, distributed object group management with version control and everything rolled into one.

Read full storyComments { 0 }

Extracting only certain lines from a file

This is probably old news to most people but I need to remember this so I figured I may as well blog it.

I made a mysqldump that just takes all databases into a single file, already I want to kick myself because I know if I ever need to import it there will be troubles because the target database will already have the mysql database etc.

Really I should have used MySQL Parallel Dump that makes files per tables etc and is much faster but it didn’t exist at the time.

So how to pull lines 8596 to 9613 from this big file?  It’s trivial with sed:

here is a sample file:

$ cat > file.txt
line 1
line 2
line 3
line 4
line 5
^D
$ sed -n '2,4p;4q' file.txt
line 2
line 3
line 4

The sed command just tells it the start to end line and also to quit processing when it hits the end line, really kewl.

Read full storyComments { 0 }
QNAP TS-209 pro NAS

QNAP TS-209 pro NAS

I have been looking for a good solid SOHO Network Attached Storage device for a while.  I was all set on the Lacie 2big 1.5TB Network device, it is attractive does what I needed – not much more than share files – and supports multiple drives.

Problem is I have since discovered that Lacie UK are the most incompetent people on the planet.  I placed the order with them after their site showed they had the unit in-stock on a 3 days delivery time, after placing my order site said the same so I was confident it was all in order.  Needless to say the device never came.  I emailed their sales lines, no response, I emailed their supports lines, no response.  I called them (after spending about a hour tracking down phone numbers) they didn’t reply to voice mails.

After about 10 calls I eventually spoke to someone who was unhelpful to say the least, I was told next-week, next-week etc a few times, next week came and went and no drive unit so I eventually just canceled my order.  No more Lacie devices in my future ever that is a certainty.

Some searching later I found a few excellent reviews over at SmallNetBuilder for this and other devices, they even have a very awesome tool for comparing different NAS devices for speed etc, I decided based on their review to get the QNAP TS-209 pro.

The TS-209 pro is an attractive yet very well built little system, all the screws and connectors are proper solid bits of kit like you’d expect on real hardware.  It is a Linux box and you can ssh to it:

# uname -a
Linux vault 2.6.12.6-arm1 #2 Thu Nov 1 03:31:14 CST 2007 armv5tejl unknown
# cat /proc/cpuinfo
Processor       : ARM926EJ-Sid(wb) rev 0 (v5l)
BogoMIPS        : 332.59
Features        : swp half thumb fastmult
# cat /proc/mdstat
<snip>md0 : active raid1 sdb3[2] sda3[0]
731423296 blocks [2/1] [U_]
[==>..................]  recovery = 10.6% (78181504/731423296) finish=144.0min speed=75574K/sec

So a proper little box then, I put 2 x Seagate 750GB drives into it for the same amount of storage as I would have had in the Lacie, the total price ended being about GBP50 more or so.

That GBP50 is money really well spent in this case.  The device has hot swap drives – I tested it by yanking one out live without any problems, a few beeps, a few emailed alerts and log entries:


The device has a ton of features, the usual SMB shares are there but also NFS, Appletalk, FTP, Web access.  It has a MySQL server built in, a webserver with php so you can deploy whatever you want on it.  An iTunes server for your MP3s and a typical UPNP media server that will work with your PS3 etc. 

This is a really capable device built on solid technology, so far I am very happy with it and will recommend to anyone.  If anything significant change on my experiences I’ll post more later but I suggest you read the review linked above and seriously consider this for your SOHO NAS needs.

Read full storyComments { 2 }

Exim on CentOS 4

I recently bought a new machine from Layeredtech for my commercial mail anti spam system and am having endless troubles with it. I have a similar machine at Hetzner also running CentOS 5 and it too is having problems, though less frequently.

The short of it is that the drives disconnect, file systems go read only and the box needs a reboot:

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: (BMDMA stat 0x4)
ata1.00: tag 0 cmd 0xca Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: soft resetting port
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 226813249
Buffer I/O error on device sda3, logical block 27835568
lost page write due to I/O error on sda3
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 227360961
Buffer I/O error on device sda3, logical block 27904032
lost page write due to I/O error on sda3

So in an effort to figure out if this is a CentOS 5 problem – both ISPs certify CentOS 4 on their hardware – I needed to get my application going on CentOS 4. This turned out to be quite a mission involving getting Exim with MySQL and the recently integrate exiscan rather than the patched version.

I looked at the various options and decided to just backport CentOS 5′s Exim package to CentOS 4.

As it turns out I haven’t yet had a machine re-installed with CentOS 4 as I found some posts suggesting some kernel parameters that might fix things, I’ve applied these now to the machines and wait.

My Exim RPMs can be found below:

exim-4.63-3.src.rpm
exim-4.63-3.i386.rpm
exim-mon-4.63-3.i386.rpm
exim-sa-4.63-3.i386.rpm

As with the CentOS 5 ones you’ll need various DB client libraries installed as this supports speaking to Postgres, MySQL, SQLite etc.

This should be useful to anyone who just wants a more recent version of Exim on their CentOS/RedHat 4 machines.

Read full storyComments { 0 }

Physical Memory Info under Linux

I’ve a number of machines that needs memory upgrades, I didn’t want to turn them off to see what is inside in order to plan this. Under windows it’s pretty easy, just download and run CPU-Z and you’ll know all there is to know.
I did a lot of searching etc and eventually came across dmidecode, you just run it as root and it parses through /dev/mem and loads the DMI tables, parses them and prints them in human readable form.
It shows a lot of useful information, on my IBM HS20 Blade it shows model, serial, hardware numbers etc. Here is a sample of the memory section:

Handle 0x0017
DMI type 16, 15 bytes.
Physical Memory Array
Location: Proprietary Add-on Card
Use: System Memory
Error Correction Type: Multi-bit ECC
Maximum Capacity: 16 GB
Error Information Handle: Not Provided
Number Of Devices: 4
Handle 0x0018
DMI type 17, 21 bytes.
Memory Device
Array Handle: 0x0017
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 512 MB
Form Factor: DIMM
Set: 1
Locator: DIMM1
Bank Locator: Slot 1
Type: DDR
Type Detail: Synchronous
Handle 0x0019
DMI type 17, 21 bytes.
Memory Device
Array Handle: 0x0017
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 512 MB
Form Factor: DIMM
Set: 1
Locator: DIMM2
Bank Locator: Slot 2
Type: DDR
Type Detail: Synchronous
Handle 0x001A
DMI type 17, 21 bytes.
Memory Device
Array Handle: 0x0017
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 512 MB
Form Factor: DIMM
Set: 2
Locator: DIMM3
Bank Locator: Slot 3
Type: DDR
Type Detail: Synchronous
Handle 0x001B
DMI type 17, 21 bytes.
Memory Device
Array Handle: 0x0017
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 512 MB
Form Factor: DIMM
Set: 2
Locator: DIMM4
Bank Locator: Slot 4
Type: DDR
Type Detail: Synchronous

So I have 4 total memory slots, each slot has a 512MB DDR module in it, this means I’ll be throwing it all away and buying new RAM.

Read full storyComments { 2 }
Further lighttpd details

Further lighttpd details

I’ve previously written that I am trying out lighttpd for serving up my static files, I’ve now been running lighttpd and apache in parallel for a while and must say the results are very good in favour of lighttpd.
First a graphic to show the change:


This is a capture out of cacti showing the requests per second for some servers. Look at the yellow line, till about 12 it was running Apache 1.x, then I took that server out, around 12:30 I put in a lighttpd server on the same box and enabled stats from it around 13:00. This is on the same hardware, same files etc same IP address and you can clearly see in terms of requests per second lighttpd totally flies compared to Apache on the same box.
The Apache is a stock Debian Apache 1.3.33, I could probably have sped it up by some tuning, but installing lighttpd is much less work and much less painstaking monitoring, tuning, monitoring, tuning.

Read full storyComments { 0 }

Webserver Performance

I look after a site that serves up a lot of static content, early on already I ran into issues with Apache coping on one machine, also my bandwidth at my main site is pretty expensive so I started farming off my static content to a number of machines hosted at other ISPs, typically paying around 100 pounds per machine per month, as long as I push out less than around 100Gb/month it’s a pretty good deal.
The only problem I have is I get tons of SYN_RECV connections on each of my machines, around 300 of them at any given time. Typically these indicate a lot of connections waiting to be served but the servers handle new requests immediately, there are no time spent waiting for IO on the servers, in fact the CPU’s are 98% idle always.
In an effort to try and resolve this (after much tuning of server sysctl’s) I asked Jaco if he’s seen it before and he suggested giving Lighttpd a try.
I installed it on one of my 3 static servers and ran it for a few days now and the results are encouraging. I still have 300 SYN_RECV’s but the machine is performing much faster than its siblings. On average before Lighttpd I was getting 30 requests/sec out of each of my machines now this one is doing around 50/sec, it is also pushing out about 30Kb/sec more than the other two. Comparing the Lighttpd machine to 2 apache machines on a graph shows it consistently out performs the others by about 20%.


Uptime 4 days 21 hours 19 min 20 s
Requests 18 Mreq
Traffic 20.77 Gbyte
Requests 105 req/s
Traffic 89.99 kbyte/s


There has been some discussion about Lighttpd and Apache benchmarks, one Apache user has written a debunk of the benchmarks. This is linked too from the Lighttpd home page so might be worth investigating some more, I’ve done a lot of the typical things that Apache people recommend but they didn’t help much. I think I’ll try and tune my one Apache server to hit the same performance as the Lighttpd and see if it’s possible, for now though I’m quite happy with the results of a quick 30 minutes spent upgrading to Lighttpd.

Read full storyComments { 0 }