XEN ‘No space left on device’ sillyness.

12/05/2007

Yesterday while trying to get a i386 DomU going on my x86_64 Xen server I ran into some hassles with ‘No space left on device’ errors.
Anyone who sees that would immediately go for the df command, but it would be futile in this instance.
What happens is that the xenstore – where it stores meta files state of the running VMs – gets corrupt,
You can try and run ‘xenstore-control check’ it will also give some b/s answer kind of suggesting all is well, it’s not, check /var/log/messages and you’ll see stuff like:

xenstored: corruption detected by connection 0: \
err No such file or directory: Write  failed
xenstored: clean_store: '/local/domain/0/backend/vbd/16/51712/sectors' is orphaned!

At this point you’re pretty much screwed, try and reboot and xend won’t even run, no VM’s will start.
Fixing it is pretty easy in the end once you’ve done tons of Googling and found the 2 year old bug in the Xen bugtracker about this exact problem – complete with the xen guys trying to close it in a routine ‘cleanup of tickets’ rather than actually fixing the bug.
First, shut down all things Xen, if you can even boot from a non Xen kernel. Once you’re sure its all down just delete /var/lib/xenstored/tdb* and reboot, it should all be fine after that.
You must be sure you don’t have xenstored running while doing this, else it will write its in-memory corrupted state back to disk when you reboot and it will look like your fix didn’t work.