Fighting email harvesters and other unfriendlies.
Since I put up this site I have been paying attention to my log files to see how it gets accessed. One of my main motivations for putting up a personal site is not to publish content or personal ideas etc but to study the blogging world, how it communicates and how information flows.
Obviously RSS [1, 2] and other XML technologies are the underlying technology that enables interesting services such as Technorati, Feedster, Blogosphere, Geoblog, Blogshares and many more and a study of this is essential. I have been looking for the RSS book for a while and might have to resort to ordering from Amazon.
There are however a lot more to a website than a XML file. The net is constantly being trawled by unwelcome guests these range from Email address harvesters, services that “monitor” your server, badly behaved search engine crawlers and bad people like the RIAA.
Here I present some strategies for combating these services from simply asking the well behaved ones to go away by using a robots.txt and by forcing the bad ones to go away by using mod_rewrite and other such methods.