Easy transparent PHP input filtering
I have been working on a site that will have potentially quite a few random third parties accessing it and inserting data into a MySQL database. I am thus quite keen on a good solid input filtering method for PHP to prevent things like XSS and SQL Injection.
There are several options out there, of the ones I found Inspekt is about the closest match to my way of working, it essentially imports $_GET, $_POST etc and wraps them in an object which you then use to access variables in a filtered method. It by default then NULLs the original variables so you cannot access them anymore, if backward compatibility is desired it can leave the originals untouched. Not optimal as this gives an unsafe by default result if you want to maintain backwards compatibility.
Another problem with this approach is that it is a lot of work to change existing code, which you might thing is just par for the course but I was convinced I need to find a way to do so more transparently.
I could for example at program start just walk through the $_GET etc arrays and apply some filtering to them using addslashes() and such but this is very restrictive, what if you need to get it unfiltered, especially if you perform destructive filtering? How would you go about filtering some variables for phone numbers, some for email addresses etc?
The answer lies in PHP's new Standard Programming Library, specifically in its ArrayAccess interface, which if you don't care for older versions of PHP is the way to go.
The basic advantage of this is that you can expose properties of your objects by using array notation rather than object notation:
$result = $foo->getBar();
compared to:
$result = $foo['bar'];
Both statements give access to the private variable $bar just using different syntax. So using this technique we can write a transparent filter for input variables, the basic usage of the final library would be something along these lines:
$_GET = new ArrayArmor($_GET);
print ("Filtered Variable:$_GET[test]<br>\n");
print ("Unfiltered Variable: " . $_GET->getRaw("test"));
A possible output from this script can be seen below:
Filtered Variable: 1234\';delete from accounts;--
Unfiltered Variable: 1234';delete from accounts;--
You can see that the default behavior is to protect the input but even for destructive filtering methods the raw unfiltered data would be available if the programmer needed it. You can provide all sorts of extra methods to validate emails, post codes and such.
A quick and dirty example of a class that provides this kind of filtering can be seen below:
<?
class ArrayArmor Implements ArrayAccess {
private $original;
function __construct (&$variable) {
$this->original = $variable;
}
function offsetExists($offset) {
return isset($this->original[$offset]);
}
function offsetGet($offset) {
return addslashes($this->original[$offset]);
}
function offsetSet($offset, $value) {
}
function offsetUnset($offset) {
}
function getRaw($offset) {
return($this->original[$offset]);
}
}
?>
So that's it, a simple method that is very easy to put into existing code. This is clearly not a full example as addslashes() is hardly the be-all and end-all of input protection, but if you build on this you can get a very easy to use and flexible input filter that is safe by default.
Nasty PHP Authentication Handling
Sometimes you come across things that just make you wonder what is going on in peoples minds.
For years everyone who wrote applications compatible with the standard HTTP Authentication method has used the REMOTE_USER server variable as set by Apache to check the username that was logged in by the webserver, this has worked well for everyone, CGI's and all would just grab it there and everyone would be happy.
Along comes PHP and they make great big mess of it, PHP suggests that we use $_SERVER['PHP_AUTH_USER'] instead, and they give some good reasons for this too, except they have severely crippled this for all but Basic and Digest authentication, the following code from main/main.c
if (auth && auth[0] != '\0' && strncmp(auth, "Basic ", 6) == 0) {
char *pass;
char *user;user = php_base64_decode(auth + 6, strlen(auth) - 6, NULL);
if (user) {
pass = strchr(user, ':');
if (pass) {
*pass++ = '\0';
SG(request_info).auth_user = user;
SG(request_info).auth_password = estrdup(pass);
ret = 0;
} else {
efree(user);
}
}
}
As you can see above, they only import the user and pass from Apache if the AuthType is Basic, this makes no sense at all. Why not just check with Apache, if it set the username then import it? Surely Apache know if a user has authenticated? Ditto for password. It is so broken in fact that PHP in CGI mode also doesn't work since those headers don't get set for that either, countless comments and nasty hacks can be found in the PHP user contributed notes about this, but it is all just sillyness.
The reason this is annoying me is that I have written a Single Singon system in PHP, you can host a identity server on any domain and hook any site in any other domain into the SSO system, its a bit like TypeKey
Of course it's nice to have a easy to use SSO system in PHP but what is the point if you can't make legacy apps like Nagios, Cacti, RT etc play along with the SSO? So to solve this I extended Apache::AuthCookie with a new mod_perl module that plugs into Apache and does authentication using my SSO and a small bit of glue that you put on your RT/Cacti/Nagios box.
All's great, I have SSO to Nagios, RT and countless other things working flawlessly, except of course Cacti because it's written along the lines of the PHP manual, uses PHP_AUTH_USER instead of REMOTE_USER and so my new fancy AuthType in Apache does not work with Cacti. As it turns out its a quick 2 liner fix in the Cacti code but you would think PHP would be a bit more generic in this regard since as it stands now I think a lot of people who want to do SSO using hardware tokens and such have issues with PHP being silly.
Macs and MS Keyboards
Previously I posted about my iMac 17" that I got, that was January 2006 well I have now upgraded to a bigger mac, this time a 24" iMac Core 2 Duo Extreme with 2GB RAM.

I still have the 17" and will keep it, it's replacing my really old AMD Linux Desktop on my desk but the 17" has been getting a bit long in the tooth with Parallels, MS Office, and all sorts of other stuff that I have been doing on it as I am now working full time from home.
Previously I bought at the bottom of the spectrum and the machine lasted well, but I was hoping to keep it as my primary machine for at least 3 years. I guess my needs have increased though so this time I bought at the top end of the range and will upgrade it to 4GB RAM soon, just not from Apple as buying direct from Crucial will save me about 200 pounds.
What immediately annoyed me - to the point of cramps in my hands and general unhappyness - were this amazingly crap thinline keyboard that comes with the machines. I soon started looking at other options and found no 3rd party Mac keyboards but did notice that Microsoft keyboards have a utility to configure the various additional keys etc so I took the plunge and got a MS Natural Ergonomic 400 keyboard to replace my very old MS Office keyboard.

I am extremely pleased with this keyboard, everything works as it should. The configuration utility lets you configure every key on the keyboard and everything is mapped correctly as expected. Even the function keys like 'new' works by sending 'apple key-n' etc right out of the box, this is the case with all the MS keyboards on the market today so I can happily recommend any MS keyboard to mac users.
The iMac itself is lovely, I am really happy with it. Speed wise the Core 2 Duo Extreme chip has made a huge improvement, with Parallels running Windows the machine idles at about 2% while I have Firefox, Netnewswire, iTerm, several Terminal.app, Adium, Skype and all sorts of background stuff going, really cannot have asked for more from a desktop machine.
Detailed Apache Stats
Apache has its native mod_status status page that many people use to pull stats into tools such as Cacti and other RRDTool based stats packages. This works well but does not always provide enough details, questions such as these remain unanswered:
- How many of my requests are GET and how many are POST?
- How many 404 errors and 5xx errors do I get on my site as a whole and for script.php specifically?
- What is the average response time for the whole server, and for script.php?
- How many Closed, Keep Alive and Aborted connections do I have?
To
answer this I wrote a script that keeps a running track of your Apache
process, it has many fine grained controls that let you fine tune
exactly what to keep stats on. I got the initial idea from an old ONLamp article titled Profiling LAMP Applications with Apache's Blackbox Logs.
The
article proposes a custom log format that provides the equivelant to an
airplanes blackbox, a flight recorder that records more detail per
request than the usual common log formats do. I suggest you read the
article for background information. The article though stops short of a
full data parser so I wrote one for a client who kindly agreed that I
can opensource it.
Using
this and some glue in my Cacti I now have graphs showing a profile of
the requests I receive for the whole site, but as you are able to
apply fine grained controls to select what exactly you'll see, you could get per server overview stats and details for just a specific scripts performance and statuses:
The script creates on a regular interval a file that contains the performance data, the data is presented in variable=value data pairs, I will soon provide a Cacti and Nagios plugin to parse this output to ease integration into these tools.
The
performance data includes values such as:
- Amount of requests in total
- Total size
of requests separated by in and out bytes - Average response time
- Total processing time.
- Counts of connections in Close, Keep Alive and Aborted
states. - Counts for each valid HTTP Status code,
and aggregates for 1xx, 2xx, 3xx, 4xx and 5xx. - The amount of GET and
POST requests. - And detail for each and every unique request the server
serves.
See the Sample Stats
for a good example, variables are pretty self explanatory. To keep the
data set small and manageable 2 selectors exist, one to choose which
requests to keep details for and which to keep stats for. These can be
combined with standard Apache directives such as Location to provide very fine grained stats for all or a subset of your site.
You would need some glue to plug this into Cacti and Nagios, I will provide a script for this soon as I have time to write up some docs for it.
Install guide etc can be found on my Wiki there is also extensive Perdoc Docs in the script, the Wiki also have links to downloading the script, the latest is always available here
Passport
Today my UK passport finally arrived, I did have to go for an interview where they re-established that I am who I say I am.
The interview was quite interesting, when I went to write my Britishness Test there were only 4 or so other people at the test and I kind of looked at that as an indication that perhaps, as usual, there is more hype in the media related to immigration than is really needed.
They arrange the interview meetings in blocks of 45 minutes and only allow you in 10 minutes before your block start, so you can know who is in there are all people who are applying for their first passport. I went to the interview center in Elephant & Castle, its a fairly big facility with almost 30 interview cubicles, in my 45 minutes block there were about 80 or so people waiting for the interview.
This kind of brought it home a bit more that yes really there is a massive immigration problem, the interview centers are open 6 days a week and if they are usually anywhere near as busy as when I went - noon on a Friday - then I'd say the rate of immigration is totally unsustainable. It is easier to visualize the problem in a setting like this than to read some big figure in a news paper.
The interview itself was very professionally done, the woman who interviewed me were friendly, thorough and I think the whole thing was actually pretty enjoyable apart from the obvious inconvenience involved. They mostly asked me to confirm what I already filled in on my application forms but also some extra things like what is my Car Registration, what bank accounts I have and when they were opened. It was all simple stuff and went really quick.

