Have you ever tried to search for reviews on the internet for any kind of gadget? I typically search for 'whatever review' and usually my search results are a mess of fake review sites, or the ones that rely solely on rantings by users.
While trying various ways to filter these from my searches I turned to PHP, read on for some information about using the Google API to filter your searches.
Initially I tried to limit my searches by adding "-whatever" to search terms, but this ran into the 10 word search term limit quite quickly. There are other methods of getting the most out of 10 words such as carefull use of the "*" but these are just not goog enough.
To get going with the API you will need your own developer key. They are free but you need to register at http://api.google.com/. Google provides libraries for Java and .Net but you can use any language with SOAP bindings.
I am using PHP and the NuSOAP library. I will not go into the details of SOAP here, you can find a good tutorial on the Zend website.
It is very simple to query Google, the following snippet will do the work. For a full explenation of the various parameters you can see the API Reference.
$ggle = new soapClient('http://api.google.com/search/beta2');
First we create a soapClient instance that points to the API.
$params = array('key' => $gglKey,
'q' => $query,
'start' => 0,
'maxResults' => 10,
'filter' => false,
'restrict' => '',
'safeSearch' => false,
'lr' => 'en'
'ie' => '',
'oe' => '');
This is a array that contains our search parameters, the most important ones here are the $gglKey and the $query, you need to assign these values in your code.
$result = $ggle->call("doGoogleSearch", $params, "urn:GoogleSearch", "urn:GoogleSearch");
This is where it all happens, you are making a call to the Google webservice and the result will end up in $result.
At this point you should have an array in $result that is structured and can either contain warnings, errors or actual search results.
For a full list of entries in the result refer to the API Reference. A few simple ones to look at are:
$result[estimatedTotalResultsCount] $result[searchTime] $result[searchComments] $result[searchTips]
The variable names are pretty self explanatory, if in doubt read the refernece. Walking through the actual results is slightly more complicated, the code below will do it for you.
if (is_array($result['resultElements'])) {
foreach($result['resultElements'] as $r) {
print ("<a href='" . $r['URL'] . "'>" . $r['title'] . "</A>\n");
print ($r['snippet'] . "(" . $r['cachedSize'] . ")\n");
print ("<'BR>" . "<A HREF ='" . $r['URL'] . "'>" . $r['URL'] . "</A>\n");
print ("<p>\n");
}
}
This is the basics of it, to remove the junk URL's from your search simply keep a list of sites to filter and do a test against $result[URL] before showing the result.
One of the draw-backs of the Google API is that it only returns 10 results maximum. With filtering searches it may happen that you filter out too much of the results to get any use out of the results. To get around this simply build your search into a loop - do multiple queries until you have enough results.
I have a very simple implimentation of the looping concept that you can play with here. For a good example search type in "apple ipod review" into the box, leave the ignore list as default and hit search. You should see about 7 pages ignored which means it had to do 3 API calls to show 20 hits. Compare that to a normal Google Query for the same term. Much better :)

I'm very curious how you went about doing the looping sequence to do multiple queries. Your looping concept link is broken: http://www.devco.net/~rip/google2/
I'm getting a "404 Not Found" for that URL.