Here's a snippet of what I've been trying to do. This probably isn't the best approach, but I can't quite figure out how to pull a child of a resulting element, PHP is forever returning an error when I try to use firstchild.
//start our result counter
$i = 0;
//try setting higher than 1000
while ($i < 1000)
{
//show status so we don't get lost
echo "Currently extracting data from records ".$i." through ".($i + 10)."...";
$raw = new domdocument;
$clean = new domdocument;
//special to Google
$url = 'http://maps.google.com/maps?f=l&hl=en&q='.$what.'&near='.$where.'&view=text&start='.$i."&radius=".$radius;
@$raw->loadHTMLFile($url);
$HTML = $raw->saveHTML();
@$clean->loadHTML($HTML);
$xpath = new domxpath($clean);
$xNodes = $clean->getElementsByTagName('td');
foreach ($xNodes as $xNode)
{
if ($xNode->getAttribute('valign') == "top")
{
//echo $xNode->nodeValue."\n";
$output .= $xNode->nodeValue."";
}
}
echo "...done\n";
//add to our counter
//10 results per page, so we add 10
$i = $i + 10;
}
//fix bugged double comma, can't figure out where this is happening
$output = preg_replace("/,,/",",",$output);
$somecontent = make_csv(strip_non_ascii($output));
echo $somecontent;
There's a bit of extra and unrelated code here, but that's the basic process I'm using.
No comments:
Post a Comment