Sunday, November 4, 2012

Introducing PGBrowser

I finally got tired of complaining about how php doesn't have a mechanize-style scraping library that does forms and cookies, and decided to make one. There's not too many bells and whistles (yet) but I did include support for doPostBack asp actions.

To test it I used this classic example from the scraperwiki blog.

require 'pgbrowser/pgbrowser.php';

$b = new PGBrowser();
$page = $b->get('http://data.fingal.ie/ViewDataSets/');

while($nextLink = $page->at('//a[@id="lnkNext"][@href]')){
  echo "I found another page!\n";
  $page = $page->form()->doPostBack($nextLink->getAttribute('href'));
}

I expect this to really take the pain out of scraping forms with php from now on.
View the project or download the source.

No comments:

Post a Comment