Sunday, November 4, 2012

Introducing PGBrowser

I finally got tired of complaining about how php doesn't have a mechanize-style scraping library that does forms and cookies, and decided to make one. There's not too many bells and whistles (yet) but I did include support for doPostBack asp actions.

To test it I used this classic example from the scraperwiki blog.

require 'pgbrowser/pgbrowser.php';

$b = new PGBrowser();
$page = $b->get('http://data.fingal.ie/ViewDataSets/');

while($nextLink = $page->at('//a[@id="lnkNext"][@href]')){
  echo "I found another page!\n";
  $page = $page->form()->doPostBack($nextLink->getAttribute('href'));
}

I expect this to really take the pain out of scraping forms with php from now on.
View the project or download the source.

2 comments:

  1. hi, i dont know if you still are up there after three years. Nyahow is worth to try.
    i just want to get the hcp result from this page:

    $data = array(
    'name' => 'NAME',
    'surname1' => 'SURNAME1',
    'surname2' => 'SURNAME2'
    );

    $url ="www.rfegolf.es/PaginasServicios/ServicioHandicap.aspx?HNom=".$data['name']."&HAp1=".$data['surname1']."&HAp2=".$data['surname2'];


    i used your example and library which is quite handy:

    $b = new PGBrowser();
    $page = $b->get($url);

    while($nextLink = $page->at('//a[contains((@id,"LinkButton1")]')){
    echo "I found another page!\n";
    $page = $page->form()->doPostBack($nextLink->getAttribute('href'));
    }


    but doe snot work.

    It seems in:

    public function search($query, $dom = null){
    if($this->is_xpath($query))
    return $dom ? $this->xpath->query($query, $dom) : $this->xpath->query($query);

    if cant found $query in $this->is_xpath but it cannot in $this->xpath->query

    any help is appreciated. Thanks a lot.

    ReplyDelete