Saturday, November 3, 2012

Choosing a Php HTML parser

Today we'll be comparing some HTML parsing libraries for php and picking a winner. The ideal candidate will support css3 selectors, be DOM based easy to use. I've rounded up 3 candidates:
  • Simple HTML Dom - A favorite with most php users from what I've seen.
  • phpQuery - An interesting project, claiming to be a jQuery port.
  • Ganon - A new challenger.

The Setup


Mitt Romney
Barack Obama

The Test

# simple html dom
require('simple_html_dom.php');
$doc = str_get_html($html);
echo $doc->find('label ~ span[id$=2]', 0)->innerText();

# phpquery
require('phpQuery.php');
$doc = phpQuery::newDocumentHTML($html);
phpQuery::selectDocument($doc);
echo pq('label ~ span[id$=2]')->text();

# ganon
require('ganon.php');
$doc = str_get_dom($html);
echo $doc('label ~ span[id$=2]', 0)->getInnerText();

The results

Simple html dom was the only one to fail the css3 test. This is clearly a deal breaker, without support for simple css like sibling selector (~), there's just no way I could justify looking any further at this library.

phpQuery passed the css3 test, its selection syntax feels the cleanest and it's Dom based, unlike the other two. I also like that it's PEAR installable.

Ganon also passed the css3 test, and it actually outperformed phpQuery for 10K iterations (7 seconds to phpQuery's 8). Definitely a strong contender.

The winner

phpQuery - Full css3 support, DOM based, PEAR installable, and a clean syntax edges out Ganon for the top spot this time.

11 comments:

  1. Study ExcelR PMP Certification where you get a great experience and better knowledge.

    PMP Certification


    Location 1:

    ExcelR - Data Science, Data Analytics Course Training in Bangalore 49, 1st Cross, 27th Main BTM Layout stage 1 Behind Tata Motors Bengaluru, Karnataka 560068 Phone: 096321 56744 Hours: Sunday - Saturday 7AM - 11PM.

    Comments

    ReplyDelete
  2. This is an awesome motivating article.I am practically satisfied with your great work.You put truly extremely supportive data. Keep it up. Continue blogging. Hoping to perusing your next post
    data scientist training in hyderabad

    ReplyDelete
  3. All of these posts were incredibly perfect. It would be great if you’ll post more updates.
    data scientist training and placement in hyderabad

    ReplyDelete
  4. Such a helpful article. Interesting to peruse this article.I might want to thank you for the endeavors you had made for composing this wonderful article.
    data science coaching in hyderabad

    ReplyDelete
  5. Hi buddies, it is a great written piece entirely defined, continuing the good work constantly.
    data science coaching in hyderabad

    ReplyDelete