Friday, November 2, 2012

Do it the right way with css

Looking around at other people's scraping projects I still see a lot of peple doing it the wrong way. And I'm not talking about parsing html with regex this time. I'm talking about using xpath expressions instead of css selectors. Let's take a look:
Barack Obama

There's 2 ways for me to get at the data I want:
# using xpath
doc.at('//label[@for="name"]/following-sibling::span').text
# using css
doc.at('label[for=name] + span').text

So which one is better? Unless you're a machine the answer is always css. Because css is a human-friendly way to select the data you want, your code will be easier to maintain than the hot mess created with xpath expressions. There's a good reason why web designers have been using it for so long.

No comments:

Post a Comment