The goal's of the project are:
- Learn how to scrape ecommerce data to Magento's configurable product import format
- Get some sexy Magento sample store data for use in future testing and mock-ups
Let's go over some of the code. First we instantiate our CSV object (yes, it's a global variable. I'm okay with that.) Then we load the listings page and iterate through each listing. Pretty self explanatory so far.
$csv = new CSV('products.csv', $fields, ",", null); // no utf-8 BOM // and start scraping $url = 'http://www.spicylingerie.com/'; $page = $browser->get($url); foreach($page->search('//div[@class="fp-pro-name"]/a') as $a){ scrape($a); echo '.'; }
So now we pass the
a elements that have the details page urls to our scrape function. Because we earlier did $browser->convertUrls = true we no longer need to worry about converting our relative hrefs to absolute urls. The library took care of that for us.
Now we get the page for the link and start building our $item
array which we will pass to the
save() function. Other than the ugly expression for description this was easy.
$url = $a->getAttribute('href');
$page = $browser->get($url);
$item = array();
$item['name'] = trim($a->nodeValue);
$item['description'] = $item['short_description'] = trim($page->at('//div[@class="pro-det-head"]/h4/text()[normalize-space()][position()=last()]')->nodeValue);
if(!preg_match('/Sale price: \$(\d+\.\d{2})/', $page->body, $m)) die('missing price!');
$item['price'] = $m[1];
if(!preg_match('/Style# : ([\w-]+)/', $page->body, $m)) die('missing sku!');
$item['sku'] = $m[1];
Next we save the image, for later import/upload - identify the categories we care about - and construct our items. The options need to look like:
$options = array(
array('size' => '12', 'color' => 'purple'),
array('size' => '10', 'color' => 'yellow')
);
Where the array keys are the attributes that you have made configurable product attributes (Global, Dropdown, Is used in Configurable Products)
That's all there is to it. I won't go into the save function because hopefully that one will just work for you.
hey nice source for us,thanks for sharing nice thoughts of the scrape a website in a simple way and this code upload in this blog and this blog very informative and you have a informative blog.I definitely bookmark this blog.
ReplyDeleteWeb Scraping Software
Thank you for sharing. For people with some technical knowledge it will be very valuable, but for ordinary store owners who do not know where to paste a piece of code or change some lines, here is also article that might be helpful to perform Magento configurable products import without coding -
ReplyDeletehttp://blog.mag-manager.com/2013/08/import-magento-configurable-products.html
Hello,thanks for the article. I am a beginner magento programmer and dont understand where this code goes so can you give some help regarding that and also I cannot download the source, the link leads me to an xml file that says access denied
ReplyDeleteThat link should be fixed now.
DeleteDoes this still work? Given it is 3 years old and there have been updates to Megento and I can't just get it to work I figured I'd ask so I can figure out if I'm doing something wrong or if I need to focus on trying to get it to work.
ReplyDeleteDoes this still work? Given it is 3 years old and there have been updates to Megento and I can't just get it to work I figured I'd ask so I can figure out if I'm doing something wrong or if I need to focus on trying to get it to work.
ReplyDeleteHi Carlos,
DeleteThe code works, but you need to customize it for whatever website you're scraping.
Blank page when running the code. Error reporting is on. Products.csv is made but contains only the headers. Any chance you could update the code. Thanks!
ReplyDelete