I recently discovered that ikea is really scraping-friendly – they have their categories and products belonging to a category all linked very clean, and the product pages themselves include, as json, the complete product data necessary to display a product somewhere else ( say, on an iPad ). They also feature the assembly instructions linked as pdf and more fun stuff. Just open any product page and browse the source, you will discover a field named "jProductData" which is just what you want.

So i decided it's time for me to try nokogiri, a beautiful and fast framework for processing urls and searching through the HTML. I managed to write a scraper in only very few lines that actually works. Since the json is embedded in regular JavaScript, i had to use rkelly to parse the javascript part and extract the right data.

For the datamodel, I used ohm with redis, which takes about no time to setup and works as advertised.

If you are curious, the source is available at github.