Posts

subscribe via RSS

  • Fun with nokogiri: Screen-scraping ikea.com

    I recently discovered that ikea is really scraping-friendly – they have their categories and products belonging to a category all linked very clean, and the product pages themselves include, as json, the complete product data necessary to display a product somewhere else ( say, on an iPad ). They also feature the assembly instructions linked as pdf and more fun stuff. Just open any product page and browse the source, you will discover a field named "jProductData" which is just what you want.

    So i decided it's time for me to try nokogiri, a beautiful and fast framework for processing urls and searching through the HTML. I managed to write a scraper in only very few lines that actually works. Since the json is embedded in regular JavaScript, i had to use rkelly to parse the javascript part and extract the right data.

    For the datamodel, I used ohm with redis, which takes about no time to setup and works as advertised.

    If you are curious, the source is available at github.

    Continue Reading...

  • Fun with CFHTTPMessage, Headers and the HTTP Standard

    TL:DR; CFHTTPMessage combines duplicate headers to a single string value containing all headers comma seperated. Thank you.

    I am currently developing a Proxy app that visualizes requests and gives you the ability to plug-and-play like filter and modify requests, which makes it a quite powerful tool for debugging web applications, mobile apps and so on.

    Of course, this tool is built using Cocoa and Core Foundation. Interesting enough, Core Foundation brings a Type called "CFHTTPMessage", which handles a lot of low-level message parsing and processing, and is really handy and quite easy to use.

    There is one drawback: CFHTTPMessage is not designed to handle both the order in which the headers arrived in a message and duplicate header fields. The first point is somewhat irrelevant, as the HTTP1.1 standard points out that servers are not supposed to depend on any order in the clients – it's really only a minor issue. The latter point is complicated. HTTP relies on duplicate headers a lot, consider this HTTP request ( Real life!! )


    POST /wp-admin/admin-ajax.php HTTP/1.1
    X-Requested-With: XMLHttpRequest
    Accept-Charset: ISO-8859-1
    Accept-Charset: utf-8;q=0.7
    Accept-Charset: *;q=0.3
    Accept-Encoding: gzip,deflate,sdch
    Content-Type: application/x-www-form-urlencoded
    Origin: http://momo.brauchtman.net
    User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.112 Safari/535.1
    Cookie: word
    Cookie: wordpress_xxxxxxxx
    Cookie: wordpress_logged_in_xxxxxx; httponly; expires=Tue
    Referer: /wp-admin/post-new.php
    Host: momo.brauchtman.net
    Accept-Language: de-DE
    Accept-Language: de;q=0.8
    Accept-Language: en-US;q=0.6
    Accept-Language: en;q=0.4
    Accept: */*
    Content-Length: 1160

    There are quite some duplicates here. So the clever guys at Cupertino just forgot to handle duplicate headers at all? No, not really. But they just didn't document what they are doing with it, which isn't that clever at all, but fortunately, the sources are available here. So after a bit of digging, I found out that duplicates are simply appended to a former header with same key, which sucks – the delimiter used is a simple comma, which is quite regularly found in header values, which makes splitting an art for itself.

    Continue Reading...

  • Talking about code debt..

    A mess is not a technical debt. A mess is just a mess.

    You should really read this article clarifying that, well, a mess has nothing to do with code debt.

    Continue Reading...

  • Apples Network Link Conditioner only built for certain CPUs..

    Xcode 4.1 comes with a nice tool called Network Link Conditioner, which is basically a GUI to configure certain settings of your network to make it behave like a "Lossy Edge Connection" or something similar. Why would anyone want to do this? In a nutshell, a bunch of bugs in iOS apps are only visible in such environments and hard to catch in office settings with perfect connections. So, to work around that perfect connection, this tool has been invented

    Apple-like, it looks great and ... doesn't work ( which is not that Apple-like at all ). At least on my brand-new Early 2011 MBP, it just crashes. So I got curious and ask some friends to double-check it, and bingo, they had similar issues on the comparable i7-equipped machines.

    Back at home, I fired the tool on my ancient 2009 iMac, and voila, it works just fine. So.. what the hell is going on there?

    [update] I tested the network link conditioner on my new i5-iMac, not a problem there.

    [update] It seems Nick Pannuto found a solution for the problem, which you can find filed under the Apple Radar with the id 11013262. Thanks!

    Continue Reading...

  • Finally, a fine Go Write-Up/Comparison

    That is worth linking to just for this highlight

    Look at your average web application's lib directory. I wouldn't be suprised to see a hundred JAR files there, all just for a simple search database or shopping site which even PHP would do in 10k LOC. And should you be adventurous, try to build it yourself. A world of fun! Setting up a Linux system from scratch, without any step-by-step instructions, is easier. Trust me, I have done both. Be sure you know how to spell "dependency hell" forwards and backwards before you begin.

    Here you go: Why all C-like languages except one suck.. ( CC, btw ).

    Continue Reading...

Newer Posts Older Posts