I stumbled across Dapper today, a pretty phenominal UI & utility that can create sophisticated screen scrapes from any website. I have two reactions:
1) WOW! These guys did a great job of making it easy to scrape! Fantastic interface, boiled down to very simple tasks!
2) Jesus Christ! Who in their right mind would try to build anything off an unknown screen scraper?
The latter is really where I finished thinking about it. It seems that in our Web-2.0 world, people think that somehow hacks can sustain. Hacks can’t. Hacks are hacks. And fundamentally, screen scraping is a hack. One small div-hierarchy change and the whole thing breaks. One slight UI one-off, and the data provided is bogus. I usually try to avoid being a purist, but screen scraping is just one approach I can’t support.
Conclusion: Dapper represents a lot of very nice work behind what is ultimately a futile effort.
One thought on “Screen Scraping Made Easy”
Screen scraping is futile? Come on, then what about all the thousands of companies that use it, including big ones like Oracle, Time Warner, and Seagate? Not to mention big universities like Stanford, Columbia, and Berkeley?
Want to get all the data from a few public online databases? Get product information from a shopping site? Compare insurance or loans from different sources? Archive web pages? Automate complex web tasks involving multiple forms? So on and so forth? Screen scraping.
Sure, it’s not very elegant in most cases. Web services would be better many times, but the fact is that many times screen scraping is the best solution for the problem. Futile? Come on, now that’s just foolish.
Some educational reading: