Mashup Guide : Screen-scraping references

Screen-scraping references

Even though my book focuses on the use of formal APIs for mashups, I'd like to provide guidance on screen-scraping and other forms of reverse engineering to my readers. There's plenty of mashup work that can be done even if you confine yourself to using only formal APIs. Sometimes, it's handy or even necessary to supplement your use of formal APIs with other ways to get at the data, functionality, or user-interface elements that you want to recombine or mashup.

Some inter-related areas to cover (or at least to make reference to):

screen-scraping (Screen scraping – Wikipedia, the free encyclopedia and Web scraping – Wikipedia, the free encyclopedia)
spidering (e.g., Amazon.com: Spidering Hacks: Books: Kevin Hemenway,Tara Calishain)
parsing of HTML with libraries such as Beautiful Soup.

Some issues I want to address:

In the book, we seek to exploit as many of the structured information available designed for consumption by programs as we can before we move on to interpreting output meant primarily for human viewing. Screen-scraping brings up a lot of issues, technical and social, that we can get back to once you learn how to use APIs.
legal issues, terms of use — see Web scraping – Wikipedia, the free encyclopedia

Some references:

No Starch Press: Webbots, Spiders, and Screen Scrapers

Posted by raymond.yee on Thursday, May 24, 2007, at 6:15 am. Filed under Chapter 02, screen scraping. Follow any responses to this post with its comments RSS feed. You can post a comment or trackback from your blog.

Mashup Guide

Screen-scraping references

Post a Comment

Home

Pages

Categories

Blogroll

RSS Feeds

Meta

Blog Search

Mashup Guide

Screen-scraping references

Post a Comment

Home

Pages

Categories

Tags

Blogroll

RSS Feeds

Meta

Blog Search