Even though my book focuses on the use of formal APIs for mashups, I'd like to provide guidance on screen-scraping and other forms of reverse engineering to my readers. There's plenty of mashup work that can be done even if you confine yourself to using only formal APIs. Sometimes, it's handy or even necessary to supplement your use of formal APIs with other ways to get at the data, functionality, or user-interface elements that you want to recombine or mashup.
Some inter-related areas to cover (or at least to make reference to):
- screen-scraping (Screen scraping – Wikipedia, the free encyclopedia and Web scraping – Wikipedia, the free encyclopedia)
- spidering (e.g., Amazon.com: Spidering Hacks: Books: Kevin Hemenway,Tara Calishain)
- parsing of HTML with libraries such as Beautiful Soup.
Some issues I want to address:
- In the book, we seek to exploit as many of the structured information available designed for consumption by programs as we can before we move on to interpreting output meant primarily for human viewing. Screen-scraping brings up a lot of issues, technical and social, that we can get back to once you learn how to use APIs.
- legal issues, terms of use — see Web scraping – Wikipedia, the free encyclopedia
Some references:
Post a Comment