<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mashup Guide &#187; screen scraping</title>
	<atom:link href="http://blog.mashupguide.net/category/screen-scraping/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.mashupguide.net</link>
	<description>A blog about Raymond Yee&#039;s Book Pro Web 2.0 Mashups: Remixing Data and Web Services</description>
	<lastBuildDate>Wed, 23 Feb 2011 13:35:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<image>
  <link>http://blog.mashupguide.net</link>
  <url>http://blog.mashupguide.net/favicon.ico</url>
  <title>Mashup Guide</title>
</image>
		<item>
		<title>Decay in the Amazon APIs</title>
		<link>http://blog.mashupguide.net/2011/02/23/decay-in-the-amazon-apis/</link>
		<comments>http://blog.mashupguide.net/2011/02/23/decay-in-the-amazon-apis/#comments</comments>
		<pubDate>Wed, 23 Feb 2011 13:35:50 +0000</pubDate>
		<dc:creator>raymondyee</dc:creator>
				<category><![CDATA[Amazon]]></category>
		<category><![CDATA[Amazon EC2]]></category>
		<category><![CDATA[Chapter 17]]></category>
		<category><![CDATA[screen scraping]]></category>

		<guid isPermaLink="false">http://blog.mashupguide.net/?p=315</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Decay+in+the+Amazon+APIs&amp;rft.aulast=Yee&amp;rft.aufirst=Raymond&amp;rft.subject=Amazon&amp;rft.subject=Amazon+EC2&amp;rft.subject=Chapter+17&amp;rft.subject=screen+scraping&amp;rft.source=Mashup+Guide&amp;rft.date=2011-02-23&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://blog.mashupguide.net/2011/02/23/decay-in-the-amazon-apis/&amp;rft.language=English"></span>
In Chapter 17 of Pro Web 2.0 Mashups, I created a mashup of an Amazon Wishlist and Google Spreadsheets. When I returned to examine my code last night, I learned that it no longer worked.  Why?  First, the  Amazon Ecommerce API morphed into the Amazon Product Advertising API; I was puzzled why the API wasn't [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Decay+in+the+Amazon+APIs&amp;rft.aulast=Yee&amp;rft.aufirst=Raymond&amp;rft.subject=Amazon&amp;rft.subject=Amazon+EC2&amp;rft.subject=Chapter+17&amp;rft.subject=screen+scraping&amp;rft.source=Mashup+Guide&amp;rft.date=2011-02-23&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://blog.mashupguide.net/2011/02/23/decay-in-the-amazon-apis/&amp;rft.language=English"></span>
<p>In <a href="http://mashupguide.net/1.0/html/ch17.xhtml">Chapter 17</a> of <em>Pro Web 2.0 Mashups</em>, I created a <a href="http://mashupguide.net/1.0/html/ch17s08.xhtml#d0e29460">mashup of an Amazon Wishlist and Google Spreadsheets</a>. When I returned to examine my code last night, I learned that it no longer worked.  Why?  First, the  <a href="http://www.programmableweb.com/api/amazon-ecommerce">Amazon Ecommerce API</a> morphed into the<a href="https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html"> Amazon Product Advertising API</a>; I was puzzled why the <a href="http://aws.amazon.com/">API wasn't listed where I expected it to be</a>.  Unfortunately, Amazon, in its infinite and inscrutable wisdom,  also decided to kill the <code>ListLookup</code> operation, the one call that I depended on to retrieve the content of <a href="http://www.amazon.com/gp/registry/wishlist/1U5EXVPVS3WP5">my Amazon wishlist</a>.  (I'm not alone in having <a href="https://forums.aws.amazon.com/thread.jspa?threadID=53342">broken applications</a> because of this change.)</p>
<p>So what to do now?  Interestingly enough, someone just announced a <a href="https://forums.aws.amazon.com/thread.jspa?threadID=54338&amp;tstart=0">JSON feed service for a given wishlist</a>, for example, <a href="http://gifterate.appspot.com/wishlist/BUWBWH9K2H77">Jeff Bezos' wishlist</a> and <a href="http://gifterate.appspot.com/wishlist/1U5EXVPVS3WP5">mine</a> (in JSON).  I hope it stays around.  How does it work given the demise of the <code>ListLookup</code> operation?  My guess is that some sort of screen-scraping is going on.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mashupguide.net/2011/02/23/decay-in-the-amazon-apis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Screen-scraping references</title>
		<link>http://blog.mashupguide.net/2007/05/24/screen-scraping-references/</link>
		<comments>http://blog.mashupguide.net/2007/05/24/screen-scraping-references/#comments</comments>
		<pubDate>Thu, 24 May 2007 14:15:22 +0000</pubDate>
		<dc:creator>raymond.yee</dc:creator>
				<category><![CDATA[Chapter 02]]></category>
		<category><![CDATA[screen scraping]]></category>

		<guid isPermaLink="false">http://blog.mashupguide.net/2007/05/24/screen-scraping-references/</guid>
		<description><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Screen-scraping+references&amp;rft.aulast=Yee&amp;rft.aufirst=Raymond&amp;rft.subject=Chapter+02&amp;rft.subject=screen+scraping&amp;rft.source=Mashup+Guide&amp;rft.date=2007-05-24&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://blog.mashupguide.net/2007/05/24/screen-scraping-references/&amp;rft.language=English"></span>
Even though my book focuses on the use of formal APIs for mashups, I'd like to provide guidance on screen-scraping and other forms of reverse engineering to my readers. There's plenty of mashup work that can be done even if you confine yourself to using only formal APIs. Sometimes, it's handy or even necessary to [...]]]></description>
			<content:encoded><![CDATA[	
	<span class="Z3988" title="ctx_ver=Z39.88-2004&amp;rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&amp;rfr_id=info%3Asid%2Focoins.info%3Agenerator&amp;rft.title=Screen-scraping+references&amp;rft.aulast=Yee&amp;rft.aufirst=Raymond&amp;rft.subject=Chapter+02&amp;rft.subject=screen+scraping&amp;rft.source=Mashup+Guide&amp;rft.date=2007-05-24&amp;rft.type=blogPost&amp;rft.format=text&amp;rft.identifier=http://blog.mashupguide.net/2007/05/24/screen-scraping-references/&amp;rft.language=English"></span>
<p>  Even though my book focuses on the use of formal APIs for mashups, I'd  like to provide guidance on screen-scraping and other forms of reverse  engineering to my readers. There's plenty of mashup work that can be  done even if you confine yourself to using only formal APIs. Sometimes,  it's handy or even necessary to supplement your use of formal APIs with  other ways to get at the data, functionality, or user-interface  elements that you want to recombine or mashup.</p>
<p>Some inter-related areas to cover (or at least to make reference to):</p>
<ul>
<li>   screen-scraping (<a href="http://en.wikipedia.org/wiki/Screen_scraping" class="external">Screen scraping - Wikipedia, the free encyclopedia</a> and <a href="http://en.wikipedia.org/wiki/Web_scraping" class="external">Web scraping - Wikipedia, the free encyclopedia</a>)</li>
<li>   spidering (e.g., <a href="http://www.amazon.com/Spidering-Hacks-Kevin-Hemenway/dp/0596005776" class="external">Amazon.com: Spidering Hacks: Books: Kevin Hemenway,Tara Calishain</a>)</li>
<li>   parsing of HTML with libraries such as <a href="http://www.crummy.com/software/BeautifulSoup/" class="external">Beautiful Soup</a>.</li>
</ul>
<p>Some issues I want to address:</p>
<ul>
<li> In the book, we seek to exploit as many of the structured  information available designed for consumption by programs as we can  before we move on to interpreting output meant primarily for human  viewing. Screen-scraping brings up a lot of issues, technical and  social, that we can get back to once you learn how to use APIs.</li>
<li>   legal issues, terms of use -- see <a href="http://en.wikipedia.org/wiki/Web_scraping#Legal_issues" class="external">Web scraping - Wikipedia, the free encyclopedia</a></li>
</ul>
<p>Some references:</p>
<ul>
<li>   <a href="http://www.nostarch.com/webbots.htm" class="external">No Starch Press: Webbots, Spiders, and Screen Scrapers</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.mashupguide.net/2007/05/24/screen-scraping-references/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

