May 2007

Google Reader API?

I don't think that there is an official API for Google Reader although Niall Kennedy documened an unofficial Google Reader API a while back.

Chapter 05
Google
Google Reader
RSS
weblogging

Comments (0)

Permalink

Barnes and Noble and ISBN-13

The online Barnes and Noble store (barnesandnoble.com) uses ISBN-13 in the links to books. (e.g., RESTful Web Services) Amazon.com uses ISBN-10. Something to keep in mind to et LibraryLookup to work for Barnes and Noble.

Chapter 01: Learning from Specific Mashups
ISBN
LibraryLookup

Comments (0)

Permalink

Using technorati and other blog search engines

Even though I've been weblogging since March 2000, I have not worked systematically to cultivate a readership for my blogs. But now that I've left the comforts of my university staff position, I've become much more interested in developing an audience for my websites. Of course, one of the basic ways to develop a readership is to be a good reader of other people's work. We are involved in conversations on the web, after all.

One little technique to use to keep my blogs in the mix is to have my blogs ping technorati whenever I write a post on my wordpress blogs. Following http://technorati.com/developers/ping/wordpress.html, I figure that I should go to WP->Options->Writing->Update services to add http://rpc.technorati.com/rpc/ping . Update Services « WordPress Codex provides a list of ping notification services in addition to technorati, including the one for Google Blog Search -- which I've started to using in addition to technorati. I'd like to figure out which of the blog search engines are the best ones to use.

Chapter 05
technorati

Comments (0)

Permalink

Screen-scraping references

Even though my book focuses on the use of formal APIs for mashups, I'd like to provide guidance on screen-scraping and other forms of reverse engineering to my readers. There's plenty of mashup work that can be done even if you confine yourself to using only formal APIs. Sometimes, it's handy or even necessary to supplement your use of formal APIs with other ways to get at the data, functionality, or user-interface elements that you want to recombine or mashup.

Some inter-related areas to cover (or at least to make reference to):

Some issues I want to address:

  • In the book, we seek to exploit as many of the structured information available designed for consumption by programs as we can before we move on to interpreting output meant primarily for human viewing. Screen-scraping brings up a lot of issues, technical and social, that we can get back to once you learn how to use APIs.
  • legal issues, terms of use -- see Web scraping - Wikipedia, the free encyclopedia

Some references:

Chapter 02
screen scraping

Comments (0)

Permalink

Chapter 16: What services to cover?

I've been captivated by the potential of Amazon S3 (Amazon's "Simple Storage Service"), which is described in the following way:

    Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites.

With S3 (combined with Amazon EC2 -- the elastic computing cluster), I keep thinking that I have the raw ingredients for a relatively inexpensive supercomputer. Now what to do with that computing power -- that's the subject of another post.

At a basic level, Chapter 16 is meant to be a tutorial of Amazon S3 and rival/parallel/comparable services. What are other services that I'd like to cover (if I manage to have enough time to write up)? On the list of possible services to cover are:

First place to start: learn to program Amazon S3. One thing I'm reading is Amazon Web Services Developer Connection : Monster Muck Mashup - Mass Video Conversion Using AWS, which touches on not only S3 but also Amazon EC2 and the Simple Queue Service.

Amazon S3
Chapter 16
storage

Comments (0)

Permalink

Work to do for the second draft of Chapter 2

Chapter 2 analyzes Flickr for what makes it a mashup platform par excellence through which you can learn how to remix a specific application and exploit features that make it so remixable. The chapter compares and contrasts Flickr with other remixable platforms: del.icio.us, Google Maps, and amazon.com. On my plate is writing the second draft of Chapter 2. Besides correcting small scale errors, refining the prose of the chapter and giving it a jazzier and more accurate title, my focus is on providing more details about mashups that could actually be created from the features I write about. I write that "a goal of this chapter is to train you [readers] to deconstruct applications for their remix and mashup potential." While I do spell out in substantial detail the ways URLs are constructed and organized for Flickr, amazon.com, Google Maps, and del.icio.us, I need to describe how to generalize these ideas to other circumstances and suggest possible mashups that can be built.

Here are some other issues to work out:

URLs as little languages and the connection to REST

I spend a lot of effort in Chapter 2 on the notion of "URLs as little languages to understand and to speak." I think that it's easy for experienced programmers to these ideas about URLs (e.g., Hacking the URL) for granted. But I want to show the importance of being able to link to specific resources. For instance, LibraryLookup depends on being able to point to a book by constructing a URL based on an ISBN. If you can't easily link to a resource, you are going to be hard-pressed to reuse it, especially if there is not formal API. (Note: Some library catalogues have odd session-dependent cookies that make it difficult to forge such a URL to the book. You can sometimes manage to create a URL that will work (temporarily), through a multi-step screen-scraping -- in contrast to just dropping an ISBN into a URL.)

Having a simple URL to represent a specific resource means one of the simplest mashup design patterns is possible: you can substitute some parameters and get the corresponding web page. For websites that don't have formal APIs, such URLs are the closest one comes to a programming interface. (Sometimes, even if there is an API, it is simpler to use the human user interface URL and do a bit of screen-scraping. And sometimes even with an API that does not cover the functionality that you care about, having access to the URL is the only way to go.)

I have a sense that there are deep connections between RESTful architecture and the importance of little URL languages -- but I can't put my fingers on the specific connections. I just ordered a copy of Leonard Richardson and Sam Ruby's Restful Web Services (RESTful Web Services) to help me better understand REST. Some impressions that I have about REST that I believe to be correct

  • A fundamental idea behind REST is using URLs to represent resources.
  • If the website that you are trying to mashup is truly RESTful, then figuring out the structures of URLs is akin to figuring how resources are named in the application -- what are the "nouns".
  • There would be pretty strong continuity between the structure of the human-facing website and any API in a RESTful site.
  • Coherent, clean URL languages correlate with good REST design.

Identifiers as glue

I want to strengthen my description of how to use identifiers, tags, and search terms to correlate similar or the same things within and across websites and applications. Think about the use of an ISBN in LibraryLookup and latitude and longitude in Google Maps in Flickr -- how those identifiers and broadly used ways of describing things connect websites together.

How the mashups we studied in Chapter 1 make use of the techniques of Chapter 2

To make the three mashups we studied in chapter 1, their creators had to understand the functioning of the constituent applications they were recombining. For instance:

  • for LibraryLookup, Udell needed to understand the use of ISBNs as identifiers among library catalogs and other book-oriented websites (such as amazon.com and other bookstores). Then you can use this ISBN (and speak the URL languages of various library catalogs) to glue together these various websites via JavaScript. (There are some challenges: it was difficult for Jon Udell to craft a totally user-friendly system for easily creating the LibraryLookup bookmarklet just for your library.)
  • for GMiF, a Greasemonkey script -- which is very much about remixing the existing user interface of an application, CK Yuan had need to understand the user interface of Flickr in order to insert the GMap icon among the other icons, how others have exploited the user tagging can be hacked to hold location data (in a system that ultimately become productized by Flickr in to machine tags). Moreover, on a prosaic level, you have to understand how to form URLs for each of the pictures.
  • housingmaps.com depends on craigslist, which has no formal API. Hence, Paul Rademacher has to parse the HTML and understand the URL structure of craigslist, what cities are covered, how to make use of the RSS and supplement that data with screen-scraping.

What you get by studying the application and not just the API

My point is the developers need to understand apps as end-users too and not just jump into the API. Learn the application first (if you are an experienced developer and user of these types of applications, it won't take that long.). It's worth the investment of time. Why not just jump into the API?

  • You're more likely to make a more useful mashup by availing yourself of knowledge as an end-user
  • You can plug the mashup into the context of how users are already using the application
  • You understand what is currently missing from the application and can be improved
  • You see hooks into the application that are not necessarily obvious from the API alone
  • You can more easily make sense of the API when you know what key data entities are and some of the functionality -- you can ask, how might it be reflected in the APIs.

Looking for signs of mashability; ties to further chapters

Chapter 2 is also a prelude to the chapters that immediately follow it, elements of a website that make it more remixable. Indeed, the topics are the basis of a checklist of questions to pose in assessing the mashability/remixability/recombinatorial potential of applications:

  • Are tags used to describe resources on the website (described in greater detail in Chapter 3)
  • Are RSS and other syndication feeds available? (We will deal with this issue in greater depth in Chapter 4)
  • Do you see functionality for integrating with weblogs? (Chapter 5)
  • Is there an API for the application (Chapter 6, 7, and 8.)

In addition, you would look for the existence of browser toolbars, desktop clients, and mobile interfaces that interact with the websites -- they not only show that the website is remixable but often show how you can do so. (I will have to give specific examples here in the chapter, but I have some already installed in my own browser: del.icio.us Firefox extension and Amazon S3 Firefox Organizer(S3Fox)).

Data formats, nouns, and Verbs

"What is the underlying data format?" -- and a related question "What are the core entities or resources in the website" -- are useful questions to pose when studying an application. If we use grammatical analogies, what are the "nouns"? When we look then at what functionality there is around the entities, we are asking what the "verbs" are. If there is an API, it will make a lot more sense if you have a sense of what those entities and their functionality are.

Chapter 02
REST

Comments (0)

Permalink

The further development of Yahoo! Pipes

I currently have an example of how to use Yahoo! Pipes in Chapter 4 of my book. Pipes continues to advance quickly beyond the point I last took a close look. At the start of its life, Pipes dealt only with acepting inputs and creating outputs that are RSS or Atom feeds. Now, it seems to be able handle a broader range of inputs: XML documents in general (I think) but also JSON. When I last looked, the output is still primarily feeds -- though KML is now one of the outputs of Pipes, allowing for the generation of mapping data that can be plotted directly on Google Earth and Google Maps. I'm looking forward to being able to produce a broader range of outputs as well as being able to plug in third party filters and transforms.

For an engaging talk on the subject by the creator of Pipes, see it on Google Video.

Chapter 04
KML
XML
Yahoo
Yahoo! Pipes

Comments (0)

Permalink

Chapter 6 (first draft) posted

I just posted my first draft of Chapter 6 ("Learning XML Web services APIs through Flickr").

Chapter 6 is a large and complex chapter that aims to do quite a few things. (The current draft runs 34 pages.) I'm excited about chapter 6 because with some refinement, I think the chapter will be able to pull off these goals. By working through this chapter, I want users to have a pretty solid understanding of the capabilities of the Flickr API and basic PHP programming and a conceptual foundation for HTTP and HTML and web services.

The overall structure is almost completely in place. I will say that I need to provide a fair amount more explanatory prose. I was striving for conceptual and technical completeness. I will balance it out with more descriptive prose in the next pass.

There is a tension of how much PHP hand-holding do I want to provide. I still have in mind an audience like my students who by and large are not programmers but who can be led into a nice programming example without having them learn all the grammar of PHP up front. This is the approach I take here.

Major changes that I still want to make the chapter really complete:

  • I'd like to expand the section on the Flickr API Explorer. Users can learn a lot from using it. I would like to write a lot more about how to use it. There are a lot of things to do to improve this section, which I think is a key section of this chapter and of the book: 1) explain why the Flickr API explorer is so cool 2) show its full capability 3) do gets and then set-type operations 4) make some more exercises that challenge the reader to really understand the API of Flickr 5) explain the differences in how you can call the various methods. 6) Use this as a way to explore the capability of Flickr. 6) Include screen-shots.
  • I'd like to weave in explanations of server vs client-side programming, HTML vs XML, and CSS. By the end of the chapter, I want to have my readers go through a core of PHP programming that covers pretty extensive use of the Flickr API. No database yet. No JavaScript yet.
  • I'd like to put in a section on CSS to get that into the mix. I didn't do that because I will admit that I've not put a lot of emphasis on styling myself -- but it would be useful to discuss CSS.
  • It would be great to be able to come through and add Java/.Net/Ruby/Python code snippets to do all the same things once the entire book is ready to go with PHP....
  • I'd like to explain Flickr uploading to not only complete out an explanation of the tricky bits of the Flickr API but also to illustrate the use of POST in PHP and HTTP headers.

Chapter 06

Comments (0)

Permalink

New chapters posted: Seeking readers!

I posted today first drafts of 5 more chapters for my book.   Take a look -- I'd love to get some feedback on these drafts.   (Send me email at raymondyee AT mashupguide DOT net.):

 

Chapter 3: "Tagging and Folksonomies." (2007-05-02 07:58:41)

Chapter 4: "RSS and Atom and syndication; integration with news readers" (2007-04-24 17:52:20)

Chapter 5: "Integration with Weblogs and Wikis" (2007-05-02 08:08:41)

Chapter 8: "Learning AJAX/Javascript widgets and their APIs" (2007-04-06)

Chapter 14: "Social Bookmarking and bibliographic systems" (2007-05-02 08:33:09)

Chapter 03
Chapter 04
Chapter 05
Chapter 08
Chapter 14

Comments (0)

Permalink