Skip to content

Chapter 16: What services to cover?

I've been captivated by the potential of Amazon S3 (Amazon's "Simple Storage Service"), which is described in the following way:

    Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites.

With S3 (combined with Amazon EC2 — the elastic computing cluster), I keep thinking that I have the raw ingredients for a relatively inexpensive supercomputer. Now what to do with that computing power — that's the subject of another post.

At a basic level, Chapter 16 is meant to be a tutorial of Amazon S3 and rival/parallel/comparable services. What are other services that I'd like to cover (if I manage to have enough time to write up)? On the list of possible services to cover are:

First place to start: learn to program Amazon S3. One thing I'm reading is Amazon Web Services Developer Connection : Monster Muck Mashup – Mass Video Conversion Using AWS, which touches on not only S3 but also Amazon EC2 and the Simple Queue Service.

Work to do for the second draft of Chapter 2

Chapter 2 analyzes Flickr for what makes it a mashup platform par excellence through which you can learn how to remix a specific application and exploit features that make it so remixable. The chapter compares and contrasts Flickr with other remixable platforms: del.icio.us, Google Maps, and amazon.com. On my plate is writing the second draft of Chapter 2. Besides correcting small scale errors, refining the prose of the chapter and giving it a jazzier and more accurate title, my focus is on providing more details about mashups that could actually be created from the features I write about. I write that "a goal of this chapter is to train you [readers] to deconstruct applications for their remix and mashup potential." While I do spell out in substantial detail the ways URLs are constructed and organized for Flickr, amazon.com, Google Maps, and del.icio.us, I need to describe how to generalize these ideas to other circumstances and suggest possible mashups that can be built.

Here are some other issues to work out:

URLs as little languages and the connection to REST

I spend a lot of effort in Chapter 2 on the notion of "URLs as little languages to understand and to speak." I think that it's easy for experienced programmers to these ideas about URLs (e.g., Hacking the URL) for granted. But I want to show the importance of being able to link to specific resources. For instance, LibraryLookup depends on being able to point to a book by constructing a URL based on an ISBN. If you can't easily link to a resource, you are going to be hard-pressed to reuse it, especially if there is not formal API. (Note: Some library catalogues have odd session-dependent cookies that make it difficult to forge such a URL to the book. You can sometimes manage to create a URL that will work (temporarily), through a multi-step screen-scraping — in contrast to just dropping an ISBN into a URL.)

Having a simple URL to represent a specific resource means one of the simplest mashup design patterns is possible: you can substitute some parameters and get the corresponding web page. For websites that don't have formal APIs, such URLs are the closest one comes to a programming interface. (Sometimes, even if there is an API, it is simpler to use the human user interface URL and do a bit of screen-scraping. And sometimes even with an API that does not cover the functionality that you care about, having access to the URL is the only way to go.)

I have a sense that there are deep connections between RESTful architecture and the importance of little URL languages — but I can't put my fingers on the specific connections. I just ordered a copy of Leonard Richardson and Sam Ruby's Restful Web Services (RESTful Web Services) to help me better understand REST. Some impressions that I have about REST that I believe to be correct

  • A fundamental idea behind REST is using URLs to represent resources.
  • If the website that you are trying to mashup is truly RESTful, then figuring out the structures of URLs is akin to figuring how resources are named in the application — what are the "nouns".
  • There would be pretty strong continuity between the structure of the human-facing website and any API in a RESTful site.
  • Coherent, clean URL languages correlate with good REST design.

Identifiers as glue

I want to strengthen my description of how to use identifiers, tags, and search terms to correlate similar or the same things within and across websites and applications. Think about the use of an ISBN in LibraryLookup and latitude and longitude in Google Maps in Flickr — how those identifiers and broadly used ways of describing things connect websites together.

How the mashups we studied in Chapter 1 make use of the techniques of Chapter 2

To make the three mashups we studied in chapter 1, their creators had to understand the functioning of the constituent applications they were recombining. For instance:

  • for LibraryLookup, Udell needed to understand the use of ISBNs as identifiers among library catalogs and other book-oriented websites (such as amazon.com and other bookstores). Then you can use this ISBN (and speak the URL languages of various library catalogs) to glue together these various websites via JavaScript. (There are some challenges: it was difficult for Jon Udell to craft a totally user-friendly system for easily creating the LibraryLookup bookmarklet just for your library.)
  • for GMiF, a Greasemonkey script — which is very much about remixing the existing user interface of an application, CK Yuan had need to understand the user interface of Flickr in order to insert the GMap icon among the other icons, how others have exploited the user tagging can be hacked to hold location data (in a system that ultimately become productized by Flickr in to machine tags). Moreover, on a prosaic level, you have to understand how to form URLs for each of the pictures.
  • housingmaps.com depends on craigslist, which has no formal API. Hence, Paul Rademacher has to parse the HTML and understand the URL structure of craigslist, what cities are covered, how to make use of the RSS and supplement that data with screen-scraping.

What you get by studying the application and not just the API

My point is the developers need to understand apps as end-users too and not just jump into the API. Learn the application first (if you are an experienced developer and user of these types of applications, it won't take that long.). It's worth the investment of time. Why not just jump into the API?

  • You're more likely to make a more useful mashup by availing yourself of knowledge as an end-user
  • You can plug the mashup into the context of how users are already using the application
  • You understand what is currently missing from the application and can be improved
  • You see hooks into the application that are not necessarily obvious from the API alone
  • You can more easily make sense of the API when you know what key data entities are and some of the functionality — you can ask, how might it be reflected in the APIs.

Looking for signs of mashability; ties to further chapters

Chapter 2 is also a prelude to the chapters that immediately follow it, elements of a website that make it more remixable. Indeed, the topics are the basis of a checklist of questions to pose in assessing the mashability/remixability/recombinatorial potential of applications:

  • Are tags used to describe resources on the website (described in greater detail in Chapter 3)
  • Are RSS and other syndication feeds available? (We will deal with this issue in greater depth in Chapter 4)
  • Do you see functionality for integrating with weblogs? (Chapter 5)
  • Is there an API for the application (Chapter 6, 7, and 8.)

In addition, you would look for the existence of browser toolbars, desktop clients, and mobile interfaces that interact with the websites — they not only show that the website is remixable but often show how you can do so. (I will have to give specific examples here in the chapter, but I have some already installed in my own browser: del.icio.us Firefox extension and Amazon S3 Firefox Organizer(S3Fox)).

Data formats, nouns, and Verbs

"What is the underlying data format?" — and a related question "What are the core entities or resources in the website" — are useful questions to pose when studying an application. If we use grammatical analogies, what are the "nouns"? When we look then at what functionality there is around the entities, we are asking what the "verbs" are. If there is an API, it will make a lot more sense if you have a sense of what those entities and their functionality are.

The further development of Yahoo! Pipes

I currently have an example of how to use Yahoo! Pipes in Chapter 4 of my book. Pipes continues to advance quickly beyond the point I last took a close look. At the start of its life, Pipes dealt only with acepting inputs and creating outputs that are RSS or Atom feeds. Now, it seems to be able handle a broader range of inputs: XML documents in general (I think) but also JSON. When I last looked, the output is still primarily feeds — though KML is now one of the outputs of Pipes, allowing for the generation of mapping data that can be plotted directly on Google Earth and Google Maps. I'm looking forward to being able to produce a broader range of outputs as well as being able to plug in third party filters and transforms.

For an engaging talk on the subject by the creator of Pipes, see it on Google Video.

Chapter 6 (first draft) posted

I just posted my first draft of Chapter 6 ("Learning XML Web services APIs through Flickr").

Chapter 6 is a large and complex chapter that aims to do quite a few things. (The current draft runs 34 pages.) I'm excited about chapter 6 because with some refinement, I think the chapter will be able to pull off these goals. By working through this chapter, I want users to have a pretty solid understanding of the capabilities of the Flickr API and basic PHP programming and a conceptual foundation for HTTP and HTML and web services.

The overall structure is almost completely in place. I will say that I need to provide a fair amount more explanatory prose. I was striving for conceptual and technical completeness. I will balance it out with more descriptive prose in the next pass.

There is a tension of how much PHP hand-holding do I want to provide. I still have in mind an audience like my students who by and large are not programmers but who can be led into a nice programming example without having them learn all the grammar of PHP up front. This is the approach I take here.

Major changes that I still want to make the chapter really complete:

  • I'd like to expand the section on the Flickr API Explorer. Users can learn a lot from using it. I would like to write a lot more about how to use it. There are a lot of things to do to improve this section, which I think is a key section of this chapter and of the book: 1) explain why the Flickr API explorer is so cool 2) show its full capability 3) do gets and then set-type operations 4) make some more exercises that challenge the reader to really understand the API of Flickr 5) explain the differences in how you can call the various methods. 6) Use this as a way to explore the capability of Flickr. 6) Include screen-shots.
  • I'd like to weave in explanations of server vs client-side programming, HTML vs XML, and CSS. By the end of the chapter, I want to have my readers go through a core of PHP programming that covers pretty extensive use of the Flickr API. No database yet. No JavaScript yet.
  • I'd like to put in a section on CSS to get that into the mix. I didn't do that because I will admit that I've not put a lot of emphasis on styling myself — but it would be useful to discuss CSS.
  • It would be great to be able to come through and add Java/.Net/Ruby/Python code snippets to do all the same things once the entire book is ready to go with PHP….
  • I'd like to explain Flickr uploading to not only complete out an explanation of the tricky bits of the Flickr API but also to illustrate the use of POST in PHP and HTTP headers.

New chapters posted: Seeking readers!

I posted today first drafts of 5 more chapters for my book.   Take a look — I'd love to get some feedback on these drafts.   (Send me email at raymondyee AT mashupguide DOT net.):

 

Chapter 3: "Tagging and Folksonomies." (2007-05-02 07:58:41)

Chapter 4: "RSS and Atom and syndication; integration with news readers" (2007-04-24 17:52:20)

Chapter 5: "Integration with Weblogs and Wikis" (2007-05-02 08:08:41)

Chapter 8: "Learning AJAX/Javascript widgets and their APIs" (2007-04-06)

Chapter 14: "Social Bookmarking and bibliographic systems" (2007-05-02 08:33:09)

Browser extension mechanisms for various browsers

I know about the Firefox add-on/extension mechanism but what about the corresponding element of other web browsers? Here's what a series of quick web searches turned up:

Firefox

Opera

Internet Explorer

Safari

  • Pimp My Safari: about:
      It was started as a reaction to the sites cataloguing Firefox extensions. Many excellent plugins for Safari have been developed, but because Safari doesn’t have an official ‘extension architecture’, many don’t know of these extensions.

I'll see how much 'll be able to cover these various mechanisms in detail in the book.

Exisiting pieces to draw on on social bookmarks for Chapter 14

If I weren't sick with a cold, I'd be energetically plowing away on Chapter 14 on the topic of social bookmarks. I'm trying to get through a first pass of the chapter this afternoon, which is likely to be way too optimistic. Fortunately, I am drawing from some work that I've already assembled:

Mining the data in ProgrammableWeb for Design Patterns in Mashups

In chapter 9, I look in detail at some individual mashups. I also want to know more about mashups in general, to do a macro-analysis of mashups. That is, I would look at the broadest range of mashups to look for design patterns that cross many examples.

One way forward would be an analysis using ProgrammableWeb, probably the single best compilation available of mashups and corresponding APIs available on the public web. There are some patterns that are immediately obvious from a study of the site; I say immediately obvious because John Musser, its creator has surfaced these elements in the interface. Let me point out some of the data about mashups:

  • You can get an overview of the mashup world, newly registered ones, what's popular at the Mashup Dashboard.
  • "mapping" is the most popular tag associated with mashups, followed by "photo"
  • The Web 2.0 Mashup Matrix displays mashups by their use of every combination of 2 APIs in the ProgrammableWeb database.

In addition, to what is obvious in the data, I would like to pose more questions that should be derivable from what is in ProgrammableWeb.com:

  • How many APIs are used by the mashups? That is, what's the distribution — how many use 1, 2, 3, etc. APIs.
  • What's the most common pair of APIs being used? Most common threesome?
  • Is there any correlation between the popularity of an API and the popularity of mashups that use that API?
  • Are there broader correlations among usage patterns of APIs if we cluster them by categories? Are mashups likely to use more than one API in the same category or across categories?

As of the writing of this book, there is no formal API to programmableweb.com — so answering these and allied questions require some other form of access to the data. I'm working with John to get such access.

Comments from Jon Udell on my writeup on LibraryLookup

Jon Udell was kind enough to make some comments on what I've written on his LibraryLookup bookmarklet in Chapter 1 (which I post here with his permission):

    Under "How can it be extended": OCLC xISBN! That service solves a key problem with the bookmarklet version: that an ISBN does not uniquely identify a work. But it creates a new problem: a bookmarklet alone cannot perform client-side remixing (i.e., calling xISBN and then using its output to splice Amazon and the library). I've incorporated xISBN into several solutions:

  • http://weblog.infoworld.com/udell/2006/01/30.html

    Being a Greasemonkey hack, this has limited reach. I've been meaning to try to produce a universal version that'd work with IE, probably using Turnabout (http://www.reifysoft.com/turnabout.php), and you've reminded me to prioritize that.

  • http://weblog.infoworld.com/udell/2006/01/26.html

    This is actually a different kind of mashup, involving Amazon wishlists. It's very cool. But again, it has limited reach in the sense that RSS notification is geeky.

  • http://elmcity.info/services

    This version eliminates RSS in favor of email, the idea being to appeal more broadly. Except it hasn't, because the conceptual barrier — multipurpose your Amazon wishlist in order to receive notifications about availability in your local library — is formidable.

    All three of these solutions could, and perhaps should, be generalized for multiple OPACs and multiple libraries, in the way that the bookmarklet generator has generalized the bookmark hack.

    That's it! Hopefully somebody will read this and take on one or more of these challenges, in case I don't get to them.

Mashup tools to look at

In Chapters 9 and 11, I analyze service composition frameworks, tools that make it easier to create mashups, for "design patterns" among mashups. That is, if some tool offers a template, it's likely that there is a design pattern behind that template. If time allows, I'd like to study at the least the following frameworks. openkapow is one such system. I want to look at the ones highlighted in John Musser's recent presentation at Web 2.0 Expo. See Open APIs Talk at Web 2.0 Expo and specifically the quote from Digg floats API, phishing mashups to come:

    "The tool space is going to explode, both for developers and nondevelopers," Musser said. Of particular note were data mashup tools such as Yahoo Pipes, RSSBus, and Grazr; scraping tools for making structured data from unstructured data, such as Kapow and Dapper; and visual development tools, including JackBe, Teqlo, Bungee, and IBM's QEDWiki.

I'm already studying Yahoo! Pipes, Kapow, Dapper, and QEDWiki but have yet to look at:

How well do these tools work? We'll see.