Skip to content

At work on Chapter 9

I'm working today on the first draft of Chapter 9 "Dissecting mashups and remixes". Ideally, I'd like to create the equivalent of the Gang of Four's Design Patterns for mashups. Such a project is long-term effort. For this chapter, I suggest finding several emerging patterns from an analysis of a handful of specific mashups. I will also outline how we can look for patterns latent in the ProgrammableWeb database of mashups.

Where we can look for analyses of mashup-related patterns?

Which Creative Commons license to choose?

My publisher and I have agreed to release my book under the By-NC-SA 2.5 Creative Commons license. Should we go even further in openness and license under a By-SA license and remove the commercial reuse restrictions? David Wiley's post Why Universities Choose NC, and What You Can Do at iterating toward openness prompted my own comment:

    I’m very interested in this issue. I’m currently working on a book to be published by Apress on mashups (http://blog.mashupguide.net/about). The deal I have with my publisher is to publish the book under a By-NC-SA-2.5 license. As David knows, I was debating with myself on what license to choose from among By-NC-SA, By-SA, and the GNU FDL. I finally decided not to go all the way to By-SA because I was afraid that if we didn’t go with the NC restriction, a commercial player could undercut Apress (and me) by taking all the materials and selling it in a more commercially advantageous position. That is, I’m afraid of the prospect of someone printing and selling paper copies at cheaper cost or putting up my book on a commercial site and realizing advertising revenue for cheap (since they did not put in the money to produce the book in the first place.) I will admit that my fears may not be well-founded — so I’m interested in figuring out whether I should revisit the issue of licensing with my publisher. (It’s not that Apress is adverse to publishing books under the GFDL — see http://www.djangobook.com/license/, for instance.) I will say that the incident with Seth Godin’s book did not help with my fears though. (http://sethgodin.typepad.com/seths_blog/2007/02/please_dont_buy.html)

    Bottom line: how do I make my work as open as possible while not opening my publisher and me up to being unfairly taken advantage of commercially? I’m not predicting that I’ll be making tons of royalties off my book, but I don’t want to have what little might be coming my way to be taken away either! :-) Since I recently left the long-term employ of the University of California, I’m a bit more dependent on income from writing than I used to be.

ProgrammableWeb points me to yet another Google release!

It's very difficult to keep up with the world of public APIs and mashups — even for someone like me who is writing a book on the topic and teaching a course on the subject! Now that I have a bit more time to work on the topic, I turn now to being more assiduous in my reading of online news sources. My first priority is a consistent read of John Musser's ProgrammableWeb. I already make steady use of its database of APIs and often point others to the Mashup Dashboard when people ask me for concrete examples of mashups. Now I want to keep up with his blog. Today (April 20)'s post concerns Google AJAX Feed API, which is described thus by the Google documentation:

    With the AJAX Feed API, you can download any public Atom or RSS feed using only JavaScript, so you can easily mash up feeds with your content and other APIs like the Google Maps API.

Hmmm…I get to a detailed look at Chapter 4 next week on RSS and other feeds. I've written about Yahoo Pipes for remixing RSS. This Google AJAX widget demands a close look too! (Thanks, John, for alerting me to this new development! I also need to subscribe to Google AJAX Search API Blog in my news reader.)

Machine tagging in Flickr (and elsewhere?)

"Machine tags" (see Flickr: Discussing Machine tags in Flickr API) have been introduced into Flickr as generalization of things like geotagging. Machine tags are also known by many as "triple tags". These are tags with a specific syntax aimed primarily for "machine consumption" (that is, by programs) and not directly for display to the typical end-user. You can use machine values to store extra data elements for a given photo. I think that it's fair to say that most important example of such data has so far been the latitude and longitude associated with a photo. So important that Flickr ultimately introduced specialized functionality to handle this data, to take that data out of the realm of having people just shoehorning that info into tags.

I'd really like to know what uptake there has been on machine tags. I was hoping to be able to do searches for namespaces in use — but as I document below, I don't know how to formulate a query to do so.

Some kinks have been fixed in the Flickr machine tags:

Some outstanding issues remain:

  • missing negative values and decimals in the machine tag API. See yws-flickr : Message: Re: Machine Tag Bug – missing negative values and decimals?. When I do the a query for my pictures that use the geo: machine tag name space and ask for the machine tags, I get the "-" and "." stripped out. (I confirmed the problem on March 11 but as of April 19, I've no seen any resolution.)
  • no response to yws-flickr : Message: Re: Ladies and gentlemen : Machine tags. I'd love to have a whole set of query functionality, including being able to look up all domains in use. I know about "geo" but what others are being used — and with what frequency? For instance, we can pull up all the geo: machine tags by searching for "geo:" with the API call. See http://tinyurl.com/yt5k2f You will >500,000 photos under that domain. I tried variants on "geo:" to try to pull up all machine tags ("*:", "*", "*:*") but I couldn't find any that would get me all the machine tags….

What are the equivalent of machine tags in other systems? What to look at:

Riya-Flickr mashup / What is visual search?

In a sidebar to Chapter 3, I ask the question of whether we will be able to ultimately rely on visual searching instead of tags? I mention that that companies like Riya are hard at work to bring us visual search. What other companies are out there? Trying out Riya has been on my list for a while, especially because it has an API. If you have experience with Riya, especially the API, please contact me. I'd like to see whether I can do a mashup between Flickr and Riya in which I could feed Riya my photos, using the tags I already have in Flickr to train it to recognize faces of friends, and then ask Riya to tag the rest of my photos. I'm sure someone must have tried to do so already. How have they succeeded?

What might a non-word based search look like? Draw something that you want to look for and the search engine will bring up pictures that look like what you drew? Or would you present a photo to the search engine, and it would bring up similar photos? The fact that we still have to type words in to a search engine to search for pictures or video or music, shows how deep end and we are on words for search and for describing nonverbal objects. That's why tags are so central in Flickr, where the dominant form of data is visual.

Chapter 3: What I could possibly cover

This week, I'm working a version 1.5 (a cleaned up first draft) of Chapter 3 "Tagging and Folksonomies". The important task for me is to work with I already have in my manuscript. Still I like to step back for a moment and write out what ideally I'd like be able to convey to my readers about tagging and folksonomies and/or functionality that I would like to have in hand for my computer systems.

  • I'd like to be to integrate the various implementations of tagging, wherever they are. For example, I use tags in Flickr, del.icio.us, and technorati. They all have somewhat different ways of expressing similar functionality — but I'd like to use these systems in as seamless way possible.
  • I like to apply tags to digital content, whether it is on my desktop computer, on one of the Web 2.0 sites, or it is on custom online collaborative spaces.
  • I like to be to apply tags at a very fine grained level. For example, not to a video clip as whole, but to a specific time segment.
  • I'd like to have better tools to visualize and query my tags, to have tools that assist me in tagging new materials based on what and how I've already tagged materials. I like to offerred suggested tags based on some mix of what has already been typed in,what my social networks uses, global usage patterns, and other things that I've tag already.
  • Tools for name reconciliation and reduction of tags. I might have had inconsistent use of singular and plural tags and want to have tools that will help me clean up those inconsistencies
  • Right now I have a mess of tags in delicious. I'd like some tool to help me clean them up. (Does del.icio.us direc.tor fit the bill?)
  • I'd like to have better reconciliation between controlled vocabulary systems (e.g., LCSH) and folksonomies.

The observations I've written are abstract. I should think in terms of concrete systems. Systems I tag a lot in:

  • Flickr
  • delicious
  • Amazon.com
  • Librarything.com
  • I do make categories on my WordPress blogs which seem to turn into technorati tags.

It would be useful to have a big list of systems that support tagging. I believe such a list exists already on the Wikipedia. Probably on the list are a lot of the social book marking sites, photo sharing sites other than Flickr, video sites like YouTube.

First impressions of tagging in Thunderbird 2.0

Since I'm writing Chapter 3 right now and learned that Thunderbird 2.0 supports tags, I decided to take a look immediately. The interface looks a bit spiffier. I will say that I'm generally pleased with Thunderbird, so I was hoping that the tags would be a great new feature.

So far, I don't quite see how I would use them. I thought that they would be more like Flickr tags than virtual folders. Moreover, how does one get rid of a tag in Thunderbird 2.0. I can see how to disassociate a tag from a given message. Thunderbird tags function more like folders. Even if you don't have a tag associated with any messages, the tag still exists (like an empty folder). I can't find any relevant help by searching for thunderbird tag in MozillaZine. Help!

Firefox plugins for Amazon EC2 and S3; developer forums for Amazon

In my post Mashup Guide :: Amazon URL structures, I mentioned that I will pose my questions concerning the Amazon web services in the appropriate forums. I'm looking at the list of forums Amazon Web Services Developer Connection : Developer Connection but I'm uncertain whether any of them is appropriate yet. Along the way I did find the Amazon EC2 Firefox Extension. I plan to come back to looking at that plugin, in combination with S3 Firefox Organizer(S3Fox), as ways to jumpstart one's exploration of Amazon S3 and EC2.

Amazon URL structures

I spend a substantial part of Chapter 2 on the topic of understanding the syntax and semantics of URLs in web applications. Knowing how URLs are formed lays the foundation of mashing them up later but also enables users to recombine content from various sites without much programming.

In the chapter, I look at URLs in Flickr. Google Maps, del.icio.us, and amazon.com. Below is an excerpt of the chapter about amazon.com. One major question I have is whether someone has documented the URL structures for amazon.com in a more comprehensive fashion, akin to what Google Map Parameters – Google Mapki does for Google Maps. I will post that question on the appropriate forums when I figure what they are. Anyone out there know the answer?

Amazon walkthrough

Amazon.com is another interesting site to look at. Not only is it a popular e-commerce site, it is a pioneering e-commerce platform which is easily remixed and recombined with other content. Although we will study the Amazon APIs later, we focus here on how amazon.com from the view of an end-user. Moreover, the goal in this section is not learn all the features of amazon.com but rather to study the structure of URLs used in amazon.com — specifically the question of how to link to the site. (While Amazon sells a lot of merchandise other than books, we will look at books to focus our walk-through. Moreover, we focus here on amazon.com, the site geared to the USA instead of the network of sites aimed to customers outside the USA.)

The strategy we follow here is to discern the key entities of the amazon.com site through a combination of using and experimenting with the site, sifting through documentation, seeing what other users have done. Note that since some of the conclusions are not supported by official documentation from amazon.com, there is no long term guarantee behind the URLs.

Amazon items

It doesn't take much use of amazon.com to see that the central entity of the site is an item for sale (akin to a photo in Flickr). By looking at the URL of a given item and looking throughout a page describing it, you will see that Amazon uses ASIN (Amazon Standard Identification Number) as a unique identifier for its products.[1] For books that have an ISBN, the ASIN is the same as the ISBN for the book. According to the Wikipedia article, on amazon.com, you can point to a product with an ASIN with the following URL:

http://www.amazon.com/gp/product/[ASIN]

Take for instance, Czeslaw Milosz’s New and Collected Poems (paperback edition), which has an ISBN of 0060514485. You can find it on amazon.com at

http://www.amazon.com/gp/product/0060514485

It is important to know that the way to link to amazon.com has changed in the past and will likely continue to change. For instance, you can also linkt to the book with

http://www.amazon.com/exec/obidos/ASIN/0060514485

or even a shorter form.

http://amazon.com/o/ASIN/0060514485

The use of this syntax would ideally be founded on some official documentation from amazon.com. Where would one find definitive documentation on how to structure a link to a product of a given ASIN? A search through the amazon developers' site leads to the the technical documentation[2], whose latest version at the time of writing is the 2007-04-04 edition of the technical docs[3] That trial leads ultimately to a page on the use of identifiers , which, alas, does not spell out how to formuate the URL for an item with a given ASIN.[4] The bottom line for now: the Wikipedia plus experimentation is the best way to discern the URL structures of amazon.com.

Let's apply this approach to other functions of amazon.com. For instance, can we generate a URL for a full-text search? Go to amazon.com and drop in your favorite search term. Take for example, flower. When you hit submit, you'll get a URL that looks like:

http://amazon.com/s/ref=nb_ss_gw/102-1755462-2944952?url=search-alias%3Daps&field-keywords=flower&Go.x=0&Go.y=0

If you do the search again, say in a different browser, you will get another URL. I got:

http://amazon.com/s/ref=nb_ss_gw/102-8204915-1347316?url=search-alias%3Daps&field-keywords=flower&Go.x=0&Go.y=0&Go=Go

Notice where things are similar and where the URLs are different from one another. Looking for what's common (the http://amazon.com/s prefix and ?url=search-alias%3Daps&field-keywords=flower&Go.x=0&Go.y=0&Go=Go argument), you might try to eliminate the sections which are different:

http://amazon.com/s/?url=search-alias%3Daps&field-keywords=flower&Go.x=0&Go.y=0&Go=Go

which seems to work fine. You can even eliminate &Go.x=0&Go.y=0&Go=Go to boil the request down to

http://amazon.com/s/?url=search-alias%3Daps&field-keywords=flower

How to limit it to books? If you go to amazon.com and select the book section and use a flower keyword, you will get a URL similar to

http://amazon.com/s/ref=nb_ss_gw/102-6984159-2338509?url=search-alias%3Dstripbooks&field-keywords=flower&Go.x=12&Go.y=6

Stripping away the parameters that we had done before give you:

http://amazon.com/s/?url=search-alias%3Dstripbooks&field-keywords=flower

This trick works for the other departments. For example, to do a search on flowers in Home & Garden:

http://amazon.com/s/?url=search-alias%3Dgarden&field-keywords=flower

Let's run through the syntax of other organizational structures:

Lists

To go to the wishlist section:

http://www.amazon.com/gp/registry/wishlist/

If you are logged in, you will see a list of your lists on the left. Look at the URL of one of them, which will look like

http://www.amazon.com/gp/registry/wishlist/1U5EXVPVS3WP5/ref=cm_wl_rlist_go/102-5889202-4328156

You'll see that the since the right hand number (e.g., 102-5889202-4328156) remains the same but one number (e.g., 1U5EXVPVS3WP5) changes for each list that 1U5EXVPVS3WP5 is the identifier for the list. You can point to a list by its list identifier by

http://www.amazon.com/gp/registry/wishlist/1U5EXVPVS3WP5

Tags

Tags are a recent introduction to Amazon.com. You will see links like

http://www.amazon.com/tag/czeslaw%20milosz/ref=tag_dp_ct/102-8204915-1347316

which can be reduced to

http://www.amazon.com/tag/czeslaw%20milosz/

Subject headings

In looking through the Browse-subject section of amazon.com (http://www.amazon.com/Subjects-Books/b/?ie=UTF8&node=1000), you can find a link such as

http://www.amazon.com/b/ref=amb_link_1760642_21/104-0367717-9318361?ie=UTF8&node=5&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=center-3&pf_rd_r=0J0MADE0YSN1VRBA6XZS&pf_rd_t=101&pf_rd_p=233185601&pf_rd_i=1000

(which refers to the Computers & Internet Section) to

http://www.amazon.com/b/?ie=UTF8&node=5

(The fact that the node is specified by number rather than any word-based descriptor makes one concerned about the long term stability of the link. Will 5 always refer to computers or if there is another section added that goes before it alphabetically, will the link break?)

There are plenty of other entities whose URL structures can be discerned, including Listmania lists (e.g., http://www.amazon.com/favorite-literary-poles/lm/1FH0E3G892IA/ and http://www.amazon.com/lm/1FH0E3G892IA/), So You'd Like to Guides (e.g., http://www.amazon.com/gp/richpub/syltguides/fullview/3T3I3YDBG889B), personal profiles (e.g., http://www.amazon.com/gp/pdp/profile/A2D978B87TKMS2/)



 

[1] http://en.wikipedia.org/wiki/Amazon_Standard_Identification_Number

 

[2] http://developer.amazonwebservices.com/connect/kbcategory.jspa?categoryID=19

[3] http://developer.amazonwebservices.com/connect/entry.jspa?externalID=703&categoryID=19

 

[4] http://docs.amazonwebservices.com/AWSECommerceService/2007-04-04/DG/ItemIdentifiers.html

Chapter 2: First draft

I have posted the first draft of Chapter 2 (pdf) "Looking at Flickr, Del.icio.us, Google maps, and Amazon.com as end-user tools". The chapter analyzes Flickr (as our primary extended example) for what makes it the remix platform par excellence for learning how to remix a specific application and exploit its many features that make it so remixable. The chapter compares and contrasts flickr with other remixable platforms: del.icio.us, Google Maps, and amazon.com.