I spend a substantial part of Chapter 2 on the topic of understanding the syntax and semantics of URLs in web applications. Knowing how URLs are formed lays the foundation of mashing them up later but also enables users to recombine content from various sites without much programming.
In the chapter, I look at URLs in Flickr. Google Maps, del.icio.us, and amazon.com. Below is an excerpt of the chapter about amazon.com. One major question I have is whether someone has documented the URL structures for amazon.com in a more comprehensive fashion, akin to what Google Map Parameters – Google Mapki does for Google Maps. I will post that question on the appropriate forums when I figure what they are. Anyone out there know the answer?
Amazon walkthrough
Amazon.com is another interesting site to look at. Not only is it a popular e-commerce site, it is a pioneering e-commerce platform which is easily remixed and recombined with other content. Although we will study the Amazon APIs later, we focus here on how amazon.com from the view of an end-user. Moreover, the goal in this section is not learn all the features of amazon.com but rather to study the structure of URLs used in amazon.com — specifically the question of how to link to the site. (While Amazon sells a lot of merchandise other than books, we will look at books to focus our walk-through. Moreover, we focus here on amazon.com, the site geared to the USA instead of the network of sites aimed to customers outside the USA.)
The strategy we follow here is to discern the key entities of the amazon.com site through a combination of using and experimenting with the site, sifting through documentation, seeing what other users have done. Note that since some of the conclusions are not supported by official documentation from amazon.com, there is no long term guarantee behind the URLs.
Amazon items
It doesn't take much use of amazon.com to see that the central entity of the site is an item for sale (akin to a photo in Flickr). By looking at the URL of a given item and looking throughout a page describing it, you will see that Amazon uses ASIN (Amazon Standard Identification Number) as a unique identifier for its products. For books that have an ISBN, the ASIN is the same as the ISBN for the book. According to the Wikipedia article, on amazon.com, you can point to a product with an ASIN with the following URL:
http://www.amazon.com/gp/product/[ASIN]
Take for instance, Czeslaw Milosz’s New and Collected Poems (paperback edition), which has an ISBN of 0060514485. You can find it on amazon.com at
http://www.amazon.com/gp/product/0060514485
It is important to know that the way to link to amazon.com has changed in the past and will likely continue to change. For instance, you can also linkt to the book with
http://www.amazon.com/exec/obidos/ASIN/0060514485
or even a shorter form.
http://amazon.com/o/ASIN/0060514485
The use of this syntax would ideally be founded on some official documentation from amazon.com. Where would one find definitive documentation on how to structure a link to a product of a given ASIN? A search through the amazon developers' site leads to the the technical documentation, whose latest version at the time of writing is the 2007-04-04 edition of the technical docs That trial leads ultimately to a page on the use of identifiers , which, alas, does not spell out how to formuate the URL for an item with a given ASIN. The bottom line for now: the Wikipedia plus experimentation is the best way to discern the URL structures of amazon.com.
Let's apply this approach to other functions of amazon.com. For instance, can we generate a URL for a full-text search? Go to amazon.com and drop in your favorite search term. Take for example, flower. When you hit submit, you'll get a URL that looks like:
http://amazon.com/s/ref=nb_ss_gw/102-1755462-2944952?url=search-alias%3Daps&field-keywords=flower&Go.x=0&Go.y=0
If you do the search again, say in a different browser, you will get another URL. I got:
http://amazon.com/s/ref=nb_ss_gw/102-8204915-1347316?url=search-alias%3Daps&field-keywords=flower&Go.x=0&Go.y=0&Go=Go
Notice where things are similar and where the URLs are different from one another. Looking for what's common (the http://amazon.com/s prefix and ?url=search-alias%3Daps&field-keywords=flower&Go.x=0&Go.y=0&Go=Go argument), you might try to eliminate the sections which are different:
http://amazon.com/s/?url=search-alias%3Daps&field-keywords=flower&Go.x=0&Go.y=0&Go=Go
which seems to work fine. You can even eliminate &Go.x=0&Go.y=0&Go=Go to boil the request down to
http://amazon.com/s/?url=search-alias%3Daps&field-keywords=flower
How to limit it to books? If you go to amazon.com and select the book section and use a flower keyword, you will get a URL similar to
http://amazon.com/s/ref=nb_ss_gw/102-6984159-2338509?url=search-alias%3Dstripbooks&field-keywords=flower&Go.x=12&Go.y=6
Stripping away the parameters that we had done before give you:
http://amazon.com/s/?url=search-alias%3Dstripbooks&field-keywords=flower
This trick works for the other departments. For example, to do a search on flowers in Home & Garden:
http://amazon.com/s/?url=search-alias%3Dgarden&field-keywords=flower
Let's run through the syntax of other organizational structures:
Lists
To go to the wishlist section:
http://www.amazon.com/gp/registry/wishlist/
If you are logged in, you will see a list of your lists on the left. Look at the URL of one of them, which will look like
http://www.amazon.com/gp/registry/wishlist/1U5EXVPVS3WP5/ref=cm_wl_rlist_go/102-5889202-4328156
You'll see that the since the right hand number (e.g., 102-5889202-4328156) remains the same but one number (e.g., 1U5EXVPVS3WP5) changes for each list that 1U5EXVPVS3WP5 is the identifier for the list. You can point to a list by its list identifier by
http://www.amazon.com/gp/registry/wishlist/1U5EXVPVS3WP5
Tags
Tags are a recent introduction to Amazon.com. You will see links like
http://www.amazon.com/tag/czeslaw%20milosz/ref=tag_dp_ct/102-8204915-1347316
which can be reduced to
http://www.amazon.com/tag/czeslaw%20milosz/
Subject headings
In looking through the Browse-subject section of amazon.com (http://www.amazon.com/Subjects-Books/b/?ie=UTF8&node=1000), you can find a link such as
http://www.amazon.com/b/ref=amb_link_1760642_21/104-0367717-9318361?ie=UTF8&node=5&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=center-3&pf_rd_r=0J0MADE0YSN1VRBA6XZS&pf_rd_t=101&pf_rd_p=233185601&pf_rd_i=1000
(which refers to the Computers & Internet Section) to
http://www.amazon.com/b/?ie=UTF8&node=5
(The fact that the node is specified by number rather than any word-based descriptor makes one concerned about the long term stability of the link. Will 5 always refer to computers or if there is another section added that goes before it alphabetically, will the link break?)
There are plenty of other entities whose URL structures can be discerned, including Listmania lists (e.g., http://www.amazon.com/favorite-literary-poles/lm/1FH0E3G892IA/ and http://www.amazon.com/lm/1FH0E3G892IA/), So You'd Like to Guides (e.g., http://www.amazon.com/gp/richpub/syltguides/fullview/3T3I3YDBG889B), personal profiles (e.g., http://www.amazon.com/gp/pdp/profile/A2D978B87TKMS2/)