SOAP by Example

By Sam Ruby, December 20, 2002.

This document provides a working example of a functional SOAP client in Python, using only HTTP and XML modules.  The Google API is used as an example.  A familiarity with HTTP, XML, and  the concepts described in A Gentle Introduction to SOAP are presumed.

Overview #

The Google API is described and implemented in terms of a simple document exchange, where the documents themselves are expressed in XML.  

Declaring the request #

There request is defined as a templated XML document, with "%s" in places where substitutable parameters are to be placed.

template = """<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<gs:doGoogleSearch xmlns:gs="urn:GoogleSearch">
<key>%(key)s</key>
<q>%(q)s</q>
<start>0</start>
<maxResults>10</maxResults>
<filter>true</filter>
<restrict/>
<safeSearch>false</safeSearch>
<lr/>
<ie>latin1</ie>
<oe>latin1</oe>
</gs:doGoogleSearch>
</soap:Body>
</soap:Envelope>"""

As you can see, the innermost section is a series of name/value pairs, where the names are defined by the Google documentation.  This is wrapped inside an envelope, a body, and an element defined in the urn:GoogleSearch namespace.

Issuing the request #

The search request is issued using HTTP POST as follows:

def do_search(key, q):
  headers = {'Content-type':'text/xml', 'SOAPAction':'urn:GoogleSearchAction'}
  request = template % {'key':escape(key), 'q':escape(q)}

  connection = httplib.HTTPConnection("api.google.com", 80)
  connection.request("POST", "/search/beta2", request, headers)
  response = connection.getresponse()</definitions>

As you can see, two headers are defined, Content-type and SOAPAction.  The first declares that the message is indeed xml, and the second can be used to determine what object is accessed.  The request itself is the filled in template (with the XML characters appropriately escaped).

Then a connection is made to the api.google.com host at port 80.

Finally a POST request is made to the /search/beta2 URL, passing in the request and headers.

Parsing the response. #

The response to a SOAP request is either another document, or a fault, which is an XML document.  Fault are always accompanied by a 500 HTTP status code, so we can use that to determine whether we are to return back a faultstring or a list of URLs.

  document = minidom.parseString(response.read())
  if response.status == 500:
    return document.getElementsByTagName("faultstring")
  else:
    return document.getElementsByTagName("URL")

Pulling it all together.  #

Now that all of the hard work is done, the do_search function can be called with a key and a query string, thus:

key = "00000000000000000000000000000000"
for node in do_search(key, "absurd obfuscation"):
  print "".join([child.data for child in node.childNodes])

Clearly, one should substitute in one's own key, and it might be nice to vary the query string based on value of a command line argument, but you should get the idea.  The only code of moderate complexity in the above logic is the concatenation of the textual data associated with child nodes of the elements returned by the query.

Results #

Sample outputs from the full script for the case where the key is not changed:

Exception from service object: 
Invalid authorization key: 00000000000000000000000000000000

And from when the key is changed:

http://diveintomark.org/archives/2002/04/19.html 
http://radio.weblogs.com/0101679/2002/04/19.html 
http://inessential.com/2002/04/18.php 
http://www.digitalabonder.se/2002_04_01_arkiv.shtml 
http://archipelago.phrasewise.com/2002/04/19 
http://www.kuro5hin.org/story/2002/7/11/13538/9378 
http://www.keele.ac.uk/socs/ks40/absurd.htm 
http://www.intertwingly.net/slides/2002/devcon/23.html 
http://www.usip.org/library/tc/doc/reports/el_salvador/tc_es_03151993_intro.html 
http://media.iww.org/static/file168.html

Conclusion #

The purpose of this example is to show that invoking SOAP based web services need not be difficult.  The only real complexity where it should be - in the data that is sent back and forth.  In this case, there is much more data which is returned that could have been processed - snippets, and titles, and directory categories.  The full Web Service Description Language for the service can be found here, and the instructions on how to decipher this information can be found here

Search

Valid XHTML 1.1!