Expect More
By Sam Ruby, July 29, 2002.
This document describes the considerations which go into making an extensible
wire level protocol that can gracefully evolve over time. A basic
understanding of SOAP
and WSDL
is required.
Preface1: Tolerance
The following is from Tim Berners-Lee's Principles
of Design:
"Be liberal in what you require but conservative in what you do"
This is the expression of a principle which applies pretty well in life,
(it is a typical UU tenet), and is commonly employed in design across the
Internet.
This principle can be contentious. When browsers are lax about what they
expect, the system works better but also it encourages laxness on the part of
web page writers. The principle of tolerance does not blunt the need for a
perfectly clear protocol specification which draws a precise distinction
between a conformance and non-conformance. The principle of tolerance is no
excuse for a product which contravenes a standard.
Preface2: Evolable Systems
The following is from Clay Shirkey's In Praise of
Evolvable Systems:
Centrally designed protocols start out strong and improve logarithmically.
Evolvable protocols start out weak and improve exponentially. It's dinosaurs
vs. mammals, and the mammals win every time. The Web is not the perfect
hypertext protocol, just the best one that's also currently practical.
Infrastructure built on evolvable protocols will always be partially
incomplete, partially wrong and ultimately better designed than its
competition.
A simple interaction
Consider the following request
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<interop:echoString xmlns:interop="http://soapinterop.org/">
<inputString>hello world</inputString>
</interop:echoString>
</soap:Body>
</soap:Envelope>
And the associated response:
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<interop:echoString xmlns:interop="http://soapinterop.org/">
<inputString>hello world</inputString>
</interop:echoString>
</soap:Body>
</soap:Envelope>
How would you go about discovering the XML schema for the two bodies?
Or the WSDL? Look closely, there are no links to either provided in the
messages themselves. These is no requirement to do so in the SOAP
specification. In fact, there not only is no requirement, there is no
provision for doing so if you wanted to. I can tell you as a developer of
a SOAP implementation that at times it would be handy if there was such a
provision, but for now the reality is that there is not. A related question:
how would you go about implementing this service using ASP.NET? At this
point, most would agree that there are probably multiple correct answers.
Is it possible that there also are multiple correct answers for the XML schema
and WSDL questions above? This essay will attempt to convince you that the
answer emphatically is yes.
An implementation with a twist
For those familiar with the SOAPBuilders interop test suite
Round 2, I
intentionally provided a bit of misdirection. While the above was intended
to evoke the concept of the echoString service defined there, here is the actual
implementation of the web service I used:
<%@ WebService Language="c#" Class="Echo" %>
using System.Web.Services;
using System.Web.Services.Protocols;
[SoapRpcService]
[WebService(Namespace="http://soapinterop.org/")]
public class Echo {
[WebMethod]
public string echoString(string inputString, int repeat) {
string result = "";
for (int i=0; i<=repeat; i++)
result += inputString;
return result;
}
}
Note the additional repeat parameter. In this service, you not only get
your inputString echoed, you optionally get it repeated any number an additional
times that you care to specify. The way ASP.NET works is that omitted
parameters are supplied "zero-ish" values. Depending on the
target data type, that may be values like 0, 0.0, false, or null. Note the
way that this service was coded, if repeat is zero, you get a total of one copy
of the inputString returned. More on why this was done later.
RPC Definition
The above is crafted conforming to what WSDL describes as style="rpc"
use="encoded". Here are the relevant sections of the WSDL
describing the input and output messages, as produced by ASP.NET:
<message name="echoStringSoapIn">
<part name="inputString" type="s:string" />
<part name="repeat" type="s:int" />
</message>
<message name="echoStringSoapOut">
<part name="echoStringResult" type="s:string" />
</message>
Note that there is absolutely no indication that repeat is optional.
This is OK as the SOAP specification indicates that it is OK for messages to
have omitted accessor
elements. It, however, does not specify what the values you might
see on the other end for these arguments would be, so caveat emptor.
Literal Definition
An
alternative definition for this service would be style="document"
use="literal". There are subtle differences - in the request and
response, two instances of "interop:" and one instance of
":interop" would be deleted. In the C# code, the second
using statement and the [SoapRpcService] custom attribute would be
removed. The resulting WSDL looks fairly different, with the relevant
portions of the schema being:
<s:schema elementFormDefault="qualified" targetNamespace="http://soapinterop.org/">
<s:element name="echoString">
<s:complexType>
<s:sequence>
<s:element minOccurs="0" maxOccurs="1" name="inputString" type="s:string" />
<s:element minOccurs="1" maxOccurs="1" name="repeat" type="s:int" />
</s:sequence>
</s:complexType>
</s:element>
<s:element name="echoStringResponse">
<s:complexType>
<s:sequence>
<s:element minOccurs="0" maxOccurs="1" name="echoStringResult" type="s:string" />
</s:sequence>
</s:complexType>
</s:element>
<s:element name="string" nillable="true" type="s:string" />
</s:schema>
Note that again there is absolutely no indication that repeat is
optional. In fact, if you look closely, you will see that repeat is
specified as minOccurs="1". From this you might infer that
repeat is required. I can assure you that no fault is produced even in the
document literal case, and that effect is the same as if a zero value was
provided in the message. And Another Thing  So
far, all we have explored are sins of omission - cases where requests have less
information than expected. What happens when the client
provides more information than is required - in essence telling the service way more
than it needs to know? In this case, lets add a <color>blue</color>
inside the echoString element in the request. I'm not exactly sure
what purpose such an element would have in this case, nor does the
implementation of echoString above have any provision for such a
parameter. What would you expect to happen? A SOAP Fault perhaps? With the current version of ASP.NET, in both the doc/lit
and rpc/enc cases, the unnecessary parameter is simply ignored. One can
infer a behavior not unlike the one specified in the original HTML
Internet Draft that any undefined tags may be ignored by parsers. This
behavior has been a critical factor in the rapid evolution of that standard. Is this
wrong? Again, if you believe that this behavior is wrong, tell Keith
Ballinger, not me.
However, I am going to try to make the case that this is exactly
what should be done, as it provides for precisely the level of loose coupling
that made the web successful in the first place. The argument is the same one
that I made in the overview of A
Busy Developers Guide to WSDL 1.1. That is to say that in the context
of web services, one should view WSDL and XML schema as prescriptive (i.e., if
you format a message to these specifications, it will be accepted) as opposed to
restrictive (i.e, the only messages that will be accepted are those that conform
to these specifications). Deployment Matters  Imagine
a web service which is deployed on the scale of the internet. Not a
centralized server servicing clients which were designed to its whim and
fancy. No, imagine a system in which
there are hundreds of implementations of a given web service and uncountable
hundreds of thousands of clients. Basically, an interface which is implemented by
everybody and owned by nobody. Now imagine what would happen if you needed to
extend this interface. One approach would be to define a new interface
without affecting the original For the moment, imagine perfect
communications, perfect knowledge as to all of the clients and servers which
utilize this interface, and perfect consensus on what needs to be done and how
to do it.
Even so: imagine the immense logistics involved. Since servers generally can
support both the old and new interfaces simultaneously, there is generally not much problem
there and servers can be upgraded in any order. However, the
client situation is quite different. If any client is upgraded before every
server upgrade is complete, then the upgraded client may send requests to a
server who will not
understand this new request. This may be manageable if the client can anticipate exactly what fault
might be returned for any given server, as it could then add fall back and retry
logic. Messy, but workable given that assumption. An alternative would be for each client to consult configuration
information as to which server understands which request and issue the
appropriate request accordingly. The coordination costs for such an
approach are rather steep, and generally only practical in topologies where each
client is bound to a single server. Contrast this to the more straightforward
approach where the addition is made directly to the message with no other
change. If the implementations are tolerant both in the sense of the Tim
Berners-Lee quote above and as demonstrated by the current implementation of
ASP.NET, then there is no need for perfect knowledge, perfect communication, or
even perfect consensus. If two simple rules are applied, then clients and
servers can be upgraded in any order. These rules are to (1) provide
reasonable defaults for missing elements, and (2) to ignore any extra elements
that are received. Such rules can also handle simple deletes. Note: I am
not suggesting anything as crazy that it is OK to substitute a Purchase Order for a Medical
Record. Merely that it should not be a given that every extension to an
SOAP message, however minor, should require existing clients and servers be
invalidated. Concrete Scenarios  The
first and most obvious scenario is one where there is a change in
requirements. An example would be a new law that has been passed requiring
insurance claims to capture an additional piece of data effective January 1st of
the following year. Regulations such as these often have such an effective dates, allowing for a smooth
transition. This is possible if applications can be upgraded in any
order. A second scenario is local adaptation. Imagine a generalized
description of a car. The California marketplace
for cars is large enough that the regulatory requirements on emission controls
are quite different than in other parts of the country, not to mention the
world. While not all cars can be sold in California, cars made to California
standards can often be sold in other places. It generally would be less
than ideal if there were separate web services interfaces for every
locality. It also is impractical to expect every local extension to be
approved as a part of an international standard prior to implementation.
What would be preferable is if a generalized description of a car permitted the
addition of an arbitrary number of California specific elements. A
variation on this which is not geographically centered is integration with other
products or protocols, ones that may be proprietary or not even exist at the
time the original interfaces were implemented. A specific example of this
would be a public calendar with linkages to a private calendar which is stored
within a company's firewall using different software. This is but one of the
scenarios intended to be supported by now defunct Hailstorm initiative.
From the forward to the Microsoft® .NET My
Services Specification:
Think outside of the box. Read our schemas and understand that this
is a baseline for you to to build upon. Anywhere you see {any}, read
this as a location where you can extend the schema with your own freeform,
namespace-qualified XML. You can choose to publish your namespaces and
schema so that others can understand and build upon your schema additions, or
you can choose to hold these additions as proprietary information that only
your software can act upon.
Another example of an existing protocol that has been extended and repurposed
extensively is RSS. In
RSS 1.0, support for extensibility has been formalized in the form of
modules. As Rael Dornfest said:
What we effectively did was, we allowed people to extend RSS both on an ad
hoc basis, being able to add tags as they wanted ... while at the same time
promoting the concept of standard modules, where folks get together -- those
interested in aggregation would get together, those interested in taxonomies
would get together, those interested in weblogs would get together -- and come
up with modules that suit the purposes of that particular usage or community.
A number
of such modules have been created. But is is valid?
As demonstrated above, the ASP.NET generated client proxy classes from a
given WSDL will not only work with a given implementation of a service, but will
also continue to work - with no change required - given a wide range of
extensions that can be made to the server implementation. But, given the
default WSDL which is generated by ASP.NET at the present time, such usages will
not validate against the enhanced schemas which will be contained in such
augmented WSDLs.
This actually is easily correctable. Perhaps a future version of
ASP.NET will produce WSDLs which more accurately describe the range of messages
that will be accepted by a given implementation. Elements can be declared
optional by specifying minOccurs="0".
Placeholders for future growth can be declared using the any
and anyAttribute
elements. Extensions can be declared using Element
Substitution Groups.
Dare Obasanjo
has published W3C
XML Schema Design Patterns: Dealing With Change which deals with this
subject in depth. This includes analogies to common analysis and design
techniques practiced in object oriented programming. In the process, a few
pitfalls and strategies for dealing with them are identified.
It can be done.
Conclusions 
It it trite but true... change is inevitable.
Given the distributed nature of the internet, what is of paramount importance is
that systems be designed and implemented in manners that enable change to be accommodated
in both upwards and downwards compatible manners. This means not only
ensuring that old clients work with new servers, but also that new clients work
with old servers. This essay outlined one possible way to achieve this and
demonstrated that this approach is highly compatible with the existing ASP.NET
implementation.
|
|
© Copyright
2002
Sam Ruby
.
Last update:
9/1/2002; 6:53:18 PM
.
This theme is based on the SoundWaves
(blue) Manila theme. |
|