Dealing with Diversity
By Sam Ruby, February 22, 2002.
The document explores the role metadata about object types plays in the goal of
achieving meaningful, robust, and reliable web services interoperability between
diverse platforms.
What is '1' + '2' ?

Let's ask a few languages:
| C |
#include <stdio.h>
main() {
putc('1' + '2', stdout);
} |
Unfortunately, you get three different answers. 'c', 3, and
"12".
What's going on here?
Strong Typing

Both C and Perl both look at the operation '+', realize that
it is an operation that applies to numeric operands, coerce the inputs to comply
and then execute the operation. Whereas C coerces based on the underlying bit
pattern of the physical representation, Perl coerces based on the assumption
that the arguments are string representations of numeric quantities.
Python on the other hand looks at the operands, sees that they are strings,
and then selects the appropriate operation based on the input data type.
I selected Perl and C as they are perhaps the most canonical example of a
scripting language and a language which is most definitely not a scripting
language. Both, however, are weakly typed.
Python, on the other hand, is strongly typed. I'm using that term in
the sense that Mark Pilgrim does in Dive
Into Python, namely that once a variable has a data type, it actually
matters.
Many other languages follow this approach: Java, JavaScript, C#, Ruby, etc.
This distinction cuts across the artificial dividing line between scripting and
non-scripting languages. Perhaps one could explore whether compiled vs. interpreted
is the true distinction; or if static typing is the true distinction between the two, but
in either case, I can provide counter examples and at the moment I really don't want to digress too far.
What it does mean is that meta data about arguments is often very helpful.
In statically typed languages, this data is most useful at design time. In
dynamically typed languages, this information is useful at runtime. Either
way, if you want to reliably get the results you want, you need to
call the correct operation with the correct data types and provide information
about those data types. That doesn't mean that some languages can't accept
a wider range of operands. Or even that the recipient language care about
what data type the sender thinks it is sending.
To take a concrete example, COM's IDispatch interface provided this information
at runtime. The result was rather one sided. Clients were expected
to send the information about the types of the data along with the data itself,
servers were expected to compensate.
Other Common Examples

Strings and numerics tend to be primitives in most modern languages. However, it is worth noting that
WebService usage will likely deviate from
intra-language usage in terms of the common profile of data types for procedure
arguments.
A prime example is XML itself. I have no concrete data to back
this up yet, but my intuition is that passing XML as arguments is something that
will occur in a greater frequency with WebServices than you have seen to date
with platform APIs. Passing XML as an argument can be achieved in
multiple ways. One is to recognize that XML is inherently nestable, which
would lead one to the logical conclusion that directly embedding the XML is the
right choice. Other alternatives are to encode the stream as base64, hex,
or even as a string (using entity notation such as <). These
approaches not only bloat the datastream, they tend to obscure the distinction
between the metadata (e.g. element names) from the content.
More subtle examples can be found here.
In general, the more knowledge about the intent of the operation the runtime
has, the more accurately and automatically can the desired result be
obtained.
Passing the Time in Perl

Times are often important in network based protocols, yet tend
not to be primitive in most programming languages. In Perl, for example,
times are commonly dealt with as an integral number of seconds since some
reference point (typically January 1, 1970). The runtime knows of it as an
integer. Perl also has another common representation for times,
namely as an array of 7 integers with 0 based months, 1 based days, and 1900
based years. Passing either representation, as is, via a web service could
only be decipherable if the receiving side knew that it was using the Perl
conventions, what epoc is used on that machine, and that the intent was to this
data is to be interpreted as a date. This is a bit much to expect, and goes
against the goals of platform and language neutrality that many find appealing
with SOAP.
Alternatively, the developer could add a the necessary conversions on both
sides. In fact, Perl excels at pulling apart strings and putting them back
together. A typical SOAP encoded date would look like 2002-02-22T16:32:07Z.
One could clearly dissect this with a regular expression such as:
/(d*)-(d*)-(d*)T(d*):(d*):(d*.?d*)([+-]d*)?Z?/.
Or one could realize that much of this work has been done before, and you could
use HTTP::Date in combination with gmttime to do the work for you. Going
the other way, producing a SOAP encoded date can be done with strftime, thus:
strftime "%Y-%m-%dT%H:%M:%SZ", $value;
All this does is show that the work CAN be done. But done this way, the
process of communicating a simple timestamp in a web service is both tedious and
error prone. Furthermore, it requires the developer to be an expert on the
underlying transport protocol. By contrast, SOAP implementations which are
aware of the intent on both sides of the wire can do all of this conversion
automatically and seamlessly.
Social Engineering

Here's a excerpt from chapter 6 of "Programming Perl":
Languages have different personalities. You can classify computer
languages by how introverted or extroverted they are; for instance Icon and
Lisp are stay-at-home languages, while Tcl and various shells are party
animals. Self-sufficient languages prefer to compete with other
languages, while social languages prefer to cooperate with other
languages. As usual, Perl tries to do both.
This is a distinction that I do find useful. There are languages designed to
produce efficient, reliable, and robust components. These languages tend
to be better for the production of the serving side of web services than the
consumption or client side. Over time emphasis has moved from the first criteria
(efficiency) towards the latter criteria (robustness), and accordingly one does
see a movement of programmers from languages like C to languages like Java.
On the other side, there are languages which are more adaptable. They
can be used to produce reusable components, but they can also be used to script or
glue together components of various origins. As a general rule, if you
picture a tree representing components arranged so that they solve a particular
problem, you will typically find social languages at the 'top' and less social
languages at the 'leaves'.
Given this reality, models like COM's IDispatch have things backwards. It is
the adaptable languages that are expected to dictate, and the rigid languages
which are expected to comply.
Conclusion

Until recently, most programming activities were mono-cultures focused around
a language centric base. A much more successful model is emerging. One in which the service
provider has an opportunity to suggest what data types it would prefer.
This gives the intended consumer an opportunity to adapt if it so chooses.
Ultimately the service provider will decide whether or not it can process the
request as sent.
|
|
© Copyright
2002
Sam Ruby
.
Last update:
9/1/2002; 6:53:32 PM
.
This theme is based on the SoundWaves
(blue) Manila theme. |
|
|