Home
 

User login

 
 

Navigation

 
 

Events

« August 2008
SunMonTueWedThuFriSat
12
3456789
10111213141516
17181920212223
24252627282930
31
 

Avoid XML Schema Wildcards For Web Service Interfaces

By James Pasley

Developers risk negative side effects when they attempt to make Web services interfaces extensible without understanding the context in which various mechanisms are applied. Given the overuse and misapplication of the HTML example,developers often litter their interfaces with XML Schema wildcards.This increases complexity and results in ambiguous interface definitions. A more appropriate versioning strategy for Web services development can help developers avoid these problems.

When designing Web Service interfaces, developers quite rightly want to define them so as to minimize the effort involved in rolling out new versions. This desire naturally leads to another desire: to make Web Service interfaces—and XML Schemas, in particular—extensible.
Much has been written on versioning and extensibility, and it would seem easiest to simply copy the approach of those who’ve previously succeeded in this venture. Standards bodies such as the World Wide Web Consortium (W3C) and the Organization for the Advancement of Structured Information Standards (Oasis) have defined numerous XML vocabularies described by XML Schemas that include extensibility features. These approaches rely heavily on the use of the XML Schema wildcard constructs—xsd:any and xsd:anyAttribute — which allow an XML Schema to remain unchanged as data formats evolve.
However, simply copying these approaches without understanding their underlying reasoning and application context won’t produce the desired results. In fact, it might result in many undesirable side effects, such as an increase in development-process complexity. Here, I explore the negative side effects of using xsd:any and xsd:anyAttribute in Web Service interfaces. I also discuss why standards bodies use them and in what context. Finally, I offer a more appropriate versioning strategy for typical Web Services, along with a solution that supports the strategy.

Extensibility—At a Price

When developers simply follow the example set by standards bodies when defining their own Web service interfaces, several problems can arise. To illustrate these problems, I offer the following example. Figure 1 shows an XML Schema in which the Charges element is extensible in two ways: you can add additional child elements or attributes not defined within the XML Schema. This is facilitated by the xsd:any and xsd:anyAttribute, respectively. Figure 2 shows a simple message that conforms to the XML Schema.
By using xsd:any, you can add new data to Figure 2’s message without modifying the XML Schema. To do this, you create a new XML Schema to define the new element (see Figure 3). This new schema uses a different targetNamespace to distinguish it from the original one. You can now add the new element into the message, as Figure 4 shows.

<xsd:schema 
  targetNamespace="http://example.org/extensible.schema"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"> 
  <xsd:element name="Charges"> 
    <xsd:complexType> 
      <xsd:sequence> 
        <xsd:element name="Goods" type="xsd:decimal"/> 
        <xsd:element name="Shipping" type="xsd:decimal"/> 
        <xsd:element name="Extension" minOccurs="0"> 
          <xsd:complexType> 
            <xsd:sequence> 
              <xsd:any namespace="##other" processContents="lax" maxOccurs="unbounded"/> 
            </xsd:sequence> 
          </xsd:complexType> 
        </xsd:element> 
      </xsd:sequence> 
      <xsd:anyAttribute namespace="##other"/> 
    </xsd:complexType> 
  </xsd:element> 
  <xsd:attribute name="mustUnderstand" type="xsd:boolean" /> 
</xsd:schema> 

Figure 1. An extensible data structure expressed in XML Schema. The xsd:any and xsd:anyAttribute wildcards make it possible to add additional data into messages conforming to this XML Schema.

<tns:Charges xmlns:tns="http://example.org/extensible.schema"> 
  <Goods>130.39</Goods> 
  <Shipping>12.00</Shipping> 
</tns:Charges> 

Figure 2. An XML message defined by Figure 1’s XML Schema. This message includes only elements explicitly defined within the original XML Schema.

<xsd:schema
  targetNamespace="http://example.org/new.data" 
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"> 
  <xsd:element name="Discount" type="xsd:decimal"/> 
</xsd:schema> 

Figure 3. An XML Schema describing new data.This schema defines elements not considered when the original XML Schema was designed,creating Figure 4’s new message.

<tns:Charges
  xmlns:tns="http://example.org/extensibe.schema" 
  xmlns:xsd1="http://example.org/new.data"> 
  <Goods>130.39</Goods> 
  <Shipping>12.00</Shipping> 
  <Extension> 
    <xsd1:Discount tns:mustUnderstand="true">5.00</xsd1:Discount> 
  </Extension> 
</tns:Charges> 

Figure 4. The XML message with the added data. Although the original XML Schema remains unmodified, developers can include new data that will not be rejected when the message is validated against the original XML Schema.

The Costs of XML Schema Wildcards

So, what have we achieved? Yes, we’ve added new data into the message without modifying the original XML Schema. However, using the XML Schema wildcard xsd:any means that recipients can ignore new data; in this case—and many cases—the new data is actually important. This forces us to adopt another common convention: a mustUnderstand flag. This lets the message sender indicate that the recipient is not allowed to ignore the new data.

Using such an extensibility mechanism is not without costs, particularly when applying the mechanism to many data structures across several applications. Those costs can be significant and include increased complexity, handwritten validation, and vague interface contracts.

Increased Complexity. Our simple example element with two values now contains several other definitions, increasing the data structures’ complexity. In addition to declaring the new value within a separate XML Schema, we had to construct the new messages from elements that both schemas define. Further, to identify which elements are associated with which XML Schema, we had to use XML namespaces.

How far should we take this approach? When we need a subsequent change, will we have to introduce a third XML Schema? If so, should the new schema’s elements be extensible as well? Obviously, the complexity can keep increasing.

Handwritten Validation. Because the mustUnderstand semantics aren’t part of the XML Schema specification, validating XML parsers won’t enforce them. As a result, we have to implement the semantics in each application that uses the format. Further, we must test each application to ensure it behaves correctly when it receives new data that it can ignore, as well as new data it must reject because it doesn’t understand it.

Vague Interface Contract. The XML Schema is no longer a precise definition of the application’s interface. The application supports the schema’s defined data set, plus some set of extensions. However, there’s no standard way to express which extensions it supports. In our example, there’s nothing to indicate that the new Discount element should be used in conjunction with the existing Charges element. That the (original) XML Schema validates the message is no longer a good indication that the message will be processed. In a sense, an XML Schema is a contract between the message sender and recipient. The extension mechanisms make this contract vague. As any lawyer will tell you, vague contracts lead to problems.

Ease of Use? Maybe for Gurus

Once introduced into an XML Schema, the way developers should use xsd:any is the subject of much debate. Even in a simple example like ours, there are several variations on the use of the namespace and processContents attributes. Developers also debate whether the xsd:any itself should be placed inside an extension element (as in our case), or simply included directly after the other elements in a sequence.

Wildcards are one of XML Schema’s advanced features, and their use raises the bar on who can author or edit them. For example, any authoring approach that requires a precise understanding of XML Schema’s determinism constraint eliminates a significant number of people. (A good description of the determinism constraint can be found in Extending and Versioning XML Languages with XML Schema,1 which advocates XML Schema wildcard use; a description of xsd:any appears toward the end of the XML Schema Primer’s third "Advanced Concepts" section.2 A perusal of either document should be warning enough as to wildcard ease of use.)

The (Not So) "Compelling" HTML Comparison

HTML is the most often quoted example of a successful extensible language. In an attempt to emulate HTML’s success in preserving compatibility, developers add wildcards to XML Schema. Before seeking to emulate HTML, however, it’s important to first understand how the way in which HTML is used differs from how Web services are used.

HTML Compatibility Requirements. There are two kinds of compatibility: backward and forward. In backward compatibility, a new application correctly processes a message that uses an old format. In forward compatibility, an old application correctly processes a message that uses a new format. HTML has been highly successful in supporting forward compatibility. In XML Schema, wildcards support forward compatibility, but aren’t needed for backward compatibility.

Once we introduce the concept of forward compatibility, we must address the issue of processing semantics. What happens when an old application receives a new message containing data that it doesn’t understand? HTML has a well-defined processing semantic, which we can summarize as follows: Any unrecognized tags are to be ignored. However, the application should attempt to process the tags’ contents as normal. The success of this “must ignore container" rule1 played a significant role in the Web’s success, ensuring that developers could introduce new features without breaking compatibility with existing Web browsers.

So, can we really use this as a precedent for Web services? The HTML scenario has two very specific factors. First, the HTML tags are typically associated with the text’s presentation layer formatting. As a result, the tags have limited impact on the target content’s semantics. For example, although <b> indicates that the subsequent text is to be rendered using a bold font, an unrecognized bold tag has limited effect. Second, HTML’s content is targeted at humans, who are relatively adaptable. When a Web browser doesn’t understand tags to format a table’s text, users might struggle to make sense of the text, but they’ll probably succeed. (The experience might also motivate them to upgrade their Web browser.)

We can’t expect such sophisticated behavior of applications that aren’t updated to handle a new data structure. When they receive new data, application behavior is typically limited to one of two possibilities: they’ll either ignore the new data or reject the entire message. Developers typically control this behavior using a mustUnderstand flag, which lets message senders specify their preferences.

Who Controls the Schema? Finally, we must consider how modifications to an XML Schema are managed. The upgrade cycle can be either engaged or disengged.3 In an engaged upgrade, users and developers are in contact and users thereby receive regular upgrades. In a disengaged scenario, old XML Schemas might remain in use for a long time. Like most XML standards, HTML operates in is engaged scenarios in which the XML Schema owners (the standards bodies themselves) are often not consulted when people extend their data formats. Consequently, they must rely heavily on XML Schema extensibility mechanisms to support forward compatibility. Although these mechanisms are costly, the price is repaid over the standard’s lifetime. This is obviously a very different scenario than when a single organization uses XML Schema to describe its services and retains complete control over schema modification.

An Alternative Versioning Strategy

To select a versioning strategy, developers must first clearly state the context for XML Schemas use. Next, they must define the Web services’ required behavior given different message-format versions. Finally, they must describe the mechanism by which this behavior will be achieved. These steps constitute authoring conventions for the XML Schema and statements as to strategy aspects the validating XML parsers must enforce and which the application must implement.

Defining the Strategy Context

For our example strategy, a single person will own the XML Schemas and actively manage them to incorporate new requirements. The owner will deliver new XML Schema versions to users and handle user requests for updates in a timely manner.

This single, active XML Schema owner seeking to satisfy a single organization’s needs is the key differentiator here. In contrast, HTML evolved as multiple competing implementations that were extended in various proprietary ways before converging on standard mechanisms.4

Naturally, there’s a range of possibilities between these two positions. When choosing your own versioning strategy, you should carefully consider your place on the spectrum. Fortunately, the W3C Technical Architecture Group’s Versioning XML Languages5 offers useful background reading that can help you correctly assess your position. It also highlights why these issues matter and presents many best practices. In XML Schema Versioning Use Cases,6 the authors attempt to present all possible versioning strategies for XML documents. Both documents are ambitious; at times this makes for difficult reading. Because a standards body produced them, the documents cover all use cases to ensure that future specifications will solve the appropriate problems. As part of this work, the Web-Services Description Language (WSDL) working group might add processing flags to automatically support must ignore rules within validating XML parsers. This will reduce the cost and complexity of implementing such rules once future standards are available. Until then, however, users must deal with the complexities of wildcard use to resolve XML Schemas’ forward compatibility problems.

Required Behavior

The proposed strategy’s goal is to define a Web Service’s desired behavior when it receives a message created using a different XML Schema version. We’ll therefore define the following behavior:

  • When an application using the new XML Schema receives an old document, it should process it successfully.
  • When an application using the old XML Schema receives a new document, it should reject the document if it contains any unrecognized elements.

In the first case, we achieve backward compatibility by modifying the XML Schemas only in such a way as to ensure that any message validated against the original XML Schema is still valid according to the new version. In the second case, we decide not to attempt forward compatibility; applications that must process new data will be updated accordingly. Until the updates occur, we avoid the use of wildcards and thus ensure that new data will not be ignored. The validating XML parser will automatically reject new messages inadvertently sent to old applications.

Additional Considerations

The proposed versioning strategy also extends to Web services that use the XML Schema. A new version of a Web service (using an updated XML Schema) will coexist with the older version for some time, rather than replacing it. An application can discover a service’s XML Schema version by querying the Web service itself using the ?wsdl convention or WS-MetadataExchange7 to retrieve the WSDL and XML Schemas. This service coexistence lets clients migrate to new service versions, either on their own or when their service provider migrates them using message routing. In any case, because XML Schemas are backward compatible, the migration should be straightforward.

Including a version number in request messages is helpful for messaging routing. Backward compatible services can also use the version number to ensure that they don’t send data back to an incompatible client.

Implementing the Strategy

My proposed versioning strategy requires that XML Schemas be backward compatible, which is easy to accomplish given XML’s inherent extensibility. When new data is introduced, we update the XML Schema to describe it. We then update new Web services versions to use the new XML Schema. Just as plain old Java objects (POJOs) have gained new respectability in the light of Enterprise Java Beans’ complexity, simple XML Schemas deserve similar treatment in the light of the complexities introduced by wildcards. Figure 5 shows the original XML Schema stripped of all its wildcards.

When it’s time to add a new value to the message format, we update the original XML Schema itself. Figure 6 shows the updated XML Schema, in which the new element appears just as any other. To ensure backward compatibility, I marked the element as optional (minOccurs="0"). Figure 7 shows an example of a new message, which is quite simple in comparison to that in Figure 4. As the figures show, my approach keeps XML Schemas simple and ensure that modifications are backward compatible.

<xsd:schema
  targetNamespace="http://example.org/simple.schema" 
  version="1.00" 
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"> 
  <xsd:element name="Charges">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="Goods" type="xsd:decimal"/>
        <xsd:element name="Shipping" type="xsd:decimal"/>
      </xsd:sequence>
    </xsd:element>
</xsd:schema>

Figure 5. A simple XML Schema sans wildcards. Implementing the Strategy

<xsd:schema
  targetNamespace="http://example.org/simple.schema"
  version="1.01"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <xsd:element name="Charges">
    <xsd:complexType>
      <xsd:sequence> 
        <xsd:element name="Goods" type="xsd:decimal"/> 
        <xsd:element name="Shipping" type="xsd:decimal"/> 
        <xsd:element name="Discount" type="xsd:decimal" minOccurs="0" />
      </xsd:sequence> 
    </xsd:complexType> 
  </xsd:element> 
</xsd:schema>

Figure 6.The XML Schema, modified with a new element.

<tns:Charges xmlns:tns="http://example.org/simple.schema"> 
  <Goods>130.39</Goods> 
  <Shipping>12.00</Shipping> 
  <Discount>5.00</Discount> 
</tns:Charges> 

Figure 7. A sample XML message as described by the modified XML Schema.

Simple Rules for Modifying XML Schema

The versioning approach I advocate requires that XML Schemas be modified in a backward compatible way. So, let’s look at several ways in which we can modify XML Schemas while preserving backward compatibility.

My proposed versioning strategy requires that XML Schemas be backward compatible, which is

  • Add optional data. As I described earlier, we can add new data as optional elements or attributes. Of course, as Figure 8 shows, the descendants of these new elements need not be optional.
  • Make existing data optional. In Figure 9, I’ve added minOccurs="0" to the Shipping element. Because the minOccurs default value is 1, we could more generally state this rule as, "reduce the value of minOccurs."
  • Increase the value of maxOccurs. To accomplish this, we can update existing elements to let them occur more than once (see Figure 10). If the element can already occur multiple times, we can increase the maxOccurs value.
  • Provide choices. Let’s say that we need to replace an existing element with a new one (perhaps we’ve changed its semantics and want to rename the element to reflect this). When developers discuss changes to semantics, they often recommend changing the XML Schema’s targetNamespace to deliberately break the message format’s backward compatibility. That approach can be overkill, however. As Figure 11 shows, we can express the semantic change by introducing a new element and placing it and the existing element in a choice content model.
  • Use the version attribute. To make an XML Schema’s version easy to determine, we must include version information in the schema itself and update it on every change, no matter how minor. XML Schema lets user specify version attributes on its root element.

Several other modifications will also let us accomplish backward-compatible updates to an XML Schema, such as substituting groups or deriving new types from existing complex types. These mechanisms can be useful, but you should always use the simplest possible mechanism to extend the XML Schema.

<xsd:schema
  targetNamespace="http://example.org/simple.schema" 
  version="1.01" 
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"> 
  <xsd:element name="Charges"> 
    <xsd:complexType> 
      <xsd:sequence> 
        <xsd:element name="Goods" type="xsd:decimal"/> 
        <xsd:element name="Shipping" type="xsd:decimal"/> 
        <xsd:element name="ComplexDiscount" minOccurs="0" > 
          <xsd:complexType> 
            <xsd:sequence> 
              <xsd:element name="Code" type="xsd:string"/> 
              <xsd:element name="Amount" type="xsd:decimal"/> 
            </xsd:sequence> 
          </xsd:complexType> 
        </xsd:element> 
      </xsd:sequence> 
    </xsd:complexType> 
  </xsd:element> 
</xsd:schema> 

Figure 8. The XML Schema with an optional complex element.

<xsd:schema
  targetNamespace="http://example.org/simple.schema" 
  version="1.01" 
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"> 
  <xsd:element name="Charges"> 
    <xsd:complexType> 
      <xsd:sequence> 
        <xsd:element name="Goods" type="xsd:decimal"/> 
        <xsd:element name="Shipping" type="xsd:decimal" minOccurs="0"/> 
      </xsd:sequence> 
    </xsd:complexType> 
  </xsd:element> 
</xsd:schema> 

Figure 9. The XML Schema modified to make existing data optional.

<xsd:schema 
  targetNamespace="http://example.org/simple.schema" 
  version="1.01"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema"> 
  <xsd:element name="Charges">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="Goods" type="xsd:decimal"/>
        <xsd:element name="Shipping" type="xsd:decimal" maxOccurs="unbounded"/>
      </xsd:sequence>
    </xsd:complexType>
</xsd:element>

Figure 10. The XML Schema modified to let the Shipping element occur multiple times.

<xsd:schema
  targetNamespace="http://example.org/simple.schema"
  version="1.01"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <xsd:element name="Charges">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="Goods" type="xsd:decimal"/>
        <xsd:choice>
          <xsd:element name="Shipping" type="xsd:decimal"/>
          <xsd:element name="InternationalShipping" type="xsd:decimal"/>
        </xsd:choice>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Figure 11. The XML Schema modified to introduce a choice content model.

A spectrum of measures exists for effectively managing message format versioning and extension. Using XML Schema wildcards represents one end of that spectrum and is useful when the schema’s authors can’t maintain control over the ways in which it’s extended. The versioning option I describe here uses mechanisms that are more appropriate for individual companies seeking to actively manage and control their data formats. When you avoid wildcards, XML Schemas offer a well defined description of the Web service’s interface. Reviewing the schema for a given Web service version provides a complete description of the messages it can process. The schema also protects the application from receiving data it doesn’t understand, automatically enforcing the mustunderstand semantics without requiring that subsequent logic be built into the application. Managing change in a distributed environment such as Web services is a significant undertaking. Developers must plan for and actively manage the introduction of new versions, and keep them to a minimum. In some cases, the tool support available has been influenced by existing versioning strategies and assumes that once XML Schemas are created, they won’t be changed. This might cause minor difficulties when following the strategy I'm advocating.In any case, selecting a new versioning strategy requires significant planning and discipline to ensure that developers will actually implement the new practices.

Acknowledgments

I thank Pete Hendry, John Maughan, John O’Shea, Adrian Pasciuta, and Fergal Somers for their comments on this article.

References

  1. D. Orchard, "Extending and Versioning XML Languages with XML Schema," Proc. XML 2004, Int’l Digital Enterprise Alliance, 2004; http://www.idealliance.org/proceedings/xml04/abstracts/paper248.html.
  2. D.C. Fallside and P. Walmsley, XML Schema Part 0: Primer Second Edition, World Wide Web Consortium (W3C) recommendation, 28 Oct. 2004; http://www.w3.org/TR/xmlschema-0/.
  3. H.S. Thompson, "Versioning Made Easy with W3C XML Schema and Pipelines," discussion notes, World Wide Web Consortium (W3C) XML Schema working group, Apr. 2004; http://www.markup.co.uk/XMLEu2004/.
  4. T. Berners-Lee, "Evolvability," keynote speech, Mar. 1998; http://www.w3.org/DesignIssues/Evolution.html.
  5. N. Walsh and D. Orchard, Versioning XML Languages Proposed TAG Finding, World Wide Web Consortium (W3C) editorial draft, 16 Nov. 2003; http://www.w3.org/2001/tag/doc/versioning.html.
  6. H. Sue, “XML Schema Versioning Use Cases," World Wide Web Consortium (W3C), draft discussion document, 31 Jan. 2006; http://www.w3.org/XML/2005/xsd-versioning-use-cases/
  7. K. Ballinger et al., Web Services Metadata Exchange, ad/prod index Microsoft Research, initial public draft release, Sept. 2004; http://download.boulder.ibm.com/ibmdl/pub/software/dw/specs/ws-mex/metadataexchange.pdf.

This article was originally published by the IEEE Computer Society in the May/June 2006 edition of IEEE Internet Computing. © 2006 IEEE.


Categories: