XML Schema Doesn’t Need Inheritance

Got a chance to talk to Dare today about the XML vs. Objects stuff we’ve been blogging on. We started by talking about the convergence of objects, databases and XML. He mentioned a chapter of Tim Neward’s Effective Enterprise Java book where Ted recommends designing the data first. Ted started Effective Enterprise Java 2 days before Patrick was born so I haven’t been reading it. The issue with designing the data first is that typically, a developer is predisposed towards one of the three poles of data design (XML, objects or relational DB) that will color that design unintentionally.

I realized I still lean towards OO when Dare pointed out the fact that XML (actually, I should say XSD) doesn’t really need derivation. Because of my OO background, it took me a while to digest that concept. But since XML is just data without behavior, it doesn’t need polymorphism the way that objects do. Consider the following schema:

<xs:complexType name="ctAddress">
  <xs:sequence>
    <xs:element name="Street" type="xs:string" maxOccurs="1" minOccurs="1" />
    <xs:element name="City" type="xs:string" maxOccurs="1" minOccurs="1" />
    <xs:element name="State" type="xs:string" maxOccurs="1" minOccurs="1" />
    <xs:element name="ZipCode" type="xs:string" maxOccurs="1" minOccurs="1" />
    <xs:any namespace="##targetNamespace" />
  </xs:sequence>
</xs:complexType>

This is version one of the address complex type schema for some arbitrary web service. Over time, we realize that we want to be more global in our addressing schema, so we want to add a country element. Since not all of the other services we interact with will be updating to the new schema, we need to make country an option element (i.e. minOccurs=”0″). While we could use schema inheritance to do this, we could also just duplicate the existing elements and add a country element:

<xs:complexType name="ctAddress">
  <xs:sequence>
    <xs:element name="Street" type="xs:string" maxOccurs="1" minOccurs="1" />
    <xs:element name="City" type="xs:string" maxOccurs="1" minOccurs="1" />
    <xs:element name="State" type="xs:string" maxOccurs="1" minOccurs="1" />
    <xs:element name="ZipCode" type="xs:string" maxOccurs="1" minOccurs="1" />
    <xs:element name="Country" type="xs:string" maxOccurs="1" minOccurs="0" />
    <xs:any namespace="##targetNamespace" />
  </xs:sequence>
</xs:complexType>

What’s interesting is that even though these two schema types are not related by inheritance, I can still validate XML addresses against either schemas. In fact, XML addresses with and without addresses validate against both schemas! That’s difficult to model in a typing system where objects have a bound to a single specific type. Other types of changes can be introduced so that break validation to one of the schema but not the other. For example, if we changed the minOccurs of ZipCode to zero, all messages that validate to the first schema would also validate to the second, but the reverse would not always be true. This is like a IsA relationship in OO, but in the wrong direction (a base message “is a” derived message, but a derived message is not always a base message).

The upshot of all this is that I think my argument against XML Serialization as a general concept is strengthened. While it does work in many scenarios, I can easily build XML messages and XSD schemas that don’t cleanly conform to an OO typing system. Since the flexibility in XML is critical (that’s what I’m using for loosely coupled public interop interfaces), I know I don’t want my schema design to be constrained to the limited set of scenarios that are supported by XML Serialization.