Capt. Horatio T.P. Webb
The DTD
Parks -- FALL 2000
Last Updated 11AM 8/29/2000

An XML file contains nested tags (the markup) and the content (the data between the tags). However, an xml file may also contain an optional DTD (Data Type Definition) to specify specific requirements for the tags and the XML structure. When using a DTD, the definitions appear before the content, like:

<?xml version="1.0"?>
<!DOCTYPE name-of-document-type [
 .
 .
 . Element definitions fit inside these square brackets for this type of document
 .
 . ]>
 
Following the DTD are the XML tags and data. The first and last tags are:  
<name-of-document-type>.
.
. the detailed XML tags and data are contained within this outside tag pair
.
</name-of-document-type>

These two "outside" tags thus contain the whole collection of XML tags and data just like the <BODY></BODY> tag pairs.

Tag names ARE CASE SENSITIVE. <FRED> and <fred> are different tags.

The nested element definitions then appear in order between the square brackets of the DTD. For each element of the name-of-document-type, its definition in the DTD will appear as:

<!ELEMENT name-of-element ( list-of-elements or data definition) >

The list-of-elements names the elements (i.e., tags) and specifies the element requirements by following the ELEMENT name with either:

  1. , (a comma means strict order)
  2. ? (element is optional)
  3. + (one or more elements)
  4. * (zero or more elements)
  5. | (select one of the elements)
  6. ( ) (groups elements together)

For example, a purchase-order might contain:

  1. buyer-name
  2. address
  3. city
  4. state
  5. zip
  6. then mutiple order-line, each with:

    1. product-code
    2. quantity
    3. price

The DTD would then begin like this:

<?xml version="1.0"?>
<!DOCTYPE purchase-order [
<!ELEMENT purchase-order (buyer-name, address+, city, state, zip, order-line+)
.
.
. ]

Thus the purchase-order document has as its root a purchase-order. The purchase-order has: a buyer-name pair, one or more address tag pairs, one each city, state and zip tags, and one or more order-line tags. Each order-line has a set of product-code, quantity and price tags. <!ELEMENT name-of-element (#PCDATA)>

(#PCDATA) means that the content of the ELEMENT (i.e., the value between the tag pairs) is parsed character data . PCDATA cannot contain the characters "<", ">" or "&". To include these characters as data use "&lt;" for <, "&gt;" for >, and "&amp" for &. You can also specify data as CDATA which is unparsed character data where the characters <, >, and & are allowed.

The full DTD would be:

 
<?xml version="1.0"?>
<!DOCTYPE purchase-order [
<!ELEMENT purchase-order (buyer-name, address+, city, state, zip, order-line+)
<!ELEMENT buyer-name (#PCDATA) >
<!ELEMENT address (#PCDATA) >
<!ELEMENT city (#PCDATA) >
<!ELEMENT state (#PCDATA) >
<!ELEMENT zip (#PCDATA) >
<!ELEMENT order-line ( product, quantity, price) >
<!ELEMENT product (#PCDATA) >
<!ELEMENT quantity (#PCDATA) >
<!ELEMENT price (#PCDATA) >
]
 

Then adding some sample content data, the full xml file would be:

 
<?xml version="1.0"?>
<!DOCTYPE purchase-order [
<!ELEMENT purchase-order (buyer-name, address+, city, state, zip, order-line+) >
<!ELEMENT buyer-name (#PCDATA) >
<!ELEMENT address (#PCDATA) >
<!ELEMENT city (#PCDATA) >
<!ELEMENT state (#PCDATA) >
<!ELEMENT zip (#PCDATA) >
<!ELEMENT order-line ( product, quantity, price) >
<!ELEMENT product (#PCDATA) >
<!ELEMENT quantity (#PCDATA) >
<!ELEMENT price (#PCDATA) >
]
> <purchase-order>
 <buyer-name>Michael S. Parks</buyer-name>
 <address>4099 Bayview Street</address>
 <address>Apartment 5</address>
 <city>Houston.</city>
 <state>TX</state>
 <zip>77001</zip>
 <order-line>
   <product>Wool Sweater </product >
   <quantity>2</quantity>
   <price>49.95 </price >
 </order-line>
 <order-line>
   <product>Gloves</product >
   <quantity>1</quantity>
   <price>19.95 </price >
 </order-line>
</purchase-order>
 

If we wished to have multiple purchase orders, we could simply modify the DTD to be:

 
<!DOCTYPE stack-of-purchase-orders [
<!ELEMENT stack-of-purchase-orders (purchase-order+) >
<!ELEMENT purchase-order (buyer-name, address+, city, state, zip, order-line+) >
<!ELEMENT buyer-name (#PCDATA) >
<!ELEMENT address (#PCDATA) >
<!ELEMENT city (#PCDATA) >
<!ELEMENT state (#PCDATA) >
<!ELEMENT zip (#PCDATA) >
<!ELEMENT order-line ( product, quantity, price) >
<!ELEMENT product (#PCDATA) >
<!ELEMENT quantity (#PCDATA) >
<!ELEMENT price (#PCDATA) >
]
 

Now a stack-of-purchase-orders is just one or more (+) purchase-order tag pairs.

Any ELEMENT may also have Attributes. These are parameters that describe the ELEMENT and are optional. Many HTML tags use attributes. The <TABLE> tag, for example can have attributes like: BGCOLOR, LINK, VLINK, ALINK, etc.. These tags are defined inside the beginning tag. The general format for the attribute tag in the DTD is:

<!ATTLISTname-of-elementname-of-attributeCDATA
or
( list-of-attribute-values
separated by |'s)
#REQUIRED
#IMPLIED
#FIXED
"default value">

#REQUIRED means the attribute must always be present
#IMPLIED means that the attribute has no default value and is NOT required
#FIXED means the default value cannot be replaced

For example, if the product element always needs a buyer-size attribute of either "S", "M", "L", or "XL" the product ELEMENT in the DTD would be:

<!ELEMENT product (#PCDATA)>
<!ATTLIST product buyer-size ( S | M | L | XL ) #REQUIRED >

A typical content tag for product might be:

<product buyer-size="M">gloves</product>

You can use the w3c's XML validator to check your XML syntax at:

http://www.ltg.ed.ac.uk/~richard/xml-check.html

Be sure to check "validate" on this page. Several other XML validators are also shown.