Capt. Horatio T.P. Webb
Markup Languages
(SGML, HTML, XML, CSS, XSL, XHTML, DOM (DHTML), SHTML)

Last Updated 10:30AM 11/9/99

To simplify, the following references rely primarily on W3C (World Wide Web Consortium) at www.w3.org. This is the authority on the entirety of markup languages currently evolving from SGML. The evolution goes like this:

  1. SGML (Standard Generalized Markup Language)

    SGML is the "mother" of markup languages and was defined in the original international standard (ISO 8879) in 1986. SGML defines how you embed descriptive markup in a document so that the content is treated as data rather than as characters AND provides a way to describe the structure of the document.There are three basic parts:

    1. Structure is obtained through a DTD, or Document Type Definition. The DTD describes the structure of a document by defining the elements, structure and rules for marking up a document. The DTD can be imbedded in the document or contained in an external file.
    2. Content is the actual information (text, pics, etc.) that is located within the document. The content is "tagged" (i.e., surrounded by tags that define the markup action to be performed).
    3. Style is the specification that actually does the formatting. There are two: the older OS (Output Specification) and the newer DSSSL.

      Arbortext says: "The OS is in the form of a particular DTD that allows the user to create a Formatting Output Specification Instance, or FOSI , for both printed and electronic output. A FOSI is essentially a powerful style sheet that specifies the formatting for each tag in a DTD. With the FOSI, the document, and the DTD, you have a complete interchange package for printed documents.

      In 1996, the International Standards Organization (ISO) approved the final draft of the Document Style Semantics and Specification Language (DSSSL) for SGML-based documents. The complete DSSSL standard covers a broad scope, so subsets are being developed to handle varying levels of functionality. A subset whose functionality is approximately equivalent to FOSIs is expected, and work on tools to convert FOSIs to and from DSSSL is under way."

    See the short tutorial at: Arbortext

    Now see the SGML/XML Bibliography

  2. HTML (Hypertext Markup Language)

    What we know and love is derived from SGML and first proposed in 1992.

    See W3C's HTML page

    and their HTML 4 Reference

  3. XML (eXtensible Markup Language)

    W3C says:
    Abstract

    The Extensible Markup Language (XML) is a subset of SGML ... Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.

    Introduction

    Extensible Markup Language, abbreviated XML, describes a class of data objects called XML documents and partially describes the behavior of computer programs which process them. XML is an application profile or restricted form of SGML, the Standard Generalized Markup Language [ISO 8879]. By construction, XML documents are conforming SGML documents.

    XML documents are made up of storage units called entities, which contain either parsed or unparsed data. Parsed data is made up of characters, some of which form character data, and some of which form markup. Markup encodes a description of the document's storage layout and logical structure. XML provides a mechanism to impose constraints on the storage layout and logical structure.

    A software module called an XML processor is used to read XML documents and provide access to their content and structure. It is assumed that an XML processor is doing its work on behalf of another module, called the application. This specification describes the required behavior of an XML processor in terms of how it must read XML data and the information it must provide to the application.

    Origin and Goals

    XML was developed by an XML Working Group (originally known as the SGML Editorial Review Board) formed under the auspices of the World Wide Web Consortium (W3C) in 1996. It was chaired by Jon Bosak of Sun Microsystems with the active participation of an XML Special Interest Group (previously known as the SGML Working Group) also organized by the W3C. The membership of the XML Working Group is given in an appendix. Dan Connolly served as the WG's contact with the W3C.

    The design goals for XML are:

    • XML shall be straightforwardly usable over the Internet.
    • XML shall support a wide variety of applications.
    • XML shall be compatible with SGML.
    • It shall be easy to write programs which process XML documents.
    • The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
    • XML documents should be human-legible and reasonably clear.
    • The XML design should be prepared quickly.
    • The design of XML shall be formal and concise.
    • XML documents shall be easy to create.
    • Terseness in XML markup is of minimal importance.

    This specification, together with associated standards (Unicode and ISO/IEC 10646 for characters, Internet RFC 1766 for language identification tags, ISO 639 for language name codes, and ISO 3166 for country name codes), provides all the information necessary to understand XML Version 1.0 and construct computer programs to process it.

    See XML at W3C (document date: 2/10/98).

  4. CSS (Cascading Style Sheets)

    1. CSS Level I Reference (1996, revised 1/11/99)

      Allows specification of significantly greater control over presentation with the creation of the >lt;STYLE> object. This allows control over object's font, color, text and the new BOX object. The style can be controlled both at the document and element level. A heirarchy defines how the objects inherit their properties. Most important is the ability to access and modify these object parameters in the script languages like VBScript and Javascript.

    2. CSS Level 2 Reference (May 1998)

      Adds things like:

      • Relative and absolute positioning, including fixed positioning.
      • The ability to control content overflow, clipping, and visibility in the visual formatting model.
      • The ability to specify minimum and maximum widths and heights in the visual formatting model

    3. CSS Level 3 (in progress)

      Will (may) add:

      • User interface enhancements
      • Scalar graphics

  5. XSL (eXtensible Style Language)

    XSL Page at W3C

    Here is what WC3 says on XSL vs CSS 11/10/99:


    "...The fact that W3C has started developing XSL in addition to CSS has caused some confusion. Why develop a second style sheet language when implementors haven't even finished the first one? The answer can be found in the table below:

    CSS XSL
    Can be used with HTML? yes no
    Can be used with XML? yes yes
    Transformation language? no yes
    Syntax CSS XML

    The unique features are that CSS can be used to style HTML documents. XSL, on the other hand, is able to transform documents. For example, XSL can be used to transform XML data into HTML/CSS documents on the Web server. This way, the two languages complement each other and can be used together.

    Both languages can be used to style XML documents.

    CSS and XSL will use the same underlying formatting model and designers will therefore have access to the same formatting features in both languages. W3C will work hard to ensure that interoperable implementations of the formatting model is available.

    A W3C Note on Using XSL and CSS together is available."


  6. XHTML (Extensible HyperText Markup Language -- A Reformulation of HTML 4.0 in XML 1.0) from 1998 draft revised May 1999)

    We are now attempting to resolve some of the issues of using HTML as a subset of SGML rather than HTML as a subset of XML. To make web things conform to XML, we have to reformat the HTML into XHTML by doing things like:

    • documents must be well formed (all start tags must have end tags and must be correctly embedded -- i.e., no overlapping tags)
    • element and attribute names must be lower case
    • attributes must be in quotes

    XHTML working Draft


    Note that all the discussion above reduces to:

    1. Formatting data within a document. This is the heart of the idea of markup languages. Though SGML is the most general standard, two variants have emerged: HTML (an application of SGML -- where a specific set of tags have been identified) and XML (a subset of SGML that allows us to define our own documents and the objects they contain). CSS provides additional specifications for more finely controlling the appearance of an HTML document.
    2. XSL provides a way to translate XML into HTML/CSS.
    3. XHTML provides ways to convert HTML from an application of SGML to an application of XML.

    For all the fuss and furor, as programmers, we have to be able to access all this marked up data in the programming languages (i.e., how can we create, access or modify the objects...

    so, we get...

  7. DOM (Document Object Model or how to DHTML)

    DHTML (Dynamic HTML) describes HTML pages with dynamic content. This just implies that HTML, CSS and the scripting languages (VBScript and Javascipt) operate together to create dynamic pages rather than static ones. There is no standard here -- it is just a phrase that denotes the interoperation of the three elements.

    However, most important to the concept of DHTML is the ability to access the document elements from a scripting language. As we have seen, in order to talk about the document in the script languages we have to have a way to access and modify their properties. Thus far we have seen only a limited portion of the ways to access the HTML objects. To expose ALL the objects in a document to the script language, W3C proposed DOM (the Document Object Model) as a way to programmatically reference the document objects. See the current DOM recommendations at:


  8. SHTML

    Used to indicate Server Side Includes are being added by the server to the HTML data stream shipped to the client. An "include" is a file or the output of a command that is "inserted" in the HTML as it is being sent to the client. SHTML is not a language, it is a prefix for the HTML file type that indicates "SSI" is being used.

  9. BNF (Backus-Naur Form)

    A standard way to express syntax rules. Much of the above documentation is shown in BNF. See a short decription at: Th. Estier's BNF page or Lars Marius Garshol's page on EBNF (that is "E"xtended BNF).

Return to Parks' DISC3371 Homepage