As an instructor of XML at a local community college,
I thought it would be a good idea to create a Web site for my students that
provides a practical example of how XML can be used in an Internet application.
I designed this site mainly in response to a question I hear often at the
beginning of my course: "Why would I want to use a tagging language in
application development?" However, the site goes beyond simply addressing this
question. By using a handy mapping technique and design pattern, and
transforming XML to HTML on the server side, I created a scalable Web site
that's easy to maintain and that can be delivered using just two Active Server
Pages (ASP). You could probably extend this technology to a more interactive
Web site, such as one with heavy forms to process, or maybe even to an
e-commerce Web site, but I will leave that decision and implementation up to
you.
My goals for this site were that it be scalable, easily maintainable, and, of
course, constructed using XML technology. Since most of the information is of a
static nature, I decided that my project should take on the look and feel of a
corporate Web site (here we are, this is what we do, here is some information,
and so on). Thus the site needed a common menu system throughout for easy site
navigation.
The Web site is an information resource, where class members can obtain a course
outline, lab assignments, homework assignments, sample code, resources/links,
and the course syllabus. This way, instead of photocopying many documents,
taking them to class, and handing them out (which is environmentally incorrect
and wrecks my back), I can direct students to the Web site. They make the
decision whether or not to print the information.
I also wanted to be able to add information to the site as needed. For example,
I wanted to make lab and homework assignments available on the Web the same day
that I planned to hand them out in class. I did not want to edit an HTML page
to do this. Basically, I wanted to "turn on" features of the site as they were
needed.
The Epiphany
Have you ever noticed a similarity between an XML document and a Web site
diagram like the one presented in Microsoft FrontPage? A site diagram is a
hierarchy of the pages in a Web site, all of which stem from the home, start,
or default page. An XML document is also a hierarchy, except it is a hierarchy
of elements that stem from the document or root element. The only difference
that I saw between the two was that the Web site diagram is typically viewed
from the top-down-left-to-right fashion (see
Figure 1), whereas an XML document is typically viewed from left to
right and then down (see Figure
2).
Because of the similarity between a site diagram and an XML document, I
concluded that I could use an XML document structure to map the contents of a
Web site-in other words, construct a site diagram using an XML document. The
difference between a standard HTML-based site and the one I wanted to build was
that each page in our site would be created using Extensible Stylesheet
Language Transformations (XSLT). I decided to name the XML document containing
the Web site hierarchy Master, or Master.xml.
A Web site is a collection of Web pages; one page is designated as the main or
default page and all other pages are subordinate to it. The Master.xml document
has a root element of Master and a subordinate element of Page that designates
a page participating in the Web site. Each Page element is adorned with
attributes designating the Page as a participant in the Main Menu for the site,
and telling if the page is active (displayable), its Name (which is used in
display), its ID or internal identifier, and if it is the default or start
page.
The ID attribute would be used for site navigation to abstract the site contents
from the user. This ID attribute is designated as an ID in the DTD, so its
value must be unique within the XML document. A page can also be referenced via
an IDREFS attribute, which means that the page has references to other pages
within the site.
To fulfill my design goal of making aspects of the system available on an
as-needed basis, I created an Active attribute that simply contains a value of
"Yes" (available for display) or "No" (not available for display). This allows
me to release pages for the site as they are completed. Each page must
designate if it participates in the Main or common menu of the Web site. In
addition, since the root element of an XML document is there only to anchor the
others, I had to designate a page as the Default page for the system. I did
this with the Default='Yes' attribute. Finally, since I was using server-side
transformations to create an HTML document, each Page contained subelements
describing the data and presentation parts of the rendered HTML page. I named
this element the Doc element-each page has two.
The Doc element contains a Type attribute, which designates the element as
content (xml) or presentation (xsl). A Doc element with a type of xml
references an XML document that contains the data for the page. Conversely, an
xsl type references an XML document that contains the transformations necessary
to present the page to the user of the Web site. The references to the discreet
name and location of the XML document are contained within the URL attribute of
the Doc element. For a diagram outlining the hierarchy of an XML document
structure in a Web site, see Figure
3.
This sample of the contents of Master.xml contains the default page for the
site, which is appropriately named Home:
<Master>
<Page active='Yes' default='Yes'
MainMenu='Yes' id="P00" Name="Home">
<doc Type="xml" URL="\xml\home.xml" />
<doc Type="xsl" URL="\xsl\home.xsl" />
</Page>
So, a Web site contains pages that contain XML documents that are used to create
the HTML page using XSLT.
The Transformation
Now that we can structure a Web site in the form of an XML document, how do we
present it to the user? The XML specification provides for the transformation
of one XML document into another, better known as XSLT. The XSLT spec is at
recommendation status; therefore it is considered final and application vendors
may safely write code to the specification without fear of a syntax change.
Until the Extensible Stylesheet Language (XSL) reaches recommendation status,
the only way to present XML documents in the browser is through XSLT
transformations.
Note that the XML document containing the XSLT instructions can also contain
HTML tags. In our case, this is mandatory because we deliver the transformed
document to a browser. Furthermore, since the transformation document is an XML
document, it must be well formed. This means that each HTML tag within the XSLT
transformation document must have an end tag. These HTML tags must have the
same case and may not be embedded within another tag. For example, every
<p> tag must have a </p> end tag, and you cannot use a <P>
tag with a </p> end tag (incorrect case).
There are two ways to transform an XML document: embedding a processing
instruction within the "to be transformed" or target document, and using DOM
methods to perform the transformation. If you load an XML document into IE 5.0,
you may notice that the document is color-coded and viewable as a formatted
tree structure. This is because IE 5.0 uses a default style sheet to render the
document. It will do this only if the XML document does not have a processing
instruction specifying an XSLT style sheet. The processing instruction
references an external document that has the XSLT commands to render the
document. If a processing instruction is present that designates a style sheet,
the XML document will be rendered in IE 5.0 according to that designated style
sheet's instructions. A sample processing instruction to render the Home page
in our system would be:
<?xml:stylesheet type=
"text/xsl" href="..\xsl\home.xsl" ?>
Since I wanted the course Web site to be widely viewable-not only by individuals
who have IE 5.0-I decided to perform server-side transformations of the XML
documents. The server-side transformation will use DOM to transform the XML
documents on the server and send them to the client as formatted HTML (see
Figure 4).
|
|
Figure 4.
XML to HTML Transformation. Click
here . |
The XML parser for the Microsoft platform, or MSXML.DLL, is a COM object, so it
can be used to instantiate DOM objects within application code such as Visual
Basic, Java, C++, and Visual Fox Pro. We will implement this using ASP. We will
use VBScript to create our DOM objects. The DOM specification is defined by the
World Wide Web Consortium (W3C). The purpose of the specification is to provide
a common set of interfaces to process an XML document within application code.
The specification is defined using Interface Definition Language (IDL) syntax
therefore making it language neutral.
The specification includes no methods to load or open an XML document, but
Microsoft has extended its DOM implementation to include a method to read an
XML document (so has Sun in its Java implementation). Microsoft has two methods
to load an XML document: Load, which uses the URL of the XML document, and
LoadXML, which uses a string variable containing XML elements. We will use both
of these methods in our code.
Since the XML parser is a validating parser, it loads and validates the
document. Validation ensures that the document is well formed, and that, if the
document has a DTD or Schema, it is also valid. Once loaded, the XML document
is available to that application through the remainder of the methods and
properties contained within the instance of the DOM object. Another method that
we use is TransformNodeToObject. This method takes as its arguments two DOM
objects. One DOM object contains transformation instructions and the other is
an empty DOM object that becomes the recipient of the transformation. This
method is performed on the DOM object that has loaded the XML data document to
be transformed.
The Pattern
Now we have the Web site modeled as an XML document, with each page in the site
containing two XML documents-one with the content and the other with the
presentation logic. We know that we can present this to the user with XSLT. How
do we do this without having to create an ASP page for each HTML page in our
Web site?
To answer this question, I present to you the Director/Builder design pattern.
This pattern uses the object-oriented technique of delegation, whereby
specialized work is delegated to a specific component of a system that can
perform the job. The theory is that by segmenting an application in this way,
it becomes easier to construct and debug. In an object-oriented language such
as Java, an application creates the Director and the Builder objects. The
Director is responsible for validating and assembling the metadata to construct
a Web page. It then communicates this information to the Builder, which
constructs the Web page and presents it to the client.
Since I was using ASP, I ran into a problem with this pattern. ASP is not an
object-oriented language. I got around this limitation by creating two Active
Server Pages; one is the Director and the other the Builder. They communicate
with one another through query strings (an argument to the page) and session
variables (hidden values on the server that are unique to a client session). I
also used other techniques specific to ASP to create and cache the common menu,
which is a transformation of the XML document containing the site diagram
(Master.xml), and cache the Master.xml file.
The implementation of the pattern in our application is straightforward and
depends heavily on our site map, or Master.xml. The Director, through the DOM,
searches the master file and locates the page requested by its ID. If no ID is
supplied, the Director searches the Master.xml for the Page that is designated
as the start page for the site (Default='Yes'). It then determines if this page
is active (Page Active attribute is "Yes"). If it is not active, the Director
displays an error message to the user. If the page is active, the Director
accesses the Doc elements for the Page creating session variables for the Doc
element that contains the XML file for the page content (Type='xml') and the
Doc element that contains the XML file for the presentation (Type='xsl'). The
Director then redirects to the Builder page, which uses the session variables
to locate the content and presentation XML files. These are each loaded into
DOM objects and then transformed into an HTML page.
Our implementation of the transformation process within the Builder creates
three instances of the DOM-one to load the data document, another to load the
transformation document, and a third to receive the transformation. The method
used is TransformNodeToObject. It is a Microsoft extension to the W3C DOM
specification. This transformation is then shipped back to the client through a
Response.write command. This transformation can also be performed in other
ways. You can add a processing instruction that references the presentation
document to the content xml file and then issues a LoadXML command on an empty
DOM object with the input coming from the target DOM object through the XML
property. You can also perform the transformation with another Microsoft
extension method to the DOM known as transformNode, which takes a DOM object
containing the style sheet as an argument and outputs a string containing the
transformation.
I chose the TransformNodeToObject method because it provides better error
diagnostics. Specifically, I saw three areas where something could go wrong:
the data XML document might not be valid or well formed, the presentation XSLT
document might not be well formed, or the transformation just might not work.
By creating three objects, one for data, one for presentation, and one for
transformation, it's easier to diagnose an error should one occur.
Because we are using server-side transformations, the Director/Builder design
pattern permits us to add as many pages as we desire without adding additional
ASP pages to the system. This is power!
System Specifics
Let's discuss some features of this system's construction, namely the caching
of the Master.xml, the creation of a common menu, and using multiple views on a
single source document. A site designed in this manner depends highly on the
Master.xml file, the XML document that lists the pages in the site and
instructions on how they are to be built. It is accessed every time the
Director.asp page is invoked, and the Builder uses it to create the common
menu.
Because of this dependency, I decided to cache the contents of the Master.xml
into global storage at application startup. This feature of ASP is implemented
in the Application_OnStart event in the global.asa for the site. This event, as
its name implies, is invoked once the application is started on the server. To
save resources, and because of threading issues in the current implementation
of the Microsoft parser, the Master.xml document was cached as a string
variable and not as a DOM object. (A free-threaded instance of the DOM did not
exist in version 2 of the MS parser but has been incorporated in version 3,
which is available now from the MS Web site.) This caching as a string variable
was done with the xml property of the DOM, whereby the contents of an XML
document can be assigned to a string. Therefore, whenever the Director or
Builder requires the Master.xml as a DOM object, it can create a DOM object and
issue a LoadXML method with the global string variable as an argument.
One of the site's original design goals was to "turn on" features of the system
as they became available, but with the Master.xml cached as a string variable
on application startup, the only way this can be achieved is to recycle the
application (shut it down and start it up). To get around this problem, when
the Master.xml is cached, we also save its file creation and modification
dates. Then, whenever a client session is started (with the global.asa
Session_OnStart event) we compare the create and modify dates of the master
file to the create and modified dates that were saved at application startup.
If they don't match, the Master.xml has been altered, and we reload and recache
it into the global string variable. Once it is recached, new and existing
clients will get the latest version of the system. This technique also
correctly builds the common menu for the system.
The Web site's common menu system is built from the Pages element of the site
map contained in the Master.xml, specifically those elements having an
attribute of MainMenu='Yes'. The common menu is built with a transformation of
the master file into an HTML <Body> section. In our Web site, this
section contains the Raritan Valley Community College graphic and a table
containing links to the other pages in the system that are to be included in
the main menu. This transformation is done in the same manner as all
transformations are done for all pages in the site-with the Master Page element
that contains a Doc element with a type attribute of xsl. Since the common menu
is also heavily referenced throughout the application (every page has the
common menu), we cache it similarly to the way we cache the Master.xml file,
with the exception that the string variable contains the transformed
Master.xml, or the HTML syntax that will render the common menu in the browser.
The code in Listing 1 shows how this is
done.
The string resulting from the transformation of the Master file contains the
common menu HTML. As the Builder builds each page, this HTML transformation is
loaded into a DOM object. That's right: the transformed document, even though
it contains HTML, is also a well formed XML document that can be loaded,
searched, and updated in the same manner as any XML document.
The Builder.asp loads the HTML and changes the caption to the title of the
current page being rendered. It can do this with the Microsoft extension to the
DOM specification, selectSingleNode. This method takes an XPATH expression and
returns a Node that meets the expression criteria, or returns a NULL node if
there isn't a match with the criteria. For our purposes, we search the XML
document containing HTML tags for the first occurrence of the underline tag, or
<U>, and change its text value to be the value of the name of the page.
The Director sets the name of the page in a session variable.
Listing 2 shows the code from the Builder that creates the common menu.
Since this type of processing is specific to a Web site, the code contained
within the buildCommonHeader function should probably be done with an include
so that the Builder.asp is not tightly coupled to the site.
Finally we come to the concept of a single document with multiple views. One of
the great things about XSLT is that you can separate content from
presentation-or present one piece of content in several ways. I wanted students
to be able to download homework assignments, lab assignments, or sample
code/data. Instead of creating a separate XML document for each type of
download, I created one Document with download types. I gave this element an
attribute of Category that would describe the type of download it was
(homework, lab, and so on) and then another attribute to designate it as being
available (Yes/No). The Document element has subelements of Description, which
contains a description of the document, and another element called URL to
designate the location of the document to be downloaded.
The transformation process creates a table row for every Downloadable document
for the category, lists its descriptions, and creates a hyperlink if the item
is available. This facilitated my design goal of turning features on within the
system. When I am ready to hand out an assignment, I just set the available
attribute to Yes and the transformation process creates the hyperlink using the
value of the URL tag. All the data relating to a download is placed in one
document named Downloadable.xml.
Figure 5 shows separate transformations that I created based on the
category.
Weighing the Benefits
After implementing these concepts, I took a step back and tried to determine if
it is worthwhile. How will implementing a Web site like this positively affect
the way I work? Has this increased my ability to maintain a Web site? I
realized that I now need two types of documents to create one HTML page. Is
this better? In my opinion, yes. Separating content from presentation can only
increase site maintainability.
In general, if you have done a good job analyzing the problem domain (the data
needs or structure of the XML documents that will be used in your site) the
application will be fairly stable, requiring only the inevitable adjustment for
changes in business processes. With a separation of presentation from content,
the presentation is more likely to change than the content. However, separating
content from presentation in the XML world doesn't come without a price. You
have to learn to code and debug the XSLT language with its template and match
commands. The Microsoft MSDN Web site (msdn.microsoft.com/xml)
has many examples of XSLT. Debugging XSLT can be difficult without a tool. The
one that I use is Microsoft's XSL Debugger, downloadable from
http://msdn.microsoft.com/workshop/c-frame.htm?/workshop/xml/index.asp.
Site maintainability has increased dramatically. Because data is contained
within XML documents, content can be added, removed, or adjusted without
ruining the presentation of that content. If you want to add a page to the
site, all you have to do is create a page entry in the Master.xml and set its
active property to Yes. Of course, you must build the corresponding XML and
XSLT pieces of the page before turning this on. To change the presentation of a
page, just redeploy the XSLT document that transforms it into HTML. Because of
the common menu concept, a change in the menu structure of the site is
automatically propagated throughout every page in the site. Finally, because of
the design pattern, you can deliver the entire content of your site with just
two ASP pages.
One extra benefit: the combined VBScript code of the application, including
comments, is less than 500 lines. More importantly, other than the
customization of a common site menu, these very same ASP pages can be used to
produce any site that you want to deploy. That is impressive. You can visit the
course Web site and check it out for yourself at
http://hol-nt1.raritanval.edu/cis227y/director.asp. Or, of course, you
can build and deploy your own site. You'll go to the head of the class.
Andrew C. Mayo is an adjunct professor at Raritan Valley Community College in New
Jersey and the principal of Carlton Software Solutions, Inc., an information
technology consulting company that provides customized development, mentoring,
and training to corporate clients on the practical application of XML
technology. In his spare time, he can be found skiing in Vermont with his wife
Joan. Reach Andrew at XMLProfessor@CarltonSolutions.com.
|