[ You are here: XTF -> Tips and Tricks -> Debugging XTF Stylesheets ]

Debugging XTF Stylesheets

Table of Contents

Debugging XTF Stylesheets
Debug Step Mode
Using "Raw" Mode
One Frame at a Time
Using
Turning on "debug" log mode
Running Saxon from the Command-Line
Indexing Subsets of Data
Since XTF involves a fair amount of customization in XSLT, you're likely to create stylesheet bugs as you develop. This section covers ways to efficiently track down the source of a problem. In particular, we focus here on debugging strategies that can be used in almost any XTF installation, without external tools. However, it should be noted that integrated development environments such as <oXygen/> and Eclipse can often provide more powerful debugging tools; setting up these environments to work with XTF is covered the section on External Tools.

Debug Step Mode

A great way to get a hands-on feel for how crossQuery works is to use the built-in "Debug Step" mode. Simply add ;debugStep=1 to any search URL. The generated web page will let you step through the entire process, with detailed explanations and real data. The first step looks like this:
SearchDebugStep.gif
Clicking on the links allows you step through the process, and to see the input and output of each stage in the crossQuery data flow, from the initial parameters to the query parser to the result formatter.

Currently, dynaXML doesn't support a debug step mode, mainly because doubling the number of frames in the already frame-heavy display mode would be quite clumsy. However, this might be added to dynaXML at some point in the future. Still, the data flow in dynaXML is much the same as in crossQuery, so learning one will help you to understand the other.

Using "Raw" Mode

Both crossQuery and dynaXML support a special "raw" mode useful if you just want to know the exact XML input that's being sent to the formatter (Result Formatter or Document Formatter, respectively.)

This mode is activated by simply adding ;raw=1 to an XTF URL in your browser. Instead of calling the formatter stylesheet, the full XML will be sent directly to your browser window. Be warned that depending on size of the request (number of hits requested in crossQuery or size of document in dynaXML) it may be quite large, and thus your web browser may take some time to process all the data before displaying it.

One Frame at a Time

Because dynaXML uses multiple frames (table of contents, button bar, content) it can be difficult to isolate problems using any of the methods below. That's because there are actually four requests made by the browser to the servlet: first, to get the frame set, and then three additional requests, one for each frame. To make matters even more interesting, these latter three requests are processed simultaneously — in parallel!

To simplify debugging, you can append a doc.view parameter to the URL string to select the particular frame you're trying to debug. For the default document formatter that comes with XTF, here are the values you can add to the URL:
;doc.view=toc Table of contents (left hand frame)
;doc.view=bbar Button bar (top frame)
;doc.view=content Main content page (right frame)

Using <xsl:message>

Often the behavior of an XSLT program can be mysterious, and you would really like to find out which templates are firing, or what the value of a variable is at a certain point. While IDEs (Integrated Development Environments) such as Oxygen excel at answering this sort of question, they do take time to set up and learn.

A simple, tried and true way to find out what's going on during stylesheet processing is to use <xsl:message> to print messages and/or variables when the stylesheet runs. Here's some sample code:
<xsl:variable name="myVar">
    <!-- ... complicated stuff here that isn't working ... -->
</xsl:variable>
<xsl:message>
    The value of myVar is: <xsl:copy-of select="$myVar"/>
</xsl:message>
The message, along with the value of $myVar, will come out in the servlet container's log file. This varies by container, but for Resin it's typically resin-dir/log/jvm-httpd.log, and for Tomcat it's typically tomcat-dir/logs/catalina.out. From a UNIX-style command prompt, you can interactively see the output of one of these files with a command like this:

Turning on "debug" log mode

Another way to see the input and output of each stylesheet (and that works in both dynaXML and crossQuery) is to turn on debug logging in the servlet configuration. Edit the configuration file for the servlet you're working with (e.g. conf/dynaXML.conf or conf/crossQuery.conf), and look for this line:
<logging level="info"/>
Change info to debug, and then watch the output of your servlet container. This varies by container, but for Resin it's typically resin-dir/log/jvm-httpd.log, and for Tomcat it's typically tomcat-dir/logs/catalina.out. The XTF servlet will output messages showing the input and output of each stylesheet. Here is sample log output from a dynaXML request:
2006-10-09:13:07:42 Log level: debug
2006-10-09:13:07:42 Processing request: http://localhost:8080/xtf-sf/view?docId=tei/ft958009mm/ft958009mm.xml;doc.view=content
2006-10-09:13:07:43 StylesheetCache: Generated. Path=/Applications/Tomcat/webapps/xtf-sf/style/dynaXML/docReqParser.xsl
2006-10-09:13:07:43 *** docReqParser input ***
2006-10-09:13:07:43   <?xml version="1.0" encoding="UTF-8"?>
<parameters>
  <param name="docId" value="tei/ft958009mm/ft958009mm.xml">
    <token value="tei/ft958009mm/ft958009mm" isWord="yes"/>
    <token value="." isWord="no"/>
    <token value="xml" isWord="yes"/>
  </param>
</parameters>
2006-10-09:13:07:43 *** docReqParser output ***
2006-10-09:13:07:43   <?xml version="1.0" encoding="UTF-8"?>
<style path="style/dynaXML/docFormatter/tei/teiDocFormatter.xsl"/>
<source path="data/tei/ft958009mm/ft958009mm.xml"/>
<index configPath="{{conf/textIndexer.conf}}" name="default"/>
<auth access="allow" type="all"/>
2006-10-09:13:07:43 Processing auth spec: access=allow type=all
2006-10-09:13:07:43 Checking IP "127.0.0.1" vs reverse proxy IP "null"
2006-10-09:13:07:43 Auth allow all
2006-10-09:13:07:44 StylesheetCache: Generated. Path=/Applications/Tomcat/webapps/xtf-sf/style/dynaXML/docFormatter/tei/teiDocFormatter.xsl
2006-10-09:13:07:44 Latency: 1473 msec for request: http://localhost:8080/xtf-sf/view?docId=tei/ft958009mm/ft958009mm.xml;doc.view=content
As you can see, there is more than just the stylesheet input and output; the servlet includes debugging information on the various internal caches, the authentication process, and how long the request took to process. These pieces of information can be useful, though not as often. Also, observe that we added a parameter to debug only the content frame (as recommended in the section on debugging one frame at a time), to avoid mixing up all the frames simultaneously.

Running Saxon from the Command-Line

It is often useful, especially when writing formatters such as dynaXML's Document Formatter, to avoid the hassle of switching between the stylesheet, browser, and command-line. It is possible to use the Saxon XSLT processor (used internally by XTF) directly from the command-line. This gives you a convenient way to quickly see the result of a stylesheet change.

Here is a sample of running Saxon from the command-line:
$ cd $XTF_HOME
$ java -classpath WEB-INF/xtf.jar net.sf.saxon.Transform \
       data/tei/ft958009mm/ft958009mm.xml \
       style/dynaXML/docFormatter/tei/teiDocFormatter.xsl
 
<!DOCTYPE html
  PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8; charset=UTF-8" />
<title>The Opening of the Apartheid Mind</title></head>
<frameset rows="120,*">
<frame frameborder="1" scrolling="no" title="Navigation Bar" name="bbar"
...
If you redirect the XML output to a file, you won't get any results on your screen... unless you have added <xsl:message> to debug a particular section. In fact, combining command-line Saxon with <xsl:message> output is a powerful and fairly efficient debugging combination that can be used to track down many problems.

The example above is for a single stylesheet — a dynaXML document formatter — but of course there are many other transformations in XTF. How do you obtain the proper input documents for these? Consult the table below:
Servlet Stylesheet How to get input data file
crossQuery Query Router Turn on debug step mode, grab step 1 file from browser using Save As...
crossQuery Query Parser Turn on debug step mode, grab step 2 file from browser using Save As...
crossQuery Result Formatters Turn on debug step mode, grab step 4 file from browser using Save As...
dynaXML Document Request Parser Turn on debug logging mode, copy/paste docReqParser input from the log file.
dynaXML Document Formatters Use raw mode, then grab marked up input from browser using Save As...
textIndexer Document Selector Turn on debug logging mode, grab docSelector input from log file.
textIndexer Document Prefilter Just use your unmodified XML source file. For non-XML sources, create a small prefilter that dumps the XML input file that XTF creates to the screen using <xsl:message>, then copy/paste that.

Indexing Subsets of Data

When debugging Document Prefilter stylesheets for use with the textIndexer, it can be very time-consuming to re-index your entire document collection every time you make a small change. You can achieve a much faster edit-run-fix cycle if you index only small subsets of your data. In other words, when you discover a bug, it's useful to just re-index one or a few documents required for debug and testing purposes, and then once you've figured out the bug, reindex the entire set of content.

There are at least two ways to index small sets. First is to simply set up a separate data directory and a new index configuration. However, an often overlooked method is to use your original index configuration and data directory, but specify a single sub-directory of your data to index. Here's a sample command doing just that:
$ textIndexer -dir tei/ft958009mm -index default