org.cdlib.xtf.textIndexer
Class XMLConfigParser
Object
DefaultHandler
XMLConfigParser
- All Implemented Interfaces:
- ContentHandler, DTDHandler, EntityResolver, ErrorHandler
public class XMLConfigParser
- extends DefaultHandler
This class parses TextIndexer configuration XML files.
The TextIndexer uses a configuration file that describes one or more index
names. Each index description identifies the source text and Lucene database
directories associated with the index, and the chunk size and overlap for
the index.
The format of the configuration file is as follows:
<?xml version="1.0" encoding="utf-8"?>
<textIndexer-config>
<index name="IndexName">
<db path="LuceneIndexPath"/>
<src path="XMLSourcePath"/>
<chunk size="ChunkSize"
overlap="ChunkOverlap"/>
<skip files= "*.xxx*, *.yyy, ... "/>
<inputfilter path="XSLPreFilterFile"/>
</index>
</textIndexer-config>
The arguments should appear at most once for each index specified. If
multiple instances of the arguments are specified for an index, the
last one is used.
A simple example of a TextIndexer config file might look as follows:
<?xml version="1.0" encoding="utf-8"?>
<textIndexer-config>
<index name="AllText">
<db path="./IndexDBs"/>
<src path="./SourceText"/>
<chunk size="100" overlap="50"/>
<skip files="*.mets*, *AuthMech*"/>
<inputfilter path="./BasicFilter.xsl"/>
</index>
</textIndexer-config>
- Notes:
- This class is derived from the SAX
DefaultHandler
class so that
its startElement()
and endElement()
methods can be called internally from the Java SAXParser
class.
To use this class, simply instantiate a copy, and then call its
configure()
method.
Method Summary |
int |
configure(IndexerConfig cfgInfo)
This method parses a config file and stores the resulting parameters in
a config info structure. |
void |
endElement(String uri,
String localName,
String qName)
Methed called when the end tag is encountered in the config file. |
void |
startElement(String uri,
String localName,
String qName,
Attributes atts)
Methed called when the start tag is encountered in the config file. |
Methods inherited from class DefaultHandler |
characters, endDocument, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning |
Methods inherited from class Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
isConfigFile
private boolean isConfigFile
indexNameFound
private boolean indexNameFound
inNamedIndexBlock
private boolean inNamedIndexBlock
configInfo
private IndexerConfig configInfo
XMLConfigParser
public XMLConfigParser()
configure
public int configure(IndexerConfig cfgInfo)
throws Exception
- This method parses a config file and stores the resulting parameters in
a config info structure.
To read indexing configuration info, create an instance of this class and
call this method with the path/name of the config file to read.
- Parameters:
cfgInfo
- Upon entry, a config structure with the path/name of the
config file to read in the
cfgFilePath
field.
Upon return, the same config structure with
parameter values from the config file stored in their
respective fields.
- Throws:
Exception
- Any internal exceptions generated while parsing the
configuration file.
- Notes:
- The format of the XML file is explained in greater detail in the description
for the
XMLConfigParser
class.
startElement
public void startElement(String uri,
String localName,
String qName,
Attributes atts)
throws SAXException
- Methed called when the start tag is encountered in the config file.
This class is derived from the SAX DefaultHandler
class so that the parser can call this method each time a start tag is
encountered in the XML config file.
- Specified by:
startElement
in interface ContentHandler
- Overrides:
startElement
in class DefaultHandler
- Parameters:
uri
- The current namespace URI in use.localName
- The local name (i.e., without prefix) of the current
element, or the empty string if namespace processing is
disabled.qName
- The qualified name (i.e., with prefix) for the current
element, or the empty string if qualified names are
disabled.atts
- The specified or defaulted arguments for the current
element. These consist of any xxx = "yyy"
style arguments for the element within the < and >.
- Throws:
SAXException
- Any internal exceptions generated due to
syntax problems in the element.
- Notes:
- For an explanation of the config file format, see the main description
for the
XMLConfigParser
class.
endElement
public void endElement(String uri,
String localName,
String qName)
throws SAXException
- Methed called when the end tag is encountered in the config file.
This class is derived from the SAX DefaultHandler
class so that the parser can call this method each time an end tag
is encountered in the XML config file.
- Specified by:
endElement
in interface ContentHandler
- Overrides:
endElement
in class DefaultHandler
- Parameters:
uri
- The current namespace URI in use.localName
- The local name (i.e., without prefix) of the current
element, or the empty string if namespace processing is
disabled.qName
- The qualified name (i.e., with prefix) for the current
element, or the empty string if qualified names are
disabled.
- Throws:
SAXException
- If any internal exceptions generated due to
syntax problems in the element.
- Notes:
- For an explanation of the config file format, see the main description
for the
XMLConfigParser
class.