org.cdlib.xtf.textIndexer
Class TextIndexer

Object
  extended by TextIndexer

public class TextIndexer
extends Object

This class is the main class for the TextIndexer program.

Internally, this class retrieves command line arguments, and processes them in order to index source XML files into one or more Lucene databases. The command line arguments required by the TextIndexer program are as follows:

TextIndexer -config CfgFilePath { {-clean|-incremental}? {-trace errors|warnings|info|debug}? -index IndexName }+
The -config argument identifies an XML configuration file that defines one or more indices to be created, updated, or deleted. This argument must be the first argument passed, and it must be passed only once. For a complete description of the contents of the configuration file, see the XMLConfigParser class.

The -clean / -incremental argument is an optional argument that specifies whether Lucene indices should be rebuilt from scratch (-clean) or should be updated (-incremental). If this argument is not specified, the default behavior is incremental.

The -buildlazy / -nobuildlazy argument is an optional argument that specifies whether the indexer should build a persistent ("lazy") version of each document during the indexing process. The lazy files are stored in the index directory, and they speed dynaXML access later. If this argument is not specified, the default behavior is to build lazy versions of the documents.

The -optimize / -nooptimize argument is an optional argument that specifies whether the indexer should optimize the indexes after they are built. Optimization improves query speed, but can take a very long time to complete depending on the index size. If this argument is not specified, the default behavior is to optimize.

The -trace argument is an optional argument that sets the level of output displayed by the text indexer. The output levels are defined as follows:
errors - Only error messages are displayed.
warnings - Both error and warning messages are displayed.
info - Error, warning, and informational messages are displayed.
debug - Low level debug output is displayed in addition to error, warning and informational messages.

If this argument is not specified, the TextIndexer defaults to displaying informational (info) level messages.

The -index argument identifies the name of the index to be created/updated. The name must be one of the index names contained in the configuration file specified as the first parameter. As is mentioned above, the -config parameter must be specified first. After that, the remaining arguments may be used one or more times to update a single index or multiple indices.


A simple example of a command line parameters for the TextIndexer might look like this:

TextIndexer -config IdxConfig.xml -clean -index AllText
This example assumes that the config file is called IdxConfig.xml, that the config file contains an entry for an index called AllText, and that the user wants the index to be rebuilt from scratch (because of the -clean argument.


Field Summary
static String CURRENT_VERSION
          The version of the text indexer (placed into any indexes created
static String REQUIRED_VERSION
          The minimum index version that we can read
 
Constructor Summary
TextIndexer()
           
 
Method Summary
static void main(String[] args)
          Main entry-point for the Text Indexer.
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CURRENT_VERSION

public static final String CURRENT_VERSION
The version of the text indexer (placed into any indexes created

See Also:
Constant Field Values

REQUIRED_VERSION

public static final String REQUIRED_VERSION
The minimum index version that we can read

See Also:
Constant Field Values
Constructor Detail

TextIndexer

public TextIndexer()
Method Detail

main

public static void main(String[] args)
Main entry-point for the Text Indexer.

This function takes the command line arguments passed and uses them to create or update the specified indices with the specified source text.

Parameters:
args - Command line arguments to process. The command line arguments required by the TextIndexer program are as follows:
TextIndexer -config CfgFilePath { {-clean|-incremental}? {-trace errors|warnings|info|debug}? -index IndexName }+
For a complete description of each command line argument, see the TextIndexer class description.