org.cdlib.xtf.textIndexer
Class TextIndexer

Object
  extended by TextIndexer

public class TextIndexer
extends Object

This class is the main class for the TextIndexer program.

Internally, this class retrieves command line arguments, and processes them in order to index source XML files into one or more Lucene databases. The command line arguments required by the TextIndexer program are as follows:

TextIndexer -config CfgFilePath { {-clean|-incremental}? {-trace errors|warnings|info|debug}? -index IndexName }+
The -config argument identifies an XML configuration file that defines one or more indices to be created, updated, or deleted. This argument must be the first argument passed, and it must be passed only once. For a complete description of the contents of the configuration file, see the XMLConfigParser class.

The -clean / -incremental argument is an optional argument that specifies whether Lucene indices should be rebuilt from scratch (-clean) or should be updated (-incremental). If this argument is not specified, the default behavior is incremental.

The -buildlazy / -nobuildlazy argument is an optional argument that specifies whether the indexer should build a persistent ("lazy") version of each document during the indexing process. The lazy files are stored in the index directory, and they speed dynaXML access later. If this argument is not specified, the default behavior is to build lazy versions of the documents.

The -optimize / -nooptimize argument is an optional argument that specifies whether the indexer should optimize the indexes after they are built. Optimization improves query speed, but can take a very long time to complete depending on the index size. If this argument is not specified, the default behavior is to optimize.

The -trace argument is an optional argument that sets the level of output displayed by the text indexer. The output levels are defined as follows:
errors - Only error messages are displayed.
warnings - Both error and warning messages are displayed.
info - Error, warning, and informational messages are displayed.
debug - Low level debug output is displayed in addition to error, warning and informational messages.

If this argument is not specified, the TextIndexer defaults to displaying informational (info) level messages.

The -index argument identifies the name of the index to be created/updated. The name must be one of the index names contained in the configuration file specified as the first parameter. As is mentioned above, the -config parameter must be specified first. After that, the remaining arguments may be used one or more times to update a single index or multiple indices.


A simple example of a command line parameters for the TextIndexer might look like this:

TextIndexer -config IdxConfig.xml -clean -index AllText
This example assumes that the config file is called IdxConfig.xml, that the config file contains an entry for an index called AllText, and that the user wants the index to be rebuilt from scratch (because of the -clean argument.


Field Summary
static String CURRENT_VERSION
          The version of the text indexer (placed into any indexes created)
static String REQUIRED_VERSION
          The minimum index version that we can read and append to
static String SHOW_VERSION
          The version to be shown to the user (does not need to string compare as higher than prev.)
 
Constructor Summary
TextIndexer()
           
 
Method Summary
private static void doIndexing(IndexerConfig cfgInfo, File xtfHomeFile)
          Handles the main work of adding and removing documents to/from the index.
private static void doRotation(IndexerConfig cfgInfo)
          Rotates a rotation-enabled index.
private static void doValidation(IndexerConfig cfgInfo)
           
static void main(String[] args)
          Main entry-point for the Text Indexer.
private static SubDirFilter makeSubDirFilter(File srcRootFile, IndexerConfig cfgInfo)
          Create a subdirectory filter, using the specified source root directory and the given configuration info.
private static void renameOrElse(File from, File to)
          Utility function to perform a rename, and throw an exception if the it fails.
private static void writeScanDirs(File indexFile, IndexInfo idxInfo)
          Append the current subdirectories we're about to scan to the scanDirs.list file.
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SHOW_VERSION

public static final String SHOW_VERSION
The version to be shown to the user (does not need to string compare as higher than prev.)

See Also:
Constant Field Values

CURRENT_VERSION

public static final String CURRENT_VERSION
The version of the text indexer (placed into any indexes created)

See Also:
Constant Field Values

REQUIRED_VERSION

public static final String REQUIRED_VERSION
The minimum index version that we can read and append to

See Also:
Constant Field Values
Constructor Detail

TextIndexer

public TextIndexer()
Method Detail

main

public static void main(String[] args)
Main entry-point for the Text Indexer.

This function takes the command line arguments passed and uses them to create or update the specified indices with the specified source text.

Parameters:
args - Command line arguments to process. The command line arguments required by the TextIndexer program are as follows:
TextIndexer -config CfgFilePath { {-clean|-incremental}? {-trace errors|warnings|info|debug}? -index IndexName }+
For a complete description of each command line argument, see the TextIndexer class description.


doIndexing

private static void doIndexing(IndexerConfig cfgInfo,
                               File xtfHomeFile)
                        throws Exception
Handles the main work of adding and removing documents to/from the index.

Throws:
Exception

writeScanDirs

private static void writeScanDirs(File indexFile,
                                  IndexInfo idxInfo)
                           throws IOException
Append the current subdirectories we're about to scan to the scanDirs.list file. This file is used in incremental index rotation to figure out which data and lazy subdirectories need to be scanned for changes.

Throws:
IOException

makeSubDirFilter

private static SubDirFilter makeSubDirFilter(File srcRootFile,
                                             IndexerConfig cfgInfo)
Create a subdirectory filter, using the specified source root directory and the given configuration info.


doRotation

private static void doRotation(IndexerConfig cfgInfo)
                        throws IOException
Rotates a rotation-enabled index.

Throws:
IOException - if anything goes wrong

doValidation

private static void doValidation(IndexerConfig cfgInfo)
                          throws IOException
Throws:
IOException

renameOrElse

private static void renameOrElse(File from,
                                 File to)
                          throws IOException
Utility function to perform a rename, and throw an exception if the it fails.

Throws:
IOException