org.cdlib.xtf.textIndexer
Class SrcTreeProcessor

Object
  extended by SrcTreeProcessor

public class SrcTreeProcessor
extends Object

This class is the main processing shell for files in the source text tree. It optimizes Lucene database access by opening the index once at the beginning, processing all the source files in the source tree (including skipping non-source XML files in the tree), and closing the database at the end.

Internally, this class uses the XMLTextProcessor class to actually split the source files up into chunks and add them to the Lucene index.


Field Summary
private  IndexerConfig cfgInfo
           
private  StringBuffer dirBuf
           
private  StringBuffer docBuf
           
private  DocSelCache docSelCache
           
private  File docSelCacheFile
           
private  Templates docSelector
           
private  String docSelPath
           
private  int nScanned
           
private  StylesheetCache stylesheetCache
           
private  XMLTextProcessor textProcessor
           
 
Constructor Summary
SrcTreeProcessor()
          Default constructor.
 
Method Summary
(package private)  String calcIndexPath()
           
 void close()
          Indexing close function.
 void loadCache(IndexerConfig cfgInfo)
          Load the previous docSelector cache.
 void open(IndexerConfig cfgInfo)
          Indexing open function.
 void processDir(File curDir, SubDirFilter subDirFilter, boolean topLevel)
          Process a directory containing source XML files.
 boolean processFile(String dir, EasyNode parentEl)
          Process file.
 void saveCache()
          Save the docSelector cache.
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

cfgInfo

private IndexerConfig cfgInfo

textProcessor

private XMLTextProcessor textProcessor

stylesheetCache

private StylesheetCache stylesheetCache

docSelector

private Templates docSelector

nScanned

private int nScanned

docBuf

private StringBuffer docBuf

dirBuf

private StringBuffer dirBuf

docSelPath

private String docSelPath

docSelCacheFile

private File docSelCacheFile

docSelCache

private DocSelCache docSelCache
Constructor Detail

SrcTreeProcessor

public SrcTreeProcessor()
Default constructor.

Instantiates the XMLTextProcessor used internally to process individual XML source files.

Method Detail

open

public void open(IndexerConfig cfgInfo)
          throws Exception
Indexing open function.

Calls the XMLTextProcessor open() method to actually create/open the Lucene index.

Parameters:
cfgInfo - The IndexerConfig that indentifies the Lucene index, source text tree, and other parameters required to perform indexing.

Throws:
IOException - Any I/O exceptions generated by the XMLTextProcessor open() method.

Exception

close

public void close()
           throws IOException
Indexing close function.

Calls the XMLTextProcessor processQueuedTexts() method to flush all the pending Lucene writes to disk. Then it calls the XMLTextProcessor close() method to actually close the Lucene index.

Throws:
IOException - Any I/O exceptions generated by the XMLTextProcessor close() method.


calcIndexPath

String calcIndexPath()

loadCache

public void loadCache(IndexerConfig cfgInfo)
Load the previous docSelector cache.

Parameters:
cfgInfo - The IndexerConfig that indentifies the Lucene index, source text tree, and other parameters required to perform indexing.


saveCache

public void saveCache()
Save the docSelector cache.


processDir

public void processDir(File curDir,
                       SubDirFilter subDirFilter,
                       boolean topLevel)
                throws Exception
Process a directory containing source XML files.

This method iterates through a source directory's contents indexing any valid files it finds, any processing any sub-directories.

Parameters:
curDir - The current directory to be processed.
subDirFilter - Sub-dirs to scan, or null for all.
topLevel - true for the top-level directory, false else.
Throws:
Exception - Any exceptions generated internally by the File class or the XMLTextProcessor class.


processFile

public boolean processFile(String dir,
                           EasyNode parentEl)
                    throws Exception
Process file.

This method processes a source file, including source text XML files, PDF files, etc.

Parameters:
parentEl - DOM element representing the current file to be processed. This may be a source XML file, PDF file, etc.

Returns:
true if the document was processed, false if it was skipped due to skipping rules.

Throws:
Exception - Any exceptions generated internally by the File class or the XMLTextProcessor class.