|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Class Summary | |
---|---|
AccentFoldingFilter | Improves query results by converting accented characters to normal characters by removing diacritics. |
CrimsonBugWorkaround | There's a very nasty bug in the Apache Crimson XML parser. |
CrimsonBugWorkaround.BlockEnum | Presents the input stream as a series of blocks of data |
FacetTokenizer | Performs special tokenization for facet fields. |
HTMLIndexSource | Transforms an HTML file to a single-record XML file. |
HTMLToString | This class provides a single static convert()
method that converts an HTML file into an XML string that can be
pre-filtered and added to a Lucene database by the
XMLTextProcessor class. |
IdxTreeCleaner | This class purges "incomplete" documents from a Lucene index. |
IdxTreeCuller | This class provides a simple mechanism for removing documents from an index when the source text no longer exists in the document library. |
IdxTreeDictMaker | This class provides a simple mechanism for generating a spelling correction dictionary after new documents have been added or updated. |
IdxTreeOptimizer | This class provides a simple mechanism for optimizing Lucene indices after new documents have been added , updated, or removed. |
IndexDump | This class dumps the contents of user-selected fields from an XTF text index. |
IndexerConfig | This class records configuration information about the current state of the TextIndexer application. |
IndexInfo | This class maintains configuration information about the current index that the TextIndexer program is processing. |
IndexMerge | This class merges the contents of two or more XTF indexes, with certain caveats. |
IndexMerge.DirInfo | |
IndexRecord | A single record within a IndexSource . |
IndexSource | Represents a single source of data for an XTF index. |
IndexStats | This class calculates and prints out some useful statistics about an existing index, such as number of documents, size, etc. |
MARCIndexSource | Supplies MARC data to an XTF index, breaking it up into individual MARCXML records. |
MSWordIndexSource | Transforms a Microsoft Word file to a single-record XML file. |
PDFIndexSource | Transforms a PDF file to a single-record XML file. |
PDFToString | This class provides a single static convert()
method that converts the text in a PDF file into an XML string that can be
pre-filtered and added to a Lucene database by the
XMLTextProcessor class. |
PluralFoldingFilter | Improves query results by converting plural words to their singular forms. |
SectionInfo | This class maintains information about the current section in a text document that the TextIndexer program is processing. |
SectionInfoStack | This class maintains information about the current nesting of sections in a text document that the TextIndexer program is processing. |
SpellWritingFilter | Adds words from the token stream to a SpellWriter. |
SrcTreeProcessor | This class is the main processing shell for files in the source text tree. |
SrcTreeProcessor.CacheEntry | One entry in the docSelector cache |
StartEndFilter | Ensures that the tokens at the start and end of the stream are indexed both with and without the special start-of-field/end-of-field markers. |
StructuredFileProxy | Used to put off actually creating a structured store until it is needed. |
TagFilter | Spots XML elements in a token stream and marks them specially. |
TextIndexer | This class is the main class for the TextIndexer program. |
TextIndexSource | Transforms an HTML file to a single-record XML file. |
XMLConfigParser | This class parses TextIndexer configuration XML files. |
XMLIndexSource | Supplies a single file containing a single record to the
XMLTextProcessor . |
XMLTextProcessor | This class performs the actual parsing of the XML source text files and generates index information for it. |
XtfSpecialTokensFilter | The XtfSpecialTokensFilter class is used by the
XTFTextAnalyzer class to convert special "bump" count values in
text chunks to actual position increments for words prior to adding them
to a Lucene index. |
XTFTextAnalyzer | The XTFTextAnalyzer class performs the task of breaking up a
contiguous chunk of text into a list of separate words (tokens
in Lucene parlance.) |
Exception Summary | |
---|---|
TextIndexerException | This exception is thrown by classes related to the textIndexer tool. |
Contains all the classes that make up the textIndexer tool.
The TextIndexer class is the main command-line interface, while XMLTextProcessor does most of the heavy lifting (scanning documents, breaking them into chunks, passing the chunks to Lucene.)
|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |