public class LuceneIndexToDict
extends Object
SpellWritingAnalyzer
or SpellWritingFilter
) since that will
grab non-stored as well as stored fields. Still, if that isn't an option or
if you simply want to test out spelling correction, after-the-fact dictionary
creation may be useful.Constructor and Description |
---|
LuceneIndexToDict() |
Modifier and Type | Method and Description |
---|---|
static void |
createDict(Directory indexDir,
File dictDir)
Read a Lucene index and make a spelling dictionary from it.
|
static void |
createDict(Directory indexDir,
File dictDir,
ProgressTracker prog)
Read a Lucene index and make a spelling dictionary from it.
|
static void |
createDict(IndexReader indexReader,
Analyzer analyzer,
SpellWriter spellWriter,
ProgressTracker prog)
Read a Lucene index and make a spelling dictionary from it.
|
static void |
main(String[] args)
Command-line interface for build a dictionary directly from a Lucene index
without writing any code.
|
static void |
queueWords(IndexReader reader,
Analyzer analyzer,
SpellWriter writer,
ProgressTracker prog)
Re-tokenize all the words in stored fields within a Lucene index,
and queue them to a spelling dictionary.
|
public static void createDict(Directory indexDir, File dictDir) throws IOException
StopAnalyzer.ENGLISH_STOP_WORDS
).indexDir
- directory containing the Lucene indexdictDir
- directory to receive the spelling dictionaryIOException
public static void createDict(Directory indexDir, File dictDir, ProgressTracker prog) throws IOException
StopAnalyzer.ENGLISH_STOP_WORDS
).indexDir
- directory containing the Lucene indexdictDir
- directory to receive the spelling dictionaryprog
- tracker called periodically to display progressIOException
public static void createDict(IndexReader indexReader, Analyzer analyzer, SpellWriter spellWriter, ProgressTracker prog) throws IOException
StopAnalyzer.ENGLISH_STOP_WORDS
).indexReader
- used to read fields from a Lucene indexanalyzer
- used to tokenize fields from the index; generally,
this should do minimal filtering, taking care to avoid substantive
token modification (such as stemming or depluralization). A good
choice is MinimalAnalyzer
.spellWriter
- receives words to be added to the dictionaryprog
- tracker called periodically to display progressIOException
public static void queueWords(IndexReader reader, Analyzer analyzer, SpellWriter writer, ProgressTracker prog) throws IOException
reader
- used to read fields from a Lucene indexanalyzer
- used to tokenize fields from the index; generally,
this should do minimal filtering, taking care to avoid substantive
token modification (such as stemming or depluralization). A good
choice is MinimalAnalyzer
.writer
- receives words to be added to the dictionaryprog
- tracker called periodically to display progressIOException
public static void main(String[] args)