org.cdlib.xtf.textEngine
Class XtfChunkedWordIter
Object
BasicWordIter
ChunkedWordIter
XtfChunkedWordIter
- All Implemented Interfaces:
- Cloneable, WordIter
public class XtfChunkedWordIter
- extends ChunkedWordIter
Handles iterating over XTF's tokenized documents, including special
tracking of node numbers and word offsets.
- Author:
- Martin Haye
Constructor Summary |
XtfChunkedWordIter(IndexReader reader,
DocNumMap docNumMap,
int mainDocNum,
String field,
Analyzer analyzer)
Construct the iterator and read in starting text from the given
chunk. |
Method Summary |
MarkPos |
getPos(int startOrEnd)
Create an uninitialized MarkPos structure |
void |
getPos(MarkPos pos,
int startOrEnd)
Get the position of the start of the current word |
Methods inherited from class Object |
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
XtfChunkedWordIter
public XtfChunkedWordIter(IndexReader reader,
DocNumMap docNumMap,
int mainDocNum,
String field,
Analyzer analyzer)
- Construct the iterator and read in starting text from the given
chunk.
- Parameters:
reader
- where to read chunks fromdocNumMap
- maps main doc num to chunk numbersmainDocNum
- doc ID of the main documentfield
- field tokenize and iterateanalyzer
- used to tokenize the field
getPos
public MarkPos getPos(int startOrEnd)
- Create an uninitialized MarkPos structure
- Specified by:
getPos
in interface WordIter
- Overrides:
getPos
in class BasicWordIter
- Parameters:
startOrEnd
- FIELD_START for the very start of the field;
TERM_START for the first character of the word;
TERM_END for the last character of the word;
TERM_END_PLUS for the last character plus any trailing
punctuation and/or spaces;
FIELD_END for the very last end of the field.
getPos
public void getPos(MarkPos pos,
int startOrEnd)
- Get the position of the start of the current word
- Specified by:
getPos
in interface WordIter
- Overrides:
getPos
in class ChunkedWordIter
startOrEnd
- FIELD_START for the very start of the field;
TERM_START for the first character of the word;
TERM_END for the last character of the word;
TERM_END_PLUS for the last character plus any trailing
punctuation and/or spaces;
FIELD_END for the very last end of the field.