org.cdlib.xtf.textEngine
Class XtfChunkedWordIter

Object
  extended by BasicWordIter
      extended by ChunkedWordIter
          extended by XtfChunkedWordIter
All Implemented Interfaces:
Cloneable, WordIter

public class XtfChunkedWordIter
extends ChunkedWordIter

Handles iterating over XTF's tokenized documents, including special tracking of node numbers and word offsets.

Author:
Martin Haye

Field Summary
 
Fields inherited from class ChunkedWordIter
chunk, chunkSource
 
Fields inherited from class BasicWordIter
maxWordPos, text, tokens, tokNum, wordPos
 
Fields inherited from interface WordIter
FIELD_END, FIELD_START, TERM_END, TERM_END_PLUS, TERM_START
 
Constructor Summary
XtfChunkedWordIter(IndexReader reader, DocNumMap docNumMap, int mainDocNum, String field, Analyzer analyzer)
          Construct the iterator and read in starting text from the given chunk.
 
Method Summary
 MarkPos getPos(int startOrEnd)
          Create an uninitialized MarkPos structure
 void getPos(MarkPos pos, int startOrEnd)
          Get the position of the start of the current word
 
Methods inherited from class ChunkedWordIter
createPos, next, prev, reseek, reseek, seekFirst, seekLast
 
Methods inherited from class BasicWordIter
clone, term
 
Methods inherited from class Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

XtfChunkedWordIter

public XtfChunkedWordIter(IndexReader reader,
                          DocNumMap docNumMap,
                          int mainDocNum,
                          String field,
                          Analyzer analyzer)
Construct the iterator and read in starting text from the given chunk.

Parameters:
reader - where to read chunks from
docNumMap - maps main doc num to chunk numbers
mainDocNum - doc ID of the main document
field - field tokenize and iterate
analyzer - used to tokenize the field
Method Detail

getPos

public MarkPos getPos(int startOrEnd)
Create an uninitialized MarkPos structure

Specified by:
getPos in interface WordIter
Overrides:
getPos in class BasicWordIter
Parameters:
startOrEnd - FIELD_START for the very start of the field; TERM_START for the first character of the word; TERM_END for the last character of the word; TERM_END_PLUS for the last character plus any trailing punctuation and/or spaces; FIELD_END for the very last end of the field.

getPos

public void getPos(MarkPos pos,
                   int startOrEnd)
Get the position of the start of the current word

Specified by:
getPos in interface WordIter
Overrides:
getPos in class ChunkedWordIter
startOrEnd - FIELD_START for the very start of the field; TERM_START for the first character of the word; TERM_END for the last character of the word; TERM_END_PLUS for the last character plus any trailing punctuation and/or spaces; FIELD_END for the very last end of the field.