org.apache.lucene.chunk
Class ChunkedWordIter

Object
  extended by BasicWordIter
      extended by ChunkedWordIter
All Implemented Interfaces:
Cloneable, WordIter
Direct Known Subclasses:
XtfChunkedWordIter

public class ChunkedWordIter
extends BasicWordIter
implements Cloneable

Iterates over words in a large document that has been broken up into many overlapping Chunks. Applies section limits at empty chunks (section limits can be overcome in any method to which they apply by simply setting the 'force' parameter.)


Field Summary
protected  Chunk chunk
          Current chunk whose tokens we're currently traversing
protected  ChunkSource chunkSource
          Source for fetching chunks
 
Fields inherited from class BasicWordIter
maxWordPos, text, tokens, tokNum, wordPos
 
Fields inherited from interface WordIter
FIELD_END, FIELD_START, TERM_END, TERM_END_PLUS, TERM_START
 
Constructor Summary
ChunkedWordIter(ChunkSource chunkSource)
          Construct the iterator to access text from the given chunk source.
 
Method Summary
 MarkPos createPos()
           
 void getPos(MarkPos pos, int startOrEnd)
          Replace the position within a MarkPos created by WordIter.getPos(int) using the iterator's current position.
 boolean next(boolean force)
          Advance to the next word.
 boolean prev(boolean force)
          Back up to the previous word.
protected  void reseek(Chunk toChunk)
           
protected  void reseek(int targetPos)
           
 void seekFirst(int targetPos, boolean force)
          Reposition the iterator at the first word whose position is greater than or equal to 'wordPos'.
 void seekLast(int targetPos, boolean force)
          Reposition the iterator at the last word whose position is less than or equal to 'wordPos'.
 
Methods inherited from class BasicWordIter
clone, getPos, term
 
Methods inherited from class Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

chunkSource

protected ChunkSource chunkSource
Source for fetching chunks


chunk

protected Chunk chunk
Current chunk whose tokens we're currently traversing

Constructor Detail

ChunkedWordIter

public ChunkedWordIter(ChunkSource chunkSource)
Construct the iterator to access text from the given chunk source.

Parameters:
chunkSource - Source to read chunks from.
Method Detail

next

public boolean next(boolean force)
Description copied from interface: WordIter
Advance to the next word. If 'force' is set, ignore any section boundary between this word and the next.

Specified by:
next in interface WordIter
Overrides:
next in class BasicWordIter
Parameters:
force - true to ignore section boundaries
Returns:
true if there was another word to advance to, false if we've reached then end (or, if 'force' is false, a section boundary).

prev

public boolean prev(boolean force)
Description copied from interface: WordIter
Back up to the previous word. If 'force' is set, ignore any section boundary between this word and the previous.

Specified by:
prev in interface WordIter
Overrides:
prev in class BasicWordIter
Parameters:
force - true to ignore section boundaries
Returns:
true if there was room to back up, false if we've reached the start (or, if 'force' is false, a section boundary).

reseek

protected void reseek(int targetPos)

reseek

protected void reseek(Chunk toChunk)

seekFirst

public void seekFirst(int targetPos,
                      boolean force)
Description copied from interface: WordIter
Reposition the iterator at the first word whose position is greater than or equal to 'wordPos'.

Specified by:
seekFirst in interface WordIter
Overrides:
seekFirst in class BasicWordIter
Parameters:
targetPos - Position to seek to
force - true to ignore section boundaries

seekLast

public void seekLast(int targetPos,
                     boolean force)
Description copied from interface: WordIter
Reposition the iterator at the last word whose position is less than or equal to 'wordPos'.

Specified by:
seekLast in interface WordIter
Overrides:
seekLast in class BasicWordIter
Parameters:
targetPos - Position to seek to
force - true to ignore section boundaries

createPos

public MarkPos createPos()

getPos

public void getPos(MarkPos pos,
                   int startOrEnd)
Description copied from interface: WordIter
Replace the position within a MarkPos created by WordIter.getPos(int) using the iterator's current position.

Specified by:
getPos in interface WordIter
Overrides:
getPos in class BasicWordIter
startOrEnd - FIELD_START for the very start of the field; TERM_START for the first character of the word; TERM_END for the last character of the word; TERM_END_PLUS for the last character plus any trailing punctuation and/or spaces; FIELD_END for the very last end of the field.