org.cdlib.xtf.textEngine
Class BoundedWordIter
Object
BasicWordIter
BoundedWordIter
- All Implemented Interfaces:
- Cloneable, WordIter
class BoundedWordIter
- extends BasicWordIter
Just like a BasicWordIter, except that it enforces "soft" boundaries if
the source text contains XTF "bump markers" of a certain size. Basically,
this prevents snippets from spanning section boundaries, or the boundaries
between different fields of the same name.
- Author:
- Martin Haye
Field Summary |
(package private) int |
boundSize
|
Constructor Summary |
BoundedWordIter(String text,
TokenStream stream,
int boundSize)
Construct a bounded word iterator on the given text. |
Method Summary |
MarkPos |
getPos(int startOrEnd)
Create a new place to hold position info |
void |
getPos(MarkPos pos,
int startOrEnd)
Get the position of the end of the current word. |
boolean |
next(boolean force)
Advance to the next token. |
boolean |
prev(boolean force)
Go to the previous token. |
Methods inherited from class Object |
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
boundSize
int boundSize
BoundedWordIter
public BoundedWordIter(String text,
TokenStream stream,
int boundSize)
throws IOException
- Construct a bounded word iterator on the given text. The tokens from
the stream must refer to the same text. The skip() method works as
normal, but next() and prev() will enforce a soft boundary for any
tokens where the position offset meets or exceeds boundSize.
- Throws:
IOException
next
public final boolean next(boolean force)
- Advance to the next token.
- Specified by:
next
in interface WordIter
- Overrides:
next
in class BasicWordIter
- Parameters:
force
- true to ignore section boundaries
- Returns:
- true if ok, false if no more.
prev
public final boolean prev(boolean force)
- Go to the previous token.
- Specified by:
prev
in interface WordIter
- Overrides:
prev
in class BasicWordIter
- Parameters:
force
- true to ignore section boundaries
- Returns:
- true if ok, false if no more.
getPos
public MarkPos getPos(int startOrEnd)
- Create a new place to hold position info
- Specified by:
getPos
in interface WordIter
- Overrides:
getPos
in class BasicWordIter
- Parameters:
startOrEnd
- FIELD_START for the very start of the field;
TERM_START for the first character of the word;
TERM_END for the last character of the word;
TERM_END_PLUS for the last character plus any trailing
punctuation and/or spaces;
FIELD_END for the very last end of the field.
getPos
public void getPos(MarkPos pos,
int startOrEnd)
- Get the position of the end of the current word.
- Specified by:
getPos
in interface WordIter
- Overrides:
getPos
in class BasicWordIter
startOrEnd
- FIELD_START for the very start of the field;
TERM_START for the first character of the word;
TERM_END for the last character of the word;
TERM_END_PLUS for the last character plus any trailing
punctuation and/or spaces;
FIELD_END for the very last end of the field.