|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
ObjectTokenStream
Tokenizer
FastTokenizer
public class FastTokenizer
Like Lucene's StandardTokenizer, but handles the easy cases very quickly. Punts the hard cases to a real StandardTokenizer, but this is rare enough that the speed increase is very substantial. Does not currently support Chinese/Japanese/Korean, but adding this support would be pretty easy.
Nested Class Summary | |
---|---|
private class |
FastTokenizer.DribbleReader
This class is used, when the fast tokenizer encounters a questionable situation, to dribble out characters to a standard tokenizer that can do a more complete job. |
Field Summary | |
---|---|
private static char[] |
charType
|
private FastTokenizer.DribbleReader |
dribbleReader
Used to dribble out tokens to a standard tokenizer; used when we encounter a case that's hard to figure out. |
(package private) static char |
fakeChar
We use a special character to mark the end of a FastTokenizer.DribbleReader . |
(package private) static String |
fakeWord
This is the special word used by DribbleReader |
private int |
pos
Position within the source array |
private char[] |
source
Array of characters to read from |
private Tokenizer |
stdTokenizer
Standard tokenizer, used for hard cases only |
Fields inherited from class Tokenizer |
---|
input |
Constructor Summary | |
---|---|
FastTokenizer(FastStringReader reader)
Create a tokenizer that will tokenize the stream of characters from the given reader. |
Method Summary | |
---|---|
Token |
next()
Retrieve the next token in the stream, or null if there are no more. |
private static void |
setCharType(char type,
char from,
char to)
Utility method used when setting up the character type table |
Methods inherited from class Tokenizer |
---|
close |
Methods inherited from class Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private char[] source
private int pos
source
array
static final char fakeChar
FastTokenizer.DribbleReader
.
static final String fakeWord
private FastTokenizer.DribbleReader dribbleReader
private Tokenizer stdTokenizer
private static final char[] charType
Constructor Detail |
---|
public FastTokenizer(FastStringReader reader)
reader
- Reader to get data from.Method Detail |
---|
private static void setCharType(char type, char from, char to)
public Token next() throws IOException
next
in class TokenStream
IOException
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |