org.cdlib.xtf.textEngine
Class StdTermFilter

Object
  extended by StdTermFilter

public class StdTermFilter
extends Object

Performs standard tokenization activities for terms, such as mapping to lowercase, removing apostrophes, etc.

Author:
Martin Haye

Nested Class Summary
private  class StdTermFilter.DribbleStream
           
 
Field Summary
private  StdTermFilter.DribbleStream dribble
           
private  TokenStream filter
           
private static String SAVE_WILD_QMARK
          During tokenization, the '?'
private static String SAVE_WILD_STAR
          During tokenization, the '*' wildcard has to be changed to a word to keep it from being removed.
 
Constructor Summary
StdTermFilter()
          Construct the rewriter
 
Method Summary
 String filter(String term)
          Apply the standard mapping to the given term.
protected static String restoreWildcards(String s)
          Restores wildcards saved by saveWildcards(String).
protected static String saveWildcards(String s)
          Converts wildcard characters into word-looking bits that would never occur in real text, so the standard tokenizer will keep them part of words.
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

dribble

private StdTermFilter.DribbleStream dribble

filter

private TokenStream filter

SAVE_WILD_STAR

private static final String SAVE_WILD_STAR
During tokenization, the '*' wildcard has to be changed to a word to keep it from being removed.

See Also:
Constant Field Values

SAVE_WILD_QMARK

private static final String SAVE_WILD_QMARK
During tokenization, the '?' wildcard has to be changed to a word to keep it from being removed.

See Also:
Constant Field Values
Constructor Detail

StdTermFilter

public StdTermFilter()
Construct the rewriter

Method Detail

filter

public String filter(String term)
Apply the standard mapping to the given term.

Returns:
changed version, or original term if no change required.

saveWildcards

protected static String saveWildcards(String s)
Converts wildcard characters into word-looking bits that would never occur in real text, so the standard tokenizer will keep them part of words. Resurrect using restoreWildcards(String).


restoreWildcards

protected static String restoreWildcards(String s)
Restores wildcards saved by saveWildcards(String).