org.cdlib.xtf.textIndexer
Class AccentFoldingFilter

Object
  extended by TokenStream
      extended by TokenFilter
          extended by AccentFoldingFilter

public class AccentFoldingFilter
extends TokenFilter

Improves query results by converting accented characters to normal characters by removing diacritics.

Author:
Martin Haye

Field Summary
private  CharMap accentMap
          Set of characters to map
 
Fields inherited from class TokenFilter
input
 
Constructor Summary
AccentFoldingFilter(TokenStream input, CharMap accentMap)
          Construct a token stream to remove accents from the input tokens.
 
Method Summary
 Token next()
          Retrieve the next token in the stream.
 
Methods inherited from class TokenFilter
close
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

accentMap

private CharMap accentMap
Set of characters to map

Constructor Detail

AccentFoldingFilter

public AccentFoldingFilter(TokenStream input,
                           CharMap accentMap)
Construct a token stream to remove accents from the input tokens.

Parameters:
input - Input stream of tokens to process
accentMap - Map of accented characters to their un-accented counterparts.
Method Detail

next

public Token next()
           throws IOException
Retrieve the next token in the stream.

Specified by:
next in class TokenStream
Throws:
IOException