Speed up queries containing stop-words. The default behavior in Lucene is to ignore stop-words; this makes query processing efficient because it doesn't have to evaluate each and every instance of words such as "the", "is", "and", etc. However, in the real world users do query for these words and want results that reflect the stop-words.
A technique has been developed to deal with this, called "n-grams" by Doug Cutting. At index time, stop words are combined with adjacent non-stop words instead of being thrown away. For instance, "man in the moon" might be indexed as "man man-in in-the the-moon moon". A similar transformation is performed at query time, so a user's query for "the moon" might become "(moon OR the-moon)". The scoring system will naturally favor hits on "the-moon" (since it will be rarer than just "moon") and the user will get hits back reflecting the stop-word they typed in. But the query is still very efficient to process.
This package contains two classes that perform n-gramming (in this case, 2-gramming):