public class BigramQueryRewriter extends QueryRewriter
QueryRewriter.SpanClauseJoiner
Modifier and Type | Field and Description |
---|---|
protected int |
maxSlop
Maximum slop to allow in a query, based on the index being queried
|
protected HashSet |
removedTerms
Keeps track of all stop-words removed from the query
|
protected Set |
stopSet
Set of stop-words (e.g.
|
Constructor and Description |
---|
BigramQueryRewriter(Set stopSet,
int maxSlop)
Constructs a rewriter using the given stopword set.
|
Modifier and Type | Method and Description |
---|---|
protected SpanQuery[] |
bigramQueries(SpanQuery[] clauses,
int slop,
QueryRewriter.SpanClauseJoiner joiner)
Removes stop words from a set of consecutive queries by combining
them with adjacent non-stop-words.
|
protected SpanQuery |
bigramTermsExact(Query[] queries,
String[] terms,
QueryRewriter.SpanClauseJoiner joiner)
Given a sequence of terms consisting of mixed stop and real words,
figure out the bigrammed sequence required to get an exact match with
the index.
|
protected SpanQuery |
bigramTermsInexact(Query[] queries,
String[] terms,
QueryRewriter.SpanClauseJoiner joiner)
Given a sequence of terms consisting of mixed stop and real words,
figure out the bigrammed sequence that will give hits on at least
the real words, and give priority to ones that are near the closest
stop words.
|
protected SpanQuery |
convertToSpanQuery(Query q)
Converts non-span queries to span queries, and passes span queries through
unchanged.
|
protected Term |
extractTerm(Object obj)
Given a term query, span term query (or plain term), extract
the Term itself.
|
protected String |
extractTermText(Object obj)
Given a term, term query, span term query (or plain string), extract
the term text.
|
protected SpanQuery |
glomInside(SpanChunkedNotQuery nq,
SpanTermQuery term,
boolean before)
Gloms the term onto each clause within a NOT query.
|
protected SpanQuery |
glomInside(SpanNotNearQuery nq,
SpanTermQuery term,
boolean before)
Gloms the term onto each clause within a NOT query.
|
protected SpanQuery |
glomInside(SpanOrQuery oq,
SpanTermQuery term,
boolean before)
Gloms the term onto each clause within an OR query.
|
protected Query |
glomQueries(Query q1,
Query q2)
Joins a stop word to a real word, or vice-versa.
|
static boolean |
isBigram(Set stopWords,
String str)
Determines if the given string is an bi-gram of a real word with a
stop-word.
|
static Set |
makeStopSet(String stopWords)
Make a stop set given a space, comma, or semicolon delimited list of
stop words.
|
protected Term |
newTerm(String field,
String text)
Construct a term given its text and field name.
|
protected void |
reduceBoost(Query query)
Reduces the boost factor of a query (typically the non-bigram of a pair in
an OR) so that the bigram will get scored higher.
|
protected Query |
rewrite(BooleanQuery bq)
Rewrite a BooleanQuery.
|
protected Query |
rewrite(SpanNearQuery q)
Rewrite a span NEAR query.
|
protected Query |
rewrite(SpanOrNearQuery q)
Rewrite a span OR-NEAR query.
|
protected Query |
rewrite(SpanOrQuery q)
Rewrite a span-based OR query.
|
protected Query |
rewriteClauses(Query oldQuery,
SpanQuery[] oldClauses,
boolean shuntSingle,
boolean bigram,
int slop,
QueryRewriter.SpanClauseJoiner joiner)
Utility function that takes care of rewriting a series of span query
clauses.
|
combineBoost, copyBoost, copyBoost, forceRewrite, rewrite, rewrite, rewrite, rewrite, rewrite, rewrite, rewrite, rewrite, rewriteClauses, rewriteQuery
protected Set stopSet
protected int maxSlop
protected HashSet removedTerms
public BigramQueryRewriter(Set stopSet, int maxSlop)
stopSet
- Set of stopwords to remove or bi-gram. This can be
constructed easily by calling
makeStopSet(String)
.maxSlop
- Maximum slop to allow in a query, based on the index
being queried.public static Set makeStopSet(String stopWords)
stopWords
- String of words to make into a setBigramQueryRewriter
.public static boolean isBigram(Set stopWords, String str)
stopWords
- The set of stop-wordsstr
- The string to checkprotected Query rewrite(BooleanQuery bq)
rewrite
in class QueryRewriter
bq
- The query to rewriteprotected Query rewrite(SpanNearQuery q)
rewrite
in class QueryRewriter
q
- The query to rewriteprotected Query rewrite(SpanOrNearQuery q)
rewrite
in class QueryRewriter
q
- The query to rewriteprotected Query rewrite(SpanOrQuery q)
rewrite
in class QueryRewriter
q
- The query to rewriteprotected Query rewriteClauses(Query oldQuery, SpanQuery[] oldClauses, boolean shuntSingle, boolean bigram, int slop, QueryRewriter.SpanClauseJoiner joiner)
oldQuery
- Query being rewrittenoldClauses
- Clauses to rewriteshuntSingle
- true to allow single-clause result to be returned,
false to force wrapping.bigram
- true to bigram stop-words, false to simply remove themslop
- if bigramming, 0 for phrase, non-zero for nearjoiner
- Handles joining new clauses into wrapper queryprotected SpanQuery[] bigramQueries(SpanQuery[] clauses, int slop, QueryRewriter.SpanClauseJoiner joiner)
clauses
- array of queries to work onslop
- zero for exact matching, non-zero for 'near' matching.joiner
- used to join the resulting bi-grammed clausesprotected SpanQuery bigramTermsInexact(Query[] queries, String[] terms, QueryRewriter.SpanClauseJoiner joiner)
queries
- Original queries in the sequenceterms
- Corresponding term text of each queryjoiner
- Used to join the resulting bi-grammed clausesprotected SpanQuery convertToSpanQuery(Query q)
q
- Query to convert (span or non-span)protected Term newTerm(String field, String text)
text
- Text for the new termfield
- Field being queriedprotected SpanQuery bigramTermsExact(Query[] queries, String[] terms, QueryRewriter.SpanClauseJoiner joiner)
queries
- Original queries in the sequenceterms
- Corresponding term text of each queryjoiner
- Used to join the resulting bi-grammed clausesprotected Query glomQueries(Query q1, Query q2)
q1
- First queryq2
- Second queryprotected SpanQuery glomInside(SpanOrQuery oq, SpanTermQuery term, boolean before)
oq
- Query to glom intoterm
- Term to glom onbefore
- true to prepend the term, false to append.protected SpanQuery glomInside(SpanChunkedNotQuery nq, SpanTermQuery term, boolean before)
nq
- Query to glom intoterm
- Term to glom onbefore
- true to prepend the term, false to append.protected SpanQuery glomInside(SpanNotNearQuery nq, SpanTermQuery term, boolean before)
nq
- Query to glom intoterm
- Term to glom onbefore
- true to prepend the term, false to append.protected String extractTermText(Object obj)
obj
- String, Term, TermQuery, or SpanTermQuery to checkprotected Term extractTerm(Object obj)
obj
- Term, TermQuery, or SpanTermQuery to checkprotected void reduceBoost(Query query)