|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
ObjectQueryRewriter
BigramQueryRewriter
public class BigramQueryRewriter
Rewrites a query to eliminate stop words by combining them with adjacent non-stop-words, forming "bi-grams" (or bi-grams with 2 words). This is a fairly in-depth process, as bi-gramming across NEAR and OR queries is complex.
Nested Class Summary |
---|
Nested classes/interfaces inherited from class QueryRewriter |
---|
QueryRewriter.SpanClauseJoiner |
Field Summary | |
---|---|
protected int |
maxSlop
Maximum slop to allow in a query, based on the index being queried |
protected HashSet |
removedTerms
Keeps track of all stop-words removed from the query |
protected Set |
stopSet
Set of stop-words (e.g. |
Constructor Summary | |
---|---|
BigramQueryRewriter(Set stopSet,
int maxSlop)
Constructs a rewriter using the given stopword set. |
Method Summary | |
---|---|
protected SpanQuery[] |
bigramQueries(SpanQuery[] clauses,
int slop,
QueryRewriter.SpanClauseJoiner joiner)
Removes stop words from a set of consecutive queries by combining them with adjacent non-stop-words. |
protected SpanQuery |
bigramTermsExact(Query[] queries,
String[] terms,
QueryRewriter.SpanClauseJoiner joiner)
Given a sequence of terms consisting of mixed stop and real words, figure out the bigrammed sequence required to get an exact match with the index. |
protected SpanQuery |
bigramTermsInexact(Query[] queries,
String[] terms,
QueryRewriter.SpanClauseJoiner joiner)
Given a sequence of terms consisting of mixed stop and real words, figure out the bigrammed sequence that will give hits on at least the real words, and give priority to ones that are near the closest stop words. |
protected SpanQuery |
convertToSpanQuery(Query q)
Converts non-span queries to span queries, and passes span queries through unchanged. |
protected Term |
extractTerm(Object obj)
Given a term query, span term query (or plain term), extract the Term itself. |
protected String |
extractTermText(Object obj)
Given a term, term query, span term query (or plain string), extract the term text. |
protected SpanQuery |
glomInside(SpanChunkedNotQuery nq,
SpanTermQuery term,
boolean before)
Gloms the term onto each clause within a NOT query. |
protected SpanQuery |
glomInside(SpanNotNearQuery nq,
SpanTermQuery term,
boolean before)
Gloms the term onto each clause within a NOT query. |
protected SpanQuery |
glomInside(SpanOrQuery oq,
SpanTermQuery term,
boolean before)
Gloms the term onto each clause within an OR query. |
protected Query |
glomQueries(Query q1,
Query q2)
Joins a stop word to a real word, or vice-versa. |
static boolean |
isBigram(Set stopWords,
String str)
Determines if the given string is an bi-gram of a real word with a stop-word. |
static Set |
makeStopSet(String stopWords)
Make a stop set given a space, comma, or semicolon delimited list of stop words. |
protected Term |
newTerm(String field,
String text)
Construct a term given its text and field name. |
protected void |
reduceBoost(Query query)
Reduces the boost factor of a query (typically the non-bigram of a pair in an OR) so that the bigram will get scored higher. |
protected Query |
rewrite(BooleanQuery bq)
Rewrite a BooleanQuery. |
protected Query |
rewrite(SpanNearQuery q)
Rewrite a span NEAR query. |
protected Query |
rewrite(SpanOrNearQuery q)
Rewrite a span OR-NEAR query. |
protected Query |
rewrite(SpanOrQuery q)
Rewrite a span-based OR query. |
protected Query |
rewriteClauses(Query oldQuery,
SpanQuery[] oldClauses,
boolean shuntSingle,
boolean bigram,
int slop,
QueryRewriter.SpanClauseJoiner joiner)
Utility function that takes care of rewriting a series of span query clauses. |
Methods inherited from class QueryRewriter |
---|
combineBoost, copyBoost, copyBoost, forceRewrite, rewrite, rewrite, rewrite, rewrite, rewrite, rewrite, rewrite, rewrite, rewriteClauses, rewriteQuery |
Methods inherited from class Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected Set stopSet
protected int maxSlop
protected HashSet removedTerms
Constructor Detail |
---|
public BigramQueryRewriter(Set stopSet, int maxSlop)
stopSet
- Set of stopwords to remove or bi-gram. This can be
constructed easily by calling
makeStopSet(String)
.maxSlop
- Maximum slop to allow in a query, based on the index
being queried.Method Detail |
---|
public static Set makeStopSet(String stopWords)
stopWords
- String of words to make into a set
BigramQueryRewriter
.public static boolean isBigram(Set stopWords, String str)
stopWords
- The set of stop-wordsstr
- The string to check
protected Query rewrite(BooleanQuery bq)
rewrite
in class QueryRewriter
bq
- The query to rewrite
protected Query rewrite(SpanNearQuery q)
rewrite
in class QueryRewriter
q
- The query to rewrite
protected Query rewrite(SpanOrNearQuery q)
rewrite
in class QueryRewriter
q
- The query to rewrite
protected Query rewrite(SpanOrQuery q)
rewrite
in class QueryRewriter
q
- The query to rewrite
protected Query rewriteClauses(Query oldQuery, SpanQuery[] oldClauses, boolean shuntSingle, boolean bigram, int slop, QueryRewriter.SpanClauseJoiner joiner)
oldQuery
- Query being rewrittenoldClauses
- Clauses to rewriteshuntSingle
- true to allow single-clause result to be returned,
false to force wrapping.bigram
- true to bigram stop-words, false to simply remove themslop
- if bigramming, 0 for phrase, non-zero for nearjoiner
- Handles joining new clauses into wrapper query
protected SpanQuery[] bigramQueries(SpanQuery[] clauses, int slop, QueryRewriter.SpanClauseJoiner joiner)
clauses
- array of queries to work onslop
- zero for exact matching, non-zero for 'near' matching.joiner
- used to join the resulting bi-grammed clauses
protected SpanQuery bigramTermsInexact(Query[] queries, String[] terms, QueryRewriter.SpanClauseJoiner joiner)
queries
- Original queries in the sequenceterms
- Corresponding term text of each queryjoiner
- Used to join the resulting bi-grammed clauses
protected SpanQuery convertToSpanQuery(Query q)
q
- Query to convert (span or non-span)
protected Term newTerm(String field, String text)
text
- Text for the new termfield
- Field being queried
protected SpanQuery bigramTermsExact(Query[] queries, String[] terms, QueryRewriter.SpanClauseJoiner joiner)
queries
- Original queries in the sequenceterms
- Corresponding term text of each queryjoiner
- Used to join the resulting bi-grammed clauses
protected Query glomQueries(Query q1, Query q2)
q1
- First queryq2
- Second query
protected SpanQuery glomInside(SpanOrQuery oq, SpanTermQuery term, boolean before)
oq
- Query to glom intoterm
- Term to glom onbefore
- true to prepend the term, false to append.
protected SpanQuery glomInside(SpanChunkedNotQuery nq, SpanTermQuery term, boolean before)
nq
- Query to glom intoterm
- Term to glom onbefore
- true to prepend the term, false to append.
protected SpanQuery glomInside(SpanNotNearQuery nq, SpanTermQuery term, boolean before)
nq
- Query to glom intoterm
- Term to glom onbefore
- true to prepend the term, false to append.
protected String extractTermText(Object obj)
obj
- String, Term, TermQuery, or SpanTermQuery to check
protected Term extractTerm(Object obj)
obj
- Term, TermQuery, or SpanTermQuery to check
protected void reduceBoost(Query query)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |