|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
ObjectSnippetMaker
public class SnippetMaker
Does the heavy lifting of interpreting span hits using the actual document text stored in the index. Marks the hit and any matching terms, and includes a configurable amount of context words.
Nested Class Summary | |
---|---|
class |
SnippetMaker.StartEndStripper
Strips the special start-of-field/end-of-field markers from tokens. |
Field Summary | |
---|---|
private CharMap |
accentMap
Accented chars to remove diacritics from |
private static Pattern |
ampPattern
|
private Analyzer |
analyzer
Lucene analyzer used for tokenizing text |
private int |
chunkOverlap
Amount of overlap between adjacent index chunks |
private int |
chunkSize
Max # of words in an index chunk |
private DocNumMap |
docNumMap
Keeps track of which chunks belong to which source document in the index. |
private static Pattern |
gtPattern
|
private static Pattern |
ltPattern
|
private int |
maxContext
Target # of characters to include in the snippet. |
private WordMap |
pluralMap
Plural words to convert to singular |
IndexReader |
reader
Lucene index reader used to fetch text data |
private Set |
stopSet
Set of stop-words removed (e.g. |
private int |
termMode
Where to mark terms (all, only in spans, etc.) |
Constructor Summary | |
---|---|
SnippetMaker(IndexReader reader,
DocNumMap docNumMap,
Set stopSet,
WordMap pluralMap,
CharMap accentMap,
int maxContext,
int termMode)
Constructs a SnippetMaker, ready to make snippets using the given index reader to load text data. |
Method Summary | |
---|---|
CharMap |
accentMap()
Obtain the set of accented chars to remove diacritics from. |
DocNumMap |
docNumMap()
Obtain the document number map used to make snippets |
Snippet[] |
makeSnippets(FieldSpans fieldSpans,
int mainDocNum,
String fieldName,
boolean getText)
Full-blown snippet formation process. |
(package private) String |
mapXMLChars(String s)
Replaces 'special' characters in the given string with their XML equivalent. |
String |
markField(Document doc,
FieldSpans fieldSpans,
String fieldName,
String value)
Marks all the terms within the given text. |
WordMap |
pluralMap()
Obtain the set of plural words to convert to singular form. |
Set |
stopSet()
Obtain a list of stop-words in the index (e.g. |
Methods inherited from class Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public IndexReader reader
private Analyzer analyzer
private DocNumMap docNumMap
private int chunkSize
private int chunkOverlap
private Set stopSet
private WordMap pluralMap
private CharMap accentMap
private int maxContext
private int termMode
private static final Pattern ampPattern
private static final Pattern ltPattern
private static final Pattern gtPattern
Constructor Detail |
---|
public SnippetMaker(IndexReader reader, DocNumMap docNumMap, Set stopSet, WordMap pluralMap, CharMap accentMap, int maxContext, int termMode)
reader
- Index reader to fetch text data fromdocNumMap
- Maps chunk numbers to document numbersstopSet
- Stop words removed (e.g. "the", "a", "and", etc.)pluralMap
- Plural words to convert to singularaccentMap
- Accented chars to remove diacritics frommaxContext
- Target # chars for hit + contexttermMode
- Where to mark terms (all, only in spans, etc.)Method Detail |
---|
public Set stopSet()
public WordMap pluralMap()
public CharMap accentMap()
public DocNumMap docNumMap()
public Snippet[] makeSnippets(FieldSpans fieldSpans, int mainDocNum, String fieldName, boolean getText)
fieldSpans
- record of the matching spans, and all search termsmainDocNum
- document ID of the main docfieldName
- name of the field we're making snippets ofgetText
- true to get the full text of the snippet, false
if we only want the start/end offsets.public String markField(Document doc, FieldSpans fieldSpans, String fieldName, String value)
doc
- document to get matching spans fromfieldName
- name of the field to mark.value
- value of the field to mark
String mapXMLChars(String s)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |