public class SnippetMaker
extends Object
Modifier and Type | Class and Description |
---|---|
class |
SnippetMaker.StartEndStripper
Strips the special start-of-field/end-of-field markers from tokens.
|
Modifier and Type | Field and Description |
---|---|
private CharMap |
accentMap
Accented chars to remove diacritics from
|
private static Pattern |
ampPattern |
private Analyzer |
analyzer
Lucene analyzer used for tokenizing text
|
private int |
chunkOverlap
Amount of overlap between adjacent index chunks
|
private int |
chunkSize
Max # of words in an index chunk
|
private DocNumMap |
docNumMap
Keeps track of which chunks belong to which source document in the
index.
|
private static Pattern |
gtPattern |
private static Pattern |
ltPattern |
private int |
maxContext
Target # of characters to include in the snippet.
|
private WordMap |
pluralMap
Plural words to convert to singular
|
IndexReader |
reader
Lucene index reader used to fetch text data
|
private Set<String> |
returnMetaFields
List of metadata fields to return in the doc hits, or null for all
|
private Set |
stopSet
Set of stop-words removed (e.g.
|
private int |
termMode
Where to mark terms (all, only in spans, etc.)
|
private Set |
tokFields
The fields that were specified as tokenized at index time.
|
Constructor and Description |
---|
SnippetMaker(IndexReader reader,
DocNumMap docNumMap,
Set stopSet,
WordMap pluralMap,
CharMap accentMap,
Set tokFields,
int maxContext,
int termMode,
String returnMetaFields)
Constructs a SnippetMaker, ready to make snippets using the given
index reader to load text data.
|
Modifier and Type | Method and Description |
---|---|
CharMap |
accentMap()
Obtain the set of accented chars to remove diacritics from.
|
DocNumMap |
docNumMap()
Obtain the document number map used to make snippets
|
Snippet[] |
makeSnippets(FieldSpans fieldSpans,
int mainDocNum,
String fieldName,
boolean getText)
Full-blown snippet formation process.
|
(package private) String |
mapXMLChars(String s)
Replaces 'special' characters in the given string with their XML
equivalent.
|
String |
markField(Document doc,
FieldSpans fieldSpans,
String fieldName,
String value)
Marks all the terms within the given text.
|
WordMap |
pluralMap()
Obtain the set of plural words to convert to singular form.
|
Set |
returnMetaFields()
Obtain the set of fields that should be returned in doc hits (null for all)
|
Set |
stopSet()
Obtain a list of stop-words in the index (e.g.
|
Set |
tokFields()
Obtain the set of tokenized fields
|
public IndexReader reader
private Analyzer analyzer
private DocNumMap docNumMap
private int chunkSize
private int chunkOverlap
private Set stopSet
private WordMap pluralMap
private CharMap accentMap
private Set tokFields
private int maxContext
private int termMode
private Set<String> returnMetaFields
private static final Pattern ampPattern
private static final Pattern ltPattern
private static final Pattern gtPattern
public SnippetMaker(IndexReader reader, DocNumMap docNumMap, Set stopSet, WordMap pluralMap, CharMap accentMap, Set tokFields, int maxContext, int termMode, String returnMetaFields)
reader
- Index reader to fetch text data fromdocNumMap
- Maps chunk numbers to document numbersstopSet
- Stop words removed (e.g. "the", "a", "and", etc.)pluralMap
- Plural words to convert to singularaccentMap
- Accented chars to remove diacritics frommaxContext
- Target # chars for hit + contexttermMode
- Where to mark terms (all, only in spans, etc.)returnMetaFields
- Optional comma-delimited subset of fields to return (instead of all by default).public Set stopSet()
public WordMap pluralMap()
public CharMap accentMap()
public DocNumMap docNumMap()
public Set tokFields()
public Set returnMetaFields()
public Snippet[] makeSnippets(FieldSpans fieldSpans, int mainDocNum, String fieldName, boolean getText)
fieldSpans
- record of the matching spans, and all search termsmainDocNum
- document ID of the main docfieldName
- name of the field we're making snippets ofgetText
- true to get the full text of the snippet, false
if we only want the start/end offsets.public String markField(Document doc, FieldSpans fieldSpans, String fieldName, String value)
doc
- document to get matching spans fromfieldName
- name of the field to mark.value
- value of the field to markString mapXMLChars(String s)