org.apache.lucene.chunk
Class ChunkSource

Object
  extended by ChunkSource
Direct Known Subclasses:
XtfChunkSource

public class ChunkSource
extends Object

Reads and caches chunks from an index.


Field Summary
protected  Analyzer analyzer
          Analyzer to use for tokenizing the text
protected  int chunkBump
          Number of words per chunk minus the overlap
protected  LinkedList chunkCache
          Cache of recently loaded chunks
protected  int chunkCacheSize
          Max # of chunks to cache
protected  int chunkOverlap
          Numer of words one chunk overlaps with the next
protected  int chunkSize
          Max number of words per chunk
protected  DocNumMap docNumMap
          Map of document to chunk numbers
protected  String field
          Field to read from the chunks
protected  int firstChunk
          First chunk in the document
protected  int lastChunk
          Last chunk in the document
protected  int mainDocNum
          The main document number
protected  IndexReader reader
          Reader to load chunk text from
 
Constructor Summary
ChunkSource(IndexReader reader, DocNumMap docNumMap, int mainDocNum, String field, Analyzer analyzer)
          Construct the iterator and read in starting text from the given chunk.
 
Method Summary
protected  Chunk createChunkTokens(int chunkNum)
          Create a new storage place for chunk tokens (derived classes may wish to override)
 int getChunkOverlap()
          Retrieve the number of words one chunk overlaps with the next
 int getChunkSize()
          Retrieve the max number of words per chunk
 boolean inMainDoc(int chunkNum)
          Check if the given chunk is contained within the main document for this chunk source.
 Chunk loadChunk(int chunkNum)
          Read in and tokenize a chunk.
protected  void loadText(int chunkNum, Chunk chunk)
          Read the text for the given chunk (derived classes may wish to override)
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

reader

protected IndexReader reader
Reader to load chunk text from


docNumMap

protected DocNumMap docNumMap
Map of document to chunk numbers


mainDocNum

protected int mainDocNum
The main document number


chunkSize

protected int chunkSize
Max number of words per chunk


chunkOverlap

protected int chunkOverlap
Numer of words one chunk overlaps with the next


chunkBump

protected int chunkBump
Number of words per chunk minus the overlap


firstChunk

protected int firstChunk
First chunk in the document


lastChunk

protected int lastChunk
Last chunk in the document


field

protected String field
Field to read from the chunks


analyzer

protected Analyzer analyzer
Analyzer to use for tokenizing the text


chunkCache

protected LinkedList chunkCache
Cache of recently loaded chunks


chunkCacheSize

protected int chunkCacheSize
Max # of chunks to cache

Constructor Detail

ChunkSource

public ChunkSource(IndexReader reader,
                   DocNumMap docNumMap,
                   int mainDocNum,
                   String field,
                   Analyzer analyzer)
Construct the iterator and read in starting text from the given chunk.

Parameters:
reader - where to read the chunks from
docNumMap - provides a mapping from main document number to to chunk numbers.
mainDocNum - is the document ID of the main doc
field - is the name of the field to read in
analyzer - will be used to tokenize the stored field contents
Method Detail

createChunkTokens

protected Chunk createChunkTokens(int chunkNum)
Create a new storage place for chunk tokens (derived classes may wish to override)


inMainDoc

public boolean inMainDoc(int chunkNum)
Check if the given chunk is contained within the main document for this chunk source. Essentially, if the chunk number is beyond the first or last chunks, or is deleted, it's not in the main doc.


loadText

protected void loadText(int chunkNum,
                        Chunk chunk)
                 throws IOException
Read the text for the given chunk (derived classes may wish to override)

Throws:
IOException

loadChunk

public Chunk loadChunk(int chunkNum)
Read in and tokenize a chunk. Maintains a cache of recently loaded chunks for speed.


getChunkSize

public int getChunkSize()
Retrieve the max number of words per chunk


getChunkOverlap

public int getChunkOverlap()
Retrieve the number of words one chunk overlaps with the next