org.cdlib.xtf.textEngine
Class XtfDocNumMap

Object
  extended by XtfDocNumMap
All Implemented Interfaces:
DocNumMap

public class XtfDocNumMap
extends Object
implements DocNumMap

Used to map chunk indexes to the corresponding document index, and vice-versa. Only performs the load when necessary (typically dynaXML uses the DocNumMap, while crossQuery doesn't.)

Author:
Martin Haye

Field Summary
private  int chunkOverlap
          Number of words one chunk overlaps with the next
private  int chunkSize
          Max number of words in a chunk
private  int[] docNums
          Array of indexes, one for each docInfo chunk
private  int high
          Used in binary searching
private  int low
          Used in binary searching
private  int nDocs
          Total number of docInfo chunks found
private  int prevNum
          Caches result of previous scan, used for speed
private  IndexReader reader
          Where to get the data from
 
Constructor Summary
XtfDocNumMap(IndexReader reader, int chunkSize, int chunkOverlap)
          Make a map for the given reader.
 
Method Summary
 int getChunkOverlap()
          Get the number of words one chunk overlaps with the next
 int getChunkSize()
          Get the max number of words per chunk
 int getDocCount()
          Return a count of the number of documents (not chunks) in the index.
 int getDocNum(int chunkNumber)
          Given a chunk number, return the corresponding document number that it is part of.
 int getFirstChunk(int docNum)
          Given a document number, this method returns the number of its first chunk.
 int getLastChunk(int docNum)
          Given a document number, this method returns the number of its last chunk.
private  void load()
           
private  void scan(int num)
          Perform a binary search looking for the given number.
 
Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

reader

private IndexReader reader
Where to get the data from


chunkSize

private int chunkSize
Max number of words in a chunk


chunkOverlap

private int chunkOverlap
Number of words one chunk overlaps with the next


nDocs

private int nDocs
Total number of docInfo chunks found


docNums

private int[] docNums
Array of indexes, one for each docInfo chunk


prevNum

private int prevNum
Caches result of previous scan, used for speed


low

private int low
Used in binary searching


high

private int high
Used in binary searching

Constructor Detail

XtfDocNumMap

public XtfDocNumMap(IndexReader reader,
                    int chunkSize,
                    int chunkOverlap)
             throws IOException
Make a map for the given reader. This reads in all the docInfo chunks to determine the range of text chunks for each document.

Throws:
IOException
Method Detail

load

private void load()

getChunkSize

public int getChunkSize()
Get the max number of words per chunk

Specified by:
getChunkSize in interface DocNumMap

getChunkOverlap

public int getChunkOverlap()
Get the number of words one chunk overlaps with the next

Specified by:
getChunkOverlap in interface DocNumMap

getDocCount

public final int getDocCount()
Return a count of the number of documents (not chunks) in the index.

Specified by:
getDocCount in interface DocNumMap

getDocNum

public final int getDocNum(int chunkNumber)
Given a chunk number, return the corresponding document number that it is part of. Note that like all Lucene indexes, this is ephemeral and only applies to the given reader. If not found, returns -1; this can basically only happen if the chunk number is greater than all document numbers.

Specified by:
getDocNum in interface DocNumMap
Parameters:
chunkNumber - Chunk number to translate
Returns:
Document index, or -1 if no match.

getFirstChunk

public final int getFirstChunk(int docNum)
Given a document number, this method returns the number of its first chunk.

Specified by:
getFirstChunk in interface DocNumMap

getLastChunk

public final int getLastChunk(int docNum)
Given a document number, this method returns the number of its last chunk.

Specified by:
getLastChunk in interface DocNumMap

scan

private void scan(int num)
Perform a binary search looking for the given number. On exit, the 'low' and 'high' member variables will be indexes into the array that bracket the value.

Parameters:
num - The number to look for.