[ You are here: XTF -> Experimental Features -> Boost Sets ]

Boost Sets

crossQuery includes a feature that allows a Boost Set to be specified in the query produced by the Query Parser. This set specifies, for each document, a boost factor to be applied to that document. This allows experimentation with different algorithms that assign a global factor to each document (for instance, Google's PageRank algorithm assigns just such a factor.) A subsequent query could specify a completely different boost set, allowing quick side-by-side testing of various global ranking algorithms.

This feature should be considered very experimental, and may be removed at some future time.

A boost set is specified by a boostSet="boostFilePath" attribute to the top-level <query> element produced by the Query Parser stylesheet. The attribute value should specify a path, relative to the XTF base directory, of a boost set file. Additionally you must specify which meta-data field the keys in the given file should match by adding a boostSetField="fieldName" attribute to the query.

The format of a boost set file is very simple: it should consist of one text line per document, and each line should contain a value from the meta-data field specified by boostSetField, followed by a | symbol, followed by a factor to be multiplied into the score for that document. For example:
  doc1|1.5
  doc2|2.0
  doc4|0.722
Boost factors greater that 1.0 will increase the ranking of a document; factors between 0.0 and 1.0 will decrease the ranking of the document; factors less than 0.0 are not valid.

Boost values are multiplied in to the document's basic score calculated by the Text Engine. However, the impact can be subtle. To get a better idea of what is going on, you can turn off score normalization by adding normalizeScores="false" to the query element generated by your Query Parser stylesheet. This will turn off the default behavior which is to scale all the scores so that the top document receives a score of 100.

The lines in the file must be listed in ascending order by value. Each value in the file will be matched to an entry in the index. Any documents not matched in the file are considered to have a boost factor of 1.0 (that is, their scores are unaltered.)

Warnings will be logged if values in the file cannot be matched to index entries, and also if any lines are out of order in the file.

Note that if the boost file is very large, it may take some time to read and process the file the first time it is used, but subsequent accesses will be very fast as the result is cached in memory by the crossQuery servlet.