[ You are here: XTF -> Programming -> crossQuery -> Spelling Correction ]

Table of Contents

Spelling Correction
Creating a Spelling Correction Dictionary
Activating Spelling Correction
Formatting the Suggestions

Spelling Correction

Users often misspell words when they're querying an XTF index, and it would be nice if the system could catch the most obvious errors and automatically suggest an appropriate spelling correction. XTF provides a facility for easily achieving this "Did you mean...?" functionality.

The following sections discuss how to add a spelling dictionary at index time, conditionally activate spelling correction at query time, and how the resulting suggestions make it to the result formatter. If you're interested in the details of the algorithm or data structures used, please see XTF Under the Hood.

Creating a Spelling Correction Dictionary

Before spelling correction can be activated in crossQuery, XTF needs a specially calculated spelling correction dictionary. This dictionary is calculated at the time the XTF index is created, and is based on the words found in the indexed documents (rather than on a standard word dictionary.) The advantage of this dynamic process is that proper names and foreign words that wouldn't appear in a standard dictionary can and will be included in the XTF spelling dictionary.

To turn on spelling dictionary generation, it must be enabled in the textIndexer configuration file: conf/textIndexer.conf. Simply add a tag like this:
<spellcheck createDict="yes"/>
Now when the indexer runs, it will accumulate words from the document text, and at the end of the indexing run it will generate the spelling correction dictionary. Note that dictionary generation is quite fast, generally a small fraction of the overall indexing time and disk space, so it usually makes sense to leave spelling correction enabled (it is by default.)

Activating Spelling Correction

To turn on spelling correction in crossQuery, modify the Query Parser stylesheet to add a new element just under the top-level <query> tag:
<spellcheck/>
This requests that the servlet use its default parameters to decide when a query is probably misspelled (i.e. 10 or fewer document hits), which terms in a user's query are probably misspelled, and the best suggestions for those terms. If you wish to exert finer control over the process, see the Spelling Correction Tag reference for details.

If the documents that result from a query are of insufficient quantity or score (controlled by attributes above), the Text Engine will check each query term for possible spelling correction. If any corrections are found, they're ranked for "best fit" and the best suggestion sent to the Result Formatter stylesheet.

Formatting the Suggestions

Spelling corrections appear in the results under a special Spelling Result tag, with one Spelling Suggestion tag per misspelled word in the original query. The Result Formatter stylesheet needs to recognize these tags and display the suggestions in the final HTML result sent to the user's browser.

For example, if the user queried for who kiled harrypotter (and your indexes documents discuss that young wizard), then crossQuery might send the following to your Result Formatter:
<spelling>
  <suggestion origTerm="kiled" suggestedTerm="killed">
  <suggestion origTerm="harrypotter" suggestedTerm="harry potter">
</spelling>
Your job then is to take these suggestions, display them to the user, and give them a link that implements the changed query. The default stylesheets that come with the XTF distribution do this, but if you have customized the query parsing and result formatting, you may need to modify this XSLT code to match your customizations.