The eXtensible Text Framework (XTF) is a powerful open source platform for providing access to digital content. Developed and maintained by the California Digital Library (CDL), XTF functions as the primary access technology for the CDL’s digital collections and other digital projects worldwide.
XTF consists of Java and XSLT 2.0 code that indexes, queries, and displays digital objects. The software is actively maintained and supported by CDL developers and is in use at institutions across the world. The XTF source code is based on open source software (e.g., Lucene, Saxon) and is itself freely available for developers to download, install and configure. Developers from government agencies, university presses, and other cultural heritage institutions such as OCLC are currently experimenting with XTF.
Features and Benefits
XTF allows end users to:
- Search using Boolean commands, truncation/wildcard operators, and exact phrases.
- Perform structure-aware searching (e.g., search only this chapter) and view search terms in context.
- Browse hierarchical facets.
- Create RSS feeds from searches.
- Choose from several default languages for the interface: English, French, Spanish, German, Italian.
XTF offers the following benefits to developers:
- Easy to deploy: Drops right in to a Java application server such as Tomcat; has been tested on SunOS, Linux, and Windows.
- Easy to configure: Can create indexes on any XML element or attribute; entire presentation layer is customizable via XSLT.
- Robust: Optimized to perform well on large documents (e.g., a single text that exceeds 10MB of encoded text); scales to perform well on collections of millions of documents; provides full Unicode support.
- Major “out-of-the-box” features: user interface with search/browse and document views, spell checker, bookbags, similar item suggestions, RSS feeds, book reader for Internet Archive/Hathi Trust books, support for major filetypes, interface globalization and more.
- Works well with a variety of authentication systems (e.g., IP address lists, LDAP, Shibboleth).
- Provides an interface for external data lookups to support thesaurus-based term expansion, recommender systems, etc.
- Can power other digital library services (e.g., OAI-PMH data provider that allows others to harvest metadata, SRU interface that exposes searches to federated search engines).
- Modular components can be deployed as separate pieces of a third-party system (e.g., the module that displays snippets of matching text).
XTF provides out-of-the-box support for the following types of documents:
- Microsoft Word
- Web pages (html/htm)
- XML encoded
- plain text
- Scanned books from Internet Archive and HathiTrust
The XTF system is divided into four components:
- crossQuery: The front-end to the collection search system.
- dynaXML: Interface to individual documents.
- Text Engine: Used by crossQuery and dynaXML to perform text searches.
- Indexer: Full-text indexer based on Lucene.
The following diagrams give a general overview of how documents are indexed, stored, queried, retrieved, and displayed using XTF:
A general illustration showing the roles the XTF components play in the user experience.
A more detailed view of the collection searching process, covering query parsing and results formatting.
Individual Object Display
A more detailed view of the object display and internal search mechanisms, covering request parsing, authentication, and document formatting experience.
An illustration of the workflow for the creation of collection indexes.