|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Interface Summary | |
DocumentIterator | An iterator over documents and their intervals. |
IntervalIterator | An interface that allows one to iterate over intervals. |
Class Summary | |
AbstractIntersectionDocumentIterator | An abstract iterator on documents, generating the intersection of the documents returned by a number of document iterators. |
AndDocumentIterator | An iterator on documents that returns the AND of a number of document iterators. |
ConsecutiveDocumentIterator | An iterator returning documents containing consecutive intervals satisfying the underlying queries; the intervals must be in query order. |
DocumentIterators | A class providing static methods and objects that do useful things with document iterators. |
Interval | An integral interval. |
IntervalIterators | A class providing static methods and objects that do useful things with interval iterators. |
IntervalIterators.EmptyIntervalIterator | An iterator returning no intervals. |
IntervalIterators.FakeIterator | An iterator that throws an exception on all method calls, except for IntervalIterators.FakeIterator.hasNext() ,
which has a settable value. |
Intervals | A class providing static methods and objects that do useful things with intervals. |
LowPassDocumentIterator | A document iterator that filters another document iterator, returning just intervals (and containing documents) whose length does not exceed a given threshold. |
NotDocumentIterator | A document iterator that returns documents not returned by its underlying iterator,
and returns just IntervalIterators.TRUE on all interval iterators. |
OrDocumentIterator | A document iterator that ORs given component iterators. |
Iterators over documents, and composition thereof.
This package contains the classes that allow to compose iterators over
documents. Such iterators are returned, for instance, by IndexReader.documents(int)
.
MG4J provides minimal-interval semantics. That is, if the index
is full-text, a document iterator will provide a list of documents and, for
each document, a list of minimal intervals. This intervals denote ranges of
positions in the document that satisfy the iterator: for instance, if you
compose two documents iterators using an AndDocumentIterator
, you will get as a result the
intersection of the document lists of the underlying iterators. Moreover,
for each document you will get the minimal set of intervals that contain
one interval both from the first iterators and from the second one.
This information is of course very useful if you're going to assign a
score to the document, as smaller intervals mean a more precise match. At
the basic level (e.g., iterators returned by an index), the intervals
returned upon a document are intervals of length one containing the term
that was used to generate the iterator. Intervals for compound iterators
are built in a natural way, preserving minimality. More details can be
found in Charles L. A. Clarke and Gordon V. Cormack, Shortest-Substring
Retrieval and Ranking (ACM Transactions on Information Systems,
vol. 18, no. 1, Jan 2000, pages 44−78). Scorers for documents may be
found in the it.unimi.dsi.mg4j.search.score
package.
Note that MG4J provides minimal-interval semantics for a set of indices. This extension is a significant improvement over single-index semantics. However, defining the exact meaning of a query is a nontrivial problem that will be fully dealt with in a forthcoming paper.
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |