|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectit.unimi.dsi.mg4j.index.Index
An abstract representation of an index.
An instance of this class stores index data such as the basename, flags, etc. It allows to build easily index readers and document iterators over the given index.
This class provides also a main method that can be used to dump data about an index for diagnostic purposes.
Field Summary | |
String |
basename
The basename of this index. |
int |
countCoding
The coding for counts. |
it.unimi.dsi.mg4j.index.Index.EmptyDocumentIterator |
emptyDocumentIterator
A singleton for an iterator returning no documents based on this index. |
int |
frequencyCoding
The coding for frequencies. |
boolean |
hasCounts
Whether this index contains counts. |
boolean |
hasPositions
Whether this index contains positions. |
File |
indexFile
The file containing the index. |
boolean |
isCaseSensitive
Whether this index is case sensitive. |
int |
maxDocPos
The maximum number of positions in an position list, or -1 if it is not known. |
int |
numberOfDocuments
The number of documents of the collection. |
int |
numberOfTerms
The number of terms of the collection. |
LongList |
offsets
The offset of each term, if offsets were loaded or specified at creation time, or null . |
int |
pointerCoding
The coding for pointers. |
int |
positionCoding
The coding for positions. |
Properties |
properties
The properties of this index. |
Set |
singletonSet
A singleton set containing just this index. |
IntList |
sizes
The size of each document, or null if sizes are not necessary in this index. |
TermMap |
termMap
The term list for this index, or null if the term list was not loaded. |
Fields inherited from interface it.unimi.dsi.mg4j.index.CompressionFlags |
ARITH, CODING_NAME, COUNTS_DEFAULT, COUNTS_DELTA, COUNTS_GAMMA, COUNTS_SHIFT, DELTA, FREQUENCIES_DEFAULT, FREQUENCIES_DELTA, FREQUENCIES_GAMMA, FREQUENCIES_SHIFT, GAMMA, GOLOMB, INTERP, NIBBLE, NO_COUNTS, NO_POSITIONS, NONE, POINTERS_DEFAULT, POINTERS_DELTA, POINTERS_GAMMA, POINTERS_GOLOMB, POINTERS_SHIFT, POSITIONS_ARITH, POSITIONS_DEFAULT, POSITIONS_DELTA, POSITIONS_GAMMA, POSITIONS_GOLOMB, POSITIONS_INTERP, POSITIONS_SHIFT, POSITIONS_SKEWED_GOLOMB, SKEWED_GOLOMB, UNARY, ZETA |
Constructor Summary | |
protected |
Index(CharSequence basename,
TermMap termMap,
boolean loadOffsets,
ProgressMeter pm)
Creates a new index using the given basename. |
|
Index(String basename,
File indexFile,
Properties properties,
int numberOfDocuments,
int numberOfTerms,
int maxDocPos,
boolean isCaseSensitive,
long flags,
TermMap termMap,
LongList offsets,
IntList sizes)
Creates a new index using the given data. |
Method Summary | |
static Index |
getInstance(CharSequence basename)
Creates a new index using the given basename. |
static Index |
getInstance(CharSequence basename,
boolean loadOffsets,
ProgressMeter pm)
Creates a new index using the given basename, loading offsets. |
static Index |
getInstance(CharSequence basename,
ProgressMeter pm)
Creates a new index using the given basename, loading offsets. |
static Index |
getInstance(CharSequence basename,
TermMap termMap,
boolean loadOffsets,
ProgressMeter pm)
Creates a new index using the given basename, loading offsets. |
IndexReader |
getReader()
Creates and returns a new IndexReader based on this index. |
IndexReader |
getReader(int bufferSize)
Creates and returns a new IndexReader based on this index. |
static TermMap |
loadTermMap(String filename)
Utility static method that loads a term map. |
static void |
main(String[] arg)
|
static LongList |
readOffsets(InputBitStream in,
int T,
ProgressMeter pm)
Utility method to load a compressed offset file into a list. |
static IntList |
readSizes(InputBitStream in,
int N,
ProgressMeter pm)
Utility method to load a compressed size file into a list. |
String |
toString()
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
public final String basename
null
if this index has been built using Index(String, File, Properties, int, int, int, boolean, long, TermMap, LongList, IntList)
.
public final Properties properties
null
if this index has been built using
Index(String, File, Properties, int, int, int, boolean, long, TermMap, LongList, IntList)
.
public final File indexFile
null
if this index has been built using
Index(String, File, Properties, int, int, int, boolean, long, TermMap, LongList, IntList)
.
public final int numberOfDocuments
public final int numberOfTerms
public final int maxDocPos
public final int frequencyCoding
public final int pointerCoding
public final int countCoding
public final int positionCoding
public final boolean hasCounts
public final boolean hasPositions
public final boolean isCaseSensitive
public final Set singletonSet
public final LongList offsets
null
.
public final IntList sizes
null
if sizes are not necessary in this index.
public final TermMap termMap
null
if the term list was not loaded.
public final it.unimi.dsi.mg4j.index.Index.EmptyDocumentIterator emptyDocumentIterator
Constructor Detail |
protected Index(CharSequence basename, TermMap termMap, boolean loadOffsets, ProgressMeter pm) throws IOException
basename
- the basename of the index.termMap
- the term list for this index, or null
if there is no term list.loadOffsets
- offsets are loaded only if this parameter is true
.pm
- an optional progress meter. If null
, no progress information will be displayed.public Index(String basename, File indexFile, Properties properties, int numberOfDocuments, int numberOfTerms, int maxDocPos, boolean isCaseSensitive, long flags, TermMap termMap, LongList offsets, IntList sizes) throws IOException
This constructor provides an index that is initialised exactly to the provided data.
It is mainly useful for debugging and testing purposes, in the case you are creating
an index file (for instance, as a memory-based bit stream) but you have no property
file to use with getInstance(CharSequence)
.
It is usually safe to provide null
for basename
, indexFile
and properties
, but the responsibility for data consistence is up to the caller.
basename
- the basename of this index, or null
.indexFile
- the file containing this index, or null
.properties
- the properties of this index, or null
.numberOfDocuments
- the number of documents in this index.numberOfTerms
- the number of terms in this index.maxDocPos
- the maximum length of an occurrence list, or -1 if it is not known.flags
- a bit mask setting the coding techniques to be used (see CompressionFlags
).termMap
- the term list for this index, or null
if there is no term list.offsets
- the offset list; may be null
if you do not plan using IndexReader.position(int)
.sizes
- the size list; may be null
if your code does not require it.Method Detail |
public static LongList readOffsets(InputBitStream in, int T, ProgressMeter pm) throws IOException
in
- the input bit stream providing the offsets (see IndexWriter
).T
- the number of terms indexed.pm
- an optional progress meter. If null
, no progress information will be displayed.
T
that gives the number
of bytes of the index file.
IOException
public static IntList readSizes(InputBitStream in, int N, ProgressMeter pm) throws IOException
in
- the input bit stream providing the offsets (see IndexWriter
).N
- the number of documents indexed.pm
- an optional progress meter. If null
, no progress information will be displayed.
IOException
public static TermMap loadTermMap(String filename) throws IOException
filename
- the name of the file containing the term map.
null
if the file did not exist.
IOException
- if some IOException (other than FileNotFoundException
) occurred.public static Index getInstance(CharSequence basename, TermMap termMap, boolean loadOffsets, ProgressMeter pm) throws IOException
basename
- the basename of the index.termMap
- the term list for this index, or null
if there is no term list.loadOffsets
- offsets are loaded only if this parameter is true
.pm
- an optional progress meter. If null
, no progress information will be displayed.
IOException
public static Index getInstance(CharSequence basename, boolean loadOffsets, ProgressMeter pm) throws IOException
null
.
basename
- the basename of the index.loadOffsets
- offsets are loaded only if this parameter is true
.pm
- an optional progress meter. If null
, no progress information will be displayed.
IOException
public static Index getInstance(CharSequence basename, ProgressMeter pm) throws IOException
null
.
basename
- the basename of the index.pm
- an optional progress meter. If null
, no progress information will be displayed.
IOException
public static Index getInstance(CharSequence basename) throws IOException
null
.
This constructor provides no progress report.
basename
- the basename of the index.
IOException
public IndexReader getReader() throws IOException
IndexReader
based on this index. After that, you
can use the reader to read this index.
IndexReader
to read this index.
IOException
public IndexReader getReader(int bufferSize) throws IOException
IndexReader
based on this index. After that, you
can use the reader to read this index.
bufferSize
- the size of the buffer to be used when opening the InputBitStream
underlying the
IndexReader
that is going to be returned.
IndexReader
to read this index.
IOException
public String toString()
public static void main(String[] arg) throws IOException
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |