it.unimi.dsi.mg4j.tool
Class SecondPass

java.lang.Object
  extended byit.unimi.dsi.mg4j.tool.SecondPass
All Implemented Interfaces:
CompressionFlags

public final class SecondPass
extends Object
implements CompressionFlags

Builds an inverted index by merging occurrence batches produced by FirstPass. Some statistics will be printed to standard output at the end of the indexing process.

This class merges the occurrence files produced by FirstPass (and possibly reordered by MiddlePass) into an inverted index.

These are the files currently generated:

basename.index
The inverted index.
basename.offsets
For each term, the byte offset in basename.index at which the inverted lists start. More precisely, the first integer is the offset for term 0 in γ coding, and then the i-th integer is the difference between the i-th and the i−1-th offset in γ coding. If T terms were indexed, this file will contain T+1 integers, the last being the difference (in bytes) between the length of the entire inverted index and the offset of the last inverted list.
basename.globcounts
For each term, the number of its occurrences throughout the whole document collection, in γ coding. More precisely, the i-th integer of the file (starting from 0) is the number of occurrences of the term of index i.
basename.properties
This class adds some information to the property file produced by FirstPass. Currently, the following keys are generated:
compressionflags
the mask of compression flags used when generating the index;
maxcount
the maximum count in the collection, that is, the maximum count of a term maximised on all terms and documents.

Since:
0.6
Author:
Sebastiano Vigna

Field Summary
 
Fields inherited from interface it.unimi.dsi.mg4j.index.CompressionFlags
ARITH, CODING_NAME, COUNTS_DEFAULT, COUNTS_DELTA, COUNTS_GAMMA, COUNTS_SHIFT, DELTA, FREQUENCIES_DEFAULT, FREQUENCIES_DELTA, FREQUENCIES_GAMMA, FREQUENCIES_SHIFT, GAMMA, GOLOMB, INTERP, NIBBLE, NO_COUNTS, NO_POSITIONS, NONE, POINTERS_DEFAULT, POINTERS_DELTA, POINTERS_GAMMA, POINTERS_GOLOMB, POINTERS_SHIFT, POSITIONS_ARITH, POSITIONS_DEFAULT, POSITIONS_DELTA, POSITIONS_GAMMA, POSITIONS_GOLOMB, POSITIONS_INTERP, POSITIONS_SHIFT, POSITIONS_SKEWED_GOLOMB, SKEWED_GOLOMB, UNARY, ZETA
 
Method Summary
static void main(String[] arg)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

main

public static void main(String[] arg)