it.unimi.dsi.mg4j.util
Class CRC32SignedMinimalPerfectHash

java.lang.Object
  extended byit.unimi.dsi.mg4j.index.AbstractTermMap
      extended byit.unimi.dsi.mg4j.util.MinimalPerfectHash
          extended byit.unimi.dsi.mg4j.util.SignedMinimalPerfectHash
              extended byit.unimi.dsi.mg4j.util.CRC32SignedMinimalPerfectHash
All Implemented Interfaces:
Serializable, TermMap

public class CRC32SignedMinimalPerfectHash
extends SignedMinimalPerfectHash

CRC-32 signed order-preserving minimal perfect hash tables.

This class source exemplifies a signed minimal perfect hash table that signes each word with a 32-bit CRC, thus avoiding false positives with high probability.

Since:
0.4
Author:
Sebastiano Vigna, Marco Olivo
See Also:
Serialized Form

Field Summary
 
Fields inherited from class it.unimi.dsi.mg4j.util.SignedMinimalPerfectHash
serialVersionUID
 
Fields inherited from class it.unimi.dsi.mg4j.util.MinimalPerfectHash
g, m, n, n4, rightShift, t, TERM_THRESHOLD, terms, VERBOSE, WEIGHT_UNKNOWN, WEIGHT_UNKNOWN_SORTED_TERMS, weight0, weight1, weight2, weightLength
 
Constructor Summary
CRC32SignedMinimalPerfectHash(Collection words)
          Creates a new CRC32-signed order-preserving minimal perfect hash table for the given set of words, using as many weights as the longest word in the collection.
CRC32SignedMinimalPerfectHash(Collection words, int weightLength)
          Creates a new CRC32-signed order-preserving minimal perfect hash table for the given set of words using the given number of weights.
CRC32SignedMinimalPerfectHash(String wordFile)
          Creates a new CRC32-signed order-preserving minimal perfect hash table for the given file of words, using as many weights as the longest word in the file.
CRC32SignedMinimalPerfectHash(String wordFile, String encoding, int weightLength)
          Creates a new CRC32-signed order-preserving minimal perfect hash table for the given file of words using the given number of weights.
 
Method Summary
 boolean checkSignature(CharSequence word, int index)
          Checks a signature.
 void initSignatures(Collection words)
          Sets up the signature system from a collection.
 
Methods inherited from class it.unimi.dsi.mg4j.util.SignedMinimalPerfectHash
get
 
Methods inherited from class it.unimi.dsi.mg4j.util.MinimalPerfectHash
get, getFromT, getWeightLength, hash, hash, hash, main, size, weightLength
 
Methods inherited from class it.unimi.dsi.mg4j.index.AbstractTermMap
get, get
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CRC32SignedMinimalPerfectHash

public CRC32SignedMinimalPerfectHash(Collection words,
                                     int weightLength)
Creates a new CRC32-signed order-preserving minimal perfect hash table for the given set of words using the given number of weights.

Parameters:
words - some words to hash; it is assumed that this Collection does not contain words with a common prefix of weightLength characters.
weightLength - the number of weights used generating the intermediate hash functions.
See Also:
MinimalPerfectHash.MinimalPerfectHash(Collection,int)

CRC32SignedMinimalPerfectHash

public CRC32SignedMinimalPerfectHash(Collection words)
Creates a new CRC32-signed order-preserving minimal perfect hash table for the given set of words, using as many weights as the longest word in the collection.

Parameters:
words - some words to hash; it is assumed that this Collection does not contain duplicates.
See Also:
MinimalPerfectHash.MinimalPerfectHash(Collection)

CRC32SignedMinimalPerfectHash

public CRC32SignedMinimalPerfectHash(String wordFile,
                                     String encoding,
                                     int weightLength)
Creates a new CRC32-signed order-preserving minimal perfect hash table for the given file of words using the given number of weights.

Parameters:
wordFile - an UTF-8 file containing one word on each line; it is assumed that it does not contain words with a common prefix of weightLength characters.
encoding - the encoding of wordFile; if null, it is assumed to be the platform default encoding.
weightLength - the number of weights used generating the intermediate hash functions.
See Also:
MinimalPerfectHash.MinimalPerfectHash(String,String,int)

CRC32SignedMinimalPerfectHash

public CRC32SignedMinimalPerfectHash(String wordFile)
Creates a new CRC32-signed order-preserving minimal perfect hash table for the given file of words, using as many weights as the longest word in the file.

Parameters:
wordFile - a file in the platform default encoding containing one word on each line; it is assumed that the file does not contain twice the same word.
See Also:
MinimalPerfectHash.MinimalPerfectHash(String)
Method Detail

initSignatures

public void initSignatures(Collection words)
Description copied from class: SignedMinimalPerfectHash
Sets up the signature system from a collection.

This abstract method must be overriden by implementing subclasses. It must set up all data structures that are necessary to handle signatures; in particular, it will usually compute signatures for all terms in the given collection.

Specified by:
initSignatures in class SignedMinimalPerfectHash
Parameters:
words - the collection of terms given to the constructor of this class.
See Also:
initSignatures(Collection), LiterallySignedMinimalPerfectHash.initSignatures(Collection)

checkSignature

public boolean checkSignature(CharSequence word,
                              int index)
Description copied from class: SignedMinimalPerfectHash
Checks a signature.

This abstract method must be overriden by implementing subclasses. It must check whether the signature of the given character sequence matches the one stored for the index-th term.

Specified by:
checkSignature in class SignedMinimalPerfectHash
Parameters:
word - a character sequence.
index - an integer denoting a term in the indexed collection.
Returns:
true iff the signature of the given character sequence matches the one stored for the index-th term.
See Also:
checkSignature(CharSequence,int), LiterallySignedMinimalPerfectHash.checkSignature(CharSequence,int)