|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectit.unimi.dsi.mg4j.index.AbstractTermMap
it.unimi.dsi.mg4j.util.MinimalPerfectHash
it.unimi.dsi.mg4j.util.SignedMinimalPerfectHash
Signed order-preserving minimal perfect hash tables.
IMPORTANT: In MG4J 1.0, the serialisation format is incompatible with previous versions.
Minimal perfect hash tables will always return a result, even for terms that were not present in the collection indexed by the table. Sometimes you may prefer to single out terms that were not present in the collection.
To this purpose, MG4J provides signed minimal perfect tables. In a signed
table, every term in the collection gets a signature that is used to
tell false positives. Signature may go from the simple, CRC based signatures
provided by the sample CRC32SignedMinimalPerfectHash
and
HashCodeSignedMinimalPerfectHash
classes, to sophisticated
cryptographic signatures, to (at the other extreme) a class that actually stores
the terms (and thus completely avoids false positives) such as LiterallySignedMinimalPerfectHash
.
A signed table extends this class, and provides two methods: a initSignatures(Collection)
method that sets up the necessary data
structures, and a checkSignature(CharSequence,int)
method that
checks a given character sequence against the signature stored for
a term having given index.
It is good practise, of course, to replicate the constructors of this
class in all implementing subclasses (by simply invoking super
with the same arguments). Moreover, to be useful classes implementing this
class must be serialisable.
Field Summary | |
static long |
serialVersionUID
|
Fields inherited from class it.unimi.dsi.mg4j.util.MinimalPerfectHash |
g, m, n, n4, rightShift, t, TERM_THRESHOLD, terms, VERBOSE, WEIGHT_UNKNOWN, WEIGHT_UNKNOWN_SORTED_TERMS, weight0, weight1, weight2, weightLength |
Constructor Summary | |
SignedMinimalPerfectHash(Collection terms)
Creates a new signed order-preserving minimal perfect hash table for the given set of terms, using as many weights as the longest term in the collection. |
|
SignedMinimalPerfectHash(Collection terms,
int weightLength)
Creates a new signed order-preserving minimal perfect hash table for the given set of terms using the given number of weights. |
|
SignedMinimalPerfectHash(String termFile)
Creates a new CRC32-signed order-preserving minimal perfect hash table for the given file of terms, using as many weights as the longest term in the file. |
|
SignedMinimalPerfectHash(String termFile,
String encoding,
int weightLength)
Creates a new CRC32-signed order-preserving minimal perfect hash table for the given file of terms using the given number of weights. |
Method Summary | |
protected abstract boolean |
checkSignature(CharSequence term,
int index)
Checks a signature. |
int |
get(CharSequence term)
Hashes a given term. |
protected abstract void |
initSignatures(Collection terms)
Sets up the signature system from a collection. |
Methods inherited from class it.unimi.dsi.mg4j.util.MinimalPerfectHash |
get, getFromT, getWeightLength, hash, hash, hash, main, size, weightLength |
Methods inherited from class it.unimi.dsi.mg4j.index.AbstractTermMap |
get, get |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public static final long serialVersionUID
Constructor Detail |
public SignedMinimalPerfectHash(Collection terms)
After calling the corresponding constructor of MinimalPerfectHash
, this
constructor will invoke initSignatures(Collection)
.
terms
- some terms to hash; it is assumed that this Collection
does not contain duplicates.MinimalPerfectHash.MinimalPerfectHash(Collection)
public SignedMinimalPerfectHash(Collection terms, int weightLength)
After calling the corresponding constructor of MinimalPerfectHash
, this
constructor will invoke initSignatures(Collection)
.
terms
- some terms to hash; it is assumed that this Collection
does not contain terms with a common prefix of
weightLength
characters.weightLength
- the number of weights used generating the intermediate hash functions.MinimalPerfectHash.MinimalPerfectHash(Collection,int)
public SignedMinimalPerfectHash(String termFile, String encoding, int weightLength)
After calling the corresponding constructor of MinimalPerfectHash
, this
constructor will invoke initSignatures(Collection)
.
termFile
- an UTF-8 file containing one term on each line; it is assumed that
it does not contain terms with a common prefix of
weightLength
characters.encoding
- the encoding of termFile
; if null
, it
is assumed to be the platform default encoding.weightLength
- the number of weights used generating the
intermediate hash functions.MinimalPerfectHash.MinimalPerfectHash(String,String,int)
public SignedMinimalPerfectHash(String termFile)
After calling the corresponding constructor of MinimalPerfectHash
, this
constructor will invoke initSignatures(Collection)
.
termFile
- a file in the platform default encoding containing one term on each line; it is assumed that the file does
not contain twice the same term.MinimalPerfectHash.MinimalPerfectHash(String)
Method Detail |
public int get(CharSequence term)
get
in interface TermMap
get
in class MinimalPerfectHash
term
- a term to hash.
protected abstract void initSignatures(Collection terms)
This abstract method must be overriden by implementing subclasses. It must set up all data structures that are necessary to handle signatures; in particular, it will usually compute signatures for all terms in the given collection.
terms
- the collection of terms given to the constructor of this class.CRC32SignedMinimalPerfectHash.initSignatures(Collection)
,
LiterallySignedMinimalPerfectHash.initSignatures(Collection)
protected abstract boolean checkSignature(CharSequence term, int index)
This abstract method must be overriden by implementing subclasses.
It must check whether the signature of the given character sequence matches
the one stored for the index
-th term.
term
- a character sequence.index
- an integer denoting a term in the indexed collection.
index
-th term.CRC32SignedMinimalPerfectHash.checkSignature(CharSequence,int)
,
LiterallySignedMinimalPerfectHash.checkSignature(CharSequence,int)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |