Class ImmutableStringTrie
- All Implemented Interfaces:
ToIntFunction<String>
Each node of the tree is represented as a series of char
s using this layout:
+---------------------------------+ | number of branches | +---------------------------------+---------------------------------+---- | char for branch 0 | char for branch 1 | ... +---------------------------------+---------------------------------+---- | key-delta/leaf/bud for branch 0 | key-delta/leaf/bud for branch 1 | ... +---------------------------------+---------------------------------+---- | offset to jump to branch 1 | offset to jump to branch 2 | ... +---------------------------------+---------------------------------+----Each node is immediately followed by its child nodes according to branch order.
The key-delta is used to skip over a section of the input key when we know it should always match given the recently matched char (assumes only strings from the original list are queried).
Leaves mark a definite end of the match, while buds mark a potential end which could continue down the trie if there are more characters to match. The key-delta for buds is implicitly 1.
The jump for branch 0 is assumed to be 0 and is always ommitted, that is any continuation of the trie for branch 0 immediately follows the current node. The entire jump section is omitted when all the branches from a node are leaves.
Simple example trie with 2 strings "getValue" and "setValue":
+---+---+---+--------+--------+ | 2 | g | s | 0x8000 | 0x8001 | +---+---+---+--------+--------+In this case the first character is enough to determine the index result.
Example of a trie with a 'bud' that contains 2 strings "getName" and "getNameAndValue":
+---+---+---+---+---+--------+---+---+--------+ | 1 | g | 6 | 1 | e | 0x4000 | 1 | A | 0x8001 | +---+---+---+---+---+--------+---+---+--------+After matching 'g' we skip to the end of 'getName' before checking if there are any more characters to match.
More complex example with 3 strings "getName", "getValue", "getVersion":
+---+---+---+---+---+---+--------+---+---+---+---+---+--------+--------+ | 1 | g | 3 | 2 | N | V | 0x8000 | 1 | 0 | 2 | a | e | 0x8001 | 0x8002 | +---+---+---+---+---+---+--------+---+---+---+---+---+--------+--------+After matching 'g' we skip past the 'get'. If the next character is 'N' we know this is 'getName' otherwise we skip over the 'V' and jump to the last check between '...alue' and '...ersion'.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static final class
Immutable trie that delegates searches that lie outside its range to an overflow trie. -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final char
Marks a 'bud' in the tree; the same as a leaf except the trie continues beneath it.private static final char
Marks a leaf in the trie, where the rest of the bits are the index to be returned.private static final int
Maximum number of rows that can be indexed by a single trie.private final char[]
The compressed trie. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionint
applyAsInt
(String key) Returns the index assigned in the trie to the given string.private static void
buildSubTrie
(StringBuilder buf, String[] table, int column, int row, int rowLimit) Recursively builds a trie for a slice of rows at a particular column.private static ToIntFunction
<String> buildTrie
(StringBuilder buf, String[] table, int row, int rowLimit) Builds a trie, overflowing to additional tries if there are too many rowsstatic ToIntFunction
<String> buildTrie
(Collection<String> table) Builds an immutable trie that indexes the given table of strings.private static int
nextPivotColumn
(String[] table, int column, int row, int rowLimit) Finds the next column in the current row whose character differs in at least one other row.private static int
nextPivotRow
(String[] table, char pivot, int column, int row, int rowLimit) Finds the next row that has a different character in the selected column to the given one, or is too short to include the column.private static int
singletonTrie
(String key)
-
Field Details
-
LEAF_MARKER
private static final char LEAF_MARKERMarks a leaf in the trie, where the rest of the bits are the index to be returned.- See Also:
-
BUD_MARKER
private static final char BUD_MARKERMarks a 'bud' in the tree; the same as a leaf except the trie continues beneath it.- See Also:
-
MAX_ROWS_PER_TRIE
private static final int MAX_ROWS_PER_TRIEMaximum number of rows that can be indexed by a single trie.- See Also:
-
trie
private final char[] trieThe compressed trie.
-
-
Constructor Details
-
ImmutableStringTrie
ImmutableStringTrie(char[] data)
-
-
Method Details
-
singletonTrie
-
applyAsInt
Returns the index assigned in the trie to the given string.Note: a return value of
-1
means the string is definitely not in the trie, but a non-negative index may be returned for strings that closely match those in the trie. This is acceptable because we will only call this method with strings that we know exist in the trie.- Specified by:
applyAsInt
in interfaceToIntFunction<String>
-
buildTrie
Builds an immutable trie that indexes the given table of strings.The table of strings must be sorted in lexical order.
-
buildTrie
private static ToIntFunction<String> buildTrie(StringBuilder buf, String[] table, int row, int rowLimit) Builds a trie, overflowing to additional tries if there are too many rows -
buildSubTrie
private static void buildSubTrie(StringBuilder buf, String[] table, int column, int row, int rowLimit) Recursively builds a trie for a slice of rows at a particular column. -
nextPivotRow
Finds the next row that has a different character in the selected column to the given one, or is too short to include the column. This determines the span of rows that fall under the given character in the trie.Returns the row just after the end of the range if all rows have the same character.
-
nextPivotColumn
Finds the next column in the current row whose character differs in at least one other row. This helps identify the longest common prefix from the current pivot point to the next one.Returns the column just after the end of the current row if all rows are identical.
-