Coverage Report

Created: 2026-04-29 06:53

next uncovered line (L), next uncovered region (R), next uncovered branch (B)
/rust/registry/src/index.crates.io-1949cf8c6b5b557f/zerotrie-0.2.4/src/builder/mod.rs
Line
Count
Source
1
// This file is part of ICU4X. For terms of use, please see the file
2
// called LICENSE at the top level of the ICU4X source tree
3
// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).
4
5
//! # [`ZeroTrie`](crate::ZeroTrie) Builder
6
//!
7
//! There are two implementations of the [`ZeroTrie`](crate::ZeroTrie) Builder:
8
//!
9
//! - [`konst::ZeroTrieBuilderConst`] allows for human-readable const construction
10
//! - [`nonconst::ZeroTrieBuilder`] has the full feaure set but requires `alloc`
11
//!
12
//! The two builders follow the same algorithm but have different capabilities.
13
//!
14
//! ## Builder Algorithm Overview
15
//!
16
//! The tries are built backwards, from the last node to the first node. The key step of the
17
//! algorithm is **determining what is the next node to prepend.**
18
//!
19
//! In the simple case of [`ZeroTrieSimpleAscii`], all nodes are binary-search, so if the input
20
//! strings are provided in lexicographic order, there is a simple, deterministic method for
21
//! identifying the next node. This insight is what enables us to make the const builder.
22
//!
23
//! The builder works with the following intermediate state variables:
24
//!
25
//! - `prefix_len` indicates the byte index we are currently processing.
26
//! - `i` and `j` bracket a window of strings in the input that share the same prefix.
27
//! - `current_len` is the length in bytes of the current self-contained trie.
28
//! - `lengths_stack` contains metadata for branch nodes.
29
//!
30
//! Consider a trie containing the following strings and values:
31
//!
32
//! - "" → 11
33
//! - "ad" → 22
34
//! - "adef" → 33
35
//! - "adghk" → 44
36
//!
37
//! Suppose `prefix_len = 2`, `i = 1`, and `j = 4`. This would indicate that we
38
//! have are evaluating the strings with the "ad" prefix, which extend from
39
//! index 1 (inclusive) to index 4 (exclusive).
40
//!
41
//! What follows is a verbal explanation of the build steps for the above trie.
42
//! When a node is prepended, it is shown in **boldface**.
43
//!
44
//! 1. Initialize the builder by setting `i=3`, `j=4`, `prefix_len=5` (the last string),
45
//!    `current_len=0`, and `lengths_stack` empty. Start the main loop.
46
//! 2. Top of loop. The string at `i` is equal in length to `prefix_len`, so we prepend
47
//!    our first node: a **value node 44**, which requires a 2-byte varint. Increase
48
//!    `current_len` to 2.
49
//! 3. Reduce `prefix_len` to 4, read our `key_ascii="k"`, and recalculate `i` and `j`
50
//!    _(this calculation is a long chunk of code in the builder impls)_. Since there is no
51
//!    other string with the prefix "adgh", `i` and `j` stay the same, we prepend an
52
//!    **ASCII node "k"**, increase `current_len` to 3, and continue the main loop.
53
//! 4. Top of loop. The string at `i` is of length 5, but `prefix_len` is 4, so there is
54
//!    no value node to prepend.
55
//! 5. Reduce `prefix_len` to 3, read our `key_ascii="h"`, and recalculate `i` and `j`.
56
//!    There are no other strings sharing the prefix "abg", so we prepend an
57
//!    **ASCII node "h"**, increase `current_len` to 4, and continue the main loop.
58
//! 6. Top of loop. There is still no value node to prepend.
59
//! 7. Reduce `prefix_len` to 2, read our `key_ascii="g"`, and recalculate `i` and `j`.
60
//!    We find that `i=1` and `j=4`, the range of strings sharing the prefix "ad". Since
61
//!    `i` or `j` changed, proceed to evaluate the branch node.
62
//! 8. The last branch byte `ascii_j` for this prefix is "g", which is the same as `key_ascii`,
63
//!    so we are the _last_ target of a branch node. Push an entry onto `lengths_stack`:
64
//!    `BranchMeta { ascii: "g", cumulative_length: 4, local_length: 4, count: 1 }`.
65
//! 9. The first branch byte `ascii_i` for this prefix is "e", which is NOT equal to `key_ascii`,
66
//!    so we are _not the first_ target of a branch node. We therefore start evaluating the
67
//!    string preceding where we were at the top of the current loop. We set `i=2`, `j=3`,
68
//!    `prefix_len=4` (length of the string at `i`), and continue the main loop.
69
//! 10. Top of loop. Since the string at `i` is equal in length to `prefix_len`, we prepend a
70
//!     **value node 33** (which requires a 2-byte varint) and increase `current_len` to 2.
71
//! 11. Reduce `prefix_len` to 3, read our `key_ascii="f"`, and recalculate `i` and `j`.
72
//!     They stay the same, so we prepend an **ASCII node "f"**, increase `current_len` to 3,
73
//!     and continue the main loop.
74
//! 12. Top of loop. No value node this time.
75
//! 13. Reduce `prefix_len` to 2, read our `key_ascii="e"`, and recalculate `i` and `j`.
76
//!     They go back to `i=1` and `j=4`.
77
//! 14. The last branch byte `ascii_j` for this prefix is "g", which is NOT equal to `key_ascii`,
78
//!     so we are _not the last_ target of a branch node. We peek at the entry at the front of
79
//!     the lengths stack and use it to push another entry onto the stack:
80
//!     `BranchMeta { ascii: "e", cumulative_length: 7, local_length: 3, count: 2 }`
81
//! 15. The first branch byte `ascii_i` for this prefix is "e", which is the same as `key_ascii`,
82
//!     wo we are the _first_ target of a branch node. We can therefore proceed to prepend the
83
//!     metadata for the branch node. We peek at the top of the stack and find that there are 2
84
//!     tries reachable from this branch and they have a total byte length of 5. We then pull off
85
//!     2 entries from the stack into a local variable `branch_metas`. From here, we write out
86
//!     the **offset table**, **lookup table**, and **branch head node**, which are determined
87
//!     from the metadata entries. We set `current_len` to the length of the two tries plus the
88
//!     metadata, which happens to be 11. Then we return to the top of the main loop.
89
//! 16. Top of loop. The string at `i` is length 2, which is the same as `prefix_len`, so we
90
//!     prepend a **value node 22** (2-byte varint) and increase `current_len` to 13.
91
//! 17. Reduce `prefix_len` to 1, read our `key_ascii="d"`, and recalculate `i` and `j`.
92
//!     They stay the same, so we prepend an **ASCII node "d"**, increase `current_len` to 14,
93
//!     and continue the main loop.
94
//! 18. Top of loop. No value node this time.
95
//! 19. Reduce `prefix_len` to 0, read our `key_ascii="a"`, and recalculate `i` and `j`.
96
//!     They change to `i=0` and `j=4`, since all strings have the empty string as a prefix.
97
//!     However, `ascii_i` and `ascii_j` both equal `key_ascii`, so we prepend **ASCII node "a"**,
98
//!     increase `current_len` to 15, and continue the main loop.
99
//! 16. Top of loop. The string at `i` is length 0, which is the same as `prefix_len`, so we
100
//!     prepend a **value node 11** and increase `current_len` to 16.
101
//! 17. We can no longer reduce `prefix_len`, so our trie is complete.
102
//!
103
//! ## Perfect Hash Reordering
104
//!
105
//! When the PHF is added to the mix, the main change is that the strings are no longer in sorted
106
//! order when they are in the trie. To resolve this issue, when adding a branch node, the target
107
//! tries are rearranged in-place in the buffer to be in the correct order for the PHF.
108
//!
109
//! ## Example
110
//!
111
//! Here is the output of the trie described above.
112
//!
113
//! ```
114
//! use zerotrie::ZeroTrieSimpleAscii;
115
//!
116
//! const DATA: [(&str, usize); 4] =
117
//!     [("", 11), ("ad", 22), ("adef", 33), ("adghk", 44)];
118
//!
119
//! // As demonstrated above, the required capacity for this trie is 16 bytes
120
//! const TRIE: ZeroTrieSimpleAscii<[u8; 16]> =
121
//!     ZeroTrieSimpleAscii::from_sorted_str_tuples(&DATA);
122
//!
123
//! assert_eq!(
124
//!     TRIE.as_bytes(),
125
//!     &[
126
//!         0x8B, // value node 11
127
//!         b'a', // ASCII node 'a'
128
//!         b'd', // ASCII node 'd'
129
//!         0x90, // value node 22 lead byte
130
//!         0x06, // value node 22 trail byte
131
//!         0xC2, // branch node 2
132
//!         b'e', // first target of branch
133
//!         b'g', // second target of branch
134
//!         3,    // offset
135
//!         b'f', // ASCII node 'f'
136
//!         0x90, // value node 33 lead byte
137
//!         0x11, // value node 33 trail byte
138
//!         b'h', // ASCII node 'h'
139
//!         b'k', // ASCII node 'k'
140
//!         0x90, // value node 44 lead byte
141
//!         0x1C, // value node 44 trail byte
142
//!     ]
143
//! );
144
//!
145
//! assert_eq!(TRIE.get(b""), Some(11));
146
//! assert_eq!(TRIE.get(b"ad"), Some(22));
147
//! assert_eq!(TRIE.get(b"adef"), Some(33));
148
//! assert_eq!(TRIE.get(b"adghk"), Some(44));
149
//! assert_eq!(TRIE.get(b"unknown"), None);
150
//! ```
151
152
#![allow(clippy::panic)]
153
154
mod branch_meta;
155
#[cfg(all(feature = "alloc", feature = "dense"))]
156
pub(crate) mod dense;
157
pub(crate) mod konst;
158
#[cfg(feature = "litemap")]
159
mod litemap;
160
#[cfg(feature = "alloc")]
161
pub(crate) mod nonconst;
162
pub(crate) mod slice_indices;
163
164
use slice_indices::ByteSliceWithIndices;
165
166
use super::ZeroTrieSimpleAscii;
167
168
impl<const N: usize> ZeroTrieSimpleAscii<[u8; N]> {
169
    /// **Const Constructor:** Creates an [`ZeroTrieSimpleAscii`] from a sorted slice of keys and values.
170
    ///
171
    /// This function needs to know the exact length of the resulting trie at compile time. To
172
    /// figure out `N`, first set `N` to be too large (say 0xFFFF), then look at the resulting
173
    /// compile error which will tell you how to set `N`, like this:
174
    ///
175
    /// > the evaluated program panicked at 'Buffer too large. Size needed: 17'
176
    ///
177
    /// That error message says you need to set `N` to 17.
178
    ///
179
    /// Also see [`Self::from_sorted_str_tuples`].
180
    ///
181
    /// # Panics
182
    ///
183
    /// Panics if `items` is not sorted or if `N` is not correct.
184
    ///
185
    /// # Examples
186
    ///
187
    /// Create a `const` [`ZeroTrieSimpleAscii`](crate::ZeroTrieSimpleAscii) at compile time:
188
    ///
189
    /// ```
190
    /// use zerotrie::ZeroTrieSimpleAscii;
191
    ///
192
    /// // The required capacity for this trie happens to be 17 bytes
193
    /// const TRIE: ZeroTrieSimpleAscii<[u8; 17]> =
194
    ///     ZeroTrieSimpleAscii::from_sorted_u8_tuples(&[
195
    ///         (b"bar", 2),
196
    ///         (b"bazzoo", 3),
197
    ///         (b"foo", 1),
198
    ///     ]);
199
    ///
200
    /// assert_eq!(TRIE.get(b"foo"), Some(1));
201
    /// assert_eq!(TRIE.get(b"bar"), Some(2));
202
    /// assert_eq!(TRIE.get(b"bazzoo"), Some(3));
203
    /// assert_eq!(TRIE.get(b"unknown"), None);
204
    /// ```
205
    ///
206
    /// Panics if strings are not sorted:
207
    ///
208
    /// ```compile_fail,E0080
209
    /// # use zerotrie::ZeroTrieSimpleAscii;
210
    /// const TRIE: ZeroTrieSimpleAscii<[u8; 17]> = ZeroTrieSimpleAscii::from_sorted_u8_tuples(&[
211
    ///     (b"foo", 1),
212
    ///     (b"bar", 2),
213
    ///     (b"bazzoo", 3),
214
    /// ]);
215
    /// ```
216
    ///
217
    /// Panics if capacity is too small:
218
    ///
219
    /// ```compile_fail,E0080
220
    /// # use zerotrie::ZeroTrieSimpleAscii;
221
    /// const TRIE: ZeroTrieSimpleAscii<[u8; 15]> = ZeroTrieSimpleAscii::from_sorted_u8_tuples(&[
222
    ///     (b"bar", 2),
223
    ///     (b"bazzoo", 3),
224
    ///     (b"foo", 1),
225
    /// ]);
226
    /// ```
227
    ///
228
    /// Panics if capacity is too large:
229
    ///
230
    /// ```compile_fail,E0080
231
    /// # use zerotrie::ZeroTrieSimpleAscii;
232
    /// const TRIE: ZeroTrieSimpleAscii<[u8; 20]> = ZeroTrieSimpleAscii::from_sorted_u8_tuples(&[
233
    ///     (b"bar", 2),
234
    ///     (b"bazzoo", 3),
235
    ///     (b"foo", 1),
236
    /// ]);
237
    /// ```
238
0
    pub const fn from_sorted_u8_tuples(tuples: &[(&[u8], usize)]) -> Self {
239
        use konst::*;
240
0
        let byte_str_slice = ByteSliceWithIndices::from_byte_slice(tuples);
241
0
        let s = ZeroTrieBuilderConst::<N>::from_tuple_slice::<100>(byte_str_slice);
242
0
        Self::from_store(s.build_or_panic())
243
0
    }
244
245
    /// **Const Constructor:** Creates an [`ZeroTrieSimpleAscii`] from a sorted slice of keys and values.
246
    ///
247
    /// This function needs to know the exact length of the resulting trie at compile time. To
248
    /// figure out `N`, first set `N` to be too large (say 0xFFFF), then look at the resulting
249
    /// compile error which will tell you how to set `N`, like this:
250
    ///
251
    /// > the evaluated program panicked at 'Buffer too large. Size needed: 17'
252
    ///
253
    /// That error message says you need to set `N` to 17.
254
    ///
255
    /// Also see [`Self::from_sorted_u8_tuples`].
256
    ///
257
    /// # Panics
258
    ///
259
    /// Panics if `items` is not sorted, if `N` is not correct, or if any of the strings contain
260
    /// non-ASCII characters.
261
    ///
262
    /// # Examples
263
    ///
264
    /// Create a `const` [`ZeroTrieSimpleAscii`](crate::ZeroTrieSimpleAscii) at compile time:
265
    ///
266
    /// ```
267
    /// use zerotrie::ZeroTrieSimpleAscii;
268
    ///
269
    /// // The required capacity for this trie happens to be 17 bytes
270
    /// const TRIE: ZeroTrieSimpleAscii<[u8; 17]> =
271
    ///     ZeroTrieSimpleAscii::from_sorted_str_tuples(&[
272
    ///         ("bar", 2),
273
    ///         ("bazzoo", 3),
274
    ///         ("foo", 1),
275
    ///     ]);
276
    ///
277
    /// assert_eq!(TRIE.get(b"foo"), Some(1));
278
    /// assert_eq!(TRIE.get(b"bar"), Some(2));
279
    /// assert_eq!(TRIE.get(b"bazzoo"), Some(3));
280
    /// assert_eq!(TRIE.get(b"unknown"), None);
281
    /// ```
282
    ///
283
    /// Panics if the strings are not ASCII:
284
    ///
285
    /// ```compile_fail,E0080
286
    /// # use zerotrie::ZeroTrieSimpleAscii;
287
    /// const TRIE: ZeroTrieSimpleAscii<[u8; 100]> = ZeroTrieSimpleAscii::from_sorted_str_tuples(&[
288
    ///     ("bár", 2),
289
    ///     ("båzzöo", 3),
290
    ///     ("foo", 1),
291
    /// ]);
292
    /// ```
293
0
    pub const fn from_sorted_str_tuples(tuples: &[(&str, usize)]) -> Self {
294
        use konst::*;
295
0
        let byte_str_slice = ByteSliceWithIndices::from_str_slice(tuples);
296
        // 100 is the value of `K`, the size of the lengths stack. If compile errors are
297
        // encountered, this number may need to be increased.
298
0
        let s = ZeroTrieBuilderConst::<N>::from_tuple_slice::<100>(byte_str_slice);
299
0
        Self::from_store(s.build_or_panic())
300
0
    }
301
}