Line data Source code
1 : #ifndef HEADER_fd_src_discof_forest_fd_forest_h
2 : #define HEADER_fd_src_discof_forest_fd_forest_h
3 :
4 : /* Forest is an API for repairing blocks as they are discovered from the
5 : cluster via Turbine or Gossip. Shreds (from Turbine) and
6 : confirmations (from Tower) inform forest that slot exists. Repair
7 : ensures that this block is received in its entirety by requesting
8 : repairs for missing shreds for the block.
9 :
10 : Note that forest needs to track the strict subset of shreds that are
11 : known by fec_resolver, store, and reasm. If any of these structures
12 : have evicted shreds, forest needs to clear out the corresponding FEC
13 : sets from forest to be re-requested. It's okay if shreds are evicted
14 : from reasm and we re-request for them and they pass through
15 : fec_resolver again. Although we could be creating duplicate ctxs,
16 : thats fine! We might later on get an evict notice for the second
17 : incomplete ctx but that's okay too!!! just have a bunch of useless
18 : messages that eventually will get ignored when we publish past it!!!!
19 :
20 : Like other fork-aware structures, forest maintains a tree that
21 : records the ancestry of slots. It also maintains references to the
22 : tips of each known fork (the frontier map), and also the latest
23 : slot we finished repairing on each fork (the consumed map). Any slot
24 : that doesn't have a known ancestry connecting it back to the root yet
25 : is part of the orphaned map. And the head of every orphaned tree is
26 : part of the subtrees map. While this seems very verbose, it allows
27 : for fast iteration and lookup of the different types of slots.
28 :
29 : fd_policy makes orphan requests that recover gaps between orphaned
30 : subtrees and the main ancestry tree, and then the forest iterator
31 : suggest repairs to make progress on the tree forwards (using BFS). */
32 :
33 : /* Merkle root tracking.
34 : For each FEC set in the slot, we record the merkle root of the first
35 : shred we receive in `.merkle_roots[ fec_set_idx / 32 ]`. Then for any
36 : shred in the same FEC inserted later, the merkle root of the new
37 : shred is compared to the merkle root we have stored.
38 :
39 : If they are the same -> good.
40 : If they are different -> we're going to mark this merkle root as
41 : incorrect. We do this by setting the merkle
42 : root to a null hash for later detection.
43 :
44 : Note we don't verify the chain on each FEC arrival, because we can't
45 : tell whether the CMR of the following FEC is incorrect or if the
46 : current MR we have is incorrect. We can only verify the chain when
47 : we get a confirmation of a block_id.
48 :
49 : Eventually one of two things happen:
50 : 1. We are able to complete the version of the FEC with the merkle root
51 : we have stored. This is the common case, and means we only saw one
52 : version of the merkle root.
53 :
54 : 2. We are not able to complete any version of the FEC.
55 : - Imagine we get shred 0-15 of FEC_A. then get shreds 16-31 of
56 : FEC_B. We would have set the merkle root to the null hash for
57 : that FEC set, but fec_resolver would not be able to complete the
58 : FEC because from a shred index POV, we don't have anything we
59 : need to repair (and we won't be making any new requests for that
60 : FEC set).
61 : It's difficult to differentiate between a slot where we haven't
62 : finished repairing, and a slot we can't repair because the
63 : version we have is a bad version. So merkle chaining
64 : verification can only be performed on slots that have all the
65 : shreds received.
66 :
67 : 3. We receive some shreds for both FEC_A and FEC_B, but get a FEC
68 : completion for FEC_B.
69 : - Could possibly happen during turbine, like we repair some data
70 : shreds from FEC_A, but get a completion for FEC_B through
71 : turbine. At this point we'll take whatever we have completed
72 : first, so overwrite our merkle root entry. It's likely being
73 : overwritten from the null_hash to the FEC_B merkle root.
74 :
75 : So unfortunately...because of case 2, we determine "slot completion"
76 : status when all the shreds in the slot have been received, NOT when
77 : the slot completes with all the FEC completions. We can rely on that
78 : at least some version of all the shreds in the slot will arrive
79 : eventually.
80 :
81 : As soon as we have a confirmed block id, we can verify the slot by
82 : verifying the chain of merkle roots backwards. As the CMRs correctly
83 : chain, the verified status on each FEC set is set. If they don't
84 : chain, we dump & repair that specific FEC set. For example, say the
85 : 2nd & 3rd FEC set is incorrect. In this case, the merkle roots array
86 : and bitset will look like the following after one call of
87 : chain_verify(slot, confirmed_bid):
88 : actual last fec
89 : |
90 : v
91 : merkle_roots [ A, B', C', D, E, F, confirmed_bid] <- confirmed_bid stored for convenience
92 : merkle_verified [ 0, 0, 0, 1, 1, 1, 1 ]
93 :
94 : At this point, C' will be dumped and repaired. Since D is verified,
95 : and the CMR entry contains the correct version of C's merkle root, we
96 : can now verify any shred of FEC set C that arrives and reject if the
97 : merkle root doesn't match the cmr entry in D.
98 :
99 : After C is successfully repaired, the after_fec call in repair_tile
100 : will re-trigger chain_verify on the slot again. After this call of
101 : chain_verify, the merkle roots array and bitset will look like this:
102 :
103 : merkle_roots [ A, B', C, D, E, F, confirmed_bid] <- confirmed_bid stored for convenience
104 : merkle_verified [ 0, 0, 1, 1, 1, 1, 1 ]
105 :
106 : At this point, C is verified, but B' is detected as incorrect. The same
107 : dump and repair process is repeated for B'. Once that after_fec on B
108 : is called, the merkle roots array and bitset will look like this:
109 :
110 : merkle_roots [ A, B, C, D, E, F, confirmed_bid] <- confirmed_bid stored for convenience
111 : merkle_verified [ 1, 1, 1, 1, 1, 1, 1 ]
112 : confirmed = 1
113 :
114 : The chain verify progresses beyond this slot, and the ancestors of
115 : this slots will also be traversed until a confirmed slot is found, or
116 : another incorrect FEC is detected. Note that because earlier
117 : confirmations may have confirmed ancestors, and because there is once
118 : verification "in-progress at all times", confirmation status can look
119 : like:
120 :
121 : slot 1 - slot 2 - slot 3 - slot 4 - slot 5 - slot 6 - slot 7 ....
122 : confirmed: 1 1 0 0 0 1 1
123 :
124 : i.e. there will be up to two contiguous chains of confirmed slots in
125 : the forest, but not more. There can be unconfirmed slots after slot 7.
126 : There may be forks as well, but only one fork can be confirmed.
127 : */
128 :
129 : /* FD_FOREST_USE_HANDHOLDING: Define this to non-zero at compile time
130 : to turn on additional runtime checks and logging. */
131 :
132 : #include "../../disco/fd_disco_base.h"
133 : #include "../../disco/shred/fd_fec_set.h"
134 :
135 : #ifndef FD_FOREST_USE_HANDHOLDING
136 : #define FD_FOREST_USE_HANDHOLDING 1
137 : #endif
138 :
139 0 : #define FD_FOREST_VER_UNINIT (0UL)
140 0 : #define FD_FOREST_VER_INVAL (ULONG_MAX)
141 :
142 0 : #define FD_FOREST_MAGIC (0xf17eda2ce7b1c0UL) /* firedancer forest version 0 */
143 :
144 : #define SET_NAME fd_forest_blk_idxs
145 0 : #define SET_MAX FD_SHRED_BLK_MAX
146 : #include "../../util/tmpl/fd_set.c"
147 :
148 : /* fd_forest_blk_t implements a left-child, right-sibling n-ary
149 : tree. Each ele maintains the `pool` index of its left-most child
150 : (`child_idx`), its immediate-right sibling (`sibling_idx`), and its
151 : parent (`parent_idx`).
152 :
153 : This tree structure is gaddr-safe and supports accesses and
154 : operations from processes with separate local forest joins. */
155 :
156 : struct __attribute__((aligned(128UL))) fd_forest_blk {
157 : ulong slot; /* map key */
158 : ulong parent_slot; /* map key of the parent. invariant: if parent is populated, parent_slot is populated. the converse is not necessarily true. */
159 : ulong next; /* internal use by fd_pool, fd_map_chain */
160 : ulong parent; /* pool idx of the parent in the tree */
161 : ulong child; /* pool idx of the left-child */
162 : ulong sibling; /* pool idx of the right-sibling */
163 :
164 : ulong head; /* reserved by dlist. not all blks will be part of a dlist. */
165 : ulong tail; /* reserved by dlist */
166 :
167 : uint buffered_idx; /* highest contiguous buffered shred idx */
168 : uint complete_idx; /* shred_idx with SLOT_COMPLETE_FLAG ie. last shred idx in the slot */
169 :
170 : fd_forest_blk_idxs_t idxs[fd_forest_blk_idxs_word_cnt]; /* data shred idxs */
171 : struct {
172 : fd_hash_t mr;
173 : fd_hash_t cmr;
174 : } merkle_roots[ FD_FEC_BLK_MAX ]; /* */
175 :
176 : fd_hash_t confirmed_bid; /* confirmed block id - can't be wrapped in the above struct because we can create sentinel blocks
177 : on confirmation, and don't know the index of the last fec set until we repair the slot.
178 : hash_null if unknown. Otherwise populated by the child slot's CMR on confirmation,
179 : or by a confirmation msg from tower. Has no bearing on if the full slot is correct or not. */
180 : uint lowest_verified_fec; /* lowest fec index that has been verified so far, inclusive. complete_idx / 32UL,
181 : if the last merkle root is verified, 5 if every merkle root after fec set 5*32 is verified */
182 :
183 : uchar chain_confirmed; /* 1 if all the FECs the slot have been confirmed via fec_chain_verify, 0 otherwise. Note confirmed_bid
184 : can be populated before this is set to 1. */
185 :
186 : uchar consumed; /* 1 if the slot has been consumed (i.e., all shreds received, and parents are complete), 0 otherwise */
187 : /* only consumed slots may be fec_chain_verified */
188 :
189 : int est_buffered_tick_recv; /* tick of shred at buffered_idx. Note since we don't track all the
190 : ticks received, this will be a lower bound estimate on the highest tick we have seen.
191 : But this is only used for limiting eager repair, so an exact value is not necessary. */
192 :
193 : /* Metrics */
194 :
195 : fd_forest_blk_idxs_t code[fd_forest_blk_idxs_word_cnt]; /* code shred idxs */
196 : long first_shred_ts; /* tick of first shred rcved in slot != complete_idx */
197 : long first_req_ts; /* tick of first request sent in slot != complete_idx */
198 : uint turbine_cnt; /* number of shreds received from turbine */
199 : uint repair_cnt; /* number of data shreds received from repair */
200 : uint recovered_cnt; /* number of shreds recovered from reedsol recovery */
201 : };
202 : typedef struct fd_forest_blk fd_forest_blk_t;
203 :
204 : #define POOL_NAME fd_forest_pool
205 0 : #define POOL_T fd_forest_blk_t
206 : #include "../../util/tmpl/fd_pool.c"
207 :
208 : #define MAP_NAME fd_forest_ancestry
209 : #define MAP_ELE_T fd_forest_blk_t
210 0 : #define MAP_KEY slot
211 : #include "../../util/tmpl/fd_map_chain.c"
212 :
213 : #define MAP_NAME fd_forest_frontier
214 : #define MAP_ELE_T fd_forest_blk_t
215 0 : #define MAP_KEY slot
216 : #include "../../util/tmpl/fd_map_chain.c"
217 :
218 : #define MAP_NAME fd_forest_orphaned
219 : #define MAP_ELE_T fd_forest_blk_t
220 0 : #define MAP_KEY slot
221 : #include "../../util/tmpl/fd_map_chain.c"
222 :
223 : #define MAP_NAME fd_forest_subtrees
224 : #define MAP_ELE_T fd_forest_blk_t
225 0 : #define MAP_KEY slot
226 : #include "../../util/tmpl/fd_map_chain.c"
227 :
228 : #define DLIST_NAME fd_forest_subtlist /* thread a dlist through the subtree elements for fast iteration */
229 : #define DLIST_ELE_T fd_forest_blk_t
230 0 : #define DLIST_NEXT head
231 0 : #define DLIST_PREV tail
232 : #include "../../util/tmpl/fd_dlist.c"
233 :
234 : /* A reference to a forest element
235 :
236 : The following maps/pools are used to track future requests.
237 :
238 : Requests:
239 : - slots that branch from the main tree (ancestry) that are being repaired /
240 : have yet to be repaired. Maintained in a dlist, where the head
241 : is the current slot being repaired.
242 :
243 : Orphreqs (orphaned requests):
244 : - slots that branch from the unconnected trees (subtrees/orphans) that are being repaired /
245 : have yet to be repaired. Maintained in a dlist, where the head
246 : is the current orphan request being repaired.
247 :
248 : Note that orphan requests are specifically an optimization from when
249 : we are catching up from very far behind. In the usual case when we
250 : boot and we are catching up from close behind, need orphans is
251 : very fast and has a non-negligible cost on total repair time. But
252 : during special cases where we are catching up from very far behind,
253 : need orphans can take a significant time because orphan requests
254 : cannot be pipelined. In this case, we can use time waiting for
255 : orphan requests to respond to also repair the full slots of these
256 : orphan trees.
257 :
258 : Consumed:
259 : - slots where the entire ancestry up to the root has been completed.
260 : This is what we are repairing next. There should be <= num forks
261 : elements in the consumed map.
262 : */
263 : struct fd_forest_ref {
264 : ulong idx; /* forest pool idx of the ele this ref refers to */
265 : ulong next; /* reserved by dlist */
266 : ulong prev; /* reserved by dlist */
267 : ulong hash; /* reserved by pool and map_chain */
268 : };
269 : typedef struct fd_forest_ref fd_forest_ref_t;
270 :
271 : #define MAP_NAME fd_forest_requests
272 : #define MAP_ELE_T fd_forest_ref_t
273 0 : #define MAP_KEY idx
274 0 : #define MAP_NEXT hash
275 : #include "../../util/tmpl/fd_map_chain.c"
276 :
277 : #define DLIST_NAME fd_forest_reqslist
278 : #define DLIST_ELE_T fd_forest_ref_t
279 0 : #define DLIST_NEXT next
280 0 : #define DLIST_PREV prev
281 : #include "../../util/tmpl/fd_dlist.c"
282 :
283 : #define POOL_NAME fd_forest_reqspool
284 0 : #define POOL_T fd_forest_ref_t
285 0 : #define POOL_NEXT hash
286 : #include "../../util/tmpl/fd_pool.c"
287 :
288 : /* Below for fast tracking of contiguous completes slots */
289 : #define MAP_NAME fd_forest_consumed
290 : #define MAP_ELE_T fd_forest_ref_t
291 0 : #define MAP_KEY idx
292 0 : #define MAP_NEXT hash
293 : #include "../../util/tmpl/fd_map_chain.c"
294 :
295 : #define DLIST_NAME fd_forest_conslist
296 : #define DLIST_ELE_T fd_forest_ref_t
297 0 : #define DLIST_NEXT next
298 0 : #define DLIST_PREV prev
299 : #include "../../util/tmpl/fd_dlist.c"
300 :
301 : #define POOL_NAME fd_forest_conspool
302 0 : #define POOL_T fd_forest_ref_t
303 0 : #define POOL_NEXT hash
304 : #include "../../util/tmpl/fd_pool.c"
305 :
306 : /* Reuse reqslist for orphan requests list, and share pool */
307 :
308 : /* Internal use only for BFSing */
309 : #define DEQUE_NAME fd_forest_deque
310 0 : #define DEQUE_T ulong
311 : #include "../../util/tmpl/fd_deque_dynamic.c"
312 :
313 :
314 : /* fd_forest_t is the top-level structure that holds the root of
315 : the tree, as well as the memory pools and map structures.
316 :
317 : These structures are bump-allocated and laid out contiguously in
318 : memory from the fd_forest_t * pointer which points to the
319 : beginning of the memory region.
320 :
321 : --------------------- <- fd_forest_t *
322 : | metadata |
323 : |-------------------|
324 : | pool |
325 : |-------------------|
326 : | ancestry |
327 : |-------------------|
328 : | frontier |
329 : |-------------------|
330 : | subtrees |
331 : |-------------------|
332 : | orphaned |
333 : |-------------------|
334 : | requests |
335 : |-------------------|
336 : | reqslist |
337 : |-------------------|
338 : | reqspool |
339 : |-------------------|
340 : | orphreqs |
341 : |-------------------|
342 : | orphlist (reqlist)|
343 : |-------------------|
344 : | consumed |
345 : |-------------------|
346 : | conspool |
347 : |-------------------|
348 : | deque |
349 : ---------------------
350 :
351 : A valid, initialized forest is always non-empty. After
352 : `fd_forest_init` the forest will always have a root ele unless
353 : modified improperly out of forest's API.*/
354 :
355 : struct fd_forest_iter {
356 : ulong ele_idx;
357 : uint shred_idx;
358 : ulong list_gaddr; /* wksp gaddr of the list this iterator corresponds to */
359 : };
360 : typedef struct fd_forest_iter fd_forest_iter_t;
361 : struct __attribute__((aligned(128UL))) fd_forest {
362 : ulong root; /* pool idx of the root */
363 : ulong wksp_gaddr; /* wksp gaddr of fd_forest in the backing wksp, non-zero gaddr */
364 : ulong ver_gaddr; /* wksp gaddr of version fseq, incremented on write ops */
365 : ulong pool_gaddr; /* wksp gaddr of fd_pool */
366 : ulong ancestry_gaddr; /* wksp_gaddr of fd_forest_ancestry */
367 : ulong frontier_gaddr; /* leaves that needs repair */
368 : ulong subtrees_gaddr; /* head of orphaned trees */
369 : ulong orphaned_gaddr; /* map of parent_slot to singly-linked list of ele orphaned by that parent slot */
370 :
371 : ulong subtlist_gaddr; /* wksp gaddr of fd_forest_subtlist - linkedlist of subtree elements*/
372 :
373 : /* Request trackers */
374 :
375 : ulong requests_gaddr; /* map of slot to pool idx of the completed repair frontier */
376 : ulong reqslist_gaddr; /* wksp gaddr of fd_forest_reqslist */
377 : ulong reqspool_gaddr; /* wksp gaddr of fd_forest_reqspool */
378 :
379 : ulong consumed_gaddr; /* wksp gaddr of fd_forest_consumed */
380 : ulong conslist_gaddr; /* wksp gaddr of fd_forest_conslist */
381 : ulong conspool_gaddr; /* wksp gaddr of fd_forest_conspool */
382 :
383 : ulong orphreqs_gaddr; /* wksp gaddr of fd_forest_orphreqs */
384 : ulong orphlist_gaddr; /* wksp gaddr of fd_forest_orphlist */
385 :
386 : fd_forest_iter_t iter; /* requests iterator corresponding to head of requests deque */
387 : fd_forest_iter_t orphiter; /* orphan requests iterator corresponding to head of orphan requests list */
388 :
389 : ulong deque_gaddr; /* wksp gaddr of fd_forest_deque. internal use only for BFSing */
390 : ulong magic; /* ==FD_FOREST_MAGIC */
391 : };
392 : typedef struct fd_forest fd_forest_t;
393 :
394 : FD_PROTOTYPES_BEGIN
395 :
396 : /* Constructors */
397 :
398 : /* fd_forest_{align,footprint} return the required alignment and
399 : footprint of a memory region suitable for use as forest with up to
400 : ele_max eles and vote_max votes. */
401 :
402 : FD_FN_CONST static inline ulong
403 0 : fd_forest_align( void ) {
404 0 : return alignof(fd_forest_t);
405 0 : }
406 :
407 : FD_FN_CONST static inline ulong
408 0 : fd_forest_footprint( ulong ele_max ) {
409 0 : return FD_LAYOUT_FINI(
410 0 : FD_LAYOUT_APPEND(
411 0 : FD_LAYOUT_APPEND(
412 0 : FD_LAYOUT_APPEND(
413 0 : FD_LAYOUT_APPEND(
414 0 : FD_LAYOUT_APPEND(
415 0 : FD_LAYOUT_APPEND(
416 0 : FD_LAYOUT_APPEND(
417 0 : FD_LAYOUT_APPEND(
418 0 : FD_LAYOUT_APPEND(
419 0 : FD_LAYOUT_APPEND(
420 0 : FD_LAYOUT_APPEND(
421 0 : FD_LAYOUT_APPEND(
422 0 : FD_LAYOUT_APPEND(
423 0 : FD_LAYOUT_APPEND(
424 0 : FD_LAYOUT_APPEND(
425 0 : FD_LAYOUT_APPEND(
426 0 : FD_LAYOUT_APPEND(
427 0 : FD_LAYOUT_INIT,
428 0 : alignof(fd_forest_t), sizeof(fd_forest_t) ),
429 0 : fd_fseq_align(), fd_fseq_footprint() ),
430 0 : fd_forest_pool_align(), fd_forest_pool_footprint ( ele_max ) ),
431 0 : fd_forest_ancestry_align(), fd_forest_ancestry_footprint( ele_max ) ),
432 0 : fd_forest_frontier_align(), fd_forest_frontier_footprint( ele_max ) ),
433 0 : fd_forest_subtrees_align(), fd_forest_subtrees_footprint( ele_max ) ),
434 0 : fd_forest_orphaned_align(), fd_forest_orphaned_footprint( ele_max ) ),
435 0 : fd_forest_subtlist_align(), fd_forest_subtlist_footprint( ) ),
436 :
437 0 : fd_forest_requests_align(), fd_forest_requests_footprint( ele_max ) ),
438 0 : fd_forest_reqslist_align(), fd_forest_reqslist_footprint( ) ),
439 0 : fd_forest_reqspool_align(), fd_forest_reqspool_footprint( ele_max ) ),
440 0 : fd_forest_consumed_align(), fd_forest_consumed_footprint( ele_max ) ),
441 0 : fd_forest_conslist_align(), fd_forest_conslist_footprint( ) ),
442 0 : fd_forest_conspool_align(), fd_forest_conspool_footprint( ele_max ) ),
443 0 : fd_forest_requests_align(), fd_forest_requests_footprint( ele_max ) ),
444 0 : fd_forest_reqslist_align(), fd_forest_reqslist_footprint( ) ),
445 0 : fd_forest_deque_align(), fd_forest_deque_footprint ( ele_max ) ),
446 0 : fd_forest_align() );
447 0 : }
448 :
449 : /* fd_forest_new formats an unused memory region for use as a
450 : forest. mem is a non-NULL pointer to this region in the local
451 : address space with the required footprint and alignment. */
452 :
453 : void *
454 : fd_forest_new( void * shmem, ulong ele_max, ulong seed );
455 :
456 : /* fd_forest_join joins the caller to the forest. forest
457 : points to the first byte of the memory region backing the forest
458 : in the caller's address space. Returns a pointer in the local
459 : address space to forest on success. */
460 :
461 : fd_forest_t *
462 : fd_forest_join( void * forest );
463 :
464 : /* fd_forest_leave leaves a current local join. Returns a pointer
465 : to the underlying shared memory region on success and NULL on failure
466 : (logs details). Reasons for failure include forest is NULL. */
467 :
468 : void *
469 : fd_forest_leave( fd_forest_t const * forest );
470 :
471 : /* fd_forest_delete unformats a memory region used as a
472 : forest. Assumes only the nobody is joined to the region.
473 : Returns a pointer to the underlying shared memory region or NULL if
474 : used obviously in error (e.g. forest is obviously not a
475 : forest ... logs details). The ownership of the memory region is
476 : transferred to the caller. */
477 :
478 : void *
479 : fd_forest_delete( void * forest );
480 :
481 : /* fd_forest_init initializes a forest. Assumes forest
482 : is a valid local join and no one else is joined. root is the initial
483 : root forest will use. This is the snapshot slot if booting from
484 : a snapshot, 0 if the genesis slot.
485 :
486 : In general, this should be called by the same process that formatted
487 : forest's memory, ie. the caller of fd_forest_new. */
488 :
489 : fd_forest_t *
490 : fd_forest_init( fd_forest_t * forest, ulong root );
491 :
492 : /* fd_forest_fini finishes an forest. Assumes forest is
493 : a valid local join and no one else is joined. */
494 :
495 : fd_forest_t *
496 : fd_forest_fini( fd_forest_t * forest );
497 :
498 : /* Accessors */
499 :
500 : /* fd_forest_wksp returns the local join to the wksp backing the
501 : forest. The lifetime of the returned pointer is at least as
502 : long as the lifetime of the local join. Assumes forest is a
503 : current local join. */
504 :
505 : FD_FN_PURE static inline fd_wksp_t *
506 0 : fd_forest_wksp( fd_forest_t const * forest ) {
507 0 : return (fd_wksp_t *)( ( (ulong)forest ) - forest->wksp_gaddr );
508 0 : }
509 :
510 : /* fd_forest_{ver, ver_const} returns the local join to the version
511 : number fseq. The lifetime of the returned pointer is at least as
512 : long as the lifetime of the local join. Assumes forest is a
513 : current local join. If value is ULONG_MAX, ghost is uninitialized or
514 : invalid. Query pre- & post-read:
515 :
516 : odd: if either pre or post is odd, discard read.
517 : even: if pre == post, read is consistent. */
518 :
519 : FD_FN_PURE static inline ulong *
520 0 : fd_forest_ver( fd_forest_t * forest ) {
521 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->ver_gaddr );
522 0 : }
523 :
524 : FD_FN_PURE static inline ulong const *
525 0 : fd_forest_ver_const( fd_forest_t const * forest ) {
526 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->ver_gaddr );
527 0 : }
528 :
529 : /* fd_forest_{pool, pool_const} returns a pointer in the caller's address
530 : space to forest's element pool. */
531 :
532 : FD_FN_PURE static inline fd_forest_blk_t *
533 0 : fd_forest_pool( fd_forest_t * forest ) {
534 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->pool_gaddr );
535 0 : }
536 :
537 : FD_FN_PURE static inline fd_forest_blk_t const *
538 0 : fd_forest_pool_const( fd_forest_t const * forest ) {
539 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->pool_gaddr );
540 0 : }
541 :
542 : /* fd_forest_{ancestry, ancestry_const} returns a pointer in the caller's
543 : address space to forest's ancestry map. */
544 :
545 : FD_FN_PURE static inline fd_forest_ancestry_t *
546 0 : fd_forest_ancestry( fd_forest_t * forest ) {
547 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->ancestry_gaddr );
548 0 : }
549 :
550 : FD_FN_PURE static inline fd_forest_ancestry_t const *
551 0 : fd_forest_ancestry_const( fd_forest_t const * forest ) {
552 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->ancestry_gaddr );
553 0 : }
554 :
555 : /* fd_forest_{frontier, frontier_const} returns a pointer in the caller's
556 : address space to forest's frontier map. */
557 :
558 : FD_FN_PURE static inline fd_forest_frontier_t *
559 0 : fd_forest_frontier( fd_forest_t * forest ) {
560 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->frontier_gaddr );
561 0 : }
562 :
563 : FD_FN_PURE static inline fd_forest_frontier_t const *
564 0 : fd_forest_frontier_const( fd_forest_t const * forest ) {
565 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->frontier_gaddr );
566 0 : }
567 :
568 : /* fd_forest_{subtrees, subtrees_const} returns a pointer in the caller's
569 : address space to forest's subtrees map. */
570 :
571 : FD_FN_PURE static inline fd_forest_subtrees_t *
572 0 : fd_forest_subtrees( fd_forest_t * forest ) {
573 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->subtrees_gaddr );
574 0 : }
575 :
576 : FD_FN_PURE static inline fd_forest_subtrees_t const *
577 0 : fd_forest_subtrees_const( fd_forest_t const * forest ) {
578 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->subtrees_gaddr );
579 0 : }
580 :
581 : /* fd_forest_{subtlist, subtlist_const} returns a pointer in the caller's
582 : address space to forest's subtlist. */
583 :
584 : FD_FN_PURE static inline fd_forest_subtlist_t *
585 0 : fd_forest_subtlist( fd_forest_t * forest ) {
586 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->subtlist_gaddr );
587 0 : }
588 :
589 : FD_FN_PURE static inline fd_forest_subtlist_t const *
590 0 : fd_forest_subtlist_const( fd_forest_t const * forest ) {
591 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->subtlist_gaddr );
592 0 : }
593 :
594 : /* fd_forest_{orphaned, orphaned_const} returns a pointer in the caller's
595 : address space to forest's orphaned map. */
596 :
597 : FD_FN_PURE static inline fd_forest_orphaned_t *
598 0 : fd_forest_orphaned( fd_forest_t * forest ) {
599 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->orphaned_gaddr );
600 0 : }
601 :
602 : FD_FN_PURE static inline fd_forest_orphaned_t const *
603 0 : fd_forest_orphaned_const( fd_forest_t const * forest ) {
604 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->orphaned_gaddr );
605 0 : }
606 :
607 : /* fd_forest_{consumed, consumed_const} returns a pointer in the caller's
608 : address space to forest's consumed map. */
609 :
610 : FD_FN_PURE static inline fd_forest_consumed_t *
611 0 : fd_forest_consumed( fd_forest_t * forest ) {
612 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->consumed_gaddr );
613 0 : }
614 :
615 : FD_FN_PURE static inline fd_forest_consumed_t const *
616 0 : fd_forest_consumed_const( fd_forest_t const * forest ) {
617 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->consumed_gaddr );
618 0 : }
619 :
620 : /* fd_forest_{conslist, conslist_const} returns a pointer in the caller's
621 : address space to forest's consumed list. */
622 :
623 : FD_FN_PURE static inline fd_forest_conslist_t *
624 0 : fd_forest_conslist( fd_forest_t * forest ) {
625 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->conslist_gaddr );
626 0 : }
627 :
628 : FD_FN_PURE static inline fd_forest_conslist_t const *
629 0 : fd_forest_conslist_const( fd_forest_t const * forest ) {
630 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->conslist_gaddr );
631 0 : }
632 :
633 : /* fd_forest_{conspool, conspool_const} returns a pointer in the caller's
634 : address space to forest's consumed pool. */
635 :
636 : FD_FN_PURE static inline fd_forest_ref_t *
637 0 : fd_forest_conspool( fd_forest_t * forest ) {
638 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->conspool_gaddr );
639 0 : }
640 :
641 : FD_FN_PURE static inline fd_forest_ref_t const *
642 0 : fd_forest_conspool_const( fd_forest_t const * forest ) {
643 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->conspool_gaddr );
644 0 : }
645 :
646 : /* fd_forest_{requests, requests_const} returns a pointer in the caller's
647 : address space to forest's requests map. */
648 :
649 : FD_FN_PURE static inline fd_forest_requests_t *
650 0 : fd_forest_requests( fd_forest_t * forest ) {
651 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->requests_gaddr );
652 0 : }
653 :
654 : FD_FN_PURE static inline fd_forest_requests_t const *
655 0 : fd_forest_requests_const( fd_forest_t const * forest ) {
656 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->requests_gaddr );
657 0 : }
658 :
659 : /* fd_forest_{reqslist, reqslist_const} returns a pointer in the caller's
660 : address space to forest's reqslist. */
661 :
662 : FD_FN_PURE static inline fd_forest_reqslist_t *
663 0 : fd_forest_reqslist( fd_forest_t * forest ) {
664 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->reqslist_gaddr );
665 0 : }
666 :
667 : FD_FN_PURE static inline fd_forest_reqslist_t const *
668 0 : fd_forest_reqslist_const( fd_forest_t const * forest ) {
669 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->reqslist_gaddr );
670 0 : }
671 :
672 : /* fd_forest_{orphreqs, orphanreqs_const} returns a pointer in the caller's
673 : address space to forest's orphanreqs. */
674 :
675 : FD_FN_PURE static inline fd_forest_requests_t *
676 0 : fd_forest_orphreqs( fd_forest_t * forest ) {
677 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->orphreqs_gaddr );
678 0 : }
679 :
680 : FD_FN_PURE static inline fd_forest_requests_t const *
681 0 : fd_forest_orphreqs_const( fd_forest_t const * forest ) {
682 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->orphreqs_gaddr );
683 0 : }
684 :
685 : /* fd_forest_{orphlist, orphanlist_const} returns a pointer in the caller's
686 : address space to forest's orphanlist. */
687 :
688 : FD_FN_PURE static inline fd_forest_reqslist_t *
689 0 : fd_forest_orphlist( fd_forest_t * forest ) {
690 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->orphlist_gaddr );
691 0 : }
692 :
693 : FD_FN_PURE static inline fd_forest_reqslist_t const *
694 0 : fd_forest_orphlist_const( fd_forest_t const * forest ) {
695 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->orphlist_gaddr );
696 0 : }
697 :
698 : /* fd_forest_{reqspool, reqspool_const} returns a pointer in the caller's
699 : address space to forest's reqspool pool. */
700 :
701 : FD_FN_PURE static inline fd_forest_ref_t *
702 0 : fd_forest_reqspool( fd_forest_t * forest ) {
703 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->reqspool_gaddr );
704 0 : }
705 :
706 : FD_FN_PURE static inline fd_forest_ref_t const *
707 0 : fd_forest_reqspool_const( fd_forest_t const * forest ) {
708 0 : return fd_wksp_laddr_fast( fd_forest_wksp( forest ), forest->reqspool_gaddr );
709 0 : }
710 :
711 : /* fd_forest_root_slot returns forest's root slot. Assumes
712 : forest is a current local join. */
713 :
714 : FD_FN_PURE static inline ulong
715 0 : fd_forest_root_slot( fd_forest_t const * forest ) {
716 0 : if( FD_UNLIKELY( forest->root == fd_forest_pool_idx_null( fd_forest_pool_const( forest ) ) )) return ULONG_MAX; /* uninitialized */
717 0 : return fd_forest_pool_ele_const( fd_forest_pool_const( forest ), forest->root )->slot;
718 0 : }
719 :
720 : fd_forest_blk_t *
721 : fd_forest_query( fd_forest_t * forest, ulong slot );
722 :
723 : /* Operations */
724 :
725 : /* fd_forest_blk_insert inserts a new block into the forest. Assumes
726 : slot >= forest->smr, and the blk pool has a free element (if
727 : handholding is enabled, explicitly checks and errors). This blk
728 : insert is idempotent, and can be called multiple times with the same
729 : slot. Returns the inserted forest ele. */
730 :
731 : fd_forest_blk_t *
732 : fd_forest_blk_insert( fd_forest_t * forest, ulong slot, ulong parent_slot, ulong * evicted );
733 :
734 0 : #define SHRED_SRC_TURBINE 0
735 0 : #define SHRED_SRC_REPAIR 1
736 0 : #define SHRED_SRC_RECOVERED 2
737 :
738 : /* fd_forest_shred_insert inserts a new shred into the forest.
739 : Assumes slot is already in forest, and should typically be called
740 : directly after fd_forest_block_insert. Returns the forest ele
741 : corresponding to the shred slot. */
742 :
743 : fd_forest_blk_t *
744 : fd_forest_data_shred_insert( fd_forest_t * forest,
745 : ulong slot,
746 : ulong parent_slot,
747 : uint shred_idx,
748 : uint fec_set_idx,
749 : int slot_complete,
750 : int ref_tick,
751 : int src,
752 : fd_hash_t * mr,
753 : fd_hash_t * cmr );
754 :
755 : /* TODO: Does merkle validation need to happen for coding shreds as well*/
756 : fd_forest_blk_t *
757 : fd_forest_code_shred_insert( fd_forest_t * forest, ulong slot, uint shred_idx );
758 :
759 : /* fd_forest_fec_insert inserts a new fully completed FEC set into the
760 : forest. Assumes slot is already in forest, and should typically be
761 : called directly after fd_forest_block_insert. Returns the forest ele
762 : corresponding to the shred slot. */
763 :
764 : fd_forest_blk_t *
765 : fd_forest_fec_insert( fd_forest_t * forest,
766 : ulong slot,
767 : ulong parent_slot,
768 : uint last_shred_idx,
769 : uint fec_set_idx,
770 : int slot_complete,
771 : int ref_tick,
772 : fd_hash_t * mr,
773 : fd_hash_t * cmr );
774 :
775 : /* fd_forest_fec_clear clears the FEC set at the given slot and
776 : fec_set_idx.
777 : Can fec_clear break requests frontier invariants? No.
778 :
779 : 2) If slot n is in scope of the forest root, then the shred
780 : delivered to repair will trigger a data_shred_insert call
781 : that does nothing, as repair already has record of that
782 : shred. Eventually the fec_completes or fec_clear msg will be
783 : delivered to repair. fec_insert will do nothing. fec_clear
784 : will remove the idxs for the shreds from the bitset, and
785 : update the buffered_idx. This doesn't matter though! because
786 : we already have moved past slot n on the requests frontier.
787 : No need to request those shreds again.
788 :
789 : Except 2) breaks a bit with in specific leader slot cases. See
790 : fd_forest_fec_clear for more details. */
791 : void
792 : fd_forest_fec_clear( fd_forest_t * forest, ulong slot, uint fec_set_idx, uint max_shred_idx );
793 :
794 : /* fd_forest_fec_chain_verify verifies the chain of merkle roots for a given block.
795 : Returns a pointer to the first slot that does not confirm, or NULL if the chain is valid. */
796 : fd_forest_blk_t *
797 : fd_forest_fec_chain_verify( fd_forest_t * forest, fd_forest_blk_t * ele, fd_hash_t const * mr );
798 :
799 : void
800 : fd_forest_confirm( fd_forest_t * forest, fd_forest_blk_t * ele, fd_hash_t const * bid );
801 :
802 : static inline uint
803 0 : fd_forest_merkle_last_incorrect_idx( fd_forest_blk_t * ele ) {
804 0 : ulong first_verified_fec = ele->lowest_verified_fec;
805 : /* UNLIKELY because this is being called because we've detected an incorrect FEC */
806 0 : if( FD_UNLIKELY( first_verified_fec == 0 ) ) return UINT_MAX;
807 :
808 0 : uint bad_fec_idx = first_verified_fec == UINT_MAX ? ele->complete_idx / 32UL /* last FEC is wrong */
809 0 : : (uint)first_verified_fec - 1;
810 0 : return bad_fec_idx * 32UL;
811 0 : }
812 :
813 : /* fd_forest_publish publishes slot as the new forest root, setting
814 : the subtree beginning from slot as the new forest tree (ie. slot
815 : and all its descendants). Prunes all eles not in slot's forest.
816 : Assumes slot is present in forest. Returns the new root. */
817 :
818 : fd_forest_blk_t const *
819 : fd_forest_publish( fd_forest_t * forest, ulong slot );
820 :
821 : /* fd_forest_highest_repaired_slot returns the highest child of a fully,
822 : contiguously repaired slot. */
823 : ulong
824 : fd_forest_highest_repaired_slot( fd_forest_t const * forest );
825 :
826 : /* fd_forest_iter_* takes either the standard iterator or the orphan
827 : iterator and returns the next shred to request. The iterator must
828 : one of the two iterators that is owned by the forest.
829 :
830 : The iterator will be in an iter_done state if there are no current
831 : shreds to request.
832 :
833 : The forward forest iterator will visit each shred at most once over
834 : the lifetime of the forest, without revisiting past shreds, so it is
835 : up to the caller to track which shreds will need re-requesting. The
836 : exception to the rule is slots where the slot_complete shred is still
837 : not known - the highest window idx will be requested for that slot,
838 : and the slot will be added to the tail of the requests deque so that
839 : later we may revisit it. As a result, the children of that slot may
840 : also be revisited multiple times.
841 :
842 : Note this case is pretty rare.
843 :
844 : An iterator signifies to the repair tile to request the
845 : highest_window_index when the ele_idx is not null and shred_idx is
846 : UINT_MAX.
847 :
848 : Otherwise, the iterator signifies to the repair tile to request a
849 : regular shred window_idx.
850 :
851 : Invariants for requests map and requests deque:
852 :
853 : There can only be one occurence of the slot in the requests deque at
854 : any time. Any slot in the requests deque must exist in the requests
855 : map, and vice versa. Any slot in the requests map must also exist in
856 : the forest. During publish the requests map must also be pruned.
857 :
858 : If we are mid-request of a slot that gets pruned, forest will take
859 : responsibility to update the iterator to a valid slot.
860 :
861 : TODO: should this really be an iterator?? or just a _next function? */
862 :
863 : fd_forest_iter_t *
864 : fd_forest_iter_next( fd_forest_iter_t * iter, fd_forest_t * forest );
865 :
866 : int
867 : fd_forest_iter_done( fd_forest_iter_t * iter, fd_forest_t * forest );
868 :
869 : /* Misc */
870 :
871 : /* fd_forest_verify checks the forest is not obviously corrupt.
872 : Returns 0 if verify succeeds, -1 otherwise. */
873 :
874 : int
875 : fd_forest_verify( fd_forest_t const * forest );
876 :
877 : /* fd_forest_print pretty-prints a formatted forest tree. Printing begins
878 : from `ele` (it will appear as the root in the print output).
879 :
880 : The most straightforward and commonly used printing pattern is:
881 : `fd_forest_print( forest, fd_forest_root( forest ) )`
882 :
883 : This would print forest beginning from the root.
884 :
885 : Alternatively, caller can print a more localized view, for example
886 : starting from the grandparent of the most recently executed slot:
887 :
888 : ```
889 : fd_forest_blk_t const * ele = fd_forest_query( slot );
890 : fd_forest_print( forest, fd_forest_parent( fd_forest_parent( ele ) ) )
891 : ```
892 :
893 : Callers should add null-checks as appropriate in actual usage. */
894 :
895 : void
896 : fd_forest_print( fd_forest_t const * forest );
897 :
898 : FD_PROTOTYPES_END
899 :
900 : #endif /* HEADER_fd_src_discof_forest_fd_forest_h */
|