Line data Source code
1 : #ifndef HEADER_fd_src_discof_poh_fd_poh_h
2 : #define HEADER_fd_src_discof_poh_fd_poh_h
3 :
4 : /* Let's say there was a computer, the "leader" computer, that acted as
5 : a bank. Users could send it messages saying they wanted to deposit
6 : money, or transfer it to someone else.
7 :
8 : That's how, for example, Bank of America works but there are problems
9 : with it. One simple problem is: the bank can set your balance to
10 : zero if they don't like you.
11 :
12 : You could try to fix this by having the bank periodically publish the
13 : list of all account balances and transactions. If the customers add
14 : unforgeable signatures to their deposit slips and transfers, then
15 : the bank cannot zero a balance without it being obvious to everyone.
16 :
17 : There's still problems. The bank can't lie about your balance now or
18 : take your money, but it can just not accept deposits on your behalf
19 : by ignoring you.
20 :
21 : You could fix this by getting a few independent banks together, lets
22 : say Bank of America, Bank of England, and Westpac, and having them
23 : rotate who operates the leader computer periodically. If one bank
24 : ignores your deposits, you can just wait and send them to the next
25 : one.
26 :
27 : This is Solana.
28 :
29 : There's still problems of course but they are largely technical. How
30 : do the banks agree who is leader? How do you recover if a leader
31 : misbehaves? How do customers verify the transactions aren't forged?
32 : How do banks receive and publish and verify each others work quickly?
33 : These are the main technical innovations that enable Solana to work
34 : well.
35 :
36 : What about Proof of History?
37 :
38 : One particular niche problem is about the leader schedule. When the
39 : leader computer is moving from one bank to another, the new bank must
40 : wait for the old bank to say it's done and provide a final list of
41 : balances that it can start working off of. But: what if the computer
42 : at the old bank crashes and never says its done?
43 :
44 : Does the new leader just take over at some point? What if the new
45 : leader is malicious, and says the past thousand leaders crashed, and
46 : there have been no transactions for days? How do you check?
47 :
48 : This is what Proof of History solves. Each bank in the network must
49 : constantly do a lot of busywork (compute hashes), even when it is not
50 : leader.
51 :
52 : If the prior thousand leaders crashed, and no transactions happened
53 : in an hour, the new leader would have to show they did about an hour
54 : of busywork for everyone else to believe them.
55 :
56 : A better name for this is proof of skipping. If a leader is skipping
57 : slots (building off of a slot that is not the direct parent), it must
58 : prove that it waited a good amount of time to do so.
59 :
60 : It's not a perfect solution. For one thing, some banks have really
61 : fast computers and can compute a lot of busywork in a short amount of
62 : time, allowing them to skip prior slot(s) anyway. But: there is a
63 : social component that prevents validators from skipping the prior
64 : leader slot. It is easy to detect when this happens and the network
65 : could respond by ignoring their votes or stake.
66 :
67 : You could come up with other schemes: for example, the network could
68 : just use wall clock time. If a new leader publishes a block without
69 : waiting 400 milliseconds for the prior slot to complete, then there
70 : is no "proof of skipping" and the nodes ignore the slot.
71 :
72 : These schemes have a problem in that they are not deterministic
73 : across the network (different computers have different clocks), and
74 : so they will cause frequent forks which are very expensive to
75 : resolve. Even though the proof of history scheme is not perfect,
76 : it is better than any alternative which is not deterministic.
77 :
78 : With all that background, we can now describe at a high level what
79 : this PoH tile actually does,
80 :
81 : (1) Whenever any other leader in the network finishes a slot, and
82 : the slot is determined to be the best one to build off of, this
83 : tile gets "reset" onto that block, the so called "reset slot".
84 :
85 : (2) The tile is constantly doing busy work, hash(hash(hash(...))) on
86 : top of the last reset slot, even when it is not leader.
87 :
88 : (3) When the tile becomes leader, it continues hashing from where it
89 : was. Typically, the prior leader finishes their slot, so the
90 : reset slot will be the parent one, and this tile only publishes
91 : hashes for its own slot. But if prior slots were skipped, then
92 : there might be a whole chain already waiting.
93 :
94 : That's pretty much it. When we are leader, in addition to doing
95 : busywork, we publish ticks and microblocks to the shred tile. A
96 : microblock is a non-empty group of transactions whose hashes are
97 : mixed-in to the chain, while a tick is a periodic stamp of the
98 : current hash, with no transactions (nothing mixed in). We need
99 : to send both to the shred tile, as ticks are important for other
100 : validators to verify in parallel.
101 :
102 : As well, the tile should never become leader for a slot that it has
103 : published anything for, otherwise it may create a duplicate block.
104 :
105 : Some particularly common misunderstandings:
106 :
107 : - PoH is critical to security.
108 :
109 : This largely isn't true. The target hash rate of the network is
110 : so slow (1 hash per 500 nanoseconds) that a malicious leader can
111 : easily catch up if they start from an old hash, and the only
112 : practical attack prevented is the proof of skipping. Most of the
113 : long range attacks in the Solana whitepaper are not relevant.
114 :
115 : - PoH keeps passage of time.
116 :
117 : This is also not true. The way the network keeps time so it can
118 : decide who is leader is that, each leader uses their operating
119 : system clock to time 400 milliseconds and publishes their block
120 : when this timer expires.
121 :
122 : If a leader just hashed as fast as they could, they could publish
123 : a block in tens of milliseconds, and the rest of the network
124 : would happily accept it. This is why the Solana "clock" as
125 : determined by PoH is not accurate and drifts over time.
126 :
127 : - PoH prevents transaction reordering by the leader.
128 :
129 : The leader can, in theory, wait until the very end of their
130 : leader slot to publish anything at all to the network. They can,
131 : in particular, hold all received transactions for 400
132 : milliseconds and then reorder and publish some right at the end
133 : to advantage certain transactions.
134 :
135 : You might be wondering... if all the PoH chain is helping us do is
136 : prove that slots were skipped correctly, why do we need to "mix in"
137 : transactions to the hash value? Or do anything at all for slots
138 : where we don't skip the prior slot?
139 :
140 : It's a good question, and the answer is that this behavior is not
141 : necessary. An ideal implementation of PoH have no concept of ticks
142 : or mixins, and would not be part of the TPU pipeline at all.
143 : Instead, there would be a simple field "skip_proof" on the last
144 : shred we send for a slot, the hash(hash(...)) value. This field
145 : would only be filled in (and only verified by replayers) in cases
146 : where the slot actually skipped a parent.
147 :
148 : Then what is the "clock? In Solana, time is constructed as follows:
149 :
150 : HASHES
151 :
152 : The base unit of time is a hash. Hereafter, any values whose
153 : units are in hashes are called a "hashcnt" to distinguish them
154 : from actual hashed values.
155 :
156 : Agave generally defines a constant duration for each tick
157 : (see below) and then varies the number of hashcnt per tick, but
158 : as we consider the hashcnt the base unit of time, Firedancer and
159 : this PoH implementation defines everything in terms of hashcnt
160 : duration instead.
161 :
162 : In mainnet-beta, testnet, and devnet the hashcnt ticks over
163 : (increments) every 100 nanoseconds. The hashcnt rate is
164 : specified as 500 nanoseconds according to the genesis, but there
165 : are several features which increase the number of hashes per
166 : tick while keeping tick duration constant, which make the time
167 : per hashcnt lower. These features up to and including the
168 : `update_hashes_per_tick6` feature are activated on mainnet-beta,
169 : devnet, and testnet, and are described in the TICKS section
170 : below.
171 :
172 : Other chains and development environments might have a different
173 : hashcnt rate in the genesis, or they might not have activated
174 : the features which increase the rate yet, which we also support.
175 :
176 : In practice, although each validator follows a hashcnt rate of
177 : 100 nanoseconds, the overall observed hashcnt rate of the
178 : network is a little slower than once every 100 nanoseconds,
179 : mostly because there are gaps and clock synchronization issues
180 : during handoff between leaders. This is referred to as clock
181 : drift.
182 :
183 : TICKS
184 :
185 : The leader needs to periodically checkpoint the hash value
186 : associated with a given hashcnt so that they can publish it to
187 : other nodes for verification.
188 :
189 : On mainnet-beta, testnet, and devnet this occurs once every
190 : 62,500 hashcnts, or approximately once every 6.4 microseconds.
191 : This value is determined at genesis time, and according to the
192 : features below, and could be different in development
193 : environments or on other chains which we support.
194 :
195 : Due to protocol limitations, when mixing in transactions to the
196 : proof-of-history chain, it cannot occur on a tick boundary (but
197 : can occur at any other hashcnt).
198 :
199 : Ticks exist mainly so that verification can happen in parallel.
200 : A verifier computer, rather than needing to do hash(hash(...))
201 : all in sequence to verify a proof-of-history chain, can do,
202 :
203 : Core 0: hash(hash(...))
204 : Core 1: hash(hash(...))
205 : Core 2: hash(hash(...))
206 : Core 3: hash(hash(...))
207 : ...
208 :
209 : Between each pair of tick boundaries.
210 :
211 : Solana sometimes calls the current tick the "tick height",
212 : although it makes more sense to think of it as a counter from
213 : zero, it's just the number of ticks since the genesis hash.
214 :
215 : There is a set of features which increase the number of hashcnts
216 : per tick. These are all deployed on mainnet-beta, devnet, and
217 : testnet.
218 :
219 : name: update_hashes_per_tick
220 : id: 3uFHb9oKdGfgZGJK9EHaAXN4USvnQtAFC13Fh5gGFS5B
221 : hashes per tick: 12,500
222 : hashcnt duration: 500 nanos
223 :
224 : name: update_hashes_per_tick2
225 : id: EWme9uFqfy1ikK1jhJs8fM5hxWnK336QJpbscNtizkTU
226 : hashes per tick: 17,500
227 : hashcnt duration: 357.142857143 nanos
228 :
229 : name: update_hashes_per_tick3
230 : id: 8C8MCtsab5SsfammbzvYz65HHauuUYdbY2DZ4sznH6h5
231 : hashes per tick: 27,500
232 : hashcnt duration: 227.272727273 nanos
233 :
234 : name: update_hashes_per_tick4
235 : id: 8We4E7DPwF2WfAN8tRTtWQNhi98B99Qpuj7JoZ3Aikgg
236 : hashes per tick: 47,500
237 : hashcnt duration: 131.578947368 nanos
238 :
239 : name: update_hashes_per_tick5
240 : id: BsKLKAn1WM4HVhPRDsjosmqSg2J8Tq5xP2s2daDS6Ni4
241 : hashes per tick: 57,500
242 : hashcnt duration: 108.695652174 nanos
243 :
244 : name: update_hashes_per_tick6
245 : id: FKu1qYwLQSiehz644H6Si65U5ZQ2cp9GxsyFUfYcuADv
246 : hashes per tick: 62,500
247 : hashcnt duration: 100 nanos
248 :
249 : In development environments, there is a way to configure the
250 : hashcnt per tick to be "none" during genesis, for a so-called
251 : "low power" tick producer. The idea is not to spin cores during
252 : development. This is equivalent to setting the hashcnt per tick
253 : to be 1, and increasing the hashcnt duration to the desired tick
254 : duration.
255 :
256 : SLOTS
257 :
258 : Each leader needs to be leader for a fixed amount of time, which
259 : is called a slot. During a slot, a leader has an opportunity to
260 : receive transactions and produce a block for the network,
261 : although they may miss ("skip") the slot if they are offline or
262 : not behaving.
263 :
264 : In mainnet-beta, testnet, and devnet a slot is 64 ticks, or
265 : 4,000,000 hashcnts, or approximately 400 milliseconds.
266 :
267 : Due to the way the leader schedule is constructed, each leader
268 : is always given at least four (4) consecutive slots in the
269 : schedule. This means when becoming leader you will be leader
270 : for at least 4 slots, or 1.6 seconds.
271 :
272 : It is rare, although can happen that a leader gets more than 4
273 : consecutive slots (eg, 8, or 12), if they are lucky with the
274 : leader schedule generation.
275 :
276 : The number of ticks in a slot is fixed at genesis time, and
277 : could be different for development or other chains, which we
278 : support. There is nothing special about 4 leader slots in a
279 : row, and this might be changed in future, and the proof of
280 : history makes no assumptions that this is the case.
281 :
282 : EPOCHS
283 :
284 : Infrequently, the network needs to do certain housekeeping,
285 : mainly things like collecting rent and deciding on the leader
286 : schedule. The length of an epoch is fixed on mainnet-beta,
287 : devnet and testnet at 420,000 slots, or around ~2 (1.94) days.
288 : This value is fixed at genesis time, and could be different for
289 : other chains including development, which we support. Typically
290 : in development, epochs are every 8,192 slots, or around ~1 hour
291 : (54.61 minutes), although it depends on the number of ticks per
292 : slot and the target hashcnt rate of the genesis as well.
293 :
294 : In development, epochs need not be a fixed length either. There
295 : is a "warmup" option, where epochs start short and grow, which
296 : is useful for quickly warming up stake during development.
297 :
298 : The epoch is important because it is the only time the leader
299 : schedule is updated. The leader schedule is a list of which
300 : leader is leader for which slot, and is generated by a special
301 : algorithm that is deterministic and known to all nodes.
302 :
303 : The leader schedule is computed one epoch in advance, so that
304 : at slot T, we always know who will be leader up until the end
305 : of slot T+EPOCH_LENGTH. Specifically, the leader schedule for
306 : epoch N is computed during the epoch boundary crossing from
307 : N-2 to N-1. For mainnet-beta, the slots per epoch is fixed and
308 : will always be 420,000. */
309 :
310 : #include "../../disco/pack/fd_pack.h"
311 : #include "../../disco/stem/fd_stem.h"
312 : #include "../../util/fd_util_base.h"
313 : #include "../../ballet/sha256/fd_sha256.h"
314 :
315 : /* FD_POH_ALIGN is the alignment needed for a memory region to hold a
316 : fd_poh_t. It is a positive integer power of 2. */
317 0 : #define FD_POH_ALIGN (128UL)
318 :
319 0 : #define FD_POH_MAGIC (0xF17EDA2CE580A000) /* FIREDANCE POH V0 */
320 :
321 : /* The maximum number of microblocks that pack is allowed to pack into a
322 : single slot. This is not consensus critical, and pack could, if we
323 : let it, produce as many microblocks as it wants, and the slot would
324 : still be valid.
325 :
326 : We have this here instead so that PoH can estimate slot completion,
327 : and keep the hashcnt up to date as pack progresses through packing
328 : the slot. If this upper bound was not enforced, PoH could tick to
329 : the last hash of the slot and have no hashes left to mixin incoming
330 : microblocks from pack, so this upper bound is a coordination
331 : mechanism so that PoH can progress hashcnts while the slot is active,
332 : and know that pack will not need those hashcnts later to do mixins. */
333 0 : #define MAX_MICROBLOCKS_PER_SLOT (131072UL)
334 :
335 : /* When we are hashing in the background in case a prior leader skips
336 : their slot, we need to store the result of each tick hash so we can
337 : publish them when we become leader. The network requires at least
338 : one leader slot to publish in each epoch for the leader schedule to
339 : generate, so in the worst case we might need two full epochs of slots
340 : to store the hashes. (Eg, if epoch T only had a published slot in
341 : position 0 and epoch T+1 only had a published slot right at the end).
342 :
343 : There is a tighter bound: the block data limit of mainnet-beta is
344 : currently FD_PACK_MAX_DATA_PER_BLOCK, or 27,332,342 bytes per slot.
345 : At 48 bytes per tick, it is not possible to publish a slot that skips
346 : 569,424 or more prior slots. */
347 0 : #define MAX_SKIPPED_TICKS (1UL+(FD_PACK_MAX_DATA_PER_BLOCK/48UL))
348 :
349 : struct fd_poh_leader_slot_ended {
350 : int completed;
351 : ulong slot;
352 : uchar blockhash[ 32UL ];
353 : };
354 :
355 : typedef struct fd_poh_leader_slot_ended fd_poh_leader_slot_ended_t;
356 :
357 : struct fd_poh_out_private {
358 : ulong idx;
359 : fd_wksp_t * mem;
360 : ulong chunk0;
361 : ulong wmark;
362 : ulong chunk;
363 : };
364 :
365 : typedef struct fd_poh_out_private fd_poh_out_t;
366 :
367 : struct __attribute__((aligned(FD_POH_ALIGN))) fd_poh_private {
368 : int state;
369 :
370 : /* Static configuration determined at genesis creation time. See
371 : long comment above for more information. */
372 : ulong tick_duration_ns;
373 : ulong hashcnt_per_tick;
374 : ulong ticks_per_slot;
375 :
376 : /* Derived from the above configuration, but we precompute it. */
377 : double slot_duration_ns;
378 : double hashcnt_duration_ns;
379 : ulong hashcnt_per_slot;
380 :
381 : /* The maximum number of real microblocks that the pack tile is
382 : allowed to publish in each slot.
383 :
384 : While we are leader, PoH internally treats this limit as having
385 : one extra phantom "microblock" reserved for the done_packing
386 : message, so that PoH does not finish the slot before pack
387 : confirms it is done. Pack itself is configured with the
388 : un-inflated limit and never publishes more than this many real
389 : microblocks per slot. */
390 : ulong max_microblocks_per_slot;
391 :
392 : /* The block id of the completed block. */
393 : uchar completed_block_id[ 32UL ];
394 :
395 : /* The slot we were reset on (what we are building on top of). */
396 : ulong reset_slot;
397 : long reset_slot_start_ns;
398 :
399 : /* The current slot and hashcnt within that slot of the proof of
400 : history, including hashes we have been producing in the background
401 : while waiting for our next leader slot. */
402 : ulong slot;
403 : ulong hashcnt;
404 :
405 : ulong next_leader_slot;
406 : long leader_slot_start_ns;
407 :
408 : /* When we send a microblock on to the shred tile, we need to tell
409 : it how many hashes there have been since the last microblock, so
410 : this tracks the hashcnt of the last published microblock.
411 :
412 : If we are skipping slots prior to our leader slot, the last_slot
413 : will be quite old, and potentially much larger than the number of
414 : hashcnts in one slot. */
415 : ulong last_slot;
416 : ulong last_hashcnt;
417 :
418 : /* The PoH tile must never drop microblocks that get committed by the
419 : bank, so it needs to always be able to mixin a microblock hash.
420 : Mixing in requires incrementing the hashcnt, so we need to ensure
421 : at all times that there is enough hascnts left in the slot to
422 : mixin whatever future microblocks pack might produce for it.
423 :
424 : This value tracks that. At any time, max_microblocks_per_slot
425 : - microblocks_lower_bound is an upper bound on the maximum number
426 : of microblocks that might still be received in this slot. */
427 : ulong microblocks_lower_bound;
428 :
429 : uchar __attribute__((aligned(32UL))) reset_hash[ 32 ];
430 : uchar __attribute__((aligned(32UL))) hash[ 32 ];
431 :
432 : /* When we are not leader, we need to save the hashes that were
433 : produced in case the prior leader skips. If they skip, we will
434 : replay these skipped hashes into our next leader bank so that
435 : the slot hashes sysvar can be updated correctly, and also publish
436 : them to peer nodes as part of our outgoing shreds. */
437 : uchar skipped_tick_hashes[ MAX_SKIPPED_TICKS ][ 32 ];
438 :
439 : fd_sha256_t * sha256;
440 :
441 : fd_poh_out_t shred_out[ 1 ];
442 : fd_poh_out_t replay_out[ 1 ];
443 :
444 : ulong magic;
445 : };
446 :
447 : typedef struct fd_poh_private fd_poh_t;
448 :
449 : FD_PROTOTYPES_BEGIN
450 :
451 : FD_FN_CONST ulong
452 : fd_poh_align( void );
453 :
454 : FD_FN_CONST ulong
455 : fd_poh_footprint( void );
456 :
457 : void *
458 : fd_poh_new( void * shmem );
459 :
460 : fd_poh_t *
461 : fd_poh_join( void * shpoh,
462 : fd_poh_out_t * shred_out,
463 : fd_poh_out_t * replay_out );
464 :
465 : void
466 : fd_poh_reset( fd_poh_t * poh,
467 : fd_stem_context_t * stem,
468 : long timestamp,
469 : ulong hashcnt_per_tick,
470 : ulong ticks_per_slot,
471 : ulong tick_duration_ns,
472 : ulong completed_slot,
473 : uchar const * completed_blockhash,
474 : ulong next_leader_slot,
475 : ulong max_microblocks_in_slot,
476 : uchar const * completed_block_id );
477 :
478 : int
479 : fd_poh_have_leader_bank( fd_poh_t const * poh );
480 :
481 : int
482 : fd_poh_hashing_to_leader_slot( fd_poh_t const * poh );
483 :
484 : int
485 : fd_poh_must_tick( fd_poh_t const * poh );
486 :
487 : int
488 : fd_poh_must_publish_skipped_tick( fd_poh_t const * poh );
489 :
490 : void
491 : fd_poh_begin_leader( fd_poh_t * poh,
492 : ulong slot,
493 : ulong hashcnt_per_tick,
494 : ulong ticks_per_slot,
495 : ulong tick_duration_ns,
496 : ulong max_microblocks_in_slot );
497 :
498 : void
499 : fd_poh_done_packing( fd_poh_t * poh,
500 : ulong microblocks_in_slot );
501 :
502 : void
503 : fd_poh_advance( fd_poh_t * poh,
504 : fd_stem_context_t * stem,
505 : int * opt_poll_in,
506 : int * charge_busy );
507 :
508 : void
509 : fd_poh1_mixin( fd_poh_t * poh,
510 : fd_stem_context_t * stem,
511 : ulong slot,
512 : uchar const * hash,
513 : ulong txn_cnt,
514 : fd_txn_p_t const * txns );
515 :
516 : FD_PROTOTYPES_END
517 :
518 : #endif /* HEADER_fd_src_discof_poh_fd_poh_h */
|