fuzz: model chanmon mempool mining#4657
Conversation
|
👋 Thanks for assigning @jkczyz as a reviewer! |
|
@wpaulino FYI, this may intersect with the existing splice fuzzing failures you’re looking at. This PR makes splice transaction mining in |
b9a40e5 to
bbbd224
Compare
Restore cfg(splicing) to the fuzz check-cfg allow list and gate chanmon consistency splice opcodes on that cfg again. Without the cfg, those inputs stop before executing splice-specific operations.
4e6f262 to
d02194d
Compare
|
Fuzz failure is pre-existing |
0ab5882 to
ef0be5b
Compare
| // Connects this node from its tracked height to target_height, delivering | ||
| // each relevant chain callback to both ChainMonitor and ChannelManager. | ||
| fn connect_chain_range(&mut self, chain_state: &ChainState, target_height: u32) { | ||
| // Mining commands can advance the harness chain by more than one block. | ||
| // Transaction blocks must be connected explicitly so LDK learns about | ||
| // on-chain spends, while empty depth blocks still need best-block | ||
| // updates so CSV/CLTV-sensitive logic can run. This is the normal sync | ||
| // path, so both the raw ChainMonitor and the ChannelManager receive the | ||
| // callbacks and the node's tracked height advances to the target. | ||
| let mut height = self.height; | ||
| while height < target_height { | ||
| let mut next_height = height + 1; | ||
| while next_height <= target_height && chain_state.block_at(next_height).1.is_empty() { | ||
| next_height += 1; | ||
| } | ||
| if next_height > target_height { | ||
| // The rest of the range is empty. One best-block update to the | ||
| // final height is enough because LDK's Confirm API explicitly | ||
| // allows best_block_updated to skip intermediary blocks, and | ||
| // empty blocks have no transactions_confirmed calls whose chain | ||
| // order must be preserved. | ||
| height = target_height; | ||
| let (header, _) = chain_state.block_at(height); | ||
| self.monitor.best_block_updated(header, height); | ||
| self.node.best_block_updated(header, height); | ||
| break; | ||
| } | ||
| if next_height > height + 1 { | ||
| // Advance across the empty prefix before the next transaction | ||
| // block. Confirm::best_block_updated may skip intermediary | ||
| // blocks, so this compressed update still lets height-triggered | ||
| // LDK work run at the correct tip before the transaction | ||
| // confirmations are connected. | ||
| height = next_height - 1; | ||
| let (header, _) = chain_state.block_at(height); | ||
| self.monitor.best_block_updated(header, height); | ||
| self.node.best_block_updated(header, height); | ||
| } | ||
| height = next_height; | ||
| let (header, txn) = chain_state.block_at(height); | ||
| let txdata: Vec<_> = txn.iter().enumerate().map(|(i, tx)| (i + 1, tx)).collect(); | ||
| if !txdata.is_empty() { | ||
| // Transaction blocks need the explicit confirmation callback | ||
| // before the best-block update so watched spends are delivered | ||
| // in chain order before the node advances to that tip. | ||
| self.monitor.transactions_confirmed(header, &txdata, height); | ||
| self.node.transactions_confirmed(header, &txdata, height); | ||
| } | ||
| self.monitor.best_block_updated(header, height); | ||
| self.node.best_block_updated(header, height); | ||
| } | ||
| self.height = target_height; | ||
| } |
There was a problem hiding this comment.
Good improvement: the old sync_with_chain_state only notified self.node (ChannelManager), never self.monitor (ChainMonitor). This means ChannelMonitors weren't receiving transactions_confirmed / best_block_updated callbacks during normal fuzz-loop sync, which would have hidden any bugs in monitor chain-tracking logic. The new connect_chain_range fixing this is an important correctness improvement.
One behavioral change worth noting: the old code called best_block_updated for every individual block height. The new code batches consecutive empty blocks into a single best_block_updated call at the last empty height before the next tx block (or at the target). This is allowed by the Confirm trait ("May be skipped for intermediary blocks"), but it means height-triggered logic inside ChainMonitor/ChannelManager runs at fewer checkpoints than before. For a fuzzer seeking maximum coverage of height-sensitive paths (e.g., HTLC timeout detection), individual block delivery would exercise more code paths.
There was a problem hiding this comment.
You could also argue that skipping exercises different paths. I think in practice, we'll get a bit from both.
|
After a thorough re-examination of every hunk in this diff, I found no new issues beyond those already covered in my prior review pass. SummaryPrior issues (still applicable):
Verification notes from re-review:
|
ef0be5b to
4036d6a
Compare
4036d6a to
57960f1
Compare
|
When we are good with the changes, I can tack on one more commit that moves the chain/mempool code into |
| const DEFAULT_TX_CONFIRMATION_BLOCKS: u32 = 6; | ||
| // Single fuzz bytes can mine more than one block so a corpus entry does not | ||
| // need long runs of identical "mine one block" commands to reach CSV or CLTV | ||
| // boundaries. Mining is clipped below if unresolved HTLCs are near expiry. |
There was a problem hiding this comment.
What is meant by "clipped below"?
There was a problem hiding this comment.
That mining stops before anything would expire. Changed it to 'capped'
| // A mined transaction is considered deeply confirmed after this many blocks. | ||
| // This confirms the transaction in one block and then mines five empty depth | ||
| // blocks. | ||
| const DEFAULT_TX_CONFIRMATION_BLOCKS: u32 = 6; |
| events::Event::PaymentClaimed { payment_hash, .. } => { | ||
| if payments.payment_preimages.contains_key(&payment_hash) { | ||
| payments.claimed_payment_hashes.insert(payment_hash); | ||
| } | ||
| }, |
There was a problem hiding this comment.
Should this (moved from line 1943) be a separate commit?
There was a problem hiding this comment.
Hm yes, that is already living in #4635. Removed code from this PR.
| assert!( | ||
| current_tip < timeout_deadline, | ||
| "pending HTLC with expiry {} and timeout deadline {} is already unsafe at tip {}", | ||
| expiry, | ||
| timeout_deadline, | ||
| current_tip | ||
| ); |
There was a problem hiding this comment.
This fails with the following input:
printf '\x20\xdf\xdf\xb0\x0e\x10\x18\x10\x18\x10\x18\x30\xd8' | ./target/release/chanmon_consistency_target
Claude's succinct summary:
The persisted view can drift arbitrarily behind chain_state.tip because
Harness::mine_blocksnever re-checkpoints, so after a deferred reload the node builds an HTLC against a stale view with a deadline already below the tip. Fix it by either soft-capping the past-deadline branch insafe_mine_block_countto return 0, or — better — also capping mining against the lowest persisted-manager view so a future reload-then-send can't produce an immediately-unsafe HTLC.
There was a problem hiding this comment.
Thanks, this was the missing startup-sync piece from the FC branch that I forgot to cherry-pick. Reload now catches ChainMonitor up from the oldest monitor height and ChannelManager from its own best block before returning to the fuzz loop.
The repro passes now
| assert!( | ||
| self.mine_blocks(DEFAULT_TX_CONFIRMATION_BLOCKS) > 0, | ||
| "finish cannot mine pending mempool transactions without crossing an unresolved HTLC timeout deadline" | ||
| ); |
There was a problem hiding this comment.
This can be reached using:
printf '\xc9\xa6\xff\xde\xdf' | ./target/release/chanmon_consistency_target
There was a problem hiding this comment.
This one the the ones below contain splice opcodes. As that is currently not yet stable, I don't want to go into debugging that preferably 😬 Without the splicing cfg flag, those strings pass.
| @@ -2818,13 +3128,7 @@ | |||
| .funding_transaction_signed(&channel_id, &counterparty_node_id, signed_tx) | |||
| .unwrap(); | |||
There was a problem hiding this comment.
This is now reachable:
printf '\xa7\xa0\xff\xd5\xd8\xa0\xff' | ./target/release/chanmon_consistency_target
| panic!( | ||
| "It may take may iterations to settle the state, but it should not take forever" | ||
| ); |
There was a problem hiding this comment.
Able to hit this now but looks like it was preexisting, too.
printf '\xa7\xa0\xff\xd5\xd8\xa0\xff' | ./target/release/chanmon_consistency_target
| let is_quiescent_msg = msg.data.contains("already sent splice_locked, cannot RBF"); | ||
| if !msg.data.contains("Disconnecting due to timeout awaiting response") && !is_quiescent_msg | ||
| { | ||
| panic!("Unexpected disconnect case: {}", msg.data); |
There was a problem hiding this comment.
Looks like this can be hit prior to your change.
printf '\xc7\x3a\xa2\x32\x3a\xff\xff\xa7\x35\x33\x45\xff' | ./target/release/chanmon_consistency_target
We may need to expand the allowed reason.
57960f1 to
771e8a5
Compare
771e8a5 to
534bb75
Compare
|
Review comments addressed: https://github.com/lightningdevkit/rust-lightning/compare/57960f1..3b8a78e1d9c61e184e4ccb7ea9279b9f832df580
|
a2335d0 to
9458dc7
Compare
Route chanmon broadcasts through an explicit harness mempool so relay, mining, wallet updates, and chain delivery share one path. This lets splice, anchor, and claim transactions enter the mempool before mining. On restart, sync loaded monitors and managers from their own persisted best blocks so raw monitors catch up without rewinding ChannelManager state. Cap modeled mining before unresolved HTLC timeout deadlines and use the LDK anti-reorg depth for setup confirmations.
9458dc7 to
3b8a78e
Compare
This prepares
chanmon_consistencyfor force-close fuzzing by making its chain model closer to the environment LDK sees in normal operation.Force-close scenarios depend heavily on transaction timing: claims may be broadcast, replaced, confirmed, followed by additional claims, and later become spendable only after more blocks. The previous harness mostly folded transaction confirmation into sync-style actions, which made it harder to express those flows accurately and made future force-close coverage depend on shortcuts in the test harness.
The updated model gives the harness an explicit mempool and block-mining path. Broadcast transactions can be relayed into the modeled mempool, mined into harness blocks, and then replayed to both monitors and managers through chain callbacks. The harness also tracks confirmed UTXOs and wallet change so later splice, anchor, and claim transactions have a realistic view of what can be spent.
This should make upcoming force-close fuzzing changes easier to review: first establish a more faithful chain environment, then add the force-close-specific scenarios and invariants on top of it.