Frequency Testnet `fatxpool` Issue: `ready_at_internal` Timeout

by Lucas 64 views
Iklan Headers

Hey everyone! We've been diving deep into some perplexing issues on the Frequency Testnet Parachain (and even seeing similar behavior on Mainnet) related to the fatxpool and its ready_at_internal function. We're experiencing some chain forking and block formation delays, and it seems to be tied to timeouts when getting the transaction pool view. Let's break down what's happening, what we suspect, and how we're tackling it.

Understanding the Problem: Forking and Delays

So, what's the deal? The main problem we're facing is that our chain is occasionally forking like crazy, about once every 80 blocks or so. This leads to significant delays in block formation, with time to inclusion jumping from the usual 18 seconds to upwards of 84 seconds! This is a major headache, and we need to get to the bottom of it.

We've observed this issue on the Frequency Testnet Parachain, specifically on Paseo, and it seems to be cropping up on Mainnet as well. Now, Mainnet is a bit more complex due to independent collators and higher activity, and it's currently running on version 2503. However, the testnet collators are running on build 2506 (6fd693e6d9cfa46cd2acbcb41cd5b0451a62d67c) and use a fork-aware transaction pool exclusively. We're operating with a 6-second block time using async backing, and our peering status is solid – the two collators are happily peering with each other and have around 40 relay chain peers each. This rules out some of the usual suspects, like network connectivity issues.

The behavior persists regardless of transaction load. We've seen it happen with no transactions at all, which makes it even more puzzling. This eliminates the possibility of high transaction volume as the sole cause. The consistency of the issue across varying loads suggests a more fundamental problem within the system's core logic, specifically within the transaction pool management. It's like trying to build a house on shaky foundations; the inconsistencies at the base level are causing ripples further up the chain, leading to the forking and delays we're observing.

The Prime Suspect: Transaction Pool View Timeouts

Our prime suspect in this mystery is the time it takes to get the view of the transaction pool for a specific block. We suspect that this process is timing out, leading to the forking and delays. But the big question is: why? What's causing these timeouts? This is the heart of the problem we're trying to solve.

To further understand this, let's talk about the transaction pool's role. The transaction pool is essentially a waiting room for transactions. Before a transaction can be included in a block, it hangs out in the pool, waiting its turn. When a collator is ready to produce a new block, it needs to get a consistent view of the transactions in the pool. This view helps the collator decide which transactions to include in the block and in what order. If getting this view takes too long, it throws a wrench in the entire process. The timeout acts as a safety net, preventing the collator from waiting indefinitely for the pool view. However, when this net is triggered too often, it leads to the very issues we're seeing – chain forking and block production delays.

The timeout itself is a symptom, not the root cause. It’s the system’s way of saying, “Hey, I’ve been waiting too long, something’s not right.” The challenge now is to pinpoint what that “something” is. Is it a bottleneck in the pool's data structure? Is there a process that's locking the pool for extended periods? Or perhaps there's an inefficiency in the way the pool view is being constructed. Answering these questions will help us identify the underlying cause and implement an effective solution.

Diving into the Logs: A Key Snippet

Let's look at a key log snippet that highlights the issue:

2025-08-15 19:22:18.480  INFO tokio-runtime-worker sc_basic_authorship::basic_authorship: [Parachain] 🙌 Starting consensus session on top of parent 0xe6d60dcd9ab92569eaffa07547102220811327db79df55bf7811155b948f31fc (#5703113)
2025-08-15 19:22:18.482 DEBUG tokio-runtime-worker txpool: [Parachain] fatp::ready_at_with_timeout at=0xe6d60dcd9ab92569eaffa07547102220811327db79df55bf7811155b948f31fc timeout=224.855789ms
2025-08-15 19:22:18.482 DEBUG tokio-runtime-worker txpool: [Parachain] fatp::ready_at_internal at=0xe6d60dcd9ab92569eaffa07547102220811327db79df55bf7811155b948f31fc pending_keys=[0xe6d60dcd9ab92569eaffa07547102220811327db79df55bf7811155b948f31fc, 0x51878f33c49a75f7dfecd261a73828449b44c4ba70b7c66ed3f881586417d088]
2025-08-15 19:22:18.482 DEBUG tokio-runtime-worker txpool: [Parachain] fatp::ready_at_light at=0xe6d60dcd9ab92569eaffa07547102220811327db79df55bf7811155b948f31fc
2025-08-15 19:22:18.482 DEBUG tokio-runtime-worker txpool: [Parachain] fatp::ready_at_light -> light at=0xe6d60dcd9ab92569eaffa07547102220811327db79df55bf7811155b948f31fc best_view_hash=0x3e9cf892a5b538d9d7ba6534520fe558b03d673845a9ea953a445ec823ea0213 before_count=0 to_be_removed=2 after_count=0 duration=98.138µs
2025-08-15 19:22:18.707  WARN tokio-runtime-worker txpool: [Parachain] Timeout fired waiting for transaction pool at block. Proceeding with production. at=0xe6d60dcd9ab92569eaffa07547102220811327db79df55bf7811155b948f31fc
2025-08-15 19:22:18.707 DEBUG tokio-runtime-worker txpool: [Parachain] fatp::report_invalid len=0
2025-08-15 19:22:18.708  INFO tokio-runtime-worker sc_basic_authorship::basic_authorship: [Parachain] 🎁 Prepared block for proposing at 5703114 (227 ms) hash: 0x9e428d8e276e6acfc5d661fc9ee226e755090281379f714bc46515731655ae3d; parent_hash: 0xe6d6…31fc; end: NoMoreTransactions; extrinsics_count: 2

This snippet reveals the sequence of events leading up to the timeout. The collator starts a consensus session, then calls fatp::ready_at_with_timeout. This triggers fatp::ready_at_internal, which identifies pending keys. The system then attempts to get a