-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transaction that is added to a node's pool may not be propagated to the pools of its peer nodes #7081
Comments
I added the node execution arguments into the issue. This is using the sequenced node pool |
Hi @nano-adhara, I see where the issue is coming from, Besu has caches to remember txs exchanges with other peers, basically because we want to avoid to resend a tx multiple time to a peer (actually also the p2p protocol state that peers resending txs should be disconnected) and we also want to avoid reprocessing a tx that we have already seen. Those caches are quite basic a the moment, so they could be improved to better handle scenarios like the one you are reporting, and I could take a look at them, trying to remove from these cache, valid txs that are dropped from the txpool, this needs to be done with care to avoid being exposed to attacks. As last a question about your network, is it common for a valid tx to stay in the pool for so long that it could be evicted by the timer? |
Hello @fab-10, So as an answer to your last question, is not happening that much, once a month, but the real problem with that is that whenever happens, the only solution is to restart the besu clients. Ill include Fernando @chookly314 and Coenie @coeniebeyers in the topic to move forward in the conversation. |
Hi @nano-adhara - sorry this hasn't moved. I am wondering if this may be an issue that we could work with @matthew1001 as its in QBFT and the sequenced pool. Was there other context needed from the folks you tagged in your comment? |
I'd be happy to look into this at some point if @fab-10 doesn't get there first. I'm a bit focused on bonsai archive and empty block period for the time being so might not be that soon for me to take a look |
No context from the other folks i tagged. Just did so they have all the context as well |
@matthew1001 Hope you are doing well? I was wondering if you could advise when you would be able to look into the issue above? Kind regards |
sorry for being slow on this one, but I can start working on it, and hope to have something by the next week. |
Hi @fab-10 , thank you very much, much appreciated! Regards |
Hi @fab-10 hope you are doing well? Was just wondering if still possible for something this week? Kind regards, Louis |
Hi @louisrushin, still not done, definitely not for this week |
it is on main now, so you can start trying the develop version, before the next release is out |
Thank you very much! |
@fab-10 after testing latest develop (main branch) functionality that includes the PR #7755, we identified that the issue is still not solved. We reviewed the code and execute the tests as described below. The issue looks like is related more on how transactions are being shared between nodes (seenTransactions and transactionsToSend maps on PeerTransactionTracker file) This is a detailed description on the test we did. SetupWe used latest main commit and started a setup with two nodes. The network is a private besu network using QBFT protocol using sequenced transaction pool:
Client node configuration:
Validator node configuration:
Genesis.json file:
Client node program arguments:
Validator node program arguments:
Test
Client node pool:
Validator node pool:
NOTE: the isReceivedFromLocalSource confirms that transaction was shared from client peer.
Client node pool:
Validator node pool:
Client node pool:
Validator node pool:
Expected outcomeTransaction should be shared to the validator node as the transaction was dropped due to a time eviction, but client node does not send the transaction, so the validator node is not aware of this transaction. The transaction cannot be executed so a nonce gap is perpetualy created for that address. This problem is not solved until all nodes are restarted with the current version |
@fab-10 @matthew1001 what is the process now? This ticket is closed, but the problem is not solved as demonstrated by @nano-adhara above. Can you give us some guidance on how we should proceed. We have a live environment with this problem in which is getting a lot of attention. |
@petermunnings I am reopening the ticket, and will take a look at your report to reproduce it and fix in the next days, if in the meantime if you already have in idea of the solution, you can propose a PR for review. |
Hi @fab-10 just following up, any progress on this? |
thanks to @nano-adhara for the detailed steps to reproduce the issue, I was able to reproduce the issue and submit a fix in PR #7858. One important note on an edge case: the eviction task is triggered every minute, so depending on the start time of each node, there could be 1 minute delay between the eviction of a tx between 2 nodes in the worst case, this means that if you try to resubmit the tx just after it has evicted on one node, it is possible that the same tx isn't still evicted on every node, so I suggest to wait 1 minute to resend the tx after it has been evicted. |
let me review the branch and let you know the outcome. thank you @fab-10 |
hello @fab-10. I was able to test the scenario and now its working 👍. i was considering the edge case you provided. i was able to test it aswell and i have some doubts that id like to know the answer on how to proceed when we apply this besu patch.
|
great, I will merge the PR then
at the moment there is way to trigger the eviction manually.
yes it will be the same tx (same hash), otherwise in case any of the tx field has changed then it will be a different one, and it could replace the previous one, according to the transaction replacement rule.
Yes the validator could mine both, and the one in the node will be removed as soon as the new block will be imported.
Correct, with this fix, if the tx has been evicted by all pools, then after re-sending it will be added again to every pool |
Introduction
In certain situations, a transaction that is added to a node's pool (of sequenced type) may not be propagated to the pools of its peer nodes, resulting in the transaction being present only in the transaction receptor node's pool rather than being distributed across all nodes' pools.
Summary
We are using the following five nodes setup for a private Besu network with QBFT protocol:
Users connect to the non-validator node to send transactions, and it propagates them to the validator so that they are included in new blocks.
Starting from a scenario where all the nodes have a transaction (let’s call it “tx_a”) in the pool, as shown in the first image.
If the transaction is dropped from the pool of the nodes for any reason, the pools of the nodes are empty as we can see in the second image.
Then, if the transaction is sent again to the non-validator, it is added to its pool, but it is not added to the validator pool as we can see in the third image.
Detailed description
We have observed 2 problematic scenarios related to how the nodes handle their caches:
Both scenarios are problematic if the transaction is valid but the node dropped it from the pool before mining the transaction for any reason, e.g. exceeding
tx-pool-retention-hours
limit.We were able to add that transaction again to the validator’s pool only by restarting all the nodes and sending the transaction again.
It seems to be a bug by which internal caches are not updated properly when transactions are dropped/evicted from the pool, not allowing those transactions to be added into new blocks as they are not propagated by the non-validator node and they are rejected by the validator as well.
Versions
This issue occurs in the latest version of the system, as of the time of this writing (v24.3.3). We believe that this occurs in previous versions as well.
Steps to reproduce
This is an explanation of how we were able to reproduce the bug:
1 . Set up at least one non-validator node and at least one validator node with a QBFT network configuration.
3. Send a new transaction using a higher nonce than the expected one, so a nonce gap is created, and that transaction will remain in the pool of both the non-validator and validator nodes.
4. Force the transaction to be dropped from the pool (for example waiting for the tx-pool-retention-hours to expire).
5. Resend the transaction to the non-validator node and you will be able to check that it won’t appear in the validator’s pool (as it does not get propagated by the non-validator nor accepted by the validator because it gets filtered due to be categorized as an already seen transaction).
Expected behavior
✅ = already happening
❌ = not happening
Nodes execution arguments
Nodes config
Non-validator node
Validator node config
genesis.json
The text was updated successfully, but these errors were encountered: