-
Notifications
You must be signed in to change notification settings - Fork 922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Silent failure to broadcast anchor tx, then crash #67
Comments
After a night's reflection, I notice something -- I feel like I have several times seen issues that feel like "lightningd is confused about keeping a TCP connection in sync with a lightning channel over that connection." The user interface treats the two together -- 'connect' accomplishes both -- but it feels like, especially in error cases or across daemon restarts, the daemon has trouble keeping all the state machines synced up with each other. |
There is no minimum required depth for the funds used to create a channel, so even 0-conf should work. As a matter of fact that's how I open most of my channels. Not broadcasting is strange, do you have any indication where it fails in the logs? Maybe we can add a JSON-RPC to rebroadcast or extract funding transactions, so we can manually trigger/release them. |
Just had the same crash after attempting to disconnect a node that was unreachable, and in state |
Turns out it was a bit different. calling |
OK, I have a patch which annotates the state information so getpeers will give you some more idea of what's happening. But: STATE_OPEN_WAIT_FOR_OPEN_WITHANCHOR : We're actually waiting for them to send the OPEN packet! You shouldn't see that state for more than 1 RTT. That's why we reverted to STATE_INIT on restart; we didn't receive anything from the peer. Though clearly we didn't do anything useful there either... The reconnect assert is another bug. I'll look at that too! Thanks! |
Interesting. @rustyrussell does this mean that the reason I never broadcast an anchor is that I was waiting for the remote host to send the OPEN packet? It's interesting that this state can persist -- does this imply that we got a TCP connection, sent a packet through it, and then ... silence? It seems like this ought to resolve quickly one way or another. |
I don't think we have regular pings, so it could be a TCP connection getting stuck forever. |
@cdecker I'll open another bug report for yours, one sec. |
db_forget_peer() was harmless, but we haven't been entered into the database yet anyway, and it asserted that we should have been STATE_CLOSED. Closes: #67 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This time I tried 'connect'ing the cat picture server using an input tx that only had one confirm. I don't know whether that was the cause, but I ended up with a connection but no channel, and lightningd never said anything about broadcasting an anchor tx.
(First thing-that-seems-like-a-bug: If lightningd is not opening a channel because the tx I supplied isn't deep enough yet, it would be helpful if it would tell me -- this protocol is very fiddly so good error reporting is key. There wasn't anything in the log, either.)
At this point 'getpeers' reported:
After a couple more confirms on the input tx, and nothing further from lightningd, I killed and restarted it. Still no anchor tx, but now 'getpeers' yielded:
So I decided to run 'connect' again just to see what I would get, yielding finally:
So I guess I ended up in an inconsistent state and then went bang. That part definitely seems like a bug!
The text was updated successfully, but these errors were encountered: