-
Notifications
You must be signed in to change notification settings - Fork 880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix panic in drivers/overlay/encryption.go #2462
Conversation
@dani-docker this is part 1 of the solution. |
drivers/overlay/encryption.go
Outdated
rIP := net.ParseIP(rIPs) | ||
return updateNodeKey(lIP, aIP, rIP, spis, d.keys, newIdx, priIdx, delIdx), false | ||
}) | ||
|
||
if (newKey != nil && newIdx == -1) || | ||
(primary != nil && priIdx == -1) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my understanding is correct, there is mismatch between driver.key and driver.secMap.node[].spi when an error happens here. The driver.key is already appended in line 466, and that is only change to the driver state.
I wonder if it is better to revert driver.key state when there is error here? That way driver.key and driver.secMap.node[[].spi are kept consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@suwang48404 if we revert the append
we will not be able to find the key
in the slice when we get an update to delete the key
, resulting in a similar error :)
Issue - "index out of range" panic in drivers/overlay/encryption.go:539 due to a mismatch in indices between curKeys and spis due to case where updateKeys might bail out due to an error and not update the spis Fix - Reconfigure keys when there is a key update failure Signed-off-by: Arko Dasgupta <arko.dasgupta@docker.com>
98623d5
to
4420ee9
Compare
@suwang48404 took your advice on reconfiguring keys in case of an update failure reducing the down time by 24hr (2 x 12h rotation intervals ) |
Set KeyRotationInterval to 20s
Subsequent Logs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PTAL @dani-docker @selansen @euanh |
@arkodg I can see how this takes care of the panic, but what about functionality? |
@dani-docker |
@arkodg My other concern, although we got rid of the error, but the encryption will fail for that node until the next rotation, ie: 12 hours, (assuming |
@arkodg as we discussed offline, I made a build to verify the fix and my concerns. Here are the logs:
First key exchange when the connection resumes:
This causes memberlist decryption errors, example:
Finally on the next key exchange after the connection resume, the newly added key became the primary and things get happy
The gossip fix should be simple, just reset the whole networkDB key ring (similar to the approach done to the driver keys) Finally, I went through the code and I don't see where we reprogram IPSec after the calling |
Had a chat with @arkodg and we decided to limit the scope of this PR to avoid the panic and eventually (after 1-2 key rotations, ie 12-14 hours) the encryption will resume correctly. In a different PR we will address the following use cases:
Approving this PR for now |
@dani-docker thanks ! |
ping @euanh @suwang48404 PTAL |
LGTM |
Issue - "index out of range" panic in drivers/overlay/encryption.go:539
due to a mismatch in indices between curKeys and spis due to
case where updateKeys might bail out due to an error and
not update the spis
Context - This issue can be hit when a worker node gets disconnected from the manager, for more than 1 key rotation interval (12h), that in itself its something to worry about, and even though the workloads get shifted to a new node, once the worker node reconnects, due to the issue mentioned above, causes dockerd to panic
Fix - Reconfigure keys when there is a key update failure
Signed-off-by: Arko Dasgupta arko.dasgupta@docker.com