-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] pot-stop, called from pot-start, might operate on the wrong jail. #202
Comments
Hi. Thanks again for reporting. I see two possible solutions:
I honestly prefer the first solution, to avoid further race conditions with post-stop and pre-start hooks. |
An additional observation related to this issue: I've seen a case where parallel cleanup of one pot destroyed an epair interface a second time, which had been re-created to be used by a new pot in the meantime - timing was unfortunate, so the new pot ended up running with its epair interface removed from it post-start. So yes, clean locking is the way forward here. |
@pizzamig I found another race in the kernel (13.0 and 13.1) Looking at the kernel sources, I find it unlikely that the code will race-free anytime soon (it looks more like a rewrite would be required). This means we need some really heavy handed locking. I created a script to trigger the race mentioned above using pot (this is similar to what happens when using nomad). We can use this to validate our locking solution: #!/bin/sh
POT="bin/pot"
CLONENUM=5
CLONES="$(jot $CLONENUM)"
if ! $POT show -p cloneme >/dev/null 2>/dev/null; then
$POT create -p cloneme -b 13.0 -t single -N public-bridge
$POT snapshot -p cloneme
fi
for i in $CLONES; do
if ! $POT show -p clone$i >/dev/null 2>/dev/null; then
$POT clone -p clone$i -P cloneme
$POT set-attr -A persistent -V no -p clone$i
$POT set-cmd -c "sleep $(jot -r 1 0.1 0.9)" -p clone$i
fi
done
sync
while true; do
for i in $CLONES; do
($POT start clone$i&
$POT stop clone$i)&
pids="$pids $!"
done
wait $pids
pids=""
done |
Thanks for reporting and for the script, it's a good way to reproduce it. |
I think the key is to clearly identify all resources belonging to one pot and then lock per pot/resource group. I can support at some point in April. |
this is how I envision the solution: first split Now, This solution should make the `/tmp/pot_stopped{_pname}' file redundant. |
I assume all of that would be per pot, right? I'm always a bit torn, as for some of these things having a daemon would make life easier (and also better in terms of performance), but losing the ability to chickly hack the scriptwork is very useful. I'll see if I have a chance to do some PoC work on this later in April. |
@pizzamig I can't believe that three months passed since - I started a review on the feature to finally push it forward. |
Describe the bug
Calls to
pot-stop <jailname>
from pot-start might operate on the wrong jail.This is part of why #200 fixed nomad problems, as a jail that was restarted, was partially dismantled by pot-stop called from pot-start. With #200 this is less of a problem, as each pot gets a truly unique name. It might be a problem with non-nomad jails though.
It seems like that this might also be the underlying reason why I perceived behavior that made me create #201.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The text was updated successfully, but these errors were encountered: