Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRIU hang on dump or restore #2032

Closed
biozit opened this issue Jan 3, 2023 · 10 comments · Fixed by #2602
Closed

CRIU hang on dump or restore #2032

biozit opened this issue Jan 3, 2023 · 10 comments · Fixed by #2602
Assignees

Comments

@biozit
Copy link

biozit commented Jan 3, 2023

I am trying to dump or restore any process (even a process that no exists) using sudo or root access:

root@test:~#  criu dump -t 1211114 -vvvvv
(00.000005) Version: 3.17.1 (gitid 0)
(00.000026) Running on osdftest.t2.ucsd.edu Linux 5.4.0-135-generic #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 2022 x86_64
(00.000041) File /run/criu.kdat does not exist
(00.000065) sockets: Probing sock diag modules
(00.000128) sockets: Done probing
(00.027251) Pagemap is fully functional
(00.027305) Found anon-shmem device at 1
(00.027339) Hugetlb size 2 Mb is supported but cannot get dev's number
(00.027358) Hugetlb size 1024 Mb is supported but cannot get dev's number
(00.027376) Reset 46767's dirty tracking
(00.027461)  ... done
(00.027502) Dirty track supported on kernel
(00.027589) Found task size of 7ffffffff000

kernel:

root@test:~# uname -a
Linux  5.4.0-135-generic #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

No error msg. It only hangs.

@avagin
Copy link
Member

avagin commented Jan 3, 2023

Run it under strace and attach logs here:

$ strace -s 1024 -f -o strace.log criu dump -t XXX -vvvv 

@biozit
Copy link
Author

biozit commented Jan 3, 2023

strace.log

@avagin
Copy link
Member

avagin commented Jan 4, 2023

Could you try out with this patch?

diff --git a/criu/kerndat.c b/criu/kerndat.c
index 5b567e79f..a2d4e2a89 100644
--- a/criu/kerndat.c
+++ b/criu/kerndat.c
@@ -599,7 +599,7 @@ static int kerndat_loginuid(void)
 static int kerndat_iptables_has_xtlocks(void)
 {
        int fd;
-       char *argv[4] = { "sh", "-c", "iptables -w -L", NULL };
+       char *argv[4] = { "sh", "-c", "iptables -n -w -L", NULL };
 
        fd = open("/dev/null", O_RDWR);
        if (fd < 0) {

@biozit
Copy link
Author

biozit commented Jan 4, 2023

Hi, now it is working! Thank you!

@biozit biozit closed this as completed Jan 5, 2023
@avagin avagin reopened this Jan 5, 2023
@avagin avagin self-assigned this Jan 5, 2023
@github-actions
Copy link

github-actions bot commented Feb 5, 2023

A friendly reminder that this issue had no activity for 30 days.

@biozit
Copy link
Author

biozit commented Feb 5, 2023

Please, could I help with this?

@github-actions
Copy link

github-actions bot commented Mar 8, 2023

A friendly reminder that this issue had no activity for 30 days.

@hanwen-flow
Copy link

I am seeing this too, but only on the first checkpoint after boot.

This is with

root@ip-172-31-7-139:~# criu --version
Version: 3.17.1
root@ip-172-31-7-139:~# uname -a
Linux ip-172-31-7-139 6.1.0-31-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.128-1 (2025-02-07) x86_64 GNU/Linux

the process isn't hung, it continues after 40 seconds,

(00.030876) Found anon-shmem device at 1
(00.031605) Hugetlb size 2 Mb is supported but cannot get dev's number
(00.031614) Reset 834's dirty tracking
(00.031662)  ... done
(00.031687) Dirty track supported on kernel
(00.033115) Found task size of 7ffffffff000
(40.088859) Restoring netdev veth idx 10
(40.090884) Dumping netns links
(40.090911)     LD: Got link 1, type 772
(40.090914)     LD: Got link 10, type 1
(40.091813) vdso: Parsing at 7ffd545e8000 7ffd545ea000
(40.091821) vdso: PT_LOAD p_vaddr: 0

@hanwen-flow
Copy link

This reproduces with the latest CRIU version and the fix to add '-n' to the iptables command worked.

@hanwen-flow
Copy link

@adrianreber - is there anything against submitting the patch in #2032 (comment) ?

avagin added a commit to avagin/criu that referenced this issue Feb 20, 2025
Resolving service names can be slow and it isn't needed here.

Fixes checkpoint-restore#2032

Signed-off-by: Andrei Vagin <avagin@google.com>
rst0git pushed a commit to rst0git/criu that referenced this issue Feb 20, 2025
When CRIU checks if the iptables command supports xtables locks, it
triggers hostname resolution, and this causes the dump/restore
commands to hang.

Fixes: checkpoint-restore#2032

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
rst0git pushed a commit that referenced this issue Feb 20, 2025
Resolving service names can be slow and it isn't needed here.

Fixes #2032

Signed-off-by: Andrei Vagin <avagin@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants