-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xtest 1013 hangs in normal world #2737
Comments
HiKey960, default config. |
Aha, so this is not caused by virtualization. I noticed that IBART fails in #2370 and was looking for cause. |
@jforissier does this default config include lockdep debug? Because, with that option enabled I am seeing the following in qemu-v8:
|
No, it doesn't.
This error does not mean that a locking problem has been detected. Here, there is not enough heap memory to allocate a buffer for lockdep to record a call stack. Unless there is a memory leak in the lockdep code, you may need to increase the heap size. |
Yes, I already found that... I didn't debugged it further, but this fault happens at 2-3rd run of |
Could be. This code has not been stressed a lot, I have essentially used it on single runs of xtest. Could it be that we create and destroy mutexes dynamically somewhere? IIRC I did not bother with cleaning things up when a mutex is destroyed :/ |
Looks like mbedtls does this all the way. At least I can see lots of calls to I just wanted to suggest to try enable lockdep, because it looks like a dead lock. By the way, I just encountered similar issue with virtualization support enabled. But in my case it got locked in test 2002. Can't reproduce it on vanilla qemu, thou. |
I don't think those functions do anything at all in our configuration.
Indeed, but unfortunately lockdep found nothing wrong :/ The issue might be with other synchronization objects than mutexes (condvars for instance)...
I suspect there is a race condition when we load the TAs. Both 1013 and 2002 load TAs concurrently. |
I enabled mutex debug and got the following log:
|
Each time a temporary bignum is allocated it's taken from this pool. Only one thread at a time can use the pool so it's important to release all references to pool when the work is done in order to allow another thread start using the pool. It would be interesting to know what's in |
Here you go:
backtrace:
|
The problem seems to be with thread 0 which doesn't release the memory pool. The refc.val suggests that it could be leakage of a few bigints, or perhaps there's some lock order issue (thread 0 trying to take the tee_ta_mutex while owning the memory pool). |
I concerned a bit by a logic in I suspect, that there may be race between this:
and this:
Assume that thread 0 owns the pool. Thread 1 calls Obviously, this is not the case for this particular issue, because we have refcount == 4. But it is possible race, right? |
You're right, there's a race. |
I don't think so, there is |
@jforissier, You're right, my mistake. |
@jenswi-linaro, I'm reproducing this on my setup with XEN on QEMU. But you can reproduce this without XEN. You need to invoke QEMU with 1 core:
I added some more tracing and found that this deadlock is caused by test 4007:
So, this is still another issue. |
Thanks, now I'm able to reproduce it. |
I can confirm that But, I'm not sure if it is related to |
It's the same with |
I think I've found the problem. A fix in #2747 |
@jforissier, did you tried that fix on a HiKey? I can confirm that #2747 fixes the issue with xtest 2002. However I can see that IBART still fails on xtest 1013 in #2370. But I can't reproduce that with QEMU either with |
(I'm using HiKey960, not HiKey) Yes, I tried it and could not reproduce the issue, but prior to that I also had a hard time reproducing it this morning even without #2747 :( so I cannot be really sure. I will leave xtest running in a loop tonight. |
I could be wrong here and my comment here isn't adding much value to the discussion as such, but this might be a very old issue that we're dealing with here. We have a separate Google spreadsheet from 2016 named " |
|
@jbech-linaro iirc that issue was resolved in linaro-swg/gen_rootfs#23 although we've moved away from https://github.com/linaro-swg/gen_rootfs. @jforissier can probably confirm. Not sure if there's something similar in buildroot's rootfs. |
Nightly test successful: no issues in 850 runs in a row (xtest with GP tests). |
Closing since I'm not able to reproduce. |
The text was updated successfully, but these errors were encountered: