Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xtest 1013 hangs in normal world #2737

Closed
jforissier opened this issue Jan 14, 2019 · 29 comments
Closed

xtest 1013 hangs in normal world #2737

jforissier opened this issue Jan 14, 2019 · 29 comments

Comments

@jforissier
Copy link
Contributor

root@HiKey960:/ xtest 1013
Test ID: 1013
Run test suite with level=0

TEE test application started with device [(null)]
######################################################
#
# regression+gp
#
######################################################

* regression_1013 Test concurency with concurrent TA
o regression_1013.1 Using small concurrency TA
o regression_1013.1.1 Busy loop repeat 10000
D/TC:? 0 tee_ta_init_pseudo_ta_session:276 Lookup pseudo TA e13010e0-2ae1-11e5-896a-0002a5d5c51b
*D/TC:? 2 __wq_rpc:39 sleep thread 2 0x3f04e120 -1
*D/TC:? 0 load_elf:823 Lookup user TA ELF e13010e0-2ae1-11e5-896a-0002a5d5c51b (early TA)
*D/TC:? 1 __wq_rpc:39 sleep thread 1 0x3f04e120 -1
*D/TC:? 0 load_elf:823 Lookup user TA ELF e13010e0-2ae1-11e5-896a-0002a5d5c51b (Secure Storage TA)
D/TC:? 0 load_elf:823 Lookup user TA ELF e13010e0-2ae1-11e5-896a-0002a5d5c51b (REE)
D/TC:? 0 load_elf_from_store:791 ELF load address 0x40005000
D/TC:? 0 tee_ta_init_user_ta_session:1013 Processing relocations in e13010e0-2ae1-11e5-896a-0002a5d5c51b
D/TC:? 0 __wq_rpc:39 wake  thread 1 0x3f04e120 -3
D/TC:? 0 __wq_rpc:39 wake  thread 2 0x3f04e120 -3
*D/TC:? 1 tee_ta_init_pseudo_ta_session:276 Lookup pseudo TA e13010e0-2ae1-11e5-896a-0002a5d5c51b
*D/TC:? 2 __wq_rpc:39 sleep thread 2 0x3f04e120 -1
*D/TC:? 0 __wq_rpc:39 sleep thread 0 0x3f04e120 -1
*D/TC:? 1 load_elf:823 Lookup user TA ELF e13010e0-2ae1-11e5-896a-0002a5d5c51b (early TA)
D/TC:? 1 load_elf:823 Lookup user TA ELF e13010e0-2ae1-11e5-896a-0002a5d5c51b (Secure Storage TA)
D/TC:? 1 load_elf:823 Lookup user TA ELF e13010e0-2ae1-11e5-896a-0002a5d5c51b (REE)
D/TC:? 1 __wq_rpc:39 sleep thread 1 0x3f06ff40 -2

OpenOCD:
> halt
hi3660.cpu0 cluster 0 core 0 multi core
hi3660.cpu1 cluster 0 core 1 multi core
target halted in AArch64 state due to debug-request, current mode: EL1H
cpsr: 0x60000085 pc: 0xffff0000080a1318
MMU: enabled, D-Cache: enabled, I-Cache: enabled
hi3660.cpu2 cluster 0 core 2 multi core
target halted in AArch64 state due to debug-request, current mode: EL1H
cpsr: 0x60000085 pc: 0xffff0000080a1318
MMU: enabled, D-Cache: enabled, I-Cache: enabled
hi3660.cpu3 cluster 0 core 3 multi core
target halted in AArch64 state due to debug-request, current mode: EL1H
cpsr: 0x60000085 pc: 0xffff0000080a1318
MMU: enabled, D-Cache: enabled, I-Cache: enabled
hi3660.cpu4 cluster 1 core 0 multi core
target halted in AArch64 state due to debug-request, current mode: EL1H
cpsr: 0x60000085 pc: 0xffff0000080a1318
MMU: enabled, D-Cache: enabled, I-Cache: enabled
hi3660.cpu5 cluster 1 core 1 multi core
target halted in AArch64 state due to debug-request, current mode: EL1H
cpsr: 0x60000085 pc: 0xffff0000080a1318
MMU: enabled, D-Cache: enabled, I-Cache: enabled
hi3660.cpu6 cluster 1 core 2 multi core
target halted in AArch64 state due to debug-request, current mode: EL1H
cpsr: 0x60000085 pc: 0xffff0000080a1318
MMU: enabled, D-Cache: enabled, I-Cache: enabled
hi3660.cpu7 cluster 1 core 3 multi core
target halted in AArch64 state due to debug-request, current mode: EL1H
cpsr: 0x60000085 pc: 0xffff0000080a1318
MMU: enabled, D-Cache: enabled, I-Cache: enabled
target halted in AArch64 state due to debug-request, current mode: EL1H
cpsr: 0x60000085 pc: 0xffff0000080a1318
MMU: enabled, D-Cache: enabled, I-Cache: enabled
>

$ aarch64-linux-gnu-addr2line -f -e linux/vmlinux 0xffff0000080a1318
cpu_do_idle
/home/jerome/work/optee_build_hikey960/linux/arch/arm64/mm/proc.S:60
@jforissier
Copy link
Contributor Author

HiKey960, default config.

@lorc
Copy link
Contributor

lorc commented Jan 14, 2019

Aha, so this is not caused by virtualization. I noticed that IBART fails in #2370 and was looking for cause.

@lorc
Copy link
Contributor

lorc commented Jan 14, 2019

@jforissier does this default config include lockdep debug? Because, with that option enabled I am seeing the following in qemu-v8:

E/TC:? 0 do_range:229 error: res = 0xf0100001
E/TC:? 0 do_range:229 error: res = 0xf0100001
E/TC:? 0 do_range:229 error: res = 0xf0100001
E/TC:1 0 unw_get_kernel_stack:134 Out of memory
E/TC:1 0 lockdep_lock_acquire:85 lockdep: error 0xffff000c
E/TC:1 0 Panic at core/include/kernel/lockdep.h:86 <lockdep_lock_acquire>
E/TC:1 0 Call stack:
E/TC:1 0 0x000000000e1066fc
E/TC:1 0 0x000000000e10b494
E/TC:1 0 0x000000000e104b60
E/TC:1 0 0x000000000e104900
E/TC:1 0 0x000000000e160370
E/TC:1 0 0x000000000e159ef4
E/TC:1 0 0x000000000e15a5ac
E/TC:1 0 0x000000000e13f24c
E/TC:1 0 0x000000000e13fd98
E/TC:1 0 0x000000000e1402a8
E/TC:1 0 0x000000000e140368
E/TC:1 0 0x000000000e140894
E/TC:1 0 0x000000000e13ec94
E/TC:1 0 0x000000000e14932c
E/TC:1 0 0x000000000e14c22c
E/TC:1 0 0x000000000e14496c
E/TC:1 0 0x000000000e102080

(gdb) bt
#0 __do_panic (file=file@entry=0xe10f391 "core/include/kernel/lockdep.h", line=line@entry=86, func=func@entry=0xe10f309 <func.1904> "lockdep_lock_acquire", msg=msg@entry=0x0) at core/kernel/panic.c:33
#1 0x000000000e104bdc in lockdep_lock_acquire (graph=0xe112008 , id=236004352, owned=0xe112650 ) at core/include/kernel/lockdep.h:86
#2 mutex_lock_check (m=m@entry=0xe112400 <ltc_hash_mutex>) at core/arch/arm/kernel/mutex_lockdep.c:43
#3 0x000000000e10497c in __mutex_lock (fname=0x0, lineno=-1, m=0xe112400 <ltc_hash_mutex>) at core/arch/arm/kernel/mutex.c:25
#4 mutex_lock (m=m@entry=0xe112400 <ltc_hash_mutex>) at core/arch/arm/kernel/mutex.c:233
#5 0x000000000e16031c in find_hash (name=0xe13d6f9 "sha256") at core/lib/libtomcrypt/src/misc/crypt/crypt_find_hash.c:55
#6 0x000000000e159ea0 in tee_algo_to_ltc_hashindex (algo=, ltc_hashindex=ltc_hashindex@entry=0xe1718cc) at core/lib/libtomcrypt/src/tee_ltc_provider.c:152
#7 0x000000000e15a5d0 in crypto_hash_final (ctx=0xe126150, algo=, digest=digest@entry=0xe121310 "0$\021\016", len=len@entry=32) at core/lib/libtomcrypt/src/tee_ltc_provider.c:390
#8 0x000000000e13f22c in check_digest (h=0xe12d800) at core/arch/arm/kernel/ree_fs_ta.c:197
#9 ta_read (h=0xe12d800, data=, len=) at core/arch/arm/kernel/ree_fs_ta.c:232
#10 0x000000000e140290 in copy_to (state=0xe121310, dst=0xe121360, dst_size=4294967295, dst_offs=0, offs=, len=32) at core/arch/arm/kernel/elf_load.c:59
#11 0x000000000e1402fc in alloc_and_copy_to (p=p@entry=0xe171a18, state=state@entry=0xe123db0, offs=4296, len=896) at core/arch/arm/kernel/elf_load.c:77
#12 0x000000000e140828 in elf_load_body (state=0xe123db0, vabase=1074769920) at core/arch/arm/kernel/elf_load.c:588
#13 0x000000000e13ec28 in load_elf_from_store (utc=, ta_store=0xe138678 <__scattered_array_0ta_stores>, uuid=0xe121af0) at core/arch/arm/kernel/user_ta.c:797
#14 load_elf (utc=, uuid=0xe121af0) at core/arch/arm/kernel/user_ta.c:825
#15 tee_ta_init_user_ta_session (uuid=uuid@entry=0xe12d000, s=s@entry=0xe126410) at core/arch/arm/kernel/user_ta.c:1004
#16 0x000000000e1492d8 in tee_ta_init_session (open_sessions=0xe1264c8, sess=, uuid=0xe12d000, err=0xe171c24) at core/kernel/tee_ta_manager.c:511
#17 tee_ta_open_session (err=0xe171c24, err@entry=0xe171b84, sess=0xe171c28, sess@entry=0xe171b88, open_sessions=open_sessions@entry=0xe1264c8, uuid=uuid@entry=0xe12d000, clnt_id=clnt_id@entry=0xe12ab30,
cancel_req_to=0, cancel_req_to@entry=236104496, param=param@entry=0xe12b400) at core/kernel/tee_ta_manager.c:538
#18 0x000000000e14c1d8 in syscall_open_ta_session (dest=0x0, cancel_req_to=236104496, usr_param=0x40004e08, ta_sess=0xe171c28, ret_orig=0xe12b400) at core/tee/tee_svc.c:781

@jforissier
Copy link
Contributor Author

@jforissier does this default config include lockdep debug?

No, it doesn't.

Because, with that option enabled I am seeing the following in qemu-v8

...
E/TC:1 0 unw_get_kernel_stack:134 Out of memory
...

This error does not mean that a locking problem has been detected.

Here, there is not enough heap memory to allocate a buffer for lockdep to record a call stack. Unless there is a memory leak in the lockdep code, you may need to increase the heap size.

@lorc
Copy link
Contributor

lorc commented Jan 14, 2019

Yes, I already found that... I didn't debugged it further, but this fault happens at 2-3rd run of xtest. So maybe there is memory leak somewhere in lockdep code, because with lockdep debug disabled, I don't see any problems with insufficient memory.

@jforissier
Copy link
Contributor Author

So maybe there is memory leak somewhere in lockdep code

Could be. This code has not been stressed a lot, I have essentially used it on single runs of xtest. Could it be that we create and destroy mutexes dynamically somewhere? IIRC I did not bother with cleaning things up when a mutex is destroyed :/

@lorc
Copy link
Contributor

lorc commented Jan 14, 2019

Looks like mbedtls does this all the way. At least I can see lots of calls to mbedtls_mutex_init() / mbedtls_mutex_free(). But anyways, I'm not sure that this is related to your issue.

I just wanted to suggest to try enable lockdep, because it looks like a dead lock.

By the way, I just encountered similar issue with virtualization support enabled. But in my case it got locked in test 2002. Can't reproduce it on vanilla qemu, thou.

@jforissier
Copy link
Contributor Author

Looks like mbedtls does this all the way. At least I can see lots of calls to mbedtls_mutex_init() / mbedtls_mutex_free().

I don't think those functions do anything at all in our configuration.

I just wanted to suggest to try enable lockdep, because it looks like a dead lock.

Indeed, but unfortunately lockdep found nothing wrong :/ The issue might be with other synchronization objects than mutexes (condvars for instance)...

By the way, I just encountered similar issue with virtualization support enabled. But in my case it got locked in test 2002.

I suspect there is a race condition when we load the TAs. Both 1013 and 2002 load TAs concurrently.

@lorc
Copy link
Contributor

lorc commented Jan 15, 2019

I enabled mutex debug and got the following log:

* regression_2002 Concurrent stressing TCP iSocket API tests
o regression_2002.1 Stressing with 3 threads
D/TC:? 0 tee_ta_init_pseudo_ta_session:276 Lookup pseudo TA 873bcd08-c2c3-11e6-a937-d0bf9c45c61c
D/TC:? 0 load_elf:823 Lookup user TA ELF 873bcd08-c2c3-11e6-a937-d0bf9c45c61c (Secure Storage TA)
D/TC:? 1 __wq_rpc:37 sleep thread 1 0xe15f100 -1 core/kernel/tee_ta_manager.c:494
D/TC:? 0 load_elf:823 Lookup user TA ELF 873bcd08-c2c3-11e6-a937-d0bf9c45c61c (REE)
D/TC:? 0 load_elf_from_store:791 ELF load address 0x40005000
D/TC:? 0 tee_ta_init_user_ta_session:1013 Processing relocations in 873bcd08-c2c3-11e6-a937-d0bf9c45c61c
D/TC:? 0 __wq_rpc:37 wake  thread 1 0xe15f100 -3 core/kernel/tee_ta_manager.c:520
D/TC:? 1 tee_ta_init_pseudo_ta_session:276 Lookup pseudo TA 873bcd08-c2c3-11e6-a937-d0bf9c45c61c
D/TC:? 1 load_elf:823 Lookup user TA ELF 873bcd08-c2c3-11e6-a937-d0bf9c45c61c (Secure Storage TA)
D/TC:? 0 __wq_rpc:37 sleep thread 0 0xe15f100 -1 core/kernel/tee_ta_manager.c:100
D/TC:? 1 load_elf:823 Lookup user TA ELF 873bcd08-c2c3-11e6-a937-d0bf9c45c61c (REE)
D/TC:? 1 __wq_rpc:37 sleep thread 1 0xe17ff40 -2 lib/libutils/ext/mempool.c:87

@jenswi-linaro
Copy link
Contributor

Each time a temporary bignum is allocated it's taken from this pool. Only one thread at a time can use the pool so it's important to release all references to pool when the work is done in order to allow another thread start using the pool. It would be interesting to know what's in pool->owner. Usually this works well, this is the first time I've seen this error.

@lorc
Copy link
Contributor

lorc commented Jan 15, 2019

Here you go:

(gdb) p *pool
$1 = {size = 43008, last_offset = 1248, data = 236327648, release_mem = 0x0, mu = {spin_lock = 0, wq = {slh_first = 0x0}, state = -1, owner_id = -1}, cv = {spin_lock = 0, m = 0x0}, refc = {val = 4}, owner = 0}

backtrace:

(gdb) bt
#0  get_pool (pool=0xe17ff20) at lib/libutils/ext/mempool.c:87
#1  mempool_alloc (pool=0xe17ff20, size=size@entry=16) at lib/libutils/ext/mempool.c:148
#2  0x000000000e100134 in init (a=0xe187968 <stack_thread+14696>) at core/lib/libtomcrypt/src/mpi_desc.c:58
#3  0x000000000e12e6e4 in ltc_init_multi (a=0xe149bd1, a@entry=0xe187968 <stack_thread+14696>) at core/lib/libtomcrypt/src/math/multi.c:52
#4  0x000000000e130ff8 in rsa_exptmod (
    in=0xe17a244 " .\245\226\201ҕvgK\355\373{\032\375x\221Eh\262\317J\230\337$\363\062\206M\027@ͯ\307\062\274\227\001\023\331jJ\257\352\301/\306]\277n\221\001g\345&\361i[\017\247\354Yh\253ws\327\177\340\274?\311\002\071⋩\255\340\215\307\b\307F<\215\263\002\b\334x\200B\376\213\233\250#G\320\332D\243\027uIi{\211\301\327\362RA\006\235\375\350\343RGf>>,\323\061\326-\277\363\360\203\365lg\237\255\017{}", inlen=256, 
    out=0xe179f90 " ", outlen=0xe187a08 <stack_thread+14856>, which=0, key=0xe187c18 <stack_thread+15384>) at core/lib/libtomcrypt/src/pk/rsa/rsa_exptmod.c:87
#5  0x000000000e1319b8 in rsa_verify_hash_ex (
    sig=sig@entry=0xe17a244 " .\245\226\201ҕvgK\355\373{\032\375x\221Eh\262\317J\230\337$\363\062\206M\027@ͯ\307\062\274\227\001\023\331jJ\257\352\301/\306]\277n\221\001g\345&\361i[\017\247\354Yh\253ws\327\177\340\274?\311\002\071⋩\255\340\215\307\b\307F<\215\263\002\b\334x\200B\376\213\233\250#G\320\332D\243\027uIi{\211\301\327\362RA\006\235\375\350\343RGf>>,\323\061\326-\277\363\360\203\365lg\237\255\017{}", 
    siglen=siglen@entry=256, 
    hash=hash@entry=0xe17a224 "/\227v?(jQ\r\377q\245_q\a\271\262[\365>Ӓ\357\236-\001\307\373\033\323 .\245\226\201ҕvgK\355\373{\032\375x\221Eh\262\317J\230\337$\363\062\206M\027@ͯ\307\062\274\227\001\023\331jJ\257\352\301/\306]\277n\221\001g\345&\361i[\017\247\354Yh\253ws\327\177\340\274?\311\002\071⋩\255\340\215\307\b\307F<\215\263\002\b\334x\200B\376\213\233\250#G\320\332D\243\027uIi{\211\301\327\362RA\006\235\375\350\343RGf>>,\323\061\326-\277\363\360\203\365lg\237\255\017{}", hashlen=hashlen@entry=32, padding=1, hash_idx=3, saltlen=saltlen@entry=18446744073709551615, 
    stat=stat@entry=0xe187c08 <stack_thread+15368>, key=key@entry=0xe187c18 <stack_thread+15384>) at core/lib/libtomcrypt/src/pk/rsa/rsa_verify_hash.c:111
#6  0x000000000e12914c in crypto_acipher_rsassa_verify (algo=1879066672, key=key@entry=0xe187c90 <stack_thread+15504>, salt_len=salt_len@entry=-1, 
    msg=msg@entry=0xe17a224 "/\227v?(jQ\r\377q\245_q\a\271\262[\365>Ӓ\357\236-\001\307\373\033\323 .\245\226\201ҕvgK\355\373{\032\375x\221Eh\262\317J\230\337$\363\062\206M\027@ͯ\307\062\274\227\001\023\331jJ\257\352\301/\306]\277n\221\001g\345&\361i[\017\247\354Yh\253ws\327\177\340\274?\311\002\071⋩\255\340\215\307\b\307F<\215\263\002\b\334x\200B\376\213\233\250#G\320\332D\243\027uIi{\211\301\327\362RA\006\235\375\350\343RGf>>,\323\061\326-\277\363\360\203\365lg\237\255\017{}", msg_len=32, 
    sig=0xe17a244 " .\245\226\201ҕvgK\355\373{\032\375x\221Eh\262\317J\230\337$\363\062\206M\027@ͯ\307\062\274\227\001\023\331jJ\257\352\301/\306]\277n\221\001g\345&\361i[\017\247\354Yh\253ws\327\177\340\274?\311\002\071⋩\255\340\215\307\b\307F<\215\263\002\b\334x\200B\376\213\233\250#G\320\332D\243\027uIi{\211\301\327\362RA\006\235\375\350\343RGf>>,\323\061\326-\277\363\360\203\365lg\237\255\017{}", sig_len=256)
    at core/lib/libtomcrypt/src/tee_ltc_provider.c:951
#7  0x000000000e118b84 in shdr_verify_signature (shdr=0xe17a210, shdr@entry=0x0) at core/crypto/signed_hdr.c:72
#8  0x000000000e104324 in ta_open (uuid=0xe175120, h=0xe187e40 <stack_thread+15936>) at core/arch/arm/kernel/ree_fs_ta.c:104
#9  0x000000000e103724 in load_elf_from_store (utc=<optimized out>, ta_store=0xe149f70 <__scattered_array_0ta_stores>, uuid=0xe175120) at core/arch/arm/kernel/user_ta.c:707
#10 load_elf (utc=<optimized out>, uuid=0xe175120) at core/arch/arm/kernel/user_ta.c:825
#11 tee_ta_init_user_ta_session (uuid=uuid@entry=0xe187f60 <stack_thread+16224>, s=s@entry=0xe17b550) at core/arch/arm/kernel/user_ta.c:1004
#12 0x000000000e1176e8 in tee_ta_init_session (open_sessions=0xe15f050 <tee_open_sessions>, sess=<synthetic pointer>, uuid=0xe187f60 <stack_thread+16224>, err=0xe187f54 <stack_thread+16212>)
    at core/kernel/tee_ta_manager.c:519
#13 tee_ta_open_session (err=err@entry=0xe187f54 <stack_thread+16212>, sess=sess@entry=0xe187f58 <stack_thread+16216>, open_sessions=open_sessions@entry=0xe15f050 <tee_open_sessions>, 
    uuid=uuid@entry=0xe187f60 <stack_thread+16224>, clnt_id=clnt_id@entry=0xe187f70 <stack_thread+16240>, cancel_req_to=cancel_req_to@entry=4294967295, param=param@entry=0xe187fa8 <stack_thread+16296>)
    at core/kernel/tee_ta_manager.c:546
#14 0x000000000e112c98 in entry_open_session (smc_args=0xe188030 <stack_thread+16432>, num_params=6, arg=0xf001000) at core/arch/arm/tee/entry_std.c:304
#15 tee_entry_std (smc_args=0xe188030 <stack_thread+16432>) at core/arch/arm/tee/entry_std.c:549
#16 0x000000000e107094 in __thread_std_smc_entry (args=0xf001000) at core/arch/arm/kernel/thread.c:626

@jenswi-linaro
Copy link
Contributor

The problem seems to be with thread 0 which doesn't release the memory pool. The refc.val suggests that it could be leakage of a few bigints, or perhaps there's some lock order issue (thread 0 trying to take the tee_ta_mutex while owning the memory pool).

@lorc
Copy link
Contributor

lorc commented Jan 15, 2019

I concerned a bit by a logic in get_pool() and put_pool().

I suspect, that there may be race between this:

	if (refcount_inc(&pool->refc)) {
		if (pool->owner == thread_get_id())
			return;
---->
		refcount_dec(&pool->refc);
	}

and this:

---->
	if (refcount_dec(&pool->refc)) {
		mutex_lock(&pool->mu);

		pool->owner = THREAD_ID_INVALID;
		condvar_signal(&pool->cv);
...

Assume that thread 0 owns the pool. Thread 1 calls get_pool() and increments refcount. In the same time thread 0 calls put_pool() to put the last reference. So it calls refcount_dec() but it does not enter if () block because thread 1 holding other reference. Then thread 0 decreases reference back to 0. Now we have pool with owner, but without references.

Obviously, this is not the case for this particular issue, because we have refcount == 4. But it is possible race, right?

@jenswi-linaro
Copy link
Contributor

You're right, there's a race.

@jenswi-linaro
Copy link
Contributor

There's a fix for this race at #2741

Back to the error in this issue. How is this recreated? @lorc you seem to be able to do this with QEMU v8 and you're using MPA instead of the default MPI. Anything more?

@jforissier
Copy link
Contributor Author

[...] you're using MPA instead of the default MPI.

I don't think so, there is mpi_desc.c in the call stack above (#2737 (comment)).

@jenswi-linaro
Copy link
Contributor

@jforissier, You're right, my mistake.

@lorc
Copy link
Contributor

lorc commented Jan 16, 2019

@jenswi-linaro, I'm reproducing this on my setup with XEN on QEMU. But you can reproduce this without XEN. You need to invoke QEMU with 1 core:

QEMU_SMP=1 make run-only.

I added some more tracing and found that this deadlock is caused by test 4007:

o regression_4007.14 Generate DSA key 512 bits
D/TC:? 0 tee_ta_get_session:197 ----
D/TC:? 0 tee_ta_try_set_busy:101 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_clear_busy:145 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_put_session:168 ---- ses = 0xe1742c0
D/TC:? 0 tee_ta_get_session:197 ----
D/TC:? 0 tee_ta_try_set_busy:101 ---- ctx = 0xe174260
D/TC:? 0 put_pool:100 refcnt = 26
D/TC:? 0 put_pool:100 refcnt = 26
D/TC:? 0 put_pool:100 refcnt = 27
D/TC:? 0 put_pool:100 refcnt = 26
D/TC:? 0 put_pool:100 refcnt = 26
[....]
D/TC:? 0 put_pool:100 refcnt = 8
D/TC:? 0 put_pool:100 refcnt = 7
D/TC:? 0 put_pool:100 refcnt = 6
D/TC:? 0 put_pool:100 refcnt = 5
D/TC:? 0 tee_ta_clear_busy:145 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_put_session:168 ---- ses = 0xe1742c0
D/TC:? 0 tee_ta_get_session:197 ----
D/TC:? 0 tee_ta_try_set_busy:101 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_clear_busy:145 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_put_session:168 ---- ses = 0xe1742c0
D/TC:? 0 tee_ta_get_session:197 ----
D/TC:? 0 tee_ta_try_set_busy:101 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_clear_busy:145 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_put_session:168 ---- ses = 0xe1742c0
D/TC:? 0 tee_ta_get_session:197 ----
D/TC:? 0 tee_ta_try_set_busy:101 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_clear_busy:145 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_put_session:168 ---- ses = 0xe1742c0
D/TC:? 0 tee_ta_get_session:197 ----
D/TC:? 0 tee_ta_try_set_busy:101 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_clear_busy:145 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_put_session:168 ---- ses = 0xe1742c0
D/TC:? 0 tee_ta_get_session:197 ----
D/TC:? 0 tee_ta_try_set_busy:101 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_clear_busy:145 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_put_session:168 ---- ses = 0xe1742c0
D/TC:? 0 tee_ta_get_session:197 ----
D/TC:? 0 tee_ta_try_set_busy:101 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_clear_busy:145 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_put_session:168 ---- ses = 0xe1742c0
  regression_4007.14 OK

So, this is still another issue.

@jenswi-linaro
Copy link
Contributor

Thanks, now I'm able to reproduce it.

@lorc
Copy link
Contributor

lorc commented Jan 16, 2019

I can confirm that
while true; do xtest 4007; xtest 2002; done; leads to deadlock.
xtest 4007 corrupts state of mempool and xtest 2002 acts as a canary.

But, I'm not sure if it is related to xtest 1013.

@jenswi-linaro
Copy link
Contributor

It's the same with xtest 1013 when I'm testing. I'm starting to suspect memory leakage in mbedtls, but I have no evidence yet.

@jenswi-linaro
Copy link
Contributor

I think I've found the problem. A fix in #2747

@lorc
Copy link
Contributor

lorc commented Jan 17, 2019

@jforissier, did you tried that fix on a HiKey? I can confirm that #2747 fixes the issue with xtest 2002.

However I can see that IBART still fails on xtest 1013 in #2370. But I can't reproduce that with QEMU either with QEMU_SMP=1 or with QEMU_SMP=4.

@jforissier
Copy link
Contributor Author

jforissier commented Jan 17, 2019

@jforissier, did you tried that fix on a HiKey? I can confirm that #2747 fixes the issue with xtest 2002.

(I'm using HiKey960, not HiKey) Yes, I tried it and could not reproduce the issue, but prior to that I also had a hard time reproducing it this morning even without #2747 :( so I cannot be really sure. I will leave xtest running in a loop tonight.

@jbech-linaro
Copy link
Contributor

I could be wrong here and my comment here isn't adding much value to the discussion as such, but this might be a very old issue that we're dealing with here. We have a separate Google spreadsheet from 2016 named "HiKey 1013 crash". I've opened it up for public viewing, see here. From what I recall we never sorted this out back then and to be honest I had forgotten about until I stumbled upon it a couple of days ago when cleaning up my Google drive.

@jenswi-linaro
Copy link
Contributor

0x40004000 sounds suspiciously much like something close to the temporary privileged mapping used while in user mode.

@vchong
Copy link
Contributor

vchong commented Jan 18, 2019

@jbech-linaro iirc that issue was resolved in linaro-swg/gen_rootfs#23 although we've moved away from https://github.com/linaro-swg/gen_rootfs. @jforissier can probably confirm. Not sure if there's something similar in buildroot's rootfs.

@jforissier
Copy link
Contributor Author

jforissier commented Jan 18, 2019

I will leave xtest running in a loop tonight.

Nightly test successful: no issues in 850 runs in a row (xtest with GP tests).

@jforissier
Copy link
Contributor Author

Closing since I'm not able to reproduce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants