xtest 1013 hangs in normal world #2737

jforissier · 2019-01-14T15:34:24Z

root@HiKey960:/ xtest 1013
Test ID: 1013
Run test suite with level=0

TEE test application started with device [(null)]
######################################################
#
# regression+gp
#
######################################################

* regression_1013 Test concurency with concurrent TA
o regression_1013.1 Using small concurrency TA
o regression_1013.1.1 Busy loop repeat 10000
D/TC:? 0 tee_ta_init_pseudo_ta_session:276 Lookup pseudo TA e13010e0-2ae1-11e5-896a-0002a5d5c51b
*D/TC:? 2 __wq_rpc:39 sleep thread 2 0x3f04e120 -1
*D/TC:? 0 load_elf:823 Lookup user TA ELF e13010e0-2ae1-11e5-896a-0002a5d5c51b (early TA)
*D/TC:? 1 __wq_rpc:39 sleep thread 1 0x3f04e120 -1
*D/TC:? 0 load_elf:823 Lookup user TA ELF e13010e0-2ae1-11e5-896a-0002a5d5c51b (Secure Storage TA)
D/TC:? 0 load_elf:823 Lookup user TA ELF e13010e0-2ae1-11e5-896a-0002a5d5c51b (REE)
D/TC:? 0 load_elf_from_store:791 ELF load address 0x40005000
D/TC:? 0 tee_ta_init_user_ta_session:1013 Processing relocations in e13010e0-2ae1-11e5-896a-0002a5d5c51b
D/TC:? 0 __wq_rpc:39 wake  thread 1 0x3f04e120 -3
D/TC:? 0 __wq_rpc:39 wake  thread 2 0x3f04e120 -3
*D/TC:? 1 tee_ta_init_pseudo_ta_session:276 Lookup pseudo TA e13010e0-2ae1-11e5-896a-0002a5d5c51b
*D/TC:? 2 __wq_rpc:39 sleep thread 2 0x3f04e120 -1
*D/TC:? 0 __wq_rpc:39 sleep thread 0 0x3f04e120 -1
*D/TC:? 1 load_elf:823 Lookup user TA ELF e13010e0-2ae1-11e5-896a-0002a5d5c51b (early TA)
D/TC:? 1 load_elf:823 Lookup user TA ELF e13010e0-2ae1-11e5-896a-0002a5d5c51b (Secure Storage TA)
D/TC:? 1 load_elf:823 Lookup user TA ELF e13010e0-2ae1-11e5-896a-0002a5d5c51b (REE)
D/TC:? 1 __wq_rpc:39 sleep thread 1 0x3f06ff40 -2

OpenOCD:
> halt
hi3660.cpu0 cluster 0 core 0 multi core
hi3660.cpu1 cluster 0 core 1 multi core
target halted in AArch64 state due to debug-request, current mode: EL1H
cpsr: 0x60000085 pc: 0xffff0000080a1318
MMU: enabled, D-Cache: enabled, I-Cache: enabled
hi3660.cpu2 cluster 0 core 2 multi core
target halted in AArch64 state due to debug-request, current mode: EL1H
cpsr: 0x60000085 pc: 0xffff0000080a1318
MMU: enabled, D-Cache: enabled, I-Cache: enabled
hi3660.cpu3 cluster 0 core 3 multi core
target halted in AArch64 state due to debug-request, current mode: EL1H
cpsr: 0x60000085 pc: 0xffff0000080a1318
MMU: enabled, D-Cache: enabled, I-Cache: enabled
hi3660.cpu4 cluster 1 core 0 multi core
target halted in AArch64 state due to debug-request, current mode: EL1H
cpsr: 0x60000085 pc: 0xffff0000080a1318
MMU: enabled, D-Cache: enabled, I-Cache: enabled
hi3660.cpu5 cluster 1 core 1 multi core
target halted in AArch64 state due to debug-request, current mode: EL1H
cpsr: 0x60000085 pc: 0xffff0000080a1318
MMU: enabled, D-Cache: enabled, I-Cache: enabled
hi3660.cpu6 cluster 1 core 2 multi core
target halted in AArch64 state due to debug-request, current mode: EL1H
cpsr: 0x60000085 pc: 0xffff0000080a1318
MMU: enabled, D-Cache: enabled, I-Cache: enabled
hi3660.cpu7 cluster 1 core 3 multi core
target halted in AArch64 state due to debug-request, current mode: EL1H
cpsr: 0x60000085 pc: 0xffff0000080a1318
MMU: enabled, D-Cache: enabled, I-Cache: enabled
target halted in AArch64 state due to debug-request, current mode: EL1H
cpsr: 0x60000085 pc: 0xffff0000080a1318
MMU: enabled, D-Cache: enabled, I-Cache: enabled
>

$ aarch64-linux-gnu-addr2line -f -e linux/vmlinux 0xffff0000080a1318
cpu_do_idle
/home/jerome/work/optee_build_hikey960/linux/arch/arm64/mm/proc.S:60

The text was updated successfully, but these errors were encountered:

jforissier · 2019-01-14T15:34:58Z

HiKey960, default config.

lorc · 2019-01-14T16:30:24Z

Aha, so this is not caused by virtualization. I noticed that IBART fails in #2370 and was looking for cause.

lorc · 2019-01-14T18:34:47Z

@jforissier does this default config include lockdep debug? Because, with that option enabled I am seeing the following in qemu-v8:

E/TC:? 0 do_range:229 error: res = 0xf0100001
E/TC:? 0 do_range:229 error: res = 0xf0100001
E/TC:? 0 do_range:229 error: res = 0xf0100001
E/TC:1 0 unw_get_kernel_stack:134 Out of memory
E/TC:1 0 lockdep_lock_acquire:85 lockdep: error 0xffff000c
E/TC:1 0 Panic at core/include/kernel/lockdep.h:86 <lockdep_lock_acquire>
E/TC:1 0 Call stack:
E/TC:1 0 0x000000000e1066fc
E/TC:1 0 0x000000000e10b494
E/TC:1 0 0x000000000e104b60
E/TC:1 0 0x000000000e104900
E/TC:1 0 0x000000000e160370
E/TC:1 0 0x000000000e159ef4
E/TC:1 0 0x000000000e15a5ac
E/TC:1 0 0x000000000e13f24c
E/TC:1 0 0x000000000e13fd98
E/TC:1 0 0x000000000e1402a8
E/TC:1 0 0x000000000e140368
E/TC:1 0 0x000000000e140894
E/TC:1 0 0x000000000e13ec94
E/TC:1 0 0x000000000e14932c
E/TC:1 0 0x000000000e14c22c
E/TC:1 0 0x000000000e14496c
E/TC:1 0 0x000000000e102080

(gdb) bt
#0 __do_panic (file=file@entry=0xe10f391 "core/include/kernel/lockdep.h", line=line@entry=86, func=func@entry=0xe10f309 <func.1904> "lockdep_lock_acquire", msg=msg@entry=0x0) at core/kernel/panic.c:33
#1 0x000000000e104bdc in lockdep_lock_acquire (graph=0xe112008 , id=236004352, owned=0xe112650 ) at core/include/kernel/lockdep.h:86
#2 mutex_lock_check (m=m@entry=0xe112400 <ltc_hash_mutex>) at core/arch/arm/kernel/mutex_lockdep.c:43
#3 0x000000000e10497c in __mutex_lock (fname=0x0, lineno=-1, m=0xe112400 <ltc_hash_mutex>) at core/arch/arm/kernel/mutex.c:25
#4 mutex_lock (m=m@entry=0xe112400 <ltc_hash_mutex>) at core/arch/arm/kernel/mutex.c:233
#5 0x000000000e16031c in find_hash (name=0xe13d6f9 "sha256") at core/lib/libtomcrypt/src/misc/crypt/crypt_find_hash.c:55
#6 0x000000000e159ea0 in tee_algo_to_ltc_hashindex (algo=, ltc_hashindex=ltc_hashindex@entry=0xe1718cc) at core/lib/libtomcrypt/src/tee_ltc_provider.c:152
#7 0x000000000e15a5d0 in crypto_hash_final (ctx=0xe126150, algo=, digest=digest@entry=0xe121310 "0$\021\016", len=len@entry=32) at core/lib/libtomcrypt/src/tee_ltc_provider.c:390
#8 0x000000000e13f22c in check_digest (h=0xe12d800) at core/arch/arm/kernel/ree_fs_ta.c:197
#9 ta_read (h=0xe12d800, data=, len=) at core/arch/arm/kernel/ree_fs_ta.c:232
#10 0x000000000e140290 in copy_to (state=0xe121310, dst=0xe121360, dst_size=4294967295, dst_offs=0, offs=, len=32) at core/arch/arm/kernel/elf_load.c:59
#11 0x000000000e1402fc in alloc_and_copy_to (p=p@entry=0xe171a18, state=state@entry=0xe123db0, offs=4296, len=896) at core/arch/arm/kernel/elf_load.c:77
#12 0x000000000e140828 in elf_load_body (state=0xe123db0, vabase=1074769920) at core/arch/arm/kernel/elf_load.c:588
#13 0x000000000e13ec28 in load_elf_from_store (utc=, ta_store=0xe138678 <__scattered_array_0ta_stores>, uuid=0xe121af0) at core/arch/arm/kernel/user_ta.c:797
#14 load_elf (utc=, uuid=0xe121af0) at core/arch/arm/kernel/user_ta.c:825
#15 tee_ta_init_user_ta_session (uuid=uuid@entry=0xe12d000, s=s@entry=0xe126410) at core/arch/arm/kernel/user_ta.c:1004
#16 0x000000000e1492d8 in tee_ta_init_session (open_sessions=0xe1264c8, sess=, uuid=0xe12d000, err=0xe171c24) at core/kernel/tee_ta_manager.c:511
#17 tee_ta_open_session (err=0xe171c24, err@entry=0xe171b84, sess=0xe171c28, sess@entry=0xe171b88, open_sessions=open_sessions@entry=0xe1264c8, uuid=uuid@entry=0xe12d000, clnt_id=clnt_id@entry=0xe12ab30,
cancel_req_to=0, cancel_req_to@entry=236104496, param=param@entry=0xe12b400) at core/kernel/tee_ta_manager.c:538
#18 0x000000000e14c1d8 in syscall_open_ta_session (dest=0x0, cancel_req_to=236104496, usr_param=0x40004e08, ta_sess=0xe171c28, ret_orig=0xe12b400) at core/tee/tee_svc.c:781

jforissier · 2019-01-14T20:30:03Z

@jforissier does this default config include lockdep debug?

No, it doesn't.

Because, with that option enabled I am seeing the following in qemu-v8

...
E/TC:1 0 unw_get_kernel_stack:134 Out of memory
...

This error does not mean that a locking problem has been detected.

Here, there is not enough heap memory to allocate a buffer for lockdep to record a call stack. Unless there is a memory leak in the lockdep code, you may need to increase the heap size.

lorc · 2019-01-14T20:38:05Z

Yes, I already found that... I didn't debugged it further, but this fault happens at 2-3rd run of xtest. So maybe there is memory leak somewhere in lockdep code, because with lockdep debug disabled, I don't see any problems with insufficient memory.

jforissier · 2019-01-14T20:49:54Z

So maybe there is memory leak somewhere in lockdep code

Could be. This code has not been stressed a lot, I have essentially used it on single runs of xtest. Could it be that we create and destroy mutexes dynamically somewhere? IIRC I did not bother with cleaning things up when a mutex is destroyed :/

lorc · 2019-01-14T21:39:07Z

Looks like mbedtls does this all the way. At least I can see lots of calls to mbedtls_mutex_init() / mbedtls_mutex_free(). But anyways, I'm not sure that this is related to your issue.

I just wanted to suggest to try enable lockdep, because it looks like a dead lock.

By the way, I just encountered similar issue with virtualization support enabled. But in my case it got locked in test 2002. Can't reproduce it on vanilla qemu, thou.

jforissier · 2019-01-15T10:29:58Z

Looks like mbedtls does this all the way. At least I can see lots of calls to mbedtls_mutex_init() / mbedtls_mutex_free().

I don't think those functions do anything at all in our configuration.

I just wanted to suggest to try enable lockdep, because it looks like a dead lock.

Indeed, but unfortunately lockdep found nothing wrong :/ The issue might be with other synchronization objects than mutexes (condvars for instance)...

By the way, I just encountered similar issue with virtualization support enabled. But in my case it got locked in test 2002.

I suspect there is a race condition when we load the TAs. Both 1013 and 2002 load TAs concurrently.

lorc · 2019-01-15T13:21:06Z

I enabled mutex debug and got the following log:

* regression_2002 Concurrent stressing TCP iSocket API tests
o regression_2002.1 Stressing with 3 threads
D/TC:? 0 tee_ta_init_pseudo_ta_session:276 Lookup pseudo TA 873bcd08-c2c3-11e6-a937-d0bf9c45c61c
D/TC:? 0 load_elf:823 Lookup user TA ELF 873bcd08-c2c3-11e6-a937-d0bf9c45c61c (Secure Storage TA)
D/TC:? 1 __wq_rpc:37 sleep thread 1 0xe15f100 -1 core/kernel/tee_ta_manager.c:494
D/TC:? 0 load_elf:823 Lookup user TA ELF 873bcd08-c2c3-11e6-a937-d0bf9c45c61c (REE)
D/TC:? 0 load_elf_from_store:791 ELF load address 0x40005000
D/TC:? 0 tee_ta_init_user_ta_session:1013 Processing relocations in 873bcd08-c2c3-11e6-a937-d0bf9c45c61c
D/TC:? 0 __wq_rpc:37 wake  thread 1 0xe15f100 -3 core/kernel/tee_ta_manager.c:520
D/TC:? 1 tee_ta_init_pseudo_ta_session:276 Lookup pseudo TA 873bcd08-c2c3-11e6-a937-d0bf9c45c61c
D/TC:? 1 load_elf:823 Lookup user TA ELF 873bcd08-c2c3-11e6-a937-d0bf9c45c61c (Secure Storage TA)
D/TC:? 0 __wq_rpc:37 sleep thread 0 0xe15f100 -1 core/kernel/tee_ta_manager.c:100
D/TC:? 1 load_elf:823 Lookup user TA ELF 873bcd08-c2c3-11e6-a937-d0bf9c45c61c (REE)
D/TC:? 1 __wq_rpc:37 sleep thread 1 0xe17ff40 -2 lib/libutils/ext/mempool.c:87

jenswi-linaro · 2019-01-15T13:26:49Z

Each time a temporary bignum is allocated it's taken from this pool. Only one thread at a time can use the pool so it's important to release all references to pool when the work is done in order to allow another thread start using the pool. It would be interesting to know what's in pool->owner. Usually this works well, this is the first time I've seen this error.

lorc · 2019-01-15T15:00:04Z

Here you go:

(gdb) p *pool
$1 = {size = 43008, last_offset = 1248, data = 236327648, release_mem = 0x0, mu = {spin_lock = 0, wq = {slh_first = 0x0}, state = -1, owner_id = -1}, cv = {spin_lock = 0, m = 0x0}, refc = {val = 4}, owner = 0}

backtrace:

(gdb) bt
#0  get_pool (pool=0xe17ff20) at lib/libutils/ext/mempool.c:87
#1  mempool_alloc (pool=0xe17ff20, size=size@entry=16) at lib/libutils/ext/mempool.c:148
#2  0x000000000e100134 in init (a=0xe187968 <stack_thread+14696>) at core/lib/libtomcrypt/src/mpi_desc.c:58
#3  0x000000000e12e6e4 in ltc_init_multi (a=0xe149bd1, a@entry=0xe187968 <stack_thread+14696>) at core/lib/libtomcrypt/src/math/multi.c:52
#4  0x000000000e130ff8 in rsa_exptmod (
    in=0xe17a244 " .\245\226\201ҕvgK\355\373{\032\375x\221Eh\262\317J\230\337$\363\062\206M\027@ͯ\307\062\274\227\001\023\331jJ\257\352\301/\306]\277n\221\001g\345&\361i[\017\247\354Yh\253ws\327\177\340\274?\311\002\071⋩\255\340\215\307\b\307F<\215\263\002\b\334x\200B\376\213\233\250#G\320\332D\243\027uIi{\211\301\327\362RA\006\235\375\350\343RGf>>,\323\061\326-\277\363\360\203\365lg\237\255\017{}", inlen=256, 
    out=0xe179f90 " ", outlen=0xe187a08 <stack_thread+14856>, which=0, key=0xe187c18 <stack_thread+15384>) at core/lib/libtomcrypt/src/pk/rsa/rsa_exptmod.c:87
#5  0x000000000e1319b8 in rsa_verify_hash_ex (
    sig=sig@entry=0xe17a244 " .\245\226\201ҕvgK\355\373{\032\375x\221Eh\262\317J\230\337$\363\062\206M\027@ͯ\307\062\274\227\001\023\331jJ\257\352\301/\306]\277n\221\001g\345&\361i[\017\247\354Yh\253ws\327\177\340\274?\311\002\071⋩\255\340\215\307\b\307F<\215\263\002\b\334x\200B\376\213\233\250#G\320\332D\243\027uIi{\211\301\327\362RA\006\235\375\350\343RGf>>,\323\061\326-\277\363\360\203\365lg\237\255\017{}", 
    siglen=siglen@entry=256, 
    hash=hash@entry=0xe17a224 "/\227v?(jQ\r\377q\245_q\a\271\262[\365>Ӓ\357\236-\001\307\373\033\323 .\245\226\201ҕvgK\355\373{\032\375x\221Eh\262\317J\230\337$\363\062\206M\027@ͯ\307\062\274\227\001\023\331jJ\257\352\301/\306]\277n\221\001g\345&\361i[\017\247\354Yh\253ws\327\177\340\274?\311\002\071⋩\255\340\215\307\b\307F<\215\263\002\b\334x\200B\376\213\233\250#G\320\332D\243\027uIi{\211\301\327\362RA\006\235\375\350\343RGf>>,\323\061\326-\277\363\360\203\365lg\237\255\017{}", hashlen=hashlen@entry=32, padding=1, hash_idx=3, saltlen=saltlen@entry=18446744073709551615, 
    stat=stat@entry=0xe187c08 <stack_thread+15368>, key=key@entry=0xe187c18 <stack_thread+15384>) at core/lib/libtomcrypt/src/pk/rsa/rsa_verify_hash.c:111
#6  0x000000000e12914c in crypto_acipher_rsassa_verify (algo=1879066672, key=key@entry=0xe187c90 <stack_thread+15504>, salt_len=salt_len@entry=-1, 
    msg=msg@entry=0xe17a224 "/\227v?(jQ\r\377q\245_q\a\271\262[\365>Ӓ\357\236-\001\307\373\033\323 .\245\226\201ҕvgK\355\373{\032\375x\221Eh\262\317J\230\337$\363\062\206M\027@ͯ\307\062\274\227\001\023\331jJ\257\352\301/\306]\277n\221\001g\345&\361i[\017\247\354Yh\253ws\327\177\340\274?\311\002\071⋩\255\340\215\307\b\307F<\215\263\002\b\334x\200B\376\213\233\250#G\320\332D\243\027uIi{\211\301\327\362RA\006\235\375\350\343RGf>>,\323\061\326-\277\363\360\203\365lg\237\255\017{}", msg_len=32, 
    sig=0xe17a244 " .\245\226\201ҕvgK\355\373{\032\375x\221Eh\262\317J\230\337$\363\062\206M\027@ͯ\307\062\274\227\001\023\331jJ\257\352\301/\306]\277n\221\001g\345&\361i[\017\247\354Yh\253ws\327\177\340\274?\311\002\071⋩\255\340\215\307\b\307F<\215\263\002\b\334x\200B\376\213\233\250#G\320\332D\243\027uIi{\211\301\327\362RA\006\235\375\350\343RGf>>,\323\061\326-\277\363\360\203\365lg\237\255\017{}", sig_len=256)
    at core/lib/libtomcrypt/src/tee_ltc_provider.c:951
#7  0x000000000e118b84 in shdr_verify_signature (shdr=0xe17a210, shdr@entry=0x0) at core/crypto/signed_hdr.c:72
#8  0x000000000e104324 in ta_open (uuid=0xe175120, h=0xe187e40 <stack_thread+15936>) at core/arch/arm/kernel/ree_fs_ta.c:104
#9  0x000000000e103724 in load_elf_from_store (utc=<optimized out>, ta_store=0xe149f70 <__scattered_array_0ta_stores>, uuid=0xe175120) at core/arch/arm/kernel/user_ta.c:707
#10 load_elf (utc=<optimized out>, uuid=0xe175120) at core/arch/arm/kernel/user_ta.c:825
#11 tee_ta_init_user_ta_session (uuid=uuid@entry=0xe187f60 <stack_thread+16224>, s=s@entry=0xe17b550) at core/arch/arm/kernel/user_ta.c:1004
#12 0x000000000e1176e8 in tee_ta_init_session (open_sessions=0xe15f050 <tee_open_sessions>, sess=<synthetic pointer>, uuid=0xe187f60 <stack_thread+16224>, err=0xe187f54 <stack_thread+16212>)
    at core/kernel/tee_ta_manager.c:519
#13 tee_ta_open_session (err=err@entry=0xe187f54 <stack_thread+16212>, sess=sess@entry=0xe187f58 <stack_thread+16216>, open_sessions=open_sessions@entry=0xe15f050 <tee_open_sessions>, 
    uuid=uuid@entry=0xe187f60 <stack_thread+16224>, clnt_id=clnt_id@entry=0xe187f70 <stack_thread+16240>, cancel_req_to=cancel_req_to@entry=4294967295, param=param@entry=0xe187fa8 <stack_thread+16296>)
    at core/kernel/tee_ta_manager.c:546
#14 0x000000000e112c98 in entry_open_session (smc_args=0xe188030 <stack_thread+16432>, num_params=6, arg=0xf001000) at core/arch/arm/tee/entry_std.c:304
#15 tee_entry_std (smc_args=0xe188030 <stack_thread+16432>) at core/arch/arm/tee/entry_std.c:549
#16 0x000000000e107094 in __thread_std_smc_entry (args=0xf001000) at core/arch/arm/kernel/thread.c:626

jenswi-linaro · 2019-01-15T15:31:21Z

The problem seems to be with thread 0 which doesn't release the memory pool. The refc.val suggests that it could be leakage of a few bigints, or perhaps there's some lock order issue (thread 0 trying to take the tee_ta_mutex while owning the memory pool).

lorc · 2019-01-15T15:57:28Z

I concerned a bit by a logic in get_pool() and put_pool().

I suspect, that there may be race between this:

	if (refcount_inc(&pool->refc)) {
		if (pool->owner == thread_get_id())
			return;
---->
		refcount_dec(&pool->refc);
	}

and this:

---->
	if (refcount_dec(&pool->refc)) {
		mutex_lock(&pool->mu);

		pool->owner = THREAD_ID_INVALID;
		condvar_signal(&pool->cv);
...

Assume that thread 0 owns the pool. Thread 1 calls get_pool() and increments refcount. In the same time thread 0 calls put_pool() to put the last reference. So it calls refcount_dec() but it does not enter if () block because thread 1 holding other reference. Then thread 0 decreases reference back to 0. Now we have pool with owner, but without references.

Obviously, this is not the case for this particular issue, because we have refcount == 4. But it is possible race, right?

jenswi-linaro · 2019-01-16T06:43:28Z

You're right, there's a race.

jenswi-linaro · 2019-01-16T07:53:54Z

There's a fix for this race at #2741

Back to the error in this issue. How is this recreated? @lorc you seem to be able to do this with QEMU v8 and you're using MPA instead of the default MPI. Anything more?

jforissier · 2019-01-16T08:41:41Z

[...] you're using MPA instead of the default MPI.

I don't think so, there is mpi_desc.c in the call stack above (#2737 (comment)).

jenswi-linaro · 2019-01-16T08:58:12Z

@jforissier, You're right, my mistake.

lorc · 2019-01-16T13:08:09Z

@jenswi-linaro, I'm reproducing this on my setup with XEN on QEMU. But you can reproduce this without XEN. You need to invoke QEMU with 1 core:

QEMU_SMP=1 make run-only.

I added some more tracing and found that this deadlock is caused by test 4007:

o regression_4007.14 Generate DSA key 512 bits
D/TC:? 0 tee_ta_get_session:197 ----
D/TC:? 0 tee_ta_try_set_busy:101 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_clear_busy:145 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_put_session:168 ---- ses = 0xe1742c0
D/TC:? 0 tee_ta_get_session:197 ----
D/TC:? 0 tee_ta_try_set_busy:101 ---- ctx = 0xe174260
D/TC:? 0 put_pool:100 refcnt = 26
D/TC:? 0 put_pool:100 refcnt = 26
D/TC:? 0 put_pool:100 refcnt = 27
D/TC:? 0 put_pool:100 refcnt = 26
D/TC:? 0 put_pool:100 refcnt = 26
[....]
D/TC:? 0 put_pool:100 refcnt = 8
D/TC:? 0 put_pool:100 refcnt = 7
D/TC:? 0 put_pool:100 refcnt = 6
D/TC:? 0 put_pool:100 refcnt = 5
D/TC:? 0 tee_ta_clear_busy:145 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_put_session:168 ---- ses = 0xe1742c0
D/TC:? 0 tee_ta_get_session:197 ----
D/TC:? 0 tee_ta_try_set_busy:101 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_clear_busy:145 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_put_session:168 ---- ses = 0xe1742c0
D/TC:? 0 tee_ta_get_session:197 ----
D/TC:? 0 tee_ta_try_set_busy:101 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_clear_busy:145 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_put_session:168 ---- ses = 0xe1742c0
D/TC:? 0 tee_ta_get_session:197 ----
D/TC:? 0 tee_ta_try_set_busy:101 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_clear_busy:145 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_put_session:168 ---- ses = 0xe1742c0
D/TC:? 0 tee_ta_get_session:197 ----
D/TC:? 0 tee_ta_try_set_busy:101 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_clear_busy:145 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_put_session:168 ---- ses = 0xe1742c0
D/TC:? 0 tee_ta_get_session:197 ----
D/TC:? 0 tee_ta_try_set_busy:101 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_clear_busy:145 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_put_session:168 ---- ses = 0xe1742c0
D/TC:? 0 tee_ta_get_session:197 ----
D/TC:? 0 tee_ta_try_set_busy:101 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_clear_busy:145 ---- ctx = 0xe174260
D/TC:? 0 tee_ta_put_session:168 ---- ses = 0xe1742c0
  regression_4007.14 OK

So, this is still another issue.

jenswi-linaro · 2019-01-16T13:17:44Z

Thanks, now I'm able to reproduce it.

lorc · 2019-01-16T15:10:17Z

I can confirm that
while true; do xtest 4007; xtest 2002; done; leads to deadlock.
xtest 4007 corrupts state of mempool and xtest 2002 acts as a canary.

But, I'm not sure if it is related to xtest 1013.

jenswi-linaro · 2019-01-16T15:17:24Z

It's the same with xtest 1013 when I'm testing. I'm starting to suspect memory leakage in mbedtls, but I have no evidence yet.

jenswi-linaro · 2019-01-17T10:22:20Z

I think I've found the problem. A fix in #2747

lorc · 2019-01-17T16:31:27Z

@jforissier, did you tried that fix on a HiKey? I can confirm that #2747 fixes the issue with xtest 2002.

However I can see that IBART still fails on xtest 1013 in #2370. But I can't reproduce that with QEMU either with QEMU_SMP=1 or with QEMU_SMP=4.

jforissier · 2019-01-17T16:47:35Z

@jforissier, did you tried that fix on a HiKey? I can confirm that #2747 fixes the issue with xtest 2002.

(I'm using HiKey960, not HiKey) Yes, I tried it and could not reproduce the issue, but prior to that I also had a hard time reproducing it this morning even without #2747 :( so I cannot be really sure. I will leave xtest running in a loop tonight.

jbech-linaro · 2019-01-17T22:43:26Z

I could be wrong here and my comment here isn't adding much value to the discussion as such, but this might be a very old issue that we're dealing with here. We have a separate Google spreadsheet from 2016 named "HiKey 1013 crash". I've opened it up for public viewing, see here. From what I recall we never sorted this out back then and to be honest I had forgotten about until I stumbled upon it a couple of days ago when cleaning up my Google drive.

jenswi-linaro · 2019-01-18T07:58:48Z

0x40004000 sounds suspiciously much like something close to the temporary privileged mapping used while in user mode.

vchong · 2019-01-18T08:04:54Z

@jbech-linaro iirc that issue was resolved in linaro-swg/gen_rootfs#23 although we've moved away from https://github.com/linaro-swg/gen_rootfs. @jforissier can probably confirm. Not sure if there's something similar in buildroot's rootfs.

jforissier · 2019-01-18T08:53:16Z

I will leave xtest running in a loop tonight.

Nightly test successful: no issues in 850 runs in a row (xtest with GP tests).

jforissier · 2019-01-23T15:29:44Z

Closing since I'm not able to reproduce.

jforissier mentioned this issue Jan 17, 2019

[RFC] Virtualization support #2370

Merged

jforissier closed this as completed Jan 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xtest 1013 hangs in normal world #2737

xtest 1013 hangs in normal world #2737

jforissier commented Jan 14, 2019

jforissier commented Jan 14, 2019

lorc commented Jan 14, 2019

lorc commented Jan 14, 2019

jforissier commented Jan 14, 2019

lorc commented Jan 14, 2019

jforissier commented Jan 14, 2019

lorc commented Jan 14, 2019

jforissier commented Jan 15, 2019

lorc commented Jan 15, 2019

jenswi-linaro commented Jan 15, 2019

lorc commented Jan 15, 2019

jenswi-linaro commented Jan 15, 2019

lorc commented Jan 15, 2019

jenswi-linaro commented Jan 16, 2019

jenswi-linaro commented Jan 16, 2019

jforissier commented Jan 16, 2019

jenswi-linaro commented Jan 16, 2019

lorc commented Jan 16, 2019

jenswi-linaro commented Jan 16, 2019

lorc commented Jan 16, 2019

jenswi-linaro commented Jan 16, 2019

jenswi-linaro commented Jan 17, 2019

lorc commented Jan 17, 2019

jforissier commented Jan 17, 2019 •

edited

Loading

jbech-linaro commented Jan 17, 2019

jenswi-linaro commented Jan 18, 2019

vchong commented Jan 18, 2019

jforissier commented Jan 18, 2019 •

edited

Loading

jforissier commented Jan 23, 2019

xtest 1013 hangs in normal world #2737

xtest 1013 hangs in normal world #2737

Comments

jforissier commented Jan 14, 2019

jforissier commented Jan 14, 2019

lorc commented Jan 14, 2019

lorc commented Jan 14, 2019

jforissier commented Jan 14, 2019

lorc commented Jan 14, 2019

jforissier commented Jan 14, 2019

lorc commented Jan 14, 2019

jforissier commented Jan 15, 2019

lorc commented Jan 15, 2019

jenswi-linaro commented Jan 15, 2019

lorc commented Jan 15, 2019

jenswi-linaro commented Jan 15, 2019

lorc commented Jan 15, 2019

jenswi-linaro commented Jan 16, 2019

jenswi-linaro commented Jan 16, 2019

jforissier commented Jan 16, 2019

jenswi-linaro commented Jan 16, 2019

lorc commented Jan 16, 2019

jenswi-linaro commented Jan 16, 2019

lorc commented Jan 16, 2019

jenswi-linaro commented Jan 16, 2019

jenswi-linaro commented Jan 17, 2019

lorc commented Jan 17, 2019

jforissier commented Jan 17, 2019 • edited Loading

jbech-linaro commented Jan 17, 2019

jenswi-linaro commented Jan 18, 2019

vchong commented Jan 18, 2019

jforissier commented Jan 18, 2019 • edited Loading

jforissier commented Jan 23, 2019

jforissier commented Jan 17, 2019 •

edited

Loading

jforissier commented Jan 18, 2019 •

edited

Loading