Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPTEE assertion fail in bpool when OPTEE_SMC_VM_CREATED is called a second time #7182

Closed
yuvraj1803 opened this issue Dec 16, 2024 · 10 comments

Comments

@yuvraj1803
Copy link
Contributor

I have compiled OPTEE with CFG_VIRTUALIZATION and the hypervisor sends a OPTEE_SMC_VM_CREATED when its booting up with a1 as HYP_CLNT_ID. But when a guest VM is being brought up, another request (OPTEE_SMC_VM_CREATED) is made, which brings OPTEE to a halt.

This is my OPTEE build config:
OPTEE_FLAGS ?= CFG_ASLR=n
CROSS_COMPILE=aarch64-linux-gnu-
CROSS_COMPILE_core=aarch64-linux-gnu-
CROSS_COMPILE_ta_arm64=aarch64-linux-gnu-
PLATFORM=vexpress-qemu_armv8a
O=out/arm/
CFG_USER_TA_TARGETS=ta_arm64
CFG_ARM64_core=y
CFG_TEE_CORE_LOG_LEVEL=3
CFG_CORE_ASLR=n
CFG_ARM_GICV3=y
CFG_NS_VIRTUALIZATION=y
DEBUG=1

This is the log from OPTEE's serial console:

D/TC:0 1 configure_guest_prtn_mem:216 TEE RAM: 0e755000
D/TC:0 1 configure_guest_prtn_mem:224 TA RAM: 0e87a000
D/TC:0 1 prepare_memory_map:161 New map (0e1db000):
D/TC:0 1 prepare_memory_map:164 T: TEE_RAM_RX rsz: 00001000, pa: 0e100000, va: 0e100000, sz: 0009b000 attr: 3851
D/TC:0 1 prepare_memory_map:164 T: TEE_RAM_RO rsz: 00001000, pa: 0e19b000, va: 0e19b000, sz: 00025000 attr: 3811
D/TC:0 1 prepare_memory_map:164 T: NEX_RAM_RW rsz: 00001000, pa: 0e1c0000, va: 0e1c0000, sz: 0001b000 attr: 3831
D/TC:0 1 prepare_memory_map:164 T: TEE_RAM_RW rsz: 00001000, pa: 0e755000, va: 0e1db000, sz: 00125000 attr: 3831
D/TC:0 1 prepare_memory_map:164 T: SEC_RAM_OVERALL rsz: 00001000, pa: 0e100000, va: 0e300000, sz: 00f00000 attr: 1831
D/TC:0 1 prepare_memory_map:164 T: SHM_VASPACE rsz: 00200000, pa: 00000000, va: 0f200000, sz: 02000000 attr: 0
D/TC:0 1 prepare_memory_map:164 T: RES_VASPACE rsz: 00200000, pa: 00000000, va: 11200000, sz: 00a00000 attr: 0
D/TC:0 1 prepare_memory_map:164 T: IO_SEC rsz: 00200000, pa: 08000000, va: 11c00000, sz: 01200000 attr: 831
D/TC:0 1 prepare_memory_map:164 T: NSEC_SHM rsz: 00200000, pa: 42000000, va: 12e00000, sz: 00200000 attr: 1031
D/TC:0 1 prepare_memory_map:164 T: TA_RAM rsz: 00001000, pa: 0e87a000, va: 1307a000, sz: 00325000 attr: 3831
D/TC:0 1 core_mmu_xlat_table_alloc:527 xlat tables used 1 / 8
D/TC:0 1 core_mmu_xlat_table_alloc:527 xlat tables used 2 / 8
D/TC:0 1 core_mmu_xlat_table_alloc:527 xlat tables used 3 / 8
D/TC:0 1 core_mmu_xlat_table_alloc:527 xlat tables used 4 / 8
D/TC:0 1 core_mmu_xlat_table_alloc:527 xlat tables used 5 / 8
E/TC:1 1 assertion 'poolset->freelist.ql.blink->ql.flink == &poolset->freelist' failed at lib/libutils/isoc/bget.c:1138
E/TC:1 1 Panic at core/kernel/assert.c:28 <_assert_break>
E/TC:1 1 TEE load address @ 0xe100000
E/TC:1 1 Call stack:
E/TC:1 1 0x0e10b020
E/TC:1 1 0x0e1283a4
E/TC:1 1 0x0e11f764
E/TC:1 1 0x0e1936f4
E/TC:1 1 0x0e194588
E/TC:1 1 0x0e195600
E/TC:1 1 0x0e1953d4
E/TC:1 1 0x0e1959c4
E/TC:1 1 0x0e10c03c
E/TC:1 1 0x0e10f318
E/TC:1 1 0x0e10f630
E/TC:1 1 0x0e10f3bc
E/TC:1 1 0x0e1074bc
I/TC: Halting CPU 0

@jforissier
Copy link
Contributor

Hi @yuvraj1803,

The call stack can be decoded with scripts/symbolize.py -d path/to/tee.elf, but whatever the stack, it looks like you have some memory corruption. Could it be that some some other piece of firmware is overwriting OP-TEE's memory?

@yuvraj1803
Copy link
Contributor Author

Hi @jforissier,

I was able to navigate through this from gdb. Nevertheless, here is the output of the script.

E/TC:1 1 TEE load address @ 0xe100000
E/TC:1 1 Call stack:
E/TC:1 1 0x0e10b020 print_kernel_stack at core/arch/arm/kernel/unwind_arm64.c:91
E/TC:1 1 0x0e1283a4 __do_panic at core/kernel/panic.c:75 (discriminator 32)
E/TC:1 1 0x0e11f764 _assert_break at core/kernel/assert.c:28
E/TC:1 1 0x0e1936f4 _assert_trap at lib/libutils/isoc/bget_malloc.c:19
E/TC:1 1 0x0e194588 bpool at lib/libutils/isoc/bget.c:1138 (discriminator 1)
E/TC:1 1 0x0e195600 raw_malloc_add_pool at lib/libutils/isoc/bget_malloc.c:934
E/TC:1 1 0x0e1953d4 gen_malloc_add_pool at lib/libutils/isoc/bget_malloc.c:872
E/TC:1 1 0x0e1959c4 malloc_add_pool at lib/libutils/isoc/bget_malloc.c:1009
E/TC:1 1 0x0e10c03c virt_guest_created at core/arch/arm/kernel/virtualization.c:336
E/TC:1 1 0x0e10f318 tee_entry_vm_created at core/arch/arm/tee/entry_fast.c:181
E/TC:1 1 0x0e10f630 __tee_entry_fast at core/arch/arm/tee/entry_fast.c:285
E/TC:1 1 0x0e10f3bc tee_entry_fast at core/arch/arm/tee/entry_fast.c:207
E/TC:1 1 0x0e1074bc thread_handle_fast_smc at core/arch/arm/kernel/thread_optee_smc.c:42

As I mentioned, this happens the "second" time I pass OPTEE_SMC_VM_CREATED. I am not completely sure if there is a memory corruption, given I am running TF-A and OPTEE out of the box. And moreover, nothing breaks unless I make that second SMC.

@yuvraj1803
Copy link
Contributor Author

yuvraj1803 commented Dec 16, 2024

A follow up, @jforissier, can the hypervisor make the same calls a guest can make to OPTEE? Do all OPTEE_SMC_* stand equivalent irrespective of where the request is coming from (hypervisor or guest)?

From what I was able to debug, assertion 'poolset->freelist.ql.blink->ql.flink == &poolset->freelist' fails because poolset->freelist.ql.blink is a value which is undesirable. I dug deeper.

When the first VM is initialized and makes its first STD call, control eventually flows to brel() (see backtrace below), which changes poolset->freelist.ql.blink (lib/libutils/isoc/bget.c:1006). So when the next VM is initialised and bpool() is called, this value is not the same as &poolset->freelist and hence the assertion fail.

BACKTRACE:

#0 brel (buf=0xe20dec0, poolset=0xe1de190 <malloc_ctx>, wipe=0) at lib/libutils/isoc/bget.c:1004
#1 0x000000000e1950e0 in raw_realloc (ptr=0xe20dec0, hdr_size=0, ftr_size=0, pl_size=128,
ctx=0xe1de190 <malloc_ctx>) at lib/libutils/isoc/bget_malloc.c:520
#2 0x000000000e19526c in realloc_unlocked (ctx=0xe1de190 <malloc_ctx>, ptr=0xe20dec0, size=128)
at lib/libutils/isoc/bget_malloc.c:817
#3 0x000000000e1952a8 in realloc (ptr=0xe20dec0, size=128) at lib/libutils/isoc/bget_malloc.c:825
#4 0x000000000e120ab4 in realloc_cached_node_array () at core/kernel/dt.c:619
#5 0x000000000e120b2c in add_cached_node (parent_offset=372, node_offset=416, address_cells=-1, size_cells=-1)
at core/kernel/dt.c:637
#6 0x000000000e120ce0 in add_cached_node_subtree (node_offset=372) at core/kernel/dt.c:673
#7 0x000000000e120d00 in add_cached_node_subtree (node_offset=328) at core/kernel/dt.c:678
#8 0x000000000e120d00 in add_cached_node_subtree (node_offset=284) at core/kernel/dt.c:678
#9 0x000000000e120d00 in add_cached_node_subtree (node_offset=0) at core/kernel/dt.c:678
#10 0x000000000e120e38 in init_node_cache_info (fdt=0xe1b6d00 <embedded_secure_dtb>) at core/kernel/dt.c:708
#11 0x000000000e120f54 in get_embedded_dt () at core/kernel/dt.c:738
#12 0x000000000e1206a8 in get_secure_dt () at core/kernel/dt.c:478
#13 0x000000000e123078 in probe_dt_drivers_early () at core/kernel/dt_driver.c:832
#14 0x000000000e12b3d4 in do_init_calls (type=0xe1b1520 "early_initcall",
begin=0xe1bea60 <__scattered_array_1early_initcall>, end=0xe1bea90 <__scattered_array_6finalcall>)
at core/kernel/initcall.c:20
#15 0x000000000e12b4b4 in call_early_initcalls () at core/kernel/initcall.c:46
#16 0x000000000e10abe8 in init_tee_runtime () at core/arch/arm/kernel/boot.c:902
#17 0x000000000e10c714 in virt_on_stdcall () at core/arch/arm/kernel/virtualization.c:554
#18 0x000000000e107e30 in __thread_std_smc_entry (a0=838860818, a1=1, a2=26976256, a3=0, a4=0, a5=0)
at core/arch/arm/kernel/thread_optee_smc.c:299
#19 0x000000000e103c0c in thread_std_smc_entry () at core/arch/arm/kernel/thread_optee_smc_a64.S:160

@jenswi-linaro
Copy link
Contributor

The hypervisor is supposed to filter and translate the SMCs as needed.

By the way, have you made any changes in the optee_os.git?

@yuvraj1803
Copy link
Contributor Author

Nope, no changes to OPTEE. You can have a look at my build system here: github.com/yuvraj1803/virt_tee.

(Please note, I have not committed the some changes, but anyway, optee has not been meddled with.)

@yuvraj1803
Copy link
Contributor Author

I was initially doubtful whether an SMC from the first VM (KVM host) was providing wrong arguments like incorrect physical address, etc.

But it looks like brel() gets called when the first std call is made, where OPTEE initialises the TEE runtime for that VM.

@yuvraj1803
Copy link
Contributor Author

After digging in even more, I believe the error was in my code making a yielding SMC with HYP_CLNT_ID, messing with malloc_ctx and corrupting it for it to be later copied into the VM's copy (__data_start to __data_end memcpy() in configure_guest_prtn_mem() ), leading to a panic.

Anyway, I believe the design of OPTEE is such that it fully trusts the hypervisor, maybe a tiny sanity check at yielding SMC reception should keep things looking good!

@jenswi-linaro
Copy link
Contributor

OP-TEE should not trust the hypervisor more than anything else in the normal world. It sounds like some fix is needed. Do you have a patch you can share or even make a pull request?

@yuvraj1803
Copy link
Contributor Author

Sure, @jenswi-linaro. I'll first put all this together (I'm sure it looks messy). Will send you the patch.

@jenswi-linaro
Copy link
Contributor

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants