The 1GB VA constraint #7275

markz-zhang · 2025-02-12T00:23:34Z

We hit an issue recently. Here is the configuration info:

CFG_LPAE_ADDR_SPACE_BITS = 38
CFG_WITH_PAGER=n
CFG_CORE_ASLR=n
CFG_CORE_FFA=y
CFG_CORE_SEL2_SPMC=y

The issue is when optee boots up and assigns the VA addresses for different memories, the VA range may span two GBs.
Check this debug log as an example:

VM 8001: D/TC:00    dump_mmap_table:925 type SHM_VASPACE  va 0x1fff000000..0x2000ffffff pa 0x00000000..0x01ffffff size 0x02000000 (pgdir)
VM 8001: D/TC:00    dump_mmap_table:925 type RES_VASPACE  va 0x2001200000..0x20031fffff pa 0x00000000..0x01ffffff size 0x02000000 (pgdir)
VM 8001: D/TC:00    dump_mmap_table:925 type IO_SEC       va 0x20035f4000..0x2003623fff pa 0x8189850000..0x818987ffff size 0x00030000 (smallpg)
VM 8001: D/TC:00    dump_mmap_table:925 type TA_RAM       va 0x2003624000..0x200521ffff pa 0x2005824000..0x200741ffff size 0x01bfc000 (smallpg)
VM 8001: D/TC:00    dump_mmap_table:925 type IO_SEC       va 0x200538f000..0x200540efff pa 0x80100000..0x8017ffff size 0x00080000 (smallpg)
VM 8001: D/TC:00    dump_mmap_table:925 type IO_SEC       va 0x200540f000..0x200541efff pa 0x08830000..0x0883ffff size 0x00010000 (smallpg)
VM 8001: D/TC:00    dump_mmap_table:925 type IO_SEC       va 0x200541f000..0x200541ffff pa 0x08800000..0x08800fff size 0x00001000 (smallpg)
VM 8001: D/TC:00    dump_mmap_table:925 type TEE_RAM_RO   va 0x2005420000..0x2005423fff pa 0x2005420000..0x2005423fff size 0x00004000 (smallpg)
VM 8001: D/TC:00    dump_mmap_table:925 type TEE_RAM_RX   va 0x2005424000..0x20054a3fff pa 0x2005424000..0x20054a3fff size 0x00080000 (smallpg)
VM 8001: D/TC:00    dump_mmap_table:925 type TEE_RAM_RW   va 0x20054a4000..0x2005823fff pa 0x20054a4000..0x2005823fff size 0x00380000 (smallpg)

As you can see, the VA range in above log is 0x1fff000000 - 0x2005824000.
This causes a mmu translation fault when running xtest 1009.
Looking at the test codes, the test case does 2 things:

Send a command to TA and ask the TA to wait 2 seconds
Create another thread to send a command to the same TA to cancel the wait

After some digging, we believe the root cause is the change in L1 table is not spawned to other CPU cores.
OP-TEE has a 3 level page table architecture and the L1 table(base table) is banked by CPU cores.
If the entire VA range in OP-TEE is within 1GB, then all CPU cores' L1 table are the same.
But if the VA range spans two GBs, for example in the example above, the VA range is 0x1fff000000 - 0x2005824000, when creating mapping on 0x1fff000000, that means the L1 table needs to be changed.
So for the xtest case, when step #1 happens, a 0x1fff000000 mapping is created in one CPU core's L1 table while other CPU cores are not aware of that.
When #2 happens, the mmu translation fault happens because this new CPU core doesn't have mapping for 0x1fff000000.

Please correct me if there is something wrong above.
Also if this is a known issue, please share the fix info and feel free to close this ticket.
Thanks.

The text was updated successfully, but these errors were encountered:

markz-zhang · 2025-02-12T00:28:18Z

Pasting the mmu fault log:

VM 8001: E/TC:08 01
VM 8001: E/TC:08 01 Core data-abort at address 0x1fff00019c (translation fault)
VM 8001: E/TC:08 01  esr 0x96000005  ttbr0 0x200554c000   ttbr1 0x00000000   cidr 0x0
VM 8001: E/TC:08 01  cpu #8          cpsr 0x60000144
VM 8001: E/TC:08 01  x0  0000001fff00019c x1  0000000000000000
VM 8001: E/TC:08 01  x2  0000000000000000 x3  0000000000000000
VM 8001: E/TC:08 01  x4  0000000000000000 x5  0000000000000000
VM 8001: E/TC:08 01  x6  0000000000000000 x7  0000000000000000
VM 8001: E/TC:08 01  x8  0000000000000020 x9  0000002005567d10
VM 8001: E/TC:08 01  x10 0000000000000000 x11 0000000000000000
VM 8001: E/TC:08 01  x12 0000000000000000 x13 0000002005567c7b
VM 8001: E/TC:08 01  x14 0000000000000000 x15 0000000000000000
VM 8001: E/TC:08 01  x16 0000002005433b84 x17 0000000000000000
VM 8001: E/TC:08 01  x18 0000000000000000 x19 0000001fff000180
VM 8001: E/TC:08 01  x20 00000020054df100 x21 0000002005488922
VM 8001: E/TC:08 01  x22 0000000000000180 x23 0000000000000001
VM 8001: E/TC:08 01  x24 0000000000000180 x25 0000000000000000
VM 8001: E/TC:08 01  x26 0000000000000000 x27 0000000000000000
VM 8001: E/TC:08 01  x28 0000000000000000 x29 0000002005567ff0
VM 8001: E/TC:08 01  x30 0000002005429edc elr 0000002005429ee0
VM 8001: E/TC:08 01  sp_el0 0000002005567ff0
VM 8001: E/TC:08 01 TEE load address @ 0x2005424000
VM 8001: E/TC:08 01 Call stack:
VM 8001: E/TC:08 01  0x2005429ee0
VM 8001: E/TC:08 01 Panic 'unhandled pageable abort' at core/arch/arm/kernel/abort.c:582 <abort_handler>
VM 8001: E/TC:08 01 TEE load address @ 0x2005424000
VM 8001: E/TC:08 01 Call stack:
VM 8001: E/TC:08 01  0x200542ba74
VM 8001: E/TC:08 01  0x20054379a0
VM 8001: E/TC:08 01  0x200542ae84
VM 8001: E/TC:08 01  0x2005427834

jenswi-linaro · 2025-02-12T07:57:53Z

I'm surprised we haven't seen this before with ASLR enabled, when looking at the code I see how this can happen.
core_init_mmu_prtn_tee() initializes the per-cpu top translation table for the boot CPU and replicates that to the other top translation tables once. However, it doesn't add mappings for "dynamic vaspace", that is, MEM_AREA_RES_VASPACE and MEM_AREA_SHM_VASPACE. Later when something is mapped in for instance MEM_AREA_SHM_VASPACE only the top translation table for that CPU is updated, leaving the others unchanged even if it's a global mapping.

I think the best fix is to add entries in the top translation tables for the "dynamic vaspace" during boot so it's replicated before the other CPUs have started. We should be careful to only add translation tables needed for the per-cpu top translation tables to avoid wasting translation tables that might not be used.

Can you fix this problem or should I?

markz-zhang · 2025-02-12T08:08:56Z

Hi Jens, your words make sense but how to add mappings for "MEM_AREA_RES_VASPACE" and "MEM_AREA_SHM_VASPACE" when optee boots? Unlike other memory regions which already have physical addresses, these 2 memory regions don't have physical addresses allocated when optee boots.

jenswi-linaro · 2025-02-12T08:15:26Z

We'd map it with NULL entries in the lowest translation table.

markz-zhang · 2025-02-12T08:18:12Z

Oh, sounds good. Let me try to create a fix and test in my develop environment. If it works, I can send out a patch to review. Thanks.

jenswi-linaro · 2025-02-12T08:33:14Z

Great, thanks!

When optee boots, the initial mapping for MEM_AREA_RES_VASPACE and MEM_AREA_SHM_VASPACE should be added into page tables and replicated to all CPU cores too. This fixes an issue when the VA of MEM_AREA_RES_VASPACE or MEM_AREA_SHM_VASPACE is not in a same 1GB region with other memory regions. Link: OP-TEE#7275 Signed-off-by: Mark Zhang <markz@nvidia.com>

When optee boots, the initial mapping for MEM_AREA_RES_VASPACE and MEM_AREA_SHM_VASPACE should be added into page tables and replicated to all CPU cores too. This fixes an issue when the VA of MEM_AREA_RES_VASPACE or MEM_AREA_SHM_VASPACE is not in a same 1GB region with other memory regions. Link: OP-TEE#7275 Signed-off-by: Mark Zhang <markz@nvidia.com> Reviewed-by: Jens Wiklander <jens.wiklander@linaro.org>

When optee boots, the initial mapping for MEM_AREA_RES_VASPACE and MEM_AREA_SHM_VASPACE should be added into page tables and replicated to all CPU cores too. This fixes an issue when the VA of MEM_AREA_RES_VASPACE or MEM_AREA_SHM_VASPACE is not in a same 1GB region with other memory regions. Link: #7275 Signed-off-by: Mark Zhang <markz@nvidia.com> Reviewed-by: Jens Wiklander <jens.wiklander@linaro.org>

markz-zhang mentioned this issue Feb 12, 2025

core: mmu: Add dynamic VA regions' mapping to page table #7276

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The 1GB VA constraint #7275

The 1GB VA constraint #7275

markz-zhang commented Feb 12, 2025 •

edited

Loading

markz-zhang commented Feb 12, 2025

jenswi-linaro commented Feb 12, 2025

markz-zhang commented Feb 12, 2025 •

edited

Loading

jenswi-linaro commented Feb 12, 2025

markz-zhang commented Feb 12, 2025

jenswi-linaro commented Feb 12, 2025

The 1GB VA constraint #7275

The 1GB VA constraint #7275

Comments

markz-zhang commented Feb 12, 2025 • edited Loading

markz-zhang commented Feb 12, 2025

jenswi-linaro commented Feb 12, 2025

markz-zhang commented Feb 12, 2025 • edited Loading

jenswi-linaro commented Feb 12, 2025

markz-zhang commented Feb 12, 2025

jenswi-linaro commented Feb 12, 2025

markz-zhang commented Feb 12, 2025 •

edited

Loading

markz-zhang commented Feb 12, 2025 •

edited

Loading