Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The 1GB VA constraint #7275

Open
markz-zhang opened this issue Feb 12, 2025 · 6 comments
Open

The 1GB VA constraint #7275

markz-zhang opened this issue Feb 12, 2025 · 6 comments

Comments

@markz-zhang
Copy link
Contributor

markz-zhang commented Feb 12, 2025

We hit an issue recently. Here is the configuration info:

  • CFG_LPAE_ADDR_SPACE_BITS = 38
  • CFG_WITH_PAGER=n
  • CFG_CORE_ASLR=n
  • CFG_CORE_FFA=y
  • CFG_CORE_SEL2_SPMC=y

The issue is when optee boots up and assigns the VA addresses for different memories, the VA range may span two GBs.
Check this debug log as an example:

VM 8001: D/TC:00    dump_mmap_table:925 type SHM_VASPACE  va 0x1fff000000..0x2000ffffff pa 0x00000000..0x01ffffff size 0x02000000 (pgdir)
VM 8001: D/TC:00    dump_mmap_table:925 type RES_VASPACE  va 0x2001200000..0x20031fffff pa 0x00000000..0x01ffffff size 0x02000000 (pgdir)
VM 8001: D/TC:00    dump_mmap_table:925 type IO_SEC       va 0x20035f4000..0x2003623fff pa 0x8189850000..0x818987ffff size 0x00030000 (smallpg)
VM 8001: D/TC:00    dump_mmap_table:925 type TA_RAM       va 0x2003624000..0x200521ffff pa 0x2005824000..0x200741ffff size 0x01bfc000 (smallpg)
VM 8001: D/TC:00    dump_mmap_table:925 type IO_SEC       va 0x200538f000..0x200540efff pa 0x80100000..0x8017ffff size 0x00080000 (smallpg)
VM 8001: D/TC:00    dump_mmap_table:925 type IO_SEC       va 0x200540f000..0x200541efff pa 0x08830000..0x0883ffff size 0x00010000 (smallpg)
VM 8001: D/TC:00    dump_mmap_table:925 type IO_SEC       va 0x200541f000..0x200541ffff pa 0x08800000..0x08800fff size 0x00001000 (smallpg)
VM 8001: D/TC:00    dump_mmap_table:925 type TEE_RAM_RO   va 0x2005420000..0x2005423fff pa 0x2005420000..0x2005423fff size 0x00004000 (smallpg)
VM 8001: D/TC:00    dump_mmap_table:925 type TEE_RAM_RX   va 0x2005424000..0x20054a3fff pa 0x2005424000..0x20054a3fff size 0x00080000 (smallpg)
VM 8001: D/TC:00    dump_mmap_table:925 type TEE_RAM_RW   va 0x20054a4000..0x2005823fff pa 0x20054a4000..0x2005823fff size 0x00380000 (smallpg)

As you can see, the VA range in above log is 0x1fff000000 - 0x2005824000.
This causes a mmu translation fault when running xtest 1009.
Looking at the test codes, the test case does 2 things:

  1. Send a command to TA and ask the TA to wait 2 seconds
  2. Create another thread to send a command to the same TA to cancel the wait

After some digging, we believe the root cause is the change in L1 table is not spawned to other CPU cores.
OP-TEE has a 3 level page table architecture and the L1 table(base table) is banked by CPU cores.
If the entire VA range in OP-TEE is within 1GB, then all CPU cores' L1 table are the same.
But if the VA range spans two GBs, for example in the example above, the VA range is 0x1fff000000 - 0x2005824000, when creating mapping on 0x1fff000000, that means the L1 table needs to be changed.
So for the xtest case, when step #1 happens, a 0x1fff000000 mapping is created in one CPU core's L1 table while other CPU cores are not aware of that.
When #2 happens, the mmu translation fault happens because this new CPU core doesn't have mapping for 0x1fff000000.

Please correct me if there is something wrong above.
Also if this is a known issue, please share the fix info and feel free to close this ticket.
Thanks.

@markz-zhang
Copy link
Contributor Author

Pasting the mmu fault log:

VM 8001: E/TC:08 01
VM 8001: E/TC:08 01 Core data-abort at address 0x1fff00019c (translation fault)
VM 8001: E/TC:08 01  esr 0x96000005  ttbr0 0x200554c000   ttbr1 0x00000000   cidr 0x0
VM 8001: E/TC:08 01  cpu #8          cpsr 0x60000144
VM 8001: E/TC:08 01  x0  0000001fff00019c x1  0000000000000000
VM 8001: E/TC:08 01  x2  0000000000000000 x3  0000000000000000
VM 8001: E/TC:08 01  x4  0000000000000000 x5  0000000000000000
VM 8001: E/TC:08 01  x6  0000000000000000 x7  0000000000000000
VM 8001: E/TC:08 01  x8  0000000000000020 x9  0000002005567d10
VM 8001: E/TC:08 01  x10 0000000000000000 x11 0000000000000000
VM 8001: E/TC:08 01  x12 0000000000000000 x13 0000002005567c7b
VM 8001: E/TC:08 01  x14 0000000000000000 x15 0000000000000000
VM 8001: E/TC:08 01  x16 0000002005433b84 x17 0000000000000000
VM 8001: E/TC:08 01  x18 0000000000000000 x19 0000001fff000180
VM 8001: E/TC:08 01  x20 00000020054df100 x21 0000002005488922
VM 8001: E/TC:08 01  x22 0000000000000180 x23 0000000000000001
VM 8001: E/TC:08 01  x24 0000000000000180 x25 0000000000000000
VM 8001: E/TC:08 01  x26 0000000000000000 x27 0000000000000000
VM 8001: E/TC:08 01  x28 0000000000000000 x29 0000002005567ff0
VM 8001: E/TC:08 01  x30 0000002005429edc elr 0000002005429ee0
VM 8001: E/TC:08 01  sp_el0 0000002005567ff0
VM 8001: E/TC:08 01 TEE load address @ 0x2005424000
VM 8001: E/TC:08 01 Call stack:
VM 8001: E/TC:08 01  0x2005429ee0
VM 8001: E/TC:08 01 Panic 'unhandled pageable abort' at core/arch/arm/kernel/abort.c:582 <abort_handler>
VM 8001: E/TC:08 01 TEE load address @ 0x2005424000
VM 8001: E/TC:08 01 Call stack:
VM 8001: E/TC:08 01  0x200542ba74
VM 8001: E/TC:08 01  0x20054379a0
VM 8001: E/TC:08 01  0x200542ae84
VM 8001: E/TC:08 01  0x2005427834

@jenswi-linaro
Copy link
Contributor

I'm surprised we haven't seen this before with ASLR enabled, when looking at the code I see how this can happen.
core_init_mmu_prtn_tee() initializes the per-cpu top translation table for the boot CPU and replicates that to the other top translation tables once. However, it doesn't add mappings for "dynamic vaspace", that is, MEM_AREA_RES_VASPACE and MEM_AREA_SHM_VASPACE. Later when something is mapped in for instance MEM_AREA_SHM_VASPACE only the top translation table for that CPU is updated, leaving the others unchanged even if it's a global mapping.

I think the best fix is to add entries in the top translation tables for the "dynamic vaspace" during boot so it's replicated before the other CPUs have started. We should be careful to only add translation tables needed for the per-cpu top translation tables to avoid wasting translation tables that might not be used.

Can you fix this problem or should I?

@markz-zhang
Copy link
Contributor Author

markz-zhang commented Feb 12, 2025

Hi Jens, your words make sense but how to add mappings for "MEM_AREA_RES_VASPACE" and "MEM_AREA_SHM_VASPACE" when optee boots? Unlike other memory regions which already have physical addresses, these 2 memory regions don't have physical addresses allocated when optee boots.

@jenswi-linaro
Copy link
Contributor

We'd map it with NULL entries in the lowest translation table.

@markz-zhang
Copy link
Contributor Author

Oh, sounds good. Let me try to create a fix and test in my develop environment. If it works, I can send out a patch to review. Thanks.

@jenswi-linaro
Copy link
Contributor

Great, thanks!

markz-zhang added a commit to markz-zhang/optee_os that referenced this issue Feb 12, 2025
When optee boots, the initial mapping for MEM_AREA_RES_VASPACE and
MEM_AREA_SHM_VASPACE should be added into page tables and replicated to
all CPU cores too. This fixes an issue when the VA of
MEM_AREA_RES_VASPACE or MEM_AREA_SHM_VASPACE is not in a same 1GB region
with other memory regions.

Link: OP-TEE#7275
Signed-off-by: Mark Zhang <markz@nvidia.com>
markz-zhang added a commit to markz-zhang/optee_os that referenced this issue Feb 12, 2025
When optee boots, the initial mapping for MEM_AREA_RES_VASPACE and
MEM_AREA_SHM_VASPACE should be added into page tables and replicated to
all CPU cores too. This fixes an issue when the VA of
MEM_AREA_RES_VASPACE or MEM_AREA_SHM_VASPACE is not in a same 1GB region
with other memory regions.

Link: OP-TEE#7275
Signed-off-by: Mark Zhang <markz@nvidia.com>
markz-zhang added a commit to markz-zhang/optee_os that referenced this issue Feb 12, 2025
When optee boots, the initial mapping for MEM_AREA_RES_VASPACE and
MEM_AREA_SHM_VASPACE should be added into page tables and replicated to
all CPU cores too. This fixes an issue when the VA of
MEM_AREA_RES_VASPACE or MEM_AREA_SHM_VASPACE is not in a same 1GB region
with other memory regions.

Link: OP-TEE#7275
Signed-off-by: Mark Zhang <markz@nvidia.com>
markz-zhang added a commit to markz-zhang/optee_os that referenced this issue Feb 14, 2025
When optee boots, the initial mapping for MEM_AREA_RES_VASPACE and
MEM_AREA_SHM_VASPACE should be added into page tables and replicated to
all CPU cores too. This fixes an issue when the VA of
MEM_AREA_RES_VASPACE or MEM_AREA_SHM_VASPACE is not in a same 1GB region
with other memory regions.

Link: OP-TEE#7275
Signed-off-by: Mark Zhang <markz@nvidia.com>
Reviewed-by: Jens Wiklander <jens.wiklander@linaro.org>
jforissier pushed a commit that referenced this issue Feb 14, 2025
When optee boots, the initial mapping for MEM_AREA_RES_VASPACE and
MEM_AREA_SHM_VASPACE should be added into page tables and replicated to
all CPU cores too. This fixes an issue when the VA of
MEM_AREA_RES_VASPACE or MEM_AREA_SHM_VASPACE is not in a same 1GB region
with other memory regions.

Link: #7275
Signed-off-by: Mark Zhang <markz@nvidia.com>
Reviewed-by: Jens Wiklander <jens.wiklander@linaro.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants