Skip to content

Commit 8ffe41b

Browse files
authoredMar 15, 2021
Enable 128K virtual memory via external SPI SRAM (#6994)
Provides a transparently accessible additional block of RAM of 128K to 8MB by using an external SPI SRAM. This memory is managed using the UMM memory manager and can be used by the core as if it were internal RAM (albeit much slower to read or write). The use case would be for things which are quite large but not particularly frequently used or compute intensive. For example, the SSL buffers of 16K++ are a good fit for this, as are the contents of Strings (both to avoid main heap fragmentation as well as allowing Strings of >30KB). A fully associative LRU cache is used to limit the SPI bus bottleneck, and background writeback is supported. Uses a define in boards.txt to enable. If this value is not defined, then the entire VM routines should not be linked in to user apps so there should be no space penalty w/o it. UMM `malloc` and `new` are modified to support internal and external heap regions. By default, everything comes from the standard heap, but a call to `ESP.setExternalHeap()` before the allocation (followed by a call to `ESP.resetHeap()` will make the allocation come from external RAM. See the `virtualmem.ino` example for use. If there is no external RAM installed, the `setExternalHeap` call is a no-op. The String and BearSSL libraries have been modified to use this external RAM automatically. Theory of Operation: The Xtensa core generates a hardware exception (unrelated to C++ exceptions) when an address that's defined as invalid for load or store. The XTOS ROM routines capture the machine state and call a standard C exception handler routine (or the default one which resets the system). We hook into this exception callback and decode the EXCVADDR (the address being accessed) and use the exception PC to read out the faulting instruction. We decode that instruction and simulate it's behavior (i.e. either loading or storing some data to a register/external memory) and then return to the calling application. We use the hardware SPI interface to talk to an external SRAM/PSRAM, and implement a simple cache to minimize the amount of times we need to go out over the (slow) SPI bus. The SPI is set up in a DIO mode which uses no more pins than normal SPI, but provides for ~2X faster transfers. SIO mode is also supported. NOTE: This works fine for processor accesses, but cannot be used by any of the peripherals' DMA. For that, we'd need a real MMU. Hardware Configuration (only use 3.3V compatible SRAMs!): SPI byte-addressible SRAM/PSRAM: 23LC1024 or smaller CS -> GPIO15 SCK -> GPIO14 MOSI -> GPIO13 MISO -> GPIO12 (note these are GPIO numbers, not the Arduino Dxx pin names. Refer to your ESP8266 board schematic for the mapping of GPIO to pin.) Higher density PSRAM (ESP-PSRAM64H/etc.) should work as well, but I'm still waiting on my chips so haven't done any testing. Biggest concern is their command set and functionality in DIO mode. If DIO mode isn't supported, then a fallback to SIO is possible. This PR originated with code from @pvvx's esp8266web server at https://github.com/pvvx/esp8266web (licensed in the public domain) but doesn't resemble it much any more. Thanks, @pvvx! Keep a list of the last 8 lines in RAM (~.5KB of RAM) and use that to speed up things like memcpys and other operations where the source and destination addresses are inside VM RAM. A custom set of SPI routines is used in the VM system for speed and code size (and because the core cannot be dependent on a library). Because UMM manages RAM in 8 byte chunks, attempting to manage the entire 1M available space on a 1M PSRAM causes the block IDs to overflow, crashing things at some point. Limit the UMM allocation to only 256K in this case. The remaining space can manually be assigned to buffers/etc. managed by the application, not malloc()/free().
1 parent c720c0d commit 8ffe41b

12 files changed

+760
-77
lines changed
 

‎boards.txt

+140
Large diffs are not rendered by default.

‎cores/esp8266/Esp.cpp

+6-22
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,6 @@
2929

3030
#include "coredecls.h"
3131
#include "umm_malloc/umm_malloc.h"
32-
// #include "core_esp8266_vm.h"
3332
#include <pgmspace.h>
3433
#include "reboot_uart_dwnld.h"
3534

@@ -984,22 +983,11 @@ String EspClass::getSketchMD5()
984983
return result;
985984
}
986985

987-
void EspClass::enableVM()
988-
{
989-
#ifdef UMM_HEAP_EXTERNAL
990-
if (!vmEnabled)
991-
install_vm_exception_handler();
992-
vmEnabled = true;
993-
#endif
994-
}
995-
996986
void EspClass::setExternalHeap()
997987
{
998988
#ifdef UMM_HEAP_EXTERNAL
999-
if (vmEnabled) {
1000-
if (!umm_push_heap(UMM_HEAP_EXTERNAL)) {
1001-
panic();
1002-
}
989+
if (!umm_push_heap(UMM_HEAP_EXTERNAL)) {
990+
panic();
1003991
}
1004992
#endif
1005993
}
@@ -1016,10 +1004,8 @@ void EspClass::setIramHeap()
10161004
void EspClass::setDramHeap()
10171005
{
10181006
#if defined(UMM_HEAP_EXTERNAL) && !defined(UMM_HEAP_IRAM)
1019-
if (vmEnabled) {
1020-
if (!umm_push_heap(UMM_HEAP_DRAM)) {
1021-
panic();
1022-
}
1007+
if (!umm_push_heap(UMM_HEAP_DRAM)) {
1008+
panic();
10231009
}
10241010
#elif defined(UMM_HEAP_IRAM)
10251011
if (!umm_push_heap(UMM_HEAP_DRAM)) {
@@ -1031,10 +1017,8 @@ void EspClass::setDramHeap()
10311017
void EspClass::resetHeap()
10321018
{
10331019
#if defined(UMM_HEAP_EXTERNAL) && !defined(UMM_HEAP_IRAM)
1034-
if (vmEnabled) {
1035-
if (!umm_pop_heap()) {
1036-
panic();
1037-
}
1020+
if (!umm_pop_heap()) {
1021+
panic();
10381022
}
10391023
#elif defined(UMM_HEAP_IRAM)
10401024
if (!umm_pop_heap()) {

‎cores/esp8266/Esp.h

-10
Original file line numberDiff line numberDiff line change
@@ -221,13 +221,6 @@ class EspClass {
221221
#else
222222
uint32_t getCycleCount();
223223
#endif // !defined(CORE_MOCK)
224-
/**
225-
* @brief Installs VM exception handler to support External memory (Experimental)
226-
*
227-
* @param none
228-
* @return none
229-
*/
230-
void enableVM();
231224
/**
232225
* @brief Push current Heap selection and set Heap selection to DRAM.
233226
*
@@ -258,9 +251,6 @@ class EspClass {
258251
*/
259252
void resetHeap();
260253
private:
261-
#ifdef UMM_HEAP_EXTERNAL
262-
bool vmEnabled = false;
263-
#endif
264254
/**
265255
* @brief Replaces @a byteCount bytes of a 4 byte block on flash
266256
*

‎cores/esp8266/core_esp8266_main.cpp

+6
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ extern "C" {
3737
#include "flash_quirks.h"
3838
#include <umm_malloc/umm_malloc.h>
3939
#include <core_esp8266_non32xfer.h>
40+
#include "core_esp8266_vm.h"
4041

4142

4243
#define LOOP_TASK_PRIORITY 1
@@ -348,9 +349,14 @@ extern "C" void user_init(void) {
348349

349350
cont_init(g_pcont);
350351

352+
#if defined(UMM_HEAP_EXTERNAL)
353+
install_vm_exception_handler();
354+
#endif
355+
351356
#if defined(NON32XFER_HANDLER) || defined(MMU_IRAM_HEAP)
352357
install_non32xfer_exception_handler();
353358
#endif
359+
354360
#if defined(MMU_IRAM_HEAP)
355361
umm_init_iram();
356362
#endif

‎cores/esp8266/core_esp8266_non32xfer.cpp

+2-43
Original file line numberDiff line numberDiff line change
@@ -64,51 +64,10 @@ static
6464
IRAM_ATTR void non32xfer_exception_handler(struct __exception_frame *ef, int cause)
6565
{
6666
do {
67-
/*
68-
In adapting the public domain version, a crash would come or go away with
69-
the slightest unrelated changes elsewhere in the function. Observed that
70-
register a15 was used for epc1, then clobbered by `rsr.` I now believe a
71-
"&" on the output register would have resolved the problem.
72-
73-
However, I have refactored the Extended ASM to reduce and consolidate
74-
register usage and corrected the issue.
75-
76-
The positioning of the Extended ASM block (as early as possible in the
77-
compiled function) is in part controlled by the immediate need for
78-
output variable `insn`. This placement aids in getting excvaddr read as
79-
early as possible.
80-
*/
8167
uint32_t insn, excvaddr;
82-
#if 1
83-
{
84-
uint32_t tmp;
85-
__asm__ (
86-
"rsr.excvaddr %[vaddr]\n\t" /* Read faulting address as early as possible */
87-
"movi.n %[tmp], ~3\n\t" /* prepare a mask for the EPC */
88-
"and %[tmp], %[tmp], %[epc]\n\t" /* apply mask for 32-bit aligned base */
89-
"ssa8l %[epc]\n\t" /* set up shift register for src op */
90-
"l32i %[insn], %[tmp], 0\n\t" /* load part 1 */
91-
"l32i %[tmp], %[tmp], 4\n\t" /* load part 2 */
92-
"src %[insn], %[tmp], %[insn]\n\t" /* right shift to get faulting instruction */
93-
: [vaddr]"=&r"(excvaddr), [insn]"=&r"(insn), [tmp]"=&r"(tmp)
94-
: [epc]"r"(ef->epc) :);
95-
}
9668

97-
#else
98-
{
99-
__asm__ __volatile__ ("rsr.excvaddr %0;" : "=r"(excvaddr):: "memory");
100-
/*
101-
"C" reference code for the ASM to document intent.
102-
May also prove useful when issolating possible issues with Extended ASM,
103-
optimizations, new compilers, etc.
104-
*/
105-
uint32_t epc = ef->epc;
106-
uint32_t *pWord = (uint32_t *)(epc & ~3);
107-
uint64_t big_word = ((uint64_t)pWord[1] << 32) | pWord[0];
108-
uint32_t pos = (epc & 3) * 8;
109-
insn = (uint32_t)(big_word >>= pos);
110-
}
111-
#endif
69+
/* Extract instruction and faulting data address */
70+
__EXCEPTION_HANDLER_PREAMBLE(ef, excvaddr, insn);
11271

11372
uint32_t what = insn & LOAD_MASK;
11473
uint32_t valmask = 0;

‎cores/esp8266/core_esp8266_non32xfer.h

+48
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,54 @@ extern "C" {
77

88
extern void install_non32xfer_exception_handler();
99

10+
11+
/*
12+
In adapting the public domain version, a crash would come or go away with
13+
the slightest unrelated changes elsewhere in the function. Observed that
14+
register a15 was used for epc1, then clobbered by `rsr.` I now believe a
15+
"&" on the output register would have resolved the problem.
16+
17+
However, I have refactored the Extended ASM to reduce and consolidate
18+
register usage and corrected the issue.
19+
20+
The positioning of the Extended ASM block (as early as possible in the
21+
compiled function) is in part controlled by the immediate need for
22+
output variable `insn`. This placement aids in getting excvaddr read as
23+
early as possible.
24+
*/
25+
26+
#if 0
27+
{
28+
__asm__ __volatile__ ("rsr.excvaddr %0;" : "=r"(excvaddr):: "memory");
29+
/*
30+
"C" reference code for the ASM to document intent.
31+
May also prove useful when issolating possible issues with Extended ASM,
32+
optimizations, new compilers, etc.
33+
*/
34+
uint32_t epc = ef->epc;
35+
uint32_t *pWord = (uint32_t *)(epc & ~3);
36+
uint64_t big_word = ((uint64_t)pWord[1] << 32) | pWord[0];
37+
uint32_t pos = (epc & 3) * 8;
38+
insn = (uint32_t)(big_word >>= pos);
39+
}
40+
#endif
41+
42+
#define __EXCEPTION_HANDLER_PREAMBLE(ef, excvaddr, insn) \
43+
{ \
44+
uint32_t tmp; \
45+
__asm__ ( \
46+
"rsr.excvaddr %[vaddr]\n\t" /* Read faulting address as early as possible */ \
47+
"movi.n %[tmp], ~3\n\t" /* prepare a mask for the EPC */ \
48+
"and %[tmp], %[tmp], %[epc]\n\t" /* apply mask for 32-bit aligned base */ \
49+
"ssa8l %[epc]\n\t" /* set up shift register for src op */ \
50+
"l32i %[insn], %[tmp], 0\n\t" /* load part 1 */ \
51+
"l32i %[tmp], %[tmp], 4\n\t" /* load part 2 */ \
52+
"src %[insn], %[tmp], %[insn]\n\t" /* right shift to get faulting instruction */ \
53+
: [vaddr]"=&r"(excvaddr), [insn]"=&r"(insn), [tmp]"=&r"(tmp) \
54+
: [epc]"r"(ef->epc) :); \
55+
}
56+
57+
1058
#ifdef __cplusplus
1159
}
1260
#endif

0 commit comments

Comments
 (0)