未验证 提交 8ffe41b7 编写于 作者: E Earle F. Philhower, III 提交者: GitHub

Enable 128K virtual memory via external SPI SRAM (#6994)

Provides a transparently accessible additional block of RAM of 128K to
8MB by using an external SPI SRAM.  This memory is managed using the UMM
memory manager and can be used by the core as if it were internal RAM
(albeit much slower to read or write).

The use case would be for things which are quite large but not
particularly frequently used or compute intensive.  For example, the SSL
buffers of 16K++ are a good fit for this, as are the contents of Strings
(both to avoid main heap fragmentation as well as allowing Strings of
>30KB).

A fully associative LRU cache is used to limit the SPI bus bottleneck,
and background writeback is supported.

Uses a define in boards.txt to enable.  If this value is not defined,
then the entire VM routines should not be linked in to user apps
so there should be no space penalty w/o it.

UMM `malloc` and `new` are modified to support internal and external
heap regions.  By default, everything comes from the standard heap, but
a call to `ESP.setExternalHeap()` before the allocation (followed by a
call to `ESP.resetHeap()` will make the allocation come from external
RAM.  See the `virtualmem.ino` example for use.

If there is no external RAM installed, the `setExternalHeap` call is a
no-op.

The String and BearSSL libraries have been modified to use this external
RAM automatically.

Theory of Operation:

The Xtensa core generates a hardware exception (unrelated to C++
exceptions) when an address that's defined as invalid for load or store.
The XTOS ROM routines capture the machine state and call a standard C
exception handler routine (or the default one which resets the system).

We hook into this exception callback and decode the EXCVADDR (the
address being accessed) and use the exception PC to read out the
faulting instruction. We decode that instruction and simulate it's
behavior (i.e. either loading or storing some data to a
register/external memory) and then return to the calling application.

We use the hardware SPI interface to talk to an external SRAM/PSRAM,
and implement a simple cache to minimize the amount of times we need
to go out over the (slow) SPI bus. The SPI is set up in a DIO mode
which uses no more pins than normal SPI, but provides for ~2X faster
transfers.  SIO mode is also supported.

NOTE: This works fine for processor accesses, but cannot be used by
any of the peripherals' DMA. For that, we'd need a real MMU.

Hardware Configuration (only use 3.3V compatible SRAMs!):

  SPI byte-addressible SRAM/PSRAM: 23LC1024 or smaller
    CS   -> GPIO15
    SCK  -> GPIO14
    MOSI -> GPIO13
    MISO -> GPIO12
 (note these are GPIO numbers, not the Arduino Dxx pin names.  Refer
  to your ESP8266 board schematic for the mapping of GPIO to pin.)

Higher density PSRAM (ESP-PSRAM64H/etc.) should work as well, but
I'm still waiting on my chips so haven't done any testing.  Biggest
concern is their command set and functionality in DIO mode.  If DIO
mode isn't supported, then a fallback to SIO is possible.

This PR originated with code from @pvvx's esp8266web server at
https://github.com/pvvx/esp8266web (licensed in the public domain)
but doesn't resemble it much any more.  Thanks, @pvvx!

Keep a list of the last 8 lines in RAM (~.5KB of RAM) and use that to
speed up things like memcpys and other operations where the source and
destination addresses are inside VM RAM.

A custom set of SPI routines is used in the VM system for speed and code
size (and because the core cannot be dependent on a library).

Because UMM manages RAM in 8 byte chunks, attempting to manage the
entire 1M available space on a 1M PSRAM causes the block IDs to
overflow, crashing things at some point.  Limit the UMM allocation to
only 256K in this case.  The remaining space can manually be assigned to
buffers/etc. managed by the application, not malloc()/free().
上级 c720c0d9
此差异已折叠。
...@@ -29,7 +29,6 @@ ...@@ -29,7 +29,6 @@
#include "coredecls.h" #include "coredecls.h"
#include "umm_malloc/umm_malloc.h" #include "umm_malloc/umm_malloc.h"
// #include "core_esp8266_vm.h"
#include <pgmspace.h> #include <pgmspace.h>
#include "reboot_uart_dwnld.h" #include "reboot_uart_dwnld.h"
...@@ -984,22 +983,11 @@ String EspClass::getSketchMD5() ...@@ -984,22 +983,11 @@ String EspClass::getSketchMD5()
return result; return result;
} }
void EspClass::enableVM()
{
#ifdef UMM_HEAP_EXTERNAL
if (!vmEnabled)
install_vm_exception_handler();
vmEnabled = true;
#endif
}
void EspClass::setExternalHeap() void EspClass::setExternalHeap()
{ {
#ifdef UMM_HEAP_EXTERNAL #ifdef UMM_HEAP_EXTERNAL
if (vmEnabled) { if (!umm_push_heap(UMM_HEAP_EXTERNAL)) {
if (!umm_push_heap(UMM_HEAP_EXTERNAL)) { panic();
panic();
}
} }
#endif #endif
} }
...@@ -1016,10 +1004,8 @@ void EspClass::setIramHeap() ...@@ -1016,10 +1004,8 @@ void EspClass::setIramHeap()
void EspClass::setDramHeap() void EspClass::setDramHeap()
{ {
#if defined(UMM_HEAP_EXTERNAL) && !defined(UMM_HEAP_IRAM) #if defined(UMM_HEAP_EXTERNAL) && !defined(UMM_HEAP_IRAM)
if (vmEnabled) { if (!umm_push_heap(UMM_HEAP_DRAM)) {
if (!umm_push_heap(UMM_HEAP_DRAM)) { panic();
panic();
}
} }
#elif defined(UMM_HEAP_IRAM) #elif defined(UMM_HEAP_IRAM)
if (!umm_push_heap(UMM_HEAP_DRAM)) { if (!umm_push_heap(UMM_HEAP_DRAM)) {
...@@ -1031,10 +1017,8 @@ void EspClass::setDramHeap() ...@@ -1031,10 +1017,8 @@ void EspClass::setDramHeap()
void EspClass::resetHeap() void EspClass::resetHeap()
{ {
#if defined(UMM_HEAP_EXTERNAL) && !defined(UMM_HEAP_IRAM) #if defined(UMM_HEAP_EXTERNAL) && !defined(UMM_HEAP_IRAM)
if (vmEnabled) { if (!umm_pop_heap()) {
if (!umm_pop_heap()) { panic();
panic();
}
} }
#elif defined(UMM_HEAP_IRAM) #elif defined(UMM_HEAP_IRAM)
if (!umm_pop_heap()) { if (!umm_pop_heap()) {
......
...@@ -221,13 +221,6 @@ class EspClass { ...@@ -221,13 +221,6 @@ class EspClass {
#else #else
uint32_t getCycleCount(); uint32_t getCycleCount();
#endif // !defined(CORE_MOCK) #endif // !defined(CORE_MOCK)
/**
* @brief Installs VM exception handler to support External memory (Experimental)
*
* @param none
* @return none
*/
void enableVM();
/** /**
* @brief Push current Heap selection and set Heap selection to DRAM. * @brief Push current Heap selection and set Heap selection to DRAM.
* *
...@@ -258,9 +251,6 @@ class EspClass { ...@@ -258,9 +251,6 @@ class EspClass {
*/ */
void resetHeap(); void resetHeap();
private: private:
#ifdef UMM_HEAP_EXTERNAL
bool vmEnabled = false;
#endif
/** /**
* @brief Replaces @a byteCount bytes of a 4 byte block on flash * @brief Replaces @a byteCount bytes of a 4 byte block on flash
* *
......
...@@ -37,6 +37,7 @@ extern "C" { ...@@ -37,6 +37,7 @@ extern "C" {
#include "flash_quirks.h" #include "flash_quirks.h"
#include <umm_malloc/umm_malloc.h> #include <umm_malloc/umm_malloc.h>
#include <core_esp8266_non32xfer.h> #include <core_esp8266_non32xfer.h>
#include "core_esp8266_vm.h"
#define LOOP_TASK_PRIORITY 1 #define LOOP_TASK_PRIORITY 1
...@@ -348,9 +349,14 @@ extern "C" void user_init(void) { ...@@ -348,9 +349,14 @@ extern "C" void user_init(void) {
cont_init(g_pcont); cont_init(g_pcont);
#if defined(UMM_HEAP_EXTERNAL)
install_vm_exception_handler();
#endif
#if defined(NON32XFER_HANDLER) || defined(MMU_IRAM_HEAP) #if defined(NON32XFER_HANDLER) || defined(MMU_IRAM_HEAP)
install_non32xfer_exception_handler(); install_non32xfer_exception_handler();
#endif #endif
#if defined(MMU_IRAM_HEAP) #if defined(MMU_IRAM_HEAP)
umm_init_iram(); umm_init_iram();
#endif #endif
......
...@@ -64,51 +64,10 @@ static ...@@ -64,51 +64,10 @@ static
IRAM_ATTR void non32xfer_exception_handler(struct __exception_frame *ef, int cause) IRAM_ATTR void non32xfer_exception_handler(struct __exception_frame *ef, int cause)
{ {
do { do {
/*
In adapting the public domain version, a crash would come or go away with
the slightest unrelated changes elsewhere in the function. Observed that
register a15 was used for epc1, then clobbered by `rsr.` I now believe a
"&" on the output register would have resolved the problem.
However, I have refactored the Extended ASM to reduce and consolidate
register usage and corrected the issue.
The positioning of the Extended ASM block (as early as possible in the
compiled function) is in part controlled by the immediate need for
output variable `insn`. This placement aids in getting excvaddr read as
early as possible.
*/
uint32_t insn, excvaddr; uint32_t insn, excvaddr;
#if 1
{
uint32_t tmp;
__asm__ (
"rsr.excvaddr %[vaddr]\n\t" /* Read faulting address as early as possible */
"movi.n %[tmp], ~3\n\t" /* prepare a mask for the EPC */
"and %[tmp], %[tmp], %[epc]\n\t" /* apply mask for 32-bit aligned base */
"ssa8l %[epc]\n\t" /* set up shift register for src op */
"l32i %[insn], %[tmp], 0\n\t" /* load part 1 */
"l32i %[tmp], %[tmp], 4\n\t" /* load part 2 */
"src %[insn], %[tmp], %[insn]\n\t" /* right shift to get faulting instruction */
: [vaddr]"=&r"(excvaddr), [insn]"=&r"(insn), [tmp]"=&r"(tmp)
: [epc]"r"(ef->epc) :);
}
#else /* Extract instruction and faulting data address */
{ __EXCEPTION_HANDLER_PREAMBLE(ef, excvaddr, insn);
__asm__ __volatile__ ("rsr.excvaddr %0;" : "=r"(excvaddr):: "memory");
/*
"C" reference code for the ASM to document intent.
May also prove useful when issolating possible issues with Extended ASM,
optimizations, new compilers, etc.
*/
uint32_t epc = ef->epc;
uint32_t *pWord = (uint32_t *)(epc & ~3);
uint64_t big_word = ((uint64_t)pWord[1] << 32) | pWord[0];
uint32_t pos = (epc & 3) * 8;
insn = (uint32_t)(big_word >>= pos);
}
#endif
uint32_t what = insn & LOAD_MASK; uint32_t what = insn & LOAD_MASK;
uint32_t valmask = 0; uint32_t valmask = 0;
......
...@@ -7,6 +7,54 @@ extern "C" { ...@@ -7,6 +7,54 @@ extern "C" {
extern void install_non32xfer_exception_handler(); extern void install_non32xfer_exception_handler();
/*
In adapting the public domain version, a crash would come or go away with
the slightest unrelated changes elsewhere in the function. Observed that
register a15 was used for epc1, then clobbered by `rsr.` I now believe a
"&" on the output register would have resolved the problem.
However, I have refactored the Extended ASM to reduce and consolidate
register usage and corrected the issue.
The positioning of the Extended ASM block (as early as possible in the
compiled function) is in part controlled by the immediate need for
output variable `insn`. This placement aids in getting excvaddr read as
early as possible.
*/
#if 0
{
__asm__ __volatile__ ("rsr.excvaddr %0;" : "=r"(excvaddr):: "memory");
/*
"C" reference code for the ASM to document intent.
May also prove useful when issolating possible issues with Extended ASM,
optimizations, new compilers, etc.
*/
uint32_t epc = ef->epc;
uint32_t *pWord = (uint32_t *)(epc & ~3);
uint64_t big_word = ((uint64_t)pWord[1] << 32) | pWord[0];
uint32_t pos = (epc & 3) * 8;
insn = (uint32_t)(big_word >>= pos);
}
#endif
#define __EXCEPTION_HANDLER_PREAMBLE(ef, excvaddr, insn) \
{ \
uint32_t tmp; \
__asm__ ( \
"rsr.excvaddr %[vaddr]\n\t" /* Read faulting address as early as possible */ \
"movi.n %[tmp], ~3\n\t" /* prepare a mask for the EPC */ \
"and %[tmp], %[tmp], %[epc]\n\t" /* apply mask for 32-bit aligned base */ \
"ssa8l %[epc]\n\t" /* set up shift register for src op */ \
"l32i %[insn], %[tmp], 0\n\t" /* load part 1 */ \
"l32i %[tmp], %[tmp], 4\n\t" /* load part 2 */ \
"src %[insn], %[tmp], %[insn]\n\t" /* right shift to get faulting instruction */ \
: [vaddr]"=&r"(excvaddr), [insn]"=&r"(insn), [tmp]"=&r"(tmp) \
: [epc]"r"(ef->epc) :); \
}
#ifdef __cplusplus #ifdef __cplusplus
} }
#endif #endif
......
/*
core_esp8266_vm - Implements logic to enable external SRAM/PSRAM to be used
as if it were on-chip memory by code.
Copyright (c) 2020 Earle F. Philhower, III All rights reserved.
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with this library; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
The original exception handler idea was taken from @pvvx's public domain
misaligned-flash-read exception handler, available here:
https://github.com/pvvx/esp8266web/blob/master/app/sdklib/system/app_main.c
Theory of Operation:
The Xtensa core generates a hardware exception (unrelated to C++ exceptions)
when an address that's defined as invalid for load or store. The XTOS ROM
routines capture the machine state and call a standard C exception handler
routine (or the default one which resets the system).
We hook into this exception callback and decode the EXCVADDR (the address
being accessed) and use the exception PC to read out the faulting
instruction. We decode that instruction and simulate it's behavior
(i.e. either loading or storing some data to a register/external memory)
and then return to the calling application.
We use the hardware SPI interface to talk to an external SRAM/PSRAM, and
implement a simple cache to minimize the amount of times we actually need
to go out over the (slow) SPI bus. The SPI is set up in a DIO mode which
uses no more pins than normal SPI, but provides for ~2X faster transfers.
NOTE: This works fine for processor accesses, but cannot be used by any
of the peripherals' DMA. For that, we'd need a real MMU.
Hardware Configuration (make sure you have 3.3V compatible SRAMs):
* SPI interfaced byte-addressible SRAM/PSRAM: 24LC1024 or smaller
CS -> GPIO15
SCK -> GPIO14
MOSI -> GPIO13
MISO -> GPIO12
(note these are GPIO numbers, not the Arduion Dxx ones. Refer to your
ESP8266 board schematic for the mapping of GPIO to pin.)
* Higher density PSRAM (ESP-PSRAM64H/etc.) works as well, but may be too
large to effectively use with UMM. Only 256K is available vial malloc,
but addresses above 256K do work and can be used for fixed buffers.
*/
#ifdef MMU_EXTERNAL_HEAP
#include <Arduino.h>
#include <esp8266_undocumented.h>
#include "esp8266_peri.h"
#include "core_esp8266_vm.h"
#include "core_esp8266_non32xfer.h"
#include "umm_malloc/umm_malloc.h"
extern "C" {
#define SHORT_MASK 0x000008u
#define LOAD_MASK 0x00f00fu
#define L8UI_MATCH 0x000002u
#define L16UI_MATCH 0x001002u
#define L16SI_MATCH 0x009002u
#define L16_MASK 0x001000u
#define SIGNED_MASK 0x008000u
#define L32IN_MATCH 0x000008u
#define L32I_MATCH 0x002002u
#define L32R_MATCH 0x000001u
#define L32_MASK 0x002009u
#define STORE_MASK 0x00f00fu
#define S8I_MATCH 0x004002u
#define S16I_MATCH 0x005002u
#define S16_MASK 0x001000u
#define S32I_MATCH 0x006002u
#define S32IN_MATCH 0x000009u
#define S32_MASK 0x002001u
#define EXCCAUSE_LOAD_PROHIBITED 28 // Cache Attribute does not allow Load
#define EXCCAUSE_STORE_PROHIBITED 29 // Cache Attribute does not allow Store
#define EXCCAUSE_STORE_MASK 1 // Fast way of deciding if it's a ld or s that faulted
// MINI SPI implementation inlined to have max performance and minimum code
// bloat. Can't include a library (SPI) in the core, anyway.
// Place in a struct so hopefully compiler will generate smaller, base+offset
// based code to access it
typedef struct {
volatile uint32_t spi_cmd; // The SPI can change this behind our backs, so volatile!
uint32_t spi_addr;
uint32_t spi_ctrl;
uint32_t spi_ctrl1; // undocumented? Not shown in the reg map
uint32_t spi_rd_status;
uint32_t spi_ctrl2;
uint32_t spi_clock;
uint32_t spi_user;
uint32_t spi_user1;
uint32_t spi_user2;
uint32_t spi_wr_status;
uint32_t spi_pin;
uint32_t spi_slave;
uint32_t spi_slave1;
uint32_t spi_slave2;
uint32_t spi_slave3;
uint32_t spi_w[16]; // NOTE: You need a memory barrier before reading these after a read xaction
uint32_t spi_ext3;
} spi_regs;
// The standard HSPI bus pins are used
constexpr uint8_t cs = 15;
constexpr uint8_t miso = 12;
constexpr uint8_t mosi = 13;
constexpr uint8_t sck = 14;
#define DECLARE_SPI1 spi_regs *spi1 = (spi_regs*)&SPI1CMD
typedef enum { spi_5mhz = 0x001c1001, spi_10mhz = 0x000c1001, spi_20mhz = 0x00041001, spi_30mhz = 0x00002001, spi_40mhz = 0x00001001 } spi_clocking;
typedef enum { sio = 0, dio = 1 } iotype;
#if MMU_EXTERNAL_HEAP > 128
constexpr uint32_t spi_clkval = spi_40mhz;
constexpr iotype hspi_mode = sio;
#else
constexpr uint32_t spi_clkval = spi_20mhz;
constexpr iotype hspi_mode = dio;
#endif
constexpr int read_delay = (hspi_mode == dio) ? 4-1 : 0;
constexpr int cache_ways = 4; // N-way, fully associative cache
constexpr int cache_words = 16; // Must be 16 words or smaller to fit in SPI buffer
static struct cache_line {
int32_t addr; // Address, lower bits masked off
int dirty; // Needs writeback
struct cache_line *next; // We'll keep linked list in MRU order
union {
uint32_t w[cache_words];
uint16_t s[cache_words * 2];
uint8_t b[cache_words * 4];
};
} __vm_cache_line[cache_ways];
static struct cache_line *__vm_cache; // Always points to MRU (hence the line being read/written)
constexpr int addrmask = ~(sizeof(__vm_cache[0].w)-1); // Helper to mask off bits present in cache entry
static void spi_init(spi_regs *spi1)
{
pinMode(sck, SPECIAL);
pinMode(miso, SPECIAL);
pinMode(mosi, SPECIAL);
pinMode(cs, SPECIAL);
spi1->spi_cmd = 0;
GPMUX &= ~(1 << 9);
spi1->spi_clock = spi_clkval;
spi1->spi_ctrl = 0 ; // MSB first + plain SPI mode
spi1->spi_ctrl1 = 0; // undocumented, clear for safety?
spi1->spi_ctrl2 = 0; // No add'l delays on signals
spi1->spi_user2 = 0; // No insn or insn_bits to set
}
// Note: GCC optimization -O2 and -O3 tried and returned *slower* code than the default
// The SPI hardware cannot make the "command" portion dual or quad, only the addr and data
// So using the command portion of the cycle will not work. Comcatenate the address
// and command into a single 32-bit chunk "address" which will be sent across both bits.
inline ICACHE_RAM_ATTR void spi_writetransaction(spi_regs *spi1, int addr, int addr_bits, int dummy_bits, int data_bits, iotype dual)
{
// Ensure no writes are still ongoing
while (spi1->spi_cmd & SPIBUSY) { /* busywait */ }
spi1->spi_addr = addr;
spi1->spi_user = (addr_bits? SPIUADDR : 0) | (dummy_bits ? SPIUDUMMY : 0) | (data_bits ? SPIUMOSI : 0) | (dual ? SPIUFWDIO : 0);
spi1->spi_user1 = (addr_bits << 26) | (data_bits << 17) | dummy_bits;
// No need to set spi_user2, insn field never used
__asm ( "" ::: "memory" );
spi1->spi_cmd = SPIBUSY;
// The write may continue on in the background, letting core do useful work instead of waiting, unless we're in cacheless mode
if (cache_ways == 0) {
while (spi1->spi_cmd & SPIBUSY) { /* busywait */ }
}
}
inline ICACHE_RAM_ATTR uint32_t spi_readtransaction(spi_regs *spi1, int addr, int addr_bits, int dummy_bits, int data_bits, iotype dual)
{
// Ensure no writes are still ongoing
while (spi1->spi_cmd & SPIBUSY) { /* busywait */ }
spi1->spi_addr = addr;
spi1->spi_user = (addr_bits? SPIUADDR : 0) | (dummy_bits ? SPIUDUMMY : 0) | SPIUMISO | (dual ? SPIUFWDIO : 0);
spi1->spi_user1 = (addr_bits << 26) | (data_bits << 8) | dummy_bits;
// No need to set spi_user2, insn field never used
__asm ( "" ::: "memory" );
spi1->spi_cmd = SPIBUSY;
while (spi1->spi_cmd & SPIBUSY) { /* busywait */ }
__asm ( "" ::: "memory" );
return spi1->spi_w[0];
}
static inline ICACHE_RAM_ATTR void cache_flushrefill(spi_regs *spi1, int addr)
{
addr &= addrmask;
struct cache_line *way = __vm_cache;
if (__vm_cache->addr == addr) return; // Fast case, it already is the MRU
struct cache_line *last = way;
way = way->next;
for (auto i = 1; i < cache_ways; i++) {
if (way->addr == addr) {
last->next = way->next;
way->next = __vm_cache;
__vm_cache = way;
return;
} else {
last = way;
way = way->next;
}
}
// At this point we know the line is not in the cache and way points to the LRU.
// We allow reads to go before writes since the write can happen in the background.
// We need to keep the data to be written back since it will be overwritten with read data
uint32_t wb[cache_words];
if (last->dirty) {
memcpy(wb, last->w, sizeof(last->w));
}
// Update MRU info, list
last->next = __vm_cache;
__vm_cache = last;
// Do the actual read
spi_readtransaction(spi1, (0x03 << 24) | addr, 32-1, read_delay, sizeof(last->w) * 8 - 1, hspi_mode);
memcpy(last->w, spi1->spi_w, sizeof(last->w));
// We fire a background writeback now, if needed
if (last->dirty) {
memcpy(spi1->spi_w, wb, sizeof(wb));
spi_writetransaction(spi1, (0x02 << 24) | last->addr, 32-1, 0, sizeof(last->w) * 8 - 1, hspi_mode);
last->dirty = 0;
}
// Update the addr at this point since we no longer need the old one
last->addr = addr;
}
static inline ICACHE_RAM_ATTR void spi_ramwrite(spi_regs *spi1, int addr, int data_bits, uint32_t val)
{
if (cache_ways == 0) {
spi1->spi_w[0] = val;
spi_writetransaction(spi1, (0x02<<24) | addr, 32-1, 0, data_bits, hspi_mode);
} else {
cache_flushrefill(spi1, addr);
__vm_cache->dirty = 1;
addr -= __vm_cache->addr;
switch (data_bits) {
case 31: __vm_cache->w[addr >> 2] = val; break;
case 7: __vm_cache->b[addr] = val; break;
default: __vm_cache->s[addr >> 1] = val; break;
}
}
}
static inline ICACHE_RAM_ATTR uint32_t spi_ramread(spi_regs *spi1, int addr, int data_bits)
{
if (cache_ways == 0) {
spi1->spi_w[0] = 0;
return spi_readtransaction(spi1, (0x03 << 24) | addr, 32-1, read_delay, data_bits, hspi_mode);
} else {
cache_flushrefill(spi1, addr);
addr -= __vm_cache->addr;
switch (data_bits) {
case 31: return __vm_cache->w[addr >> 2];
case 7: return __vm_cache->b[addr];
default: return __vm_cache->s[addr >> 1];
}
}
}
static void (*__old_handler)(struct __exception_frame *ef, int cause);
static ICACHE_RAM_ATTR void loadstore_exception_handler(struct __exception_frame *ef, int cause)
{
uint32_t excvaddr;
uint32_t insn;
/* Extract instruction and faulting data address */
__EXCEPTION_HANDLER_PREAMBLE(ef, excvaddr, insn);
// Check that we're really accessing VM and not some other illegal range
if ((excvaddr >> 28) != 1) {
// Reinstall the old handler, and retry the instruction to keep us out of the stack dump
_xtos_set_exception_handler(EXCCAUSE_LOAD_PROHIBITED, __old_handler);
_xtos_set_exception_handler(EXCCAUSE_STORE_PROHIBITED, __old_handler);
return;
}
DECLARE_SPI1;
ef->epc += (insn & SHORT_MASK) ? 2 : 3; // resume at following instruction
int regno = (insn & 0x0000f0u) >> 4;
if (regno != 0) --regno; // account for skipped a1 in exception_frame
if (cause & EXCCAUSE_STORE_MASK) {
uint32_t val = ef->a_reg[regno];
uint32_t what = insn & STORE_MASK;
if (what == S8I_MATCH) {
spi_ramwrite(spi1, excvaddr & 0x1ffff, 8-1, val);
} else if (what == S16I_MATCH) {
spi_ramwrite(spi1, excvaddr & 0x1ffff, 16-1, val);
} else {
spi_ramwrite(spi1, excvaddr & 0x1ffff, 32-1, val);
}
} else {
if (insn & L32_MASK) {
ef->a_reg[regno] = spi_ramread(spi1, excvaddr & 0x1ffff, 32-1);
} else if (insn & L16_MASK) {
ef->a_reg[regno] = spi_ramread(spi1, excvaddr & 0x1ffff, 16-1);
if ((insn & SIGNED_MASK ) && (ef->a_reg[regno] & 0x8000))
ef->a_reg[regno] |= 0xffff0000;
} else {
ef->a_reg[regno] = spi_ramread(spi1, excvaddr & 0x1ffff, 8-1);
}
}
}
void install_vm_exception_handler()
{
__old_handler = _xtos_set_exception_handler(EXCCAUSE_LOAD_PROHIBITED, loadstore_exception_handler);
_xtos_set_exception_handler(EXCCAUSE_STORE_PROHIBITED, loadstore_exception_handler);
DECLARE_SPI1;
// Manually reset chip from DIO to SIO mode (HW SPI has issues with <8 bits/clocks total output)
digitalWrite(cs, HIGH);
digitalWrite(mosi, HIGH);
digitalWrite(miso, HIGH);
digitalWrite(sck, LOW);
pinMode(cs, OUTPUT);
pinMode(miso, OUTPUT);
pinMode(mosi, OUTPUT);
pinMode(sck, OUTPUT);
digitalWrite(cs, LOW);
for (int i = 0; i < 4; i++) {
digitalWrite(sck, HIGH);
digitalWrite(sck, LOW);
}
digitalWrite(cs, HIGH);
// Set up the SPI regs
spi_init(spi1);
// Enable streaming read/write mode
spi1->spi_w[0] = 0x40;
spi_writetransaction(spi1, 0x01<<24, 8-1, 0, 8-1, sio);
if (hspi_mode == dio) {
// Ramp up to DIO mode
spi_writetransaction(spi1, 0x3b<<24, 8-1, 0, 0, sio);
spi1->spi_ctrl |= SPICDIO | SPICFASTRD;
}
// Bring cache structures to baseline
if (cache_ways > 0) {
for (auto i = 0; i < cache_ways; i++) {
__vm_cache_line[i].addr = -1; // Invalid, bits set in lower region so will never match
__vm_cache_line[i].next = &__vm_cache_line[i+1];
}
__vm_cache = &__vm_cache_line[0];
__vm_cache_line[cache_ways - 1].next = NULL;
}
// Hook into memory manager
umm_init_vm( (void *)0x10000000, MMU_EXTERNAL_HEAP * 1024);
}
};
#endif
#ifdef __cplusplus
extern "C" {
#endif
extern void install_vm_exception_handler();
#ifdef __cplusplus
};
#endif
...@@ -40,7 +40,7 @@ ...@@ -40,7 +40,7 @@
* *
*/ */
#if defined(NON32XFER_HANDLER) || defined(MMU_IRAM_HEAP) || defined(NEW_EXC_C_WRAPPER) #if defined(NON32XFER_HANDLER) || defined(MMU_IRAM_HEAP) || defined(NEW_EXC_C_WRAPPER) || defined(MMU_EXTERNAL_HEAP)
/* /*
* The original module source code came from: * The original module source code came from:
......
...@@ -42,7 +42,11 @@ extern "C" { ...@@ -42,7 +42,11 @@ extern "C" {
#undef UMM_HEAP_IRAM #undef UMM_HEAP_IRAM
#endif #endif
// #define UMM_HEAP_EXTERNAL #if defined(MMU_EXTERNAL_HEAP)
#define UMM_HEAP_EXTERNAL
#else
#undef UMM_HEAP_EXTERNAL
#endif
/* /*
* Assign IDs to active Heaps and tally. DRAM is always active. * Assign IDs to active Heaps and tally. DRAM is always active.
......
uint32_t cyclesToRead1Kx32(unsigned int *x, uint32_t *res) {
uint32_t b = ESP.getCycleCount();
uint32_t sum = 0;
for (int i = 0; i < 1024; i++) {
sum += *(x++);
}
*res = sum;
return ESP.getCycleCount() - b;
}
uint32_t cyclesToWrite1Kx32(unsigned int *x) {
uint32_t b = ESP.getCycleCount();
uint32_t sum = 0;
for (int i = 0; i < 1024; i++) {
sum += i;
*(x++) = sum;
}
return ESP.getCycleCount() - b;
}
uint32_t cyclesToRead1Kx16(unsigned short *x, uint32_t *res) {
uint32_t b = ESP.getCycleCount();
uint32_t sum = 0;
for (int i = 0; i < 1024; i++) {
sum += *(x++);
}
*res = sum;
return ESP.getCycleCount() - b;
}
uint32_t cyclesToWrite1Kx16(unsigned short *x) {
uint32_t b = ESP.getCycleCount();
uint32_t sum = 0;
for (int i = 0; i < 1024; i++) {
sum += i;
*(x++) = sum;
}
return ESP.getCycleCount() - b;
}
uint32_t cyclesToRead1Kx8(unsigned char*x, uint32_t *res) {
uint32_t b = ESP.getCycleCount();
uint32_t sum = 0;
for (int i = 0; i < 1024; i++) {
sum += *(x++);
}
*res = sum;
return ESP.getCycleCount() - b;
}
uint32_t cyclesToWrite1Kx8(unsigned char*x) {
uint32_t b = ESP.getCycleCount();
uint32_t sum = 0;
for (int i = 0; i < 1024; i++) {
sum += i;
*(x++) = sum;
}
return ESP.getCycleCount() - b;
}
void setup() {
Serial.begin(115200);
Serial.printf("\n");
// Enabling VM does not change malloc to use the external region. It will continue to
// use the normal RAM until we request otherwise.
uint32_t *mem = (uint32_t *)malloc(1024 * sizeof(uint32_t));
Serial.printf("Internal buffer: Address %p, free %d\n", mem, ESP.getFreeHeap());
// Now request from the VM heap
ESP.setExternalHeap();
uint32_t *vm = (uint32_t *)malloc(1024 * sizeof(uint32_t));
Serial.printf("External buffer: Address %p, free %d\n", vm, ESP.getFreeHeap());
// Make sure we go back to the internal heap for other allocations. Don't forget to ESP.resetHeap()!
ESP.resetHeap();
uint32_t res;
uint32_t t;
t = cyclesToWrite1Kx32(vm);
Serial.printf("Virtual Memory Write: %d cycles for 4K\n", t);
t = cyclesToWrite1Kx32(mem);
Serial.printf("Physical Memory Write: %d cycles for 4K\n", t);
t = cyclesToRead1Kx32(vm, &res);
Serial.printf("Virtual Memory Read: %d cycles for 4K (sum %08x)\n", t, res);
t = cyclesToRead1Kx32(mem, &res);
Serial.printf("Physical Memory Read: %d cycles for 4K (sum %08x)\n", t, res);
t = cyclesToWrite1Kx16((uint16_t*)vm);
Serial.printf("Virtual Memory Write: %d cycles for 2K by 16\n", t);
t = cyclesToWrite1Kx16((uint16_t*)mem);
Serial.printf("Physical Memory Write: %d cycles for 2K by 16\n", t);
t = cyclesToRead1Kx16((uint16_t*)vm, &res);
Serial.printf("Virtual Memory Read: %d cycles for 2K by 16 (sum %08x)\n", t, res);
t = cyclesToRead1Kx16((uint16_t*)mem, &res);
Serial.printf("Physical Memory Read: %d cycles for 2K by 16 (sum %08x)\n", t, res);
t = cyclesToWrite1Kx8((uint8_t*)vm);
Serial.printf("Virtual Memory Write: %d cycles for 1K by 8\n", t);
t = cyclesToWrite1Kx8((uint8_t*)mem);
Serial.printf("Physical Memory Write: %d cycles for 1K by 8\n", t);
t = cyclesToRead1Kx8((uint8_t*)vm, &res);
Serial.printf("Virtual Memory Read: %d cycles for 1K by 8 (sum %08x)\n", t, res);
t = cyclesToRead1Kx8((uint8_t*)mem, &res);
Serial.printf("Physical Memory Read: %d cycles for 1K by 8 (sum %08x)\n", t, res);
// Let's use external heap to make a big ole' String
ESP.setExternalHeap();
String s = "";
for (int i = 0; i < 100; i++) {
s += i;
s += ' ';
}
ESP.resetHeap();
Serial.printf("Internal free: %d\n", ESP.getFreeHeap());
ESP.setExternalHeap();
Serial.printf("External free: %d\n", ESP.getFreeHeap());
ESP.resetHeap();
Serial.printf("String: %s\n", s.c_str());
// Note that free/realloc will all use the heap specified when the pointer was created.
// No need to change heaps to delete an object, only to create it.
free(vm);
free(mem);
Serial.printf("Internal free: %d\n", ESP.getFreeHeap());
ESP.setExternalHeap();
Serial.printf("External free: %d\n", ESP.getFreeHeap());
ESP.resetHeap();
}
void loop() {
}
...@@ -1233,6 +1233,10 @@ macros = { ...@@ -1233,6 +1233,10 @@ macros = {
( '.menu.mmu.4816H.build.mmuflags', '-DMMU_IRAM_SIZE=0xC000 -DMMU_ICACHE_SIZE=0x4000 -DMMU_IRAM_HEAP' ), ( '.menu.mmu.4816H.build.mmuflags', '-DMMU_IRAM_SIZE=0xC000 -DMMU_ICACHE_SIZE=0x4000 -DMMU_IRAM_HEAP' ),
( '.menu.mmu.3216', '16KB cache + 32KB IRAM + 16KB 2nd Heap (not shared)' ), ( '.menu.mmu.3216', '16KB cache + 32KB IRAM + 16KB 2nd Heap (not shared)' ),
( '.menu.mmu.3216.build.mmuflags', '-DMMU_IRAM_SIZE=0x8000 -DMMU_ICACHE_SIZE=0x4000 -DMMU_SEC_HEAP=0x40108000 -DMMU_SEC_HEAP_SIZE=0x4000' ), ( '.menu.mmu.3216.build.mmuflags', '-DMMU_IRAM_SIZE=0x8000 -DMMU_ICACHE_SIZE=0x4000 -DMMU_SEC_HEAP=0x40108000 -DMMU_SEC_HEAP_SIZE=0x4000' ),
( '.menu.mmu.ext128k', '128K External 23LC1024' ),
( '.menu.mmu.ext128k.build.mmuflags', '-DMMU_EXTERNAL_HEAP=128 -DMMU_IRAM_SIZE=0x8000 -DMMU_ICACHE_SIZE=0x8000' ),
( '.menu.mmu.ext1024k', '1M External 64 MBit PSRAM' ),
( '.menu.mmu.ext1024k.build.mmuflags', '-DMMU_EXTERNAL_HEAP=256 -DMMU_IRAM_SIZE=0x8000 -DMMU_ICACHE_SIZE=0x8000' ),
]), ]),
######################## Non 32-bit load/store exception handler ######################## Non 32-bit load/store exception handler
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册