Daisy boot loader and ITCMRAM

Does the Daisy bootloader support loading code into ITCMRAM from QSPI?

Why would you want that? Do you realize that ITCM size is 64kb which is just half of what you can store on internal flash?

I only need to put my CPU intensive DSP code into ITCMRAM. Executing from direct coupled memory is substantially faster than executing from flash, particularly compared with executing from QSPI flash.

I was mistaken in thinking this was a boot loader function, it is handled by the linker with a little help from the startup code. Below is how I got it working in case others might find this useful.

This is my current memory usage, you can see that about 5.6K of my code is now running in ITCMRAM.

Memory region         Used Size  Region Size  %age Used
           FLASH:          0 GB       128 KB      0.00%
         DTCMRAM:      105696 B       128 KB     80.64%
            SRAM:       59088 B       512 KB     11.27%
      RAM_D2_DMA:        8256 B        32 KB     25.20%
          RAM_D2:          0 GB       256 KB      0.00%
          RAM_D3:          0 GB        64 KB      0.00%
         ITCMRAM:        5616 B        64 KB      8.57%
           SDRAM:          0 GB        64 MB      0.00%
       QSPIFLASH:      130200 B      7936 KB      1.60%

First step is to add the following to the .lds file. This tells the linker to create a code segment in ITCMRAM.

	.itcmram_text (NOLOAD) :
		. = ALIGN(4);
		_sitcmram_text = .;

		PROVIDE(__itcmram_text_start__ = _sitcmram_text);
		. = ALIGN(4);
		_eitcmram_text = .;

		PROVIDE(__ictmram_text_end__ = _eitcmram_text);

	_sitext = LOADADDR(.itcmram_text);

Second step is to declare externs for the segment addresses and add a loop to ResetHandler() in startup_stm32h750xx.c copy code from flash to ITCMRAM

extern void *_sitext, *_sitcmram_text, *_eitcmram_text;
	for (pSource = &_sitext, pDest = &_sitcmram_text; 
              pDest != &_eitcmram_text; pSource++, pDest++)
		*pDest = *pSource;

CPU intensive functions are allocated in ITCRAM by preceding the function definition with this attribute:

#define ITCM_MEM_SECTION __attribute__((section(".itcmram_text")))

It would be good to see these changes incorporated in the libdaisy code base.


It won’t always be substantially faster, because any CPU intensive code will normally run from the MCU cache. But there should be some improvements and it’s certainly better than not utilizing that memory section at all.

I would also suggest excluding the first the 4 bytes from ITCM section, because its address is 0x0 and a pointer to that value is indistinguishable from a null pointer. This occasionally creates very interesting bugs.

1 Like

True, if your inner loop fits in the cache there may not be much of a win. In my case the inner loop is running FFT. Good point about address 0, probably a good idea to plant an infinite loop at address 0.

1 Like

I just found an interesting-relevant application note from ST on this topic:
AN4891 STM32H72x, STM32H73x, and single-core STM32H74x/75x
system architecture and performance

It applies to the STM32H750IBK6 used in Daisy and even gives an FFT application as an example!