Executable binary sizes

I’m getting to the point in one of my projects where I’m starting to get close to the 128KB limit of the Flash memory. I can shave a bit off by optimising for size (-Os), but I was surprised how large the binary was. I took a look at the simplest Daisy project, Blink, and the binary size is 59KB, which is pretty huge for code that just blinks an LED.

Memory region         Used Size  Region Size  %age Used
           FLASH:       58964 B       128 KB     44.99%
         DTCMRAM:           0 B       128 KB      0.00%
            SRAM:       14224 B       512 KB      2.71%
          RAM_D2:         16 KB       288 KB      5.56%
          RAM_D3:           0 B        64 KB      0.00%
         ITCMRAM:           0 B        64 KB      0.00%
           SDRAM:           0 B        64 MB      0.00%
       QSPIFLASH:           0 B         8 MB      0.00%
arm-none-eabi-objcopy -O ihex build/Blink.elf build/Blink.hex
arm-none-eabi-objcopy -O binary -S build/Blink.elf build/Blink.bin

I imported the .elf into an online .elf viewer and can see symbols for lots of DaisyLib that I imagine isn’t used. I was assuming the linker would dead-strip most of this. Has anyone here spent much time looking at reducing binary size beyond just setting -Os?

Hi @Cutlasses!

Great question! This was brought up during today’s meeting actually, and we would love to provide more documentation about this topic of optimizing flash size. And (if I understood correctly) there will be a way to set up in a makefile to choose what’s not needed (for example like SPI) to save space for your project.

For the time being, -Os is the best approach. And if you need more space, then Daisy Bootloader is the way to go.

Thank you so much for bringing this up :slight_smile:

2 Likes

Same thing for me…i think i am at 99.56% of the Flash :frowning: . And yes i have use the -Os for my part but also i the DaisyLib part. And i agree the Daisy lib take a lot of place in the Flash.
Maybe Electrosmith should consider to use another STM32 with more flash for a new project or Maybe use a derivate of the actual ship.

Any more news here? I feel your suggestion @Takumi_Ogata may be too course for my needs, rather than broadly ruling out areas of the library (which I have a wide usage of) I would want the linker to include only the functions which are referenced. I thought Link Time Optimisation would do this. I’ve tried enabling this but I’m having issues because it’s not compatible with the assembly generation options in the core DaisyLib makefile and I’m a bit of a makefile noob. Has anyone got LTO working and did it help with executable size?

EDIT - with some hacking of the DaisySP core makefile I was able to get Blink to compile with LTO, but I did not see the executable size change (still a whopping 58k), so I assume the dead stripping isn’t happening correctly. This is what I’m adding to the Blink makefile

# Enable LTO
CFLAGS += -ffunction-sections -fdata-sections -flto
LDFLAGS += -Wl,--gc-sections -flto

I then had to remove all uses of -Wa and assembler-with-cpp from the core makefiles

There are a few low-effort strategies for managing program size, as well as increasing how much program size can be executed that can be used with libDaisy projects

As @Takumi_Ogata mentioned, using the -Os flag both in libDaisy, and in your main project can help reduce flash size without require any additional work. So that’s always a quick starting point if you’re trying to make a little bit more room.

Bootloader Application:

Beyond that, an application can be rebuilt to run on the daisy bootloader. This is all bundled into the libDaisy framework so that it can be easily used to any project using the Makefile build system with libDaisy.

 # Anywhere above the "include" statement at the bottom of the file:
APP_TYPE=BOOT_SRAM

# optionally provide your own linker script
# LDSCRIPT=sram_linker.ld

With this added, you can recompile your program, and it will be linked for execution from a larger region of memory.

You can flash the bootloader to your Daisy via USB DFU with
make program-boot
and then you can program your app to the Daisy as normal with:
make program-dfu

The SRAM applications will have similar (and sometimes even better) performance than the same application running from internal flash.
Debugging via JTAG is still possible, though a little less convenient.

note: there are a few weirdnesses with the default linker script that can make DMA, and SD card functionality problematic. This is easily resolved with a custom linker script, and we’re considering swapping the default linker script for bootloader apps to have more consistent usage with the internal flash version.

libDaisy feature-stripping:

If, for some reason, using the bootloader with your application is not possible there are some small modifications to the library source code that can remove a fair amount of code.

One of the biggest sources of program-size bloat is due to the global initialization for the dma-compatible peripherals (spi, i2c, uart). These being in the project contribute about 20kB of program memory, whether you’re using the peripherals or not.

If you’re not some or all of these peripherals, you can comment out the corresponding function for the peripherals you don’t need here in the system initialization. Once you recompile libDaisy, and your application you should see those reductions take effect. Just keep in mind, attempts to use the DMA for those peripherals without that call with have unexpected behavior.

If your application isn’t using audio (a much less common scenario), you can trim another ~11kB by commenting out the ConfigureAudio line in the Daisy Seed Initialization (don’t forget to recompile libDaisy after making changes to see them take effect).

Hope that helps!

1 Like

Thanks for the tips, @shensley! For others experiencing similar problems, one tool I’ve found quite useful is bloaty

1 Like

On top of what @shensley suggested another not-quite-insignificant source of code size in libDaisy are the sizable conditional blocks and switch statements in audio.cpp in the InternalCallback function, related to the chosen bit depth and interleaved vs non interleaved callback options.

If you are only ever using one bit depth and interleaving strategy in your application firmware and not switching them dynamically at runtime (as is exceedingly common), and you don’t mind modifying libDaisy, you can strip/comment out all of the conditional code for the audio/callback formats you’re not using and save a bit of program space in your binary. Same for stripping everything related to SAI2 if you’re not using an external codec.

I had to do this for a project that needed every last byte, but was also using almost every peripheral - it wasn’t a huge savings but it was also not insignificant.

The gcc standard toolchain provides a helpful tool for analyzing which functions are contributing most to code size, here’s a magic invocation you can use

arm-none-eabi-nm -t d -C -l -S --size-sort build/<your_project_name>.elf | grep -Ei " (t) "

Basically this spits out a list of the largest individual symbols in the .elf file and their sizes, sorted in ascending order, and then pipes it into grep to filter it only to those in the “t/T” (.text) section, i.e. symbols representing code or const data.

It’s not perfect, since many symbols contain inlined code from other places rather than every individual function being named, but this is how I originally identified the biggest stuff in libDaisy that could be stripped down (aside from code for peripherals that are just not used in the application firmware).

4 Likes