CMSIS DSP library support (arm_math.h)

infrasonicaudio · April 7, 2024, 5:45pm

The main source of program memory usage with CMSIS DSP FFT stuff is the constant lookup table (LUT) data used for various sizes of FFT. I believe in older versions of CMSIS DSP all the LUTs for all the supported sizes are linked into program data, whereas with the newer versions allow you to specify which ones you want to compile/include via preprocessor definitions.

The latest main branch of libDaisy was recently updated with a newer CMSIS DSP version and if you’re up-to-date with the main branch, something like this will work in your makefile (adapted from one of my own projects).

# Config Options
MY_FFT_SIZE = 1024

# Definitions for FFT
C_DEFS += -DARM_DSP_CONFIG_TABLES -DARM_FFT_ALLOW_TABLES

ifeq ($(MY_FFT_SIZE), 512)
C_DEFS += -DARM_TABLE_TWIDDLECOEF_F32_256 -DARM_TABLE_BITREVIDX_FLT_256 -DARM_TABLE_TWIDDLECOEF_RFFT_F32_512
endif
ifeq ($(MY_FFT_SIZE), 1024)
C_DEFS += -DARM_TABLE_TWIDDLECOEF_F32_512 -DARM_TABLE_BITREVIDX_FLT_512 -DARM_TABLE_TWIDDLECOEF_RFFT_F32_1024
endif
ifeq ($(MY_FFT_SIZE), 2048)
C_DEFS += -DARM_TABLE_TWIDDLECOEF_F32_1024 -DARM_TABLE_BITREVIDX_FLT_1024 -DARM_TABLE_TWIDDLECOEF_RFFT_F32_2048
endif
ifeq ($(MY_FFT_SIZE), 4096)
C_DEFS += -DARM_TABLE_TWIDDLECOEF_F32_2048 -DARM_TABLE_BITREVIDX_FLT_2048 -DARM_TABLE_TWIDDLECOEF_RFFT_F32_4096
endif

# ... rest of makefile ...

It’s a little bit clumsy to do it like this, but it works - whatever MY_FFT_SIZE is set to determines which LUTs are actually compiled and linked. There’s also some kind of python configuration tool included with CMSIS-DSP that seems to automate some of this but I’ve not tried to integrate it with the libDaisy Makefile build process.

Of course, with larger FFT sizes the LUT data still might not fit into program memory internal to the STM32, so you may need to use the Daisy Bootloader instead.

Shabtronic · April 7, 2024, 6:59pm

Hey as a offshoot to CMSIS ! have you tested using

arm_rfft_fast_f32(&fftInstance,FFTInTemp,FFTOut,0) ;

I’ve got a really slick spectrum display running @60fps 4096 size - it’s fantastic on a 320x170x16bit 2" display. But I noticed that arm_rfft_fast_f32 trashes the input array! Can’t see anything in the docs about that.

Wondering if anyone else has this issue?

thx

infrasonicaudio · April 7, 2024, 8:02pm

This is documented behavior: Real FFT Functions

[in] p points to input buffer (Source buffer is modified by this function.)

If you need to use the input buffer for something else after running the fft, make a copy of it first.

Shabtronic · April 7, 2024, 8:14pm

thx - the docs I read didn’t mention the src getting mangled (or maybe I misinterpreted it in a coding frenzy!)
Guess it’s some radix/butterfly shenanigans doing that under the hood!
I apply a window every frame and my input data is a fifo pipe - hence “FFTInTemp”. I recreate the input every time. Works great and snappy - 4096 size = 1-2ms (shows 1ms at gettick granularity).

mrahc626 · April 9, 2024, 10:15am

this is exactly what i’ve been looking for lately! I started my fft project attempting to use cmsis but then diverted to shy_fft after struggling with the LUT issue