I have been trouble shooting a phase vocoder based pitch shift effect for nearly 6 months. Today I finally made a breakthrough and I would like to share what I have learned about the stmlib FFT header function, shy_fft.h! Any good info on using this FFT function is few and far between so I figured this must be helpful to someone out there.
There are definitely somethings I don’t know so perhaps people could share other valuable information they may have in the replies. I won’t be going in to any of the DSP theory behind Fast Fourier Transform here but will rather focus purely on using the ShyFFT class
Using shy_fft in your Daisy project
Simply copy shy_fft.h from DaisyExamples/stmlib/fft/ into your project folder and #include it in your c++ file.
Now you can instantiate a shy_fft template class as follows
#define FFT_SIZE 1024
ShyFFT<float, FFT_SIZE, RotationPhasor> i_fft;
The template class needs the following parameters:
- Data type (default is float)
- The FFT length
- The Phasor
So here is the first thing I don’t know a whole lot about and thats the Phasor class parameter. I understand it effects how the trigonometric data used in the FFT calculation is derived but I can’t really comment on the pros/cons of either. Options include either RotationPhasor or LutPhasor. The definitions of these phasor classes can also be found in the shy_fft.h header
FFT Analysis
FFT analysis is performed on an array of audio samples as follows:
float input_buffer[FFT_SIZE];
float fft_buffer[FFT_SIZE];
i_fft.Direct(input_buffer, fft_buffer);
Here, a fast fourier transform is performed on the input buffer. The phase and magnitude information of the different frequency bins is stored in the fft_buffer (more on how that looks later).
FFT Synthesis
FFT synthesis is performed on the processed fft buffer as follows:
float output_buffer[FFT_SIZE];
float fft_buffer[FFT_SIZE];
i_fft.Inverse(fft_buffer, output_buffer);
Here, an inverse fast fourier transform is performed on the fft buffer, converting it back to a buffer of audio samples
Arrangement of FFT Buffer Phase Bins !! (The big issue)
The FFT of a length-N signal is always a length-N complex signal. When the input is real (i.e. our audio buffer), that complex output signal has some symmetry, so all the information lives in the first N / 2 bins. Real FFT functions return those N / 2 complex numbers as an array of N real numbers, but there isn’t a standard convention for how those numbers should be arranged.
Here is the issue that kept me scratching my head for so long. I had read somewhere that these N/2 complex values were stored in the fft output buffer as follows:
{real[0], real[1], real[2], …, real[N / 2 - 1], imag[0], imag[1], imag[2], …, imag[N / 2 - 1]}.
After a whole bunch of careful testing with a controlled source, what I discovered is that the actual arrangement is the following:
{imag[0], imag[1], imag[2], …, imag[N / 2 - 1], real[0], real[1], real[2], …, real[N / 2 - 1]}.
The imaginary component of the different complex frequency bin data comes first. This is then followed by the real components.
Therefore, the magnitude and phase of the different frequency bins can be calculated as follows:
float phase[FFT_SIZE/2];
float magnitude[FFT_SIZE/2];
for (int i = 0; i < FFT_SIZE/2; i++) {
magnitude[i] = sqrt(fft_buffer[i + FFT_SIZE/2] * fft_buffer[i + FFT_SIZE/2] + fft_buffer[i] * fft_buffer[i]);
phase[i] = atan2_approx(fft_buffer[i], fft_buffer[i + FFT_SIZE/2]);
}
My pitchshifters main issue was solved by swapping the two arguments in the atan2 function and by correcting how I stored the processed fft_buffer prior to synthesis
Weird Scaling of the output buffer
The final thing to mention is that after calling the inverse FFT it is necessary to divide the output audio buffer by FFT_SIZE. I haven’t a notion why this is the case but it is needed in order to get unity input/output volume with no frequency processing of the fft_buffer.
This seems to be a theme with stmlib functions as I found it necessary to divide the stmlib atan2 function output by 10,000 aswell when testing with that. Weird!
Hope this proves helpful to someone!