STFT Normalization

I’m developing an STFT processor for the Daisy with CMSIS Real FFT functions.(using the pfft~ object as an example). I’ll be parsing the exported code, replacing the cmsisFFTIN function with the actual fft routine, and shuffling the signal processing routine around to take care of singe sample/block processing. I have a semi working STFT processor, the issue is with an increase in overlap, the gain increases. I believe I need to normalize the signal.(due to the additions in the overlap process?) I’ve read about peak and rms normalization, but still am unsure on how to best normalize a real time signal. Thank you!

void processBlock(float32_t *pSrc, float32_t *pDst)


// incoming signal

for (auto i = 0; i < BLOCK_SIZE; i++)


  currBlock[i] = *(pSrc++);


// join previous and current blocks

arm_copy_f32(prevBlock, &twoBlocks[0], BLOCK_SIZE);

arm_copy_f32(currBlock, &twoBlocks[BLOCK_SIZE], BLOCK_SIZE);

// copy currBlock to prevBlock

arm_copy_f32(currBlock, prevBlock, BLOCK_SIZE);

arm_fill_f32(0.0, outputBuffer, 2 * BLOCK_SIZE - (BLOCK_SIZE / OVERLAP));

for (auto i = 0; i < OVERLAP; i++)


  float32_t tmp[BLOCK_SIZE];

  float32_t res[BLOCK_SIZE];

  float32_t cmplxOut[BLOCK_SIZE];

  arm_copy_f32(&twoBlocks[i * HOP], tmp, BLOCK_SIZE);

  // win_hanning

  arm_mult_f32(tmp, window, tmp, BLOCK_SIZE);

  // fft

  arm_rfft_fast_f32(&S, tmp, cmplxOut, 0);

  // process

  // ifft

  arm_rfft_fast_f32(&S, cmplxOut, res, 1);

  // win_hanning

  arm_mult_f32(res, window, res, BLOCK_SIZE);

  // add to output buffer

  arm_add_f32(res, &outputBuffer[i * HOP], &outputBuffer[i * HOP], BLOCK_SIZE);


// add previous overlap

arm_add_f32(olap, outputBuffer, outputBuffer, OB_LEN - BLOCK_SIZE);

// store new ovelrap

arm_copy_f32(&outputBuffer[BLOCK_SIZE], olap, OB_LEN - BLOCK_SIZE);

// output BLOCK_SIZE samples

memcpy(pDst, outputBuffer, sizeof(float32_t *) * BLOCK_SIZE);


I’ve done something similar before, the code was written to normalize windows of different shapes, but in your case input and output windows are the same.

There are 2 problems that need solving. First of all, you increase signal level due to overlap, that’s obvious. Second problem is that you’re modulating amplitude when you apply those overlapped signals, so you don’t get perfect reconstruction and instead a ripple is added to your output signal. The important part is that you don’t normalize the signal itself, but the window that you apply. And since it can (and should) be precomputed, this normalization doesn’t cost you anything extra.

       float norm[block_size];

        // Accumulate shifted multiplications of both windows as norm
        for (size_t i = 0; i < window_size; i++) {
            norm[i % block_size] += in_win[i] * out_win[i];

        // Divide output window by the norm for every block, compensating amplitude changes
        for (size_t i = 0; i < overlap; i++) {
            for (size_t j = 0; j < block_size; j++) {
                out_win[j] /= norm[j];

In the code above block_size * overlap == window_size. In other words, the norm array is repeated for every block as it’s based on sum of window overlaps.

1 Like

This was very helpful! Thank you

1 Like