Fastest way to access large lookup tables

el-duderino · June 23, 2022, 1:16pm

Hey everyone,

I am currently working on a Leslie style rotary effect implemented on the daisy seed.
It’s driven by multiple delay lines that use precomputed delay times that change over the course of a full rotation of the speaker.
I stored those delay times as arrays in separate header files like this

extern const float delaytime_direct [] = {
...
}

and access them in my callback function. This works quite well, the only problem is there is not enough flash memory on the daisy. I tried storing them on the qspi memory by changing the array declaration to

extern const float DSY_QSPI_BSS delaytime_direct [] {
...
}

I am still accessing this array in the callback like this.

//set current delaytime based on position index

        currentDelayd = delaytime_direct[rotation];

//advance position index by one sample and restart after full rotation

        rotation += 1;

        if (rotation >= 7385) {
            rotation = 0;
        }

This seems to be too slow though as the sound becomes choppy when I do that.

I am kind of a noob when it comes to coding so my question would be:
Am I handling the qspi memory correctly? I saw in the example code that arrays get assigned to a specific address on the qspi but I didn’t get fully behind how that is working.
If it’s actually too slow, what would be the best way to buffer the values inside the flash memory?

Maybe there is a better approach after all, so I’m thankful for any hints in the right direction.

brbrr · June 23, 2022, 2:41pm

QSPI is slow. Since it uses SPI to access the data. You can try DSY_SDRAM_BSS which should be faster. Although, it is not the best solution. I guess, the best way to solve the performance drawback would be to load bigger chunks of data at once, basically switching from per sample processing to process blocks of samples at once.

I’m not the expert with external data access Maybe @shensley can suggest the better way to solve this.

antisvin · June 24, 2022, 6:39am

Your problem is not with access time which is very fast for all practical purposes even with QSPI thanks to cache on this MCU. It sounds like you’re using delay lines in your patch and don’t intrepolate anything. This causes discontinuity from peeking at different places in delay buffer.

el-duderino · June 24, 2022, 10:26am

What exactly do you mean by interpolating?
To my understanding the interpolation for fractional delay is done inside the delay line function already.
My first guess about access time was because the effect is running fine when the data is stored on the flash memory with the rest of the program, but as soon as I move it to qspi I get a stuttering sound.

I tried it with DSY_SDRAM_BSS, but that gives me a noisy signal and the desired tremolo effect doesn’t seem to work after all… Here is the main part of my code, maybe there are some flaws that I’m not skilled enough to see.

// Declare a DelayLine of MAX_DELAY number of floats.
static DelayLine<float, MAX_DELAY> deld, del1, del2, del3;

static void AudioCallback(AudioHandle::InterleavingInputBuffer  in,
                          AudioHandle::InterleavingOutputBuffer out,
                          size_t                                size)
{

    
    for(size_t i = 0; i < size; i += 2)
    {

        //input variable
        in1 = in[i+1];


        //fonepole(currentDelay, currentDelay, .00007f);


        //set current delaytime based on position index
        currentDelayd = delaytime_direct[rotation];
        //currentDelay1 = delaytime_1[rotation];
        //currentDelay2 = delaytime_2[rotation];
        //currentDelay3 = delaytime_3[rotation];

        deld.SetDelay(currentDelayd);
        //del1.SetDelay(currentDelay1);
        //del2.SetDelay(currentDelay2);
        //del3.SetDelay(currentDelay3);


        //advance position index by one sample and restart after full rotation
        rotation += 1;

        if (rotation >= 7385) {
            rotation = 0;
        }


        // Write to the delay
        deld.Write(in1);
        //del1.Write(in1);
        //del2.Write(in1);
        //del3.Write(in1);


        // Read from delay line
        deld_out = deld.Read();
        //del1_out = del1.Read();
        //del2_out = del2.Read();
        //del3_out = del3.Read();
        // Calculate output and feedback
        //sig_out  = (deld_out + del1_out + del2_out)/3;
        sig_out = deld_out;

       


        // Output
        out[LEFT]  = sig_out;
        out[RIGHT] = sig_out;
    }
}

int main(void)
{
    // initialize seed hardware and daisysp modules
    float sample_rate;

    hw.Configure();
    hw.Init();
    hw.SetAudioBlockSize(4);


    
    sample_rate = hw.AudioSampleRate();

    deld.Init();
    //del1.Init();
    //del2.Init();
    //del3.Init();



    // start callback
    hw.StartAudio(AudioCallback);


    while(1) {}
}

antisvin · June 24, 2022, 11:53am

Fractional delay lines just give you an ability to use real number for relay length instead of integers. It won’t save you from discontinuities if your change delay time too quickly. So adding LPF helps here, another approach is to crossfade between values based on old and new position.

If you say it only happens with QSPI, maybe your guess was correct. I.e. the table is too large and you end up polluting data cache from constant reading from that table and get too much cache misses.

Is it too expensive to compute those delay length in realtime without a LUT? Rotation itself is a fairly cheap operation, it’s just a matter of multiplying two complex numbers (one of them is the rotation vector that is just a tangent that you might precompute, but the table would be much smaller).

el-duderino · June 25, 2022, 10:02am

Oh got it, thanks. Using less values and crossfading between them is another approach I wanted to try out. I first wanted to optimize storing and accessing the data, so I thought it might be a good idea to start with large tables as it’s always easy to go smaller.

I haven’t tried computing the values in realtime on the board yet. They represent the time sound needs to travel from the rotating sound source to a fixed listener position. The other three delay lines represent image sound sources created by reflections on the cabinet walls. That’s just for the horn movement though, as the bass baffle is moving with a slightly different speed it needs to be simulated separately. Ideally there is going to be a stereo output as well. That’s a lot of data to be calculated in realtime, although it might be worth testing against using a LUT.

I know that I will run into multiple bottlenecks along the way, so my goal is to optimize the process as much as possible and then find a good balance between sound/realism and computational efficiency for the least amount of latency.

Here is a link to the paper this project is based on, in case anybody is interested.

jaradical · June 25, 2022, 5:18pm

Implementing the rotating speaker effect described in this paper is on my to do list:

I’ve built the Hilbert transform part and it runs nicely on the Daisy, HMU if you’re interested.

Cheers

el-duderino · June 27, 2022, 11:47am

Looks interesting! I’ve been using the Hilbert transform for analyzing purposes only so far. I can imagine it’s an efficient method for frequency modulation but my goal is to mimic the original speaker design as close as possible. With my method I can implement real life dimensions into the software directly. Would be interesting to see a comparison regarding computational efficiency for frequency shifting with fractional delay vs. Hilbert transform though.

I’ll probably open a separate rotary speaker effect thread after I’ve made some more progress.

antisvin · June 27, 2022, 7:48pm

It’s not something that you would use for FM.

Analytic signal from Hilbert transform (real signal is converted into complex with imaginary dimension added) is used as a building block for frequency shifters. Another interesting application is for envelope followers and compressors. There’s a popular page with coefficients and some info about HT, another set of coefficient is commonly reused from PD and csound sources. It’s fairly cheap computationally, i.e. you can run it on 72MHz MCU alongside with a few more effects.

Fractional delay keeps your signal real and it’s just an interpolated read from a delay line. It’s used for pitch shifters, simulating Doppler effect and as a building block for many other things.

el-duderino · June 28, 2022, 10:45am

Thanks for clarifying, I will check that out.

btw I fixed the problem by running my program from sram with the Daisy bootloader. All four delay lines run smoothly now and I have plenty of space left for further implementations.