Wave table storage

brbrr · October 8, 2021, 11:04am

I decided to implement a wavetable oscillator to mitigate performance issues with the DaisySP Oscillator performance limitations.

While the design is pretty straightforward (btw, it’s based on sndkits implementation: The Oscillator). I’m not sure how to properly store the tables. I don’t have much space left on FLASH, so I’d love to store it somewhere else.

I see few options:

In FLASH. Pretty straightforward. Basically, all the buffers are statically allocated on the FLASH
In SDRAM. Seems like the best option to my knowledge. I can easily allocate buffers in SDRAM, but it seems it is not possible to instantiate the buffers in compile-time (WaveTable DSY_SDRAM_BSS Sine{.size = 1024, .data = {....}}; code like this won’t actually initialize the data with passed values). So the only option is to dynamically generate the table contents right after start.
In QSPI FLASH. Never used it, so have no idea how it works, and if the behavior is any different from SDRAM.
Something I missed?

Currently, I have implemented option 2 with dynamic table generation. I see ~2x performance increase compared to DaisySP sine oscillator, so I think it’s good enough (for now). Also, I know that SDARM is rather slow, and @antisvin recommended using SRAM for anything LUT-related. The problem here is that I don’t really know how to explicitly load a buffer into SRAM

How do you do that? What are the recommendations/best practices when working with lookup tables?

antisvin · October 8, 2021, 11:55am

In compile time you instantiate data that will be stored on flash, SDRAM is non-volatile so you can’t store anything there until the program runs. I think this is fairly obvious.

There’s been several topic here about this peripheral. And they’ve added higher level QSPI storage to libDaisy recently - Feature - Persistent Storage by stephenhensley · Pull Request #396 · electro-smith/libDaisy · GitHub

Current implementation doesn’t have flash wear leveling, so not a good thing to use if you need frequent writes. It’s fine for storing LUTs, but obviously you have to add some solution for generating them initially.

You can allocate an empty buffer if you define a section in linker script and add attribute for GCC to use that section. Those 2 changes are sufficient and that’s how DSY_SDRAM_BSS works.

If you want actual data from flash to be copied to a custom section, you’d also have to modify startup to do that. But it makes little sense on Daisy as its internal flash is so small.

You could use CMSIS function libDaisy/Drivers/CMSIS/DSP/Source/FastMathFunctions/arm_sin_f32.c at master · electro-smith/libDaisy · GitHub , but it actually uses a LUT too.

Also, it’s possible to generate a pair of sine/cosine functions as complex numbers without using sin/cos or precomputing their LUTs. But you still need tangent for oscillator’s phase increment in such case. Here’s a paper with some details: https://vicanek.de/articles/QuadOsc.pdf

donstavely · October 8, 2021, 4:56pm

You may need to use lookup tables or complex math to do band-limiting of arbitrary or discontinuous waveforms (like square or sawtooth waveforms) in order to avoid aliasing. But none of this is necessary for sines.

All you need is a state-variable oscillator. This is just two multiply-accumulates on the two state variables (integrators in the analog world). It runs like this:

    sin_SV += freq_coef * cos_SV;
    cos_SV += -freq_coef * sin_SV;

Where you initialize the state variables and pre-compute the coefficient:

    sin_SV = 0.0;
    cos_SV = 1.0;
    freq_coef = 2 * PI * freq_Hz / sample_rate;

This will run forever, creating perfect sine waves to full precision.

I am generating 96 double-precision sines simultaneously on my clonewheel, and the whole project consumes less than half of the Seed processor horsepower.

antisvin · October 9, 2021, 7:23am

I would say it’s a phasor rotating in complex plane, but there more than one way we can describe this strange beast called “elephant”. There’s a more detailed explanation here

In practice an oscillator using exactly this approach would explode or collapse to zero due to accumulated numeric error. To avoid that, you’d have to normalize magnitude or reset it to zero periodically. The paper mentioned in my previous post describes approach that is stable without correcting magnitude even if computed in single precision and with FM.

donstavely · October 9, 2021, 5:40pm

Very interesting article, @antisvin. As I have learned many times over on this forum, the more I think I know about a subject the more there is to know out there. It invariably takes things to another level!

A couple of interesting experiences on this exact issue:

I coded a monophonic FM guitar synth on the FV-1 using SV oscillators several years ago. I though it might stay stable, given that the FV-1 uses fixed-point math. But indeed when the oscillator frequency was modulated, its amplitude changed. I had to add code to reset the states on zero-crossings. It worked fine, but it ate valuable cycles on that limited processor.
The SV oscillators on my Daisy clonewheel project are stable, even though they are floating point. I have let my program run for days without re-initializing with no issues. I suspect that the guys who designed the IEEE floating point specification were careful to minimize the accumulation of rounding errors. The frequencies are fixed in this case, and I would not be surprised if they blew up if I did modulate them.

All this just reinforces my belief that some form of iterative sine oscillator will be more accurate and will run faster than an interpolated lookup table implementation.

antisvin · October 9, 2021, 7:57pm

There’s no magic bullet in using doubles, it only increases the amount of time for the error to become noticeable. At the cost of decreased performance.

I think I’ve figured out why you haven’t seen the problem in the second case. It’s working well on constant or relatively stable frequency (i.e. if controlled by MIDI). When you modulate frequency and especially if you have some noise (from ADC), it quickly falls apart.

Example code for experiments (simulating clean/noisy frequency modulation):

#include <math.h>
#include <stdio.h>

int main() {
    double sin_SV = 0.0;
    double cos_SV = 1.0;

    double sample_rate = 48000.0;

    // Deviation from 1.0 magnitude
    double err = 0.0;
    double freq_Hz = 1000.0;
    while (1) {
        //Increase frequency - by constant or random in 0..0.001 Hz range - second option will obliterate the oscillator, disabling both keeps osc stable
#if 1
        freq_Hz += 0.001;
#else
        freq_Hz += double(rand()) / RAND_MAX / 1000;
#endif

        if (freq_Hz > 10000)
            freq_Hz -= 9990;

        double freq_coef = 2 * M_PI * freq_Hz / sample_rate;
        // Vector magnitude
        double mag = sqrtf(sin_SV * sin_SV + cos_SV * cos_SV);

        // Print if new max deviation reached
        if (abs(mag - 1.0) > err) {
            err = abs(mag - 1.0);
            printf("Err = %f%% mag = %f\n", err * 100, mag);
        }

        sin_SV += freq_coef * cos_SV;
        cos_SV += -freq_coef * sin_SV;
    }
}

From my experience, an oscillator using CMSIS sine function (with LUT) is about as fast as an iterative oscillator. If you need both sine and cosine outputs, iterative oscillator makes more sense (as long as you use a stable design!), otherwise I wouldn’t bother. As for double precision, I’m not using it on microcontrollers.

brbrr · October 29, 2021, 9:16am

Wow, that’s lots of good stuff! Thanks!

I used sines just as an example, what I really want is an efficient oscillator with the ability to switch between waveforms. I am definitely interested in band-limiting as well. Also, the idea of morphing between waveforms sounds good. With LUTs it is simple enough, not sure about the iterative approach.

Regarding the wavetable sizes, what is the appropriate size for the oscillator table? I used 1024 samples, but I wonder if it is enough, or should I make it smaller.

antisvin · October 29, 2021, 10:50am

There’s no universal answer to that as it depends on several factors:

interpolation algorithm used
waveform complexity
typical playback frequency

1k sounds ok as a starting point. If you don’t have a decent solution for bandlimiting, you might want to use shorter tables for oscillators to avoid aliasing on high frequencies. Or just stick to 256 samples for compatibility with WaveEdit export.

brbrr · August 31, 2022, 8:34am

Yes, I’ve been using it for quite some time. I went ahead and submitted a PR to DaisySP: Add wavetable oscillator by brbrr · Pull Request #176 · electro-smith/DaisySP · GitHub

antisvin · August 31, 2022, 6:19pm

Hey Yaroslav,
That’s a very nice PR. Not sure if that’s useful, but from a quick look I would suggest considering converting the struct Tables into a template class (or template struct which is a rare beast, but still valid C++). The idea would be to provide FFT implementation as a template parameter instead of making it hardcoded in struct methods. That would be specifically useful to use CMSIS based implementation on ARM MCUs.