Can I bring the Daisy to its knees?

Elby · April 20, 2021, 4:20am

I’m in the final throes of building a synth with a 16-step sequencer (using a Daisy Seed for sound generation, of course!) and struggling to get all the processor hardware into my custom-built ceramic case. I’d like to get an opinion on whether it’s feasible to try replacing much of my current processor network with the Daisy.

Currently, I’m using 3 Arduino Nanos to process input controls (36 rotary encoders, 14 potentiometers, 22 3-way switches - most hardware attached to 7 GPIO boards communicating via i2c), feeding into a reduced footprint Arduino Mega which runs the sequencer, controls step LEDs, and generates Midi for the Daisy which is running a duophonic sound generator. The Mega also has an SD board attached for patch storage/retrieval and communicates with a NodeMCU running a Blynk controller to allow parameter display/update on an iPhone. What I’d really like is to replace the 6 processors with 2 - the Daisy and the NodeMCU. I/O pins are not an issue, I’d connect the daisy-chained GPIO boards via a single i2c port, have a serial connection to the NodeMCU, an SPI connection to the SD board, and two digital pins for LEDs. I’m concerned about processor load, however. There’s no way I could do all this processing on an AVR board (hence 4 of them), but the STM in Daisy is much more powerful. Is it worth trying to cram all this functionality onto the Daisy, or am I delusional? Thanks in advance for any insight!

donstavely · April 26, 2021, 8:04pm

@Elby, I am just now at a point in my learning curve where I can give you some insight.

As background, I have been fascinated with Hammond organs for years. I built a digital/analog “clonewheel” organ that used one Atmel processor per note of polyphony (I did six), and one more for keyboard and note processing. The processors were 20MHz single cycle 8/16-bit machines, but cost less than a buck. The six “tone generator” chips each output nine simultaneous sine waves as PDM (Pulse Density Modulation) bit streams. These note harmonics were filtered and summed externally with the other tone chips and scaled with “drawbar” pots. Very much hand-crafted assembly language programming, counting every cycle. It worked well but still wound up on the shelf.

Now the contrast: I am in the process of recreating a clonewheel organ on my Daisy as a processing power benchmark. I generate 96 simultaneous sine waves in semitone steps, accurate to Hammond’s tonewheel gear ratios. I process incoming MIDI notes with 16-way polyphony, switch their nine proper harmonic tones, and apply the drawbar settings to generate the output, all as the 48KHz sample rate. A little processor utilization blinking-LED routine (described elsewhere on the forum) shows that I am using about one quarter of the available horsepower so far!

As an added bonus, that functionality took a few days and a few dozen lines of code, once I got the damn toolchain set up and running right. The multi-micro software took weeks to code and fine-tune. Not to mention the time and effort to build the hardware.

So, the short answer is yes it is worth it, and no you are not delusional.

hammondeggsmusic · April 26, 2021, 9:44pm

Nice. Once I finish my VA+FM synth a clone wheel was my next thought among a few others. Though I was thinking on going the “full polyphony” route where all 96 oscillators are running and using simulated bussbars / leakage etc - after all percussion is supposed to be monophonic etc

Elby · May 11, 2021, 4:11am

Thanks very much for the detailed answer and encouragement, Don. Your clonewheel sounds awesome! I’m starting to do some porting and experimenting. I’ll be sure to let folks know how it goes

Elby · May 26, 2021, 5:18pm

Well, I’m about two weeks into this porting effort, and it’s certainly been an adventure! As others have noted on the forum, the lack of documentation and annotated examples, particularly for I/O and internal functions (such as timing and logging), makes the learning curve particularly painful. But, I think that within a couple of days I’ll either have a functional prototype (still missing a bunch of functionality such as LED strings, SD card, Blynk) or know it’s time to let this go.

One thing that’s become clear is that my naive program structure is not going to fly:

Loop {
// Process I/O (Mux boards, switches, encoders, etc)
// Update synth parameters (osc freqs, filter params, etc)
// Compute audio samples  }

The I/O processing takes much too long - I’ll need to compute audio samples more often. I could interspersed calls to the “Compute audio samples” routine amidst more atomic-level I/O routines in the main loop, but would love to have a more principled approach. Other folks must have this problem; how do you solve it? Thanks for any feedback/pointers!

donstavely · May 26, 2021, 10:54pm

The key is do do the only the things that need to happen at the sample rate (default 48KHz) inside the callback loop. Stuff like pots, switches, LEDs, setting parameters for LFO’s, filters, etc. can happen at the much slower callback rate (default 1KHz). All of the examples that process audio have this same structure:

void AudioCallback(float *in, float *out, size_t size)
{
// At sample rate:
for(size_t i = 0; i < size; i += 2)
{
sig = in[i] + in[I + 1];
// process audio rate signals
out[i + 1] = out[i] = sig;
}
// At callback rate:
ReadControls();
SetSettings();
// etc.
}

int main(void)
{
HardwareInit();
StartStuff();
StartAudio(AudioCallback);
for( ; ; ) { }
}

I found it easiest as a noob to start from an simple example program that I could understand, and start modifying and building on it. I got all the way to a functional clonewheel project this way.

dimduj · May 27, 2021, 12:53pm

Hi !
Thx for all thoses explanation. Really useful !
May I ask what will be the difference (in term of performance, best-practice) with between your implementation and something like :

void AudioCallback(float *in, float *out, size_t size)
{
  // At sample rate:
  for(size_t i = 0; i < size; i += 2)
  {
    sig = in[i] + in[I + 1];
    // process audio rate signals
    out[i + 1] = out[i] = sig;
  }
  // At callback rate:
  doSomethingElse();
}

int main(void)
{
  HardwareInit();
  StartStuff();
  StartAudio(AudioCallback);
  for( ; ; ) {
    ReadControls();
    SetSettings();
    // etc.
  }
}

donstavely · May 27, 2021, 5:46pm

Doing controls and settings in the main forever loop just means that you will be doing it continuously when you are not in the callback function. It should work fine, but reading pots and switches and setting LEDs a thousand times a second is plenty. So I guess it is more of a style thing.

TheSlowGrowth · May 29, 2021, 9:32am

If you do the UI control polling from the main loop, that doesn’t decrease your overall CPU load (you’re still doing the same processing steps after all).

BUT it makes your audio callback more predictable and effectively increase the amount of audio processing you can do without crackling.

E.g. you may have a couple of conditional statements in your audio code that look like this:

if (newNoteShouldBeTriggered) 
{ 
    restartEnvelope();
    // more stuff here
}

These blocks will likely only be called once every 1000 audio blocks. The same kind of conditionals will also be in your UI control polling code.

When you’re close to the CPU limit, many notes hitting at the same time AND some UI activity may result in your audio callback to be exceeding the available processing time. I a case like this, shifting your UI control code to the main loop would result in the audio callback being a little more predictable in terms of the time it takes to complete, simply because the additional UI control code doesn’t have to be completed within the audio block.

Effectively that means that you can spread the processing load of your UI control code so that it fills the gaps between two adjacent audio callbacks.

dimduj · June 5, 2021, 10:36am

Many thx @donstavely & @TheSlowGrowth for the explanations

I would like to know a bit more about the difference between those possibility. The « tasks » priority, the time available on each « loop » or the scheduling strategy between tasks.

Does the daisy work a little like freeRTOS or does it have another of managing the scheduling of tasks ?

I don’t know if there is documentation available somewhere ?

TheSlowGrowth · June 6, 2021, 8:37pm

In a RTOS, you would have actual threads and the processor switches between them to realise multi tasking. On the Daisy, there are no threads, but you can still do multitasking with interrupts.

The main() function is a little bit like an “idle thread” because it will be executed when nothing more urgent needs to be done. Then there are multiple interrupt sources that can trigger an interrupt service routine (ISR; also called interrupt request, IRQ). These ISRs will interrupt the main function, do their thing and return when they’re done. ISRs typically serve a peripheral in the chip, e.g. output data to a serial bus or read the result of a A/D conversion. ISRs can be nested, meaning that they have priorities and one ISR can interrupt another one.

On the Daisy, most of the processing load lies in the calculation of the audio samples. This is done in the AudioCallback, which is an ISR, more specifically, it’s the ISR that’s triggered when the DMA needs more samples to write to the audio codec. The task of calculating the audio samples needs to be done within the time it takes the DMA to write one block of audio samples to the codec. If the AudioCallback doesn’t finish within this time frame, the DMA won’t have data to write to the Codec and your audio will stutter and glitch out.

In theory, nothing prevents you from doing everything in the AudioCallback - from calculating audio samples to scanning UI controls, to writing files to an SD card. If you can ensure that you’re able to complete all the things before the DMA runs out of samples to write to the codec, then you’re fine doing all that in the AudioCallback.
But in practice, most of these things need more or less time to complete, depending on the circumstances. The SD card access is a particularly bad example because it may block for a long time while the SD card writes stuff to its memory. You wouldn’t want this to block the delivery of fresh audio samples to the codec. Additionally, if writing to the SD card takes 10ms longer, you’d never notice - it’s not real time critical.

That’s why you should consider how real time critical your tasks really are and how much the time to complete them varies.

IMO, the calculating audio samples is the ONLY thing that should actually happen in the AudioCallback, simply because it’s the ONLY thing that may not be delayed. All other things (processing user input, reading/writing files, updating LEDs, etc.) can wait when the system is under higher load than usual. These things should be done from the main() function where they can be interrupted at any time. Effectively the main function can fill the gaps between your AudioCallback and other ISRs. That’s how you give priority to the things that have a real time constraint.

There are situations where you can still do non-audio things in the AudioCallback. E.g. writing or reading a GPIO pin. That is because this task is always very fast to complete and doesn’t impact the real time capability of the AudioCallback much. You can see that the daisy platform code (petal, patch, field, …) scans its UI inputs in the AudioCallback, for example. It’s not a super clean design, but in this case the effect is negligible and it makes things a little easier to program for beginners.

donstavely · June 7, 2021, 3:28pm

Thanks for the excellent explanation, @TheSlowGrowth. It makes good sense to keep only the true sample rate processing in the callback. Slower, event-driven things like MIDI processing can be in the main loop.

Rather than spinning there reading pots and switches and updating LEDs continuously, maybe we should set a flag at the end of the callback, and then test and clear it in the main loop processing. Then these updates will happen once once per callback, without being in the callback function itself. Does this make sense?

dimduj · June 10, 2021, 7:06am

Many thx for all the explanations @TheSlowGrowth !

programchild · July 18, 2024, 3:47pm

chiming in here, because I found out that refreshing a 128x64 oled takes too much time to read an encoder/adc simultaneously.

for example, if the screen writes the encoder value, this process will slow town the encoder reading to a point where it requires too many turns to be nice.

I now read the encoder in the audio callback, which leaves the screen doing its thing while the encoder is as fast as it should be.

but I ran into a new problem, where drawing a 128-line waveform with a i<128 for loop, plus 8 circles with an additional i<8 for loop is very laggy.

but this might be due to my 8 - grain effect taking lots of processing power (?). raising the blocksize from 4 to 32 made no difference, but now I get spikes in the audio… I don‘t really see why preparing 32 samples should be less time consuming than preparing 4 samples. isn‘t the work to be done the same?

tele_player · July 18, 2024, 3:56pm

Regarding increasing buffer size to reduce cpu load:

Yes, the sample processing is the same, but larger buffer sizes incur less ‘context switching’ overhead interrupting the main() function to run the callback function.