Programming audio DSP with multiple cores

I have been moving DaisySP to “ordinary” computers (see my Openframeworks “port”: DStudio - a music studio for openFrameworks/rtAudio - v1.1 released). I have also ported it to rtAudio: DaisySP and RtAudio

When running on an SBC (Rock Pi 4 in my case), I am maxing the one core it runs on.

I got a Rock 5B as a gift, so I want to run DaisySP on it. :slight_smile: But. This time I would like to use as many cores as possible.

Does anyone here have experience in programming audio in multi-core applications, or could point me to some good reads?

Thanks in advance!

This document contains a set of “design patterns” for real time systems, particularly for computer music systems. We see these patterns often because the problems that they solve come up again and again. Hopefully, these patterns will serve a more than just a set of canned solutions. It is perhaps even more important to understand the underlying problems, which often have subtle aspects and ramifications. By describing these patterns, we have tried to capture the problems, solutions, and a way of thinking about real-time systems design. We welcome your comments and questions.

Mostly user level information, but occasionally dips into programming and OS-multicore issues in Debian Linux:
Some good pointers:

There has been some discussion of multi-core issues on the Zynthian forum, although mostly the issue is avoided in Zynthian by leaving it up to the ‘engines’ ie synths running in Zynthian.

When running on the Rock Pi 4 will DaisySP be running on the ‘bare metal’ like Daisy or will it be under an OS?

I have found an Application Note from NXP titled “Embedded Multicore: An Introduction” useful for defining terms and concepts, I don’t seem to be able to put a link here, but a Google search on the title finds it.

1 Like

Thanks a lot @tunagenes!

Interesting reads, although not that multicore-oriented.

Yeah, I can understand that solution.

On the Rock Pi 4 it is running under Debian. I will probably do the same on the Rock 5. Although, I would like to run bare metal, I think this is beyond me right now. The only SBC I know of that has documentationen on bare metal programming is the Raspberry Pi.

That search phrase revealed a lot of useful texts, thanks!

The Rock 5 has Quad A76 + Quad A55. If I only focus on the A76s, I can possibly divide my tasks into 4 parts, but I want a mechanism for adding the results of the (possibly) threads, which is audio data. Or is there some other way of designing the algorithm? My scenario is running a lot (10+) of synthesizers (C++ objects with oscillators, EGs, filters, LFOs etc).

There are multicore-friendly use cases that should require little effort to run in parallel - polyphony and multi-channel audio processing.

For situations where you just have sequential DSP code that can’t be split into independent blocks, you might have better results from using a single core for audio and writing vectorized code for Neon on that processor. Unfortunately DaisySP is not a good candidate for something like this as it has very few block-based processing code that is easy to vectorize.

1 Like

Yes, I can understand this. But even if you have a “simple case”, where for example I run X synths on one core, and Y synths on the other, how do I combine/mix them; because I do not know the timing etc…

Seems like this is some uncharted territory, so maybe I will just come up with something on my own! :smiley:

Anytway, thank you @antisvin! :slight_smile:

I Google’d ‘multithreaded audio processing’ and got this as the top result - ideas of some things to consider.

1 Like

The questions you ask are not unique to audio - effectively this is just “how do I run thing in parallel and merge their results”. This is not uncharted territory, but can become a mine field easily.

For audio code that likely means basically splitting your DSP code into a graph with separate nodes and scheduling them between different threads. Feedback can be a problem here, plus you need to buffer data passed between graph nodes if you’re using buffer based processing. And then consider effects of this on CPU caches, dynamic CPU scaling and diminishing returns from adding more threads.

Maybe it’s worth to research some discussions on VCVrack forums around the time its engine was made MT-aware. At the very least you can find some benchmarks that confirm Amdahl’s law. Also there was an interesting proposal with some alternative appoaches to MT

1 Like

Thank you all for your answers, I have gotten a lot of good ideas!