SDRAM memory management for multifx

kenjib · March 13, 2023, 4:21am

Hello everyone,

I was wondering if anyone with more experience had a recommendation for memory management for me. I am making a synth/multi-fx unit. Probably the closest comparison would be something like the Empress Zoia, though I’m not so proud as to assume it will be anywhere near as ambitious or cool as that. Heh – but something kind of modular with both FX and synth components that can make lots of ambient weirdness for both internal synth sounds and guitar/mic inputs.

So one of the key hurdles I am currently facing is SDRAM management. SRAM runs out really quickly, and even something like the daisysp library’s reverb algorithm might need to go to SDRAM when running a full synth alongside. All audio buffers will need to go there too – so anything delay related.

I’m doing a few weird things with delay though, so there might possibly be multiple delay buffers running concurrently. For example, there could be a heavily modulated “FX” type of delay running into a more standard multi-tap delay, and all of that could be feeding into a looper buffer. That’s just an example but shows how three different delays could be running at once. I have some ideas for multiple simultaneous looper buffers too, and that could get pretty SDRAM intensive. Since I want it to be modular, at compile time I do not know how many delays will be running at a given time, nor of what type, but all of them probably need the buffers stored in SDRAM.

That brings us to the lack of dynamic memory allocation for SDRAM. I have read an earlier thread where people seemed pretty united in feeling like dynamic allocation is bad for this type of programming, and it’s not available anyway, so now I need to come up with a flexible memory system to manage this kind of modular setup. Some options:

Statically allocate a really large chunk of SDRAM as a general delay buffer, and write my own code to manage allocating chunks of this to various algorithms as needed. There is a lot of potential for some really ugly errors, issues with available memory fragmentation etc. Ultimately just having malloc/delete available for SDRAM in the first place would probably be a lot better than this option. This seems like a total headache.
Statically allocate separate buffers for various effects and only allow each effect to be used one-at-a-time. The drawback here is I am wasting a lot of memory on effects that aren’t being used and I might eventually hit limits on what type and how many algorithms I can add to the unit as the pre-allocated memory uses up all of the available SDRAM. Something like a stereo looper could eat up a lot by itself, for example. This seems both wasteful and limiting.
Statically allocate a number of delay buffers of fixed sizes, and “loan” them out to various effects on request. Maybe there would be three different sizes: small for stuff like chorus, detune and slapbacks; medium for typical delay type fx, and a larger one (or ones) for looper or Frippertronic type applications. So there would be three arrays storing N of each of these buffers, and if someone wants to load a certain effect, it will only succeed if one of the required buffers is available. This certainly simplifies memory management but it still wastes available memory on allocated buffers that aren’t being used, creating artificial ceilings on how many fx of a given type are available. In all it might be the most workable compromise though.

Anyways, I wasn’t sure if there were some people with experience with this kind of thing who might have some good solutions or advice for handling this situation. Maybe there is a better way to do it that I haven’t thought of?

Many thanks in advance for any advice…

-Kenji

Takumi_Ogata · March 14, 2023, 6:06pm

Hey Kenji!

Empress Zoia-inspired Daisy multi-fx unit sounds fun

Creating a resource/guide on SDRAM optimization (and buffer in general) is something I’ve been wanting to explore deeper and have available for sure.

In the meantime, I can bring this up to the team and let you know if they have tips!

kenjib · March 14, 2023, 10:45pm

Thank you. Yes, please let me know if you get any feedback from the team. I have a simple buffer management system that I think is working now, but having the sizes and numbers pre-allocated is a little bit wasteful.

miminashi · March 16, 2023, 4:51am

You should find or implement a simple allocator along the lines of malloc and free. There’s a pretty good treatment of the subject in the O’Hallaron Computer Systems textbook. Implementing malloc is a common lab in undergrad computer systems courses.

I have libMemory in my browser history, but I haven’t used it myself. It appears to be a full-blown malloc/free implementation, which is probably overkill for your application. But it could be a good place to start.

Writing a high-performance generalized allocator is definitely a bit tricky, but writing a simple one for a specific use case like yours is not as bad as you might think. If you can find a copy of that textbook it might give you some ideas. I’m sure there are many other resources online.

Most allocators do clever things to improve performance and reduce overhead, but there is no reason to get fancy for a special-purpose allocator. Just do the simplest thing that works for your application. Performance is not likely to be an issue. Two important things to watch out for are memory leaks and memory fragmentation, which can easily exhaust your memory after a few alloc/de-alloc cycles.

miminashi · March 16, 2023, 5:02am

Ah, I had my response queued up in the editor since yesterday, I didn’t see your message about your buffer management system. It sounds like you’re on the right track. If you have any questions about how you might generalize your system a little bit to handle different sizes, don’t hesitate to ask.

kenjib · March 17, 2023, 10:53pm

Thanks! That is really helpful. Maybe I will reconsider a more generalized malloc/free implementation. I guess I will see if I run into a brick wall with my current, simpler, implementation. If that happens I will get more complicated. Currently I have the following pre-allocated in SDRAM:

60 short buffers 0.2 seconds long. These are for things like granular synthesis, choruses, flangers, slapbacks, etc.
12 medium buffers 3.0 seconds long. These are for your more typical types of delay lines.
6 long buffers 20.0 seconds long. These are for looper/frippertronic type effects.

You need two of these for each stereo line. This is using a little more than half of the available memory so there should be room to increase the numbers a little more when I find out what I am running out of most often. There is also still room for stuff like the DaisySP reverb.

It works on a check-out, check-in system. Effects try to request the buffers they need from a BufferManager singleton when they load. If there aren’t available buffers of the type needed then they get a NULL response, which I can use to inform the user that there aren’t enough resources to add that effect to their effect chain.

In the effect’s destructor it needs to make sure to release any buffers it is using via the BufferManager so that they can be re-used. BufferManager is a really minimal/simple set of code that just tracks what’s in use and handles check-in and check-out.

Does that seem somewhat reasonable?

miminashi · March 19, 2023, 6:18pm

I wouldn’t necessarily pursue a more generalized allocator unless you find that your current approach is a real limitation. Given the memory and processing constraints of the Daisy, the way you have partitioned the memory might be more than sufficient for your application.

In my application, I want to support loading about one hundred audio files into SDRAM. The files range from a few kilobytes to a few megabytes, so I need to allow somewhat arbitrary allocation sizes to make good use of the SDRAM.

I use a struct to represent individual allocations:

struct MemAllocation {
    int32_t id;
    int32_t size;
    int32_t offset;
};

I maintain three lists to keep track of the memory allocated by my application:

Free List
Used List
Unused List

There is an array of MemAllocation structs. The lists are simple linked lists that hold the allocation IDs, which are used to index into the array. The IDs are moved between the lists as memory is allocated and de-allocated, and at any time each ID is in exactly one of the lists. I maintain the Free List in a sorted order, according to the offsets, to enable coalescing.

Initially, there is one entry in the Free List that represents all of the SDRAM, so { .id=0, .size=64MB, .offset=0 }; the remaining entries are all in the Unused List.

When my application requests a block of SDRAM, I search the Free List for an allocation that is >= to the size requested. If the allocation that is found is larger than the requested size, then it’s necessary to split the allocation. So if my application allocates 1MB, I end up with two MemAllocation structs that look something like this:

    MemAllocation allocA = { .id=0, size=63MB, offset=1MB }; // This allocation has been split
    MemAllocation allocB = { .id=1, size=1MB,  offset=0   }; // This is the allocation that has been carved off

    // Move the ID for the new allocation to the Used List,
    // and return the address for the allocation to the application.
    addToUsedList( allocB.id );
    return SDRAM_BASE_ADDR + allocB.offset;

When my application needs to release a block of SDRAM, I search the Used List for the allocation that matches the address passed in from my app. The ID for the matching allocation is added back to the Free List in sorted order.

Whenever I add an allocation ID back to the Free List, I look at the neighboring allocations in the list. If two allocations are contiguous, then I merge them into a single allocation and return one of the IDs to the Unused List.

    // These allocations can be merged because:
    //     (allocA.offset + allocA.size) == allocB.offset
    MemAllocation allocA = { .id=17, size=2MB, offset=10MB };
    MemAllocation allocB = { .id=40, size=5MB, offset=12MB };

    allocB.offset  = allocA.offset;
    allocB.size   += allocA.size;

    moveFromFreeToUnusedList( allocA.id );

That’s pretty much all there is to my allocator. Obviously a lot of details are missing, but hopefully it makes sense and will give you some ideas for your own application.

kenjib · March 23, 2023, 6:32am

That’s really helpful. Thank you so much. I will stash that away for later in case my buffers start running out. Wouldn’t it be easier if the Daisy supported malloc/free in SDRAM space? Seems like it would just make things simpler and save people from having to write similar code. Anyways, thanks again. I really like that model. It’s easy to grasp and keep track of.