Audio corruption with large DMA buffers for i2c

I’m getting audio corruption in my firmware when I declare large dma buffers (using the Daisy macros) and use them for i2c. The audio output sounds broken (like the data is being overwritten). I’ve recreated this in some test code, and although the audio sounds different, it still sounds very broken. This code is just sending arbitrary data to and from an RPi Pico.

In the test example, if I comment out the code which does the i2c, then pass-through audio sounds correct. EDIT- Also, just removing DMA_BUFFER_MEM_SECTION, so the data is allocate normally, the problem goes away and the output audio is correct.

The strange thing is, in my main firmware the audio is corrupted just by defining the buffer as DMA_BUFFER_MEM_SECTION, I don’t even have to do any DMA over i2c for the problem to exhibit itself! Which suggests maybe the linker has overlapped memory somewhere?

Is it valid to create dma buffers of this size? The actual i2c transactions work, and the correct data is sent, it just screws up the audio. Any help much appreciated as I’m completely stumped!

#include <cstdint>

#include "daisy_seed.h"

daisy::DaisySeed hw;

////////////////////////////

static void i2c_dma_callback(void *data, daisy::I2CHandle::Result result);

class i2c_bus
{
  public:
    struct DmaResult
    {
        volatile bool m_complete = false;
        volatile bool m_success  = false;
    };

    i2c_bus()
    {
        daisy::I2CHandle::Config i2c_config;
        i2c_config.periph         = daisy::I2CHandle::Config::Peripheral::I2C_1;
        i2c_config.pin_config.scl = daisy::seed::D11;
        i2c_config.pin_config.sda = daisy::seed::D12;
        i2c_config.mode           = daisy::I2CHandle::Config::Mode::I2C_MASTER;
        i2c_config.speed          = daisy::I2CHandle::Config::Speed::I2C_1MHZ;

        m_i2c.Init(i2c_config);
    }

    bool read_data(uint8_t i2c_address, uint8_t *data, uint16_t size, uint32_t timeout)
    {
        const daisy::I2CHandle::Result res = m_i2c.ReceiveBlocking(i2c_address, data, size, timeout);

        return res == daisy::I2CHandle::Result::OK;
    }

    bool read_data_dma(uint8_t i2c_address, uint8_t *data, uint16_t size, uint32_t timeout, volatile DmaResult &dma_result)
    {
        const daisy::I2CHandle::Result res =
            m_i2c.ReceiveDma(i2c_address, data, size, i2c_dma_callback, (void *)(&dma_result));

        return res == daisy::I2CHandle::Result::OK;
    }

    bool write_data(uint8_t i2c_address, uint8_t *data, uint16_t size, uint32_t timeout)
    {
        const daisy::I2CHandle::Result res = m_i2c.TransmitBlocking(i2c_address, data, size, timeout);

        return res == daisy::I2CHandle::Result::OK;
    }

    bool write_data_dma(uint8_t i2c_address, uint8_t *data, uint16_t size, uint32_t timeout, volatile DmaResult &dma_result)
    {
        const daisy::I2CHandle::Result res =
            m_i2c.TransmitDma(i2c_address, data, size, i2c_dma_callback, (void *)(&dma_result));

        return res == daisy::I2CHandle::Result::OK;
    }

  private:
    daisy::I2CHandle m_i2c;
};

static void i2c_dma_callback(void *data, daisy::I2CHandle::Result result)
{
    i2c_bus::DmaResult *dma_result = reinterpret_cast<i2c_bus::DmaResult *>(data);
    dma_result->m_complete         = true;
    dma_result->m_success          = result == daisy::I2CHandle::Result::OK;
}

////////////////////////////

constexpr uint8_t i2c_address = 16;

constexpr uint32_t other_data_size = 32 * 1024;
DSY_SDRAM_BSS uint8_t other_data[other_data_size];
constexpr uint32_t dma_data_block_size = 96 * 512;
DMA_BUFFER_MEM_SECTION uint8_t dma_data_block[dma_data_block_size];

constexpr uint8_t request_data_command = 16;
constexpr uint8_t send_data_command    = 17;

bool test_receive(i2c_bus &i2c, uint32_t timeout)
{
    uint8_t command = uint8_t(request_data_command);
    bool success    = i2c.write_data(i2c_address, &command, sizeof(command), timeout);

    if (!success)
    {
        daisy::DaisySeed::PrintLine("Failed receive command");
        return false;
    }

    volatile i2c_bus::DmaResult dma_result_receive;
    success = i2c.read_data_dma(i2c_address, dma_data_block, dma_data_block_size, timeout, dma_result_receive);

    if (success == false)
    {
        daisy::DaisySeed::PrintLine("Failed receive data");
        return false;
    }
    else
    {
        while (dma_result_receive.m_complete == false)
        {
        }
    }

    daisy::DaisySeed::PrintLine("test_receive() success!");
    return true;
}

bool test_send(i2c_bus &i2c, uint32_t timeout)
{
    uint8_t command = uint8_t(send_data_command);
    bool success    = i2c.write_data(i2c_address, &command, sizeof(command), timeout);

    if (success == false)
    {
        daisy::DaisySeed::PrintLine("Failed send command");
        return false;
    }

    volatile i2c_bus::DmaResult dma_result_send;
    success = i2c.write_data_dma(i2c_address, dma_data_block, dma_data_block_size, timeout, dma_result_send);

    if (success == false)
    {
        daisy::DaisySeed::PrintLine("Failed send data");
        return false;
    }
    else
    {
        volatile bool *done = &dma_result_send.m_complete;
        while (*done == false)
        {
        }
    }

    daisy::DaisySeed::PrintLine("test_send() success!");

    return true;
}

void audio_callback(const float *in, float *out, size_t size)
{
    for (size_t s = 0; s < size; ++s)
    {
        out[s] = in[s];
    }
}

int main(void)
{
    hw.Init();

    hw.SetAudioBlockSize(48); // number of samples handled per callback
    hw.SetAudioSampleRate(daisy::SaiHandle::Config::SampleRate::SAI_48KHZ);

    hw.StartLog(true);

    i2c_bus i2c;
    const uint32_t timeout = 1000;

    hw.StartAudio(audio_callback);

    while (1)
    {
        while (test_send(i2c, timeout) == false)
        {
        }
        // hw.DelayMs(100);
        while (test_receive(i2c, timeout) == false)
        {
        }
    }
}

Probably not the problem (?), but your audio callback is only doing one channel.

1 Like

I probably should have said I’m only using the 1st input/output (left channel), but thanks for looking!

Looking at the map file (from my main firmware, not the example code) I can see the data is right next to the daisy audio data (and the daisy audio data is in a different location when I don’t attempt to put the memory in DMA memory. This does seem rather suspicious, could this be the issue? Do I need to pack my data size so that the Daisy audio memory sits on a particular memory boundary? It is already a multiple of 512 (512*96), so packing seems unlikely. As I said before, the corruption can be heard even if I don’t actually use the DMA, just organising the memory in this way seems to cause it. Maybe @shensley would know this?

With DMA memory block

*(.sram1_bss)
 .sram1_bss     0x0000000030000000     0xc000 build/Archive.o
                0x0000000030000000                Gloop::wav_data_block
 .sram1_bss     0x000000003000c000     0x4000 ../libdaisy//build/libdaisy.a(audio.o)
 .sram1_bss     0x0000000030010000      0x140 ../libdaisy//build/libdaisy.a(adc.o)
 *(.sram1_bss*)

Without DMA memory block the lib daisy audio block is located where my DMA buffer would be.

 *(.sram1_bss)
 .sram1_bss     0x0000000030000000     0x4000 ../libdaisy//build/libdaisy.a(audio.o)
 .sram1_bss     0x0000000030004000      0x140 ../libdaisy//build/libdaisy.a(adc.o)
 *(.sram1_bss*)

I think what’s happening is you’re exceeding the 32kB of D2 RAM that libDaisy reserves and configures as cacheless using the MPU.

The range 0x30000000 - 0x30008000 is the 32k of D2 RAM that’s specifically set aside for DMA usage and configured to be non-cacheable in system.cpp:

    // Configure RAM D2 (SRAM1) as non cacheable
    MPU_InitStruct.Enable           = MPU_REGION_ENABLE;
    MPU_InitStruct.BaseAddress      = 0x30000000;
    MPU_InitStruct.Size             = MPU_REGION_SIZE_32KB;
    MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
    MPU_InitStruct.IsBufferable     = MPU_ACCESS_NOT_BUFFERABLE;
    MPU_InitStruct.IsCacheable      = MPU_ACCESS_NOT_CACHEABLE;
    MPU_InitStruct.IsShareable      = MPU_ACCESS_SHAREABLE;
    MPU_InitStruct.Number           = MPU_REGION_NUMBER0;
    MPU_InitStruct.TypeExtField     = MPU_TEX_LEVEL1;
    MPU_InitStruct.SubRegionDisable = 0x00;
    MPU_InitStruct.DisableExec      = MPU_INSTRUCTION_ACCESS_ENABLE;
    HAL_MPU_ConfigRegion(&MPU_InitStruct);

Whereas from your map file it’s clear that with I2C buffers allocated to D2 RAM, the audio buffers now extend beyond this range.

In the default non-internal-flash (bootloader) linker scripts this should trigger a linker error since the segment is explicitly defined for 32kB but in the internal flash script it’s just one larger D2 RAM chunk and the MPU config is the only thing setting the first 32k as non-cacheable.

To fix it so you can use large I2C DMA buffers, you’ll have to modify libDaisy one way or another:

  1. Change the audio DMA buffer size in audio.cpp, they’re pretty big by default (1024 samples interleaved x 2 channel pairs). These buffers only really need to be as big as your callback block size x 2 for interleaving, so e.g. if you’re using a 48 sample block size kAudioMaxBufferSize could be changed to 96.
  2. Change the MPU config and/or linker script to reserve more of D2 RAM as cacheless for DMA

Those are probably the simplest workarounds. There’s a lot more that can be trimmed out of audio/sai code if you’re only ever using one configuration in your firmware, but I think either of these should do the trick!

1 Like

Thanks very much for your detailed response! I did wonder if this might be the case, but from the linker output I was led to believe there was 288KB of D2 RAM and it looked like I had lots of D2 RAM to spare?

           FLASH:      112436 B       128 KB     85.78%
         DTCMRAM:          0 GB       128 KB      0.00%
            SRAM:      491180 B       512 KB     93.69%
          RAM_D2:       16704 B       288 KB      5.66%
          RAM_D3:          0 GB        64 KB      0.00%
     BACKUP_SRAM:          12 B         4 KB      0.29%
         ITCMRAM:          0 GB        64 KB      0.00%
           SDRAM:      22532 KB        64 MB     34.38%
       QSPIFLASH:          8 KB         8 MB      0.10%

Is that number not accurate?

Yes that number is accurate but libDaisy only configures the first 32kB of that chunk for DMA. Caching must be disabled for any DMA buffers, and that’s what the code I shared from system.cpp is doing - but again, only for the first 32kB of memory starting at 0x3000000.

Also what I meant by this:

In the default non-internal-flash (bootloader) linker scripts this should trigger a linker error since the segment is explicitly defined for 32kB but in the internal flash script it’s just one larger D2 RAM chunk and the MPU config is the only thing setting the first 32k as non-cacheable.

The default internal flash linker script doesn’t treat the first 32kB as a separate memory segment from the rest of RAM D2. But the other linker scripts for Daisy Bootloader apps (SRAM or QSPI) does declare it as a separate segment.

So if you were compiling for a bootloader app, you would actually get a linker error letting you know you’ve exceeded the 32kB segment for cacheless DMA buffers, but unfortunately not for an internal flash app. That particular quirk could easily be solved with a PR to libDaisy to match the behavior of the bootloader app linker scripts.

But regardless - you are still faced with the problem of using more than 32kB of DMA buffer memory, so you need to either shrink down the libDaisy internal audio DMA buffers or change the MPU configuration like I mentioned.

1 Like

By the way I was referring to this in your previous message to indicate you’ve gone beyond 32kB:

Here you can clearly see that the 0x4000 (=16kB) of memory used for the libDaisy audio DMA buffers will go beyond the 32kB cacheless DMA memory boundary - as evidenced by the next chunk (for the ADC) starting at an address past 0x30008000

1 Like

Thanks for the explanation! As I’m not using D2 memory for anything else, is there any reason I shouldn’t make the entirety of the D2 cache-disabled?

If you’re not using it for anything else then I suppose there’s no reason it can’t be fully cache-disabled. But keep in mind the MPU configuration API requires you to use powers-of-two for the size. So for now it’s 32kB, you could do 64kB, 128kB, or 256kB with just a one line code change in libDaisy. But to get 288kB you’d need to make two calls to the configure region method in the HAL with two structs (one with one base + size 256kB, another with the adjusted base + size 32kB)

1 Like

I’ve tried this and it works, thanks so much!

Is there another macro to prompt the linker place data in D2 (in the Daisy environment, not through HAL) other than DMA_BUFFER_MEM_SECTION? If not I wonder what the reason is not to default all of D2 to ‘no-cache’? It would make it less likely for others to have the same issue I had, and if there’s no other way in libDaisy to use D2 memory that would seem sensible? Obviously if you want to use HAL directly to access D2, then it’s the programmers responsibility to make sure things are setup correctly.

1 Like

Glad it worked! There is no macro for the rest of D2 RAM but you could make one by declaring and defining a separate segment in the linker script corresponding to the remaining cache-enabled D2, and adding a macro that calls out its name with __attribute__((section(".d2_cache"))) or whatever you end up naming it.

You can check out the default libDaisy linker scripts and section macros as a guide.

1 Like

Yeah, my point was, as the Daisy toolchain doesn’t provide such a macro, it would make sense to me for the entire of D2 memory to be configured as cache-less by default. That’s a suggestion for the Daisy team though.

Thanks again!