FatFs errors on SD card when audio is enabled

Hi everyone. Just getting started with Daisy, so bear with me. I have a Pod and another standalone Seed with an SD card hooked up to the SDIO lines. I’ve been having SD card read issues on both, so I wrote some smaller code to measure performance and troubleshoot the problem.

The initial performance test I wrote to be as simple as possible, and I had no FatFs errors. Then I added simple blank audio callbacks and it immediately reproduced.

I thought it might have something to do with the ffconf.h timeout config:
#define _FS_TIMEOUT 1000 /**< Timeout period in unit of time ticks */

If this is in the same 200 MHz tick units as the daisy timer, then that would only be 5 us. I tried changing to 100000 (500 us if using 200 MHz tick), recompiling daisy lib, then my example project, but still got the error when audio was enabled.

Side note on the SD card SDIO performance, I’m hoping we seem some improvements when libdaisy goes to DMA and clock speed increases.

Hopefully pasting this code block works. This can be compiled by replacing SDMMC.cpp in the SDMMC Seed example. The new code waits for the user to open the USB CDC console by requiring a button press before starting. On the Pod, that is the first button. On the Seed, you can ground pin 27.

Would be great if someone could try reproducing. I’ve tried multiple Daisy’s and SD cards. Any help appreciated!

  
// SD card performance test
#include 
#include 
#include 
#include 
#include "daisy_pod.h"
#include "fatfs.h"

#define TEST_FILE_NAME "sdtest.txt"
#define BUFFER_SIZE (64 * 1024)
#define TEST_PATTERN(index) ((index % 16) + '0')
#define TICKS_TO_US 200.0
#define AUDIO_BLOCK_SIZE 48

using namespace daisy;

static DaisySeed hw;
static Switch button;
static SdmmcHandler sd;

uint8_t testBuffer[BUFFER_SIZE];

#define DBG_BUFFER_SIZE 4096
char dbgBuf[DBG_BUFFER_SIZE];
char *dbgPtr = dbgBuf;

void dbgPrintf(const char *fmt, ...) {
    if ((dbgPtr - dbgBuf) >= (int)sizeof(dbgBuf)) {
        return;
    }

    va_list valist;
    va_start(valist, fmt);
    dbgPtr += vsnprintf(dbgPtr, &dbgBuf[sizeof(dbgBuf)] - dbgPtr, fmt, valist);
    va_end(valist);
}

void dbgDump() {
    dbgBuf[sizeof(dbgBuf) - 1] = 0;
    hw.usb_handle.TransmitInternal((uint8_t *)dbgBuf, strlen(dbgBuf));
    dbgPtr = dbgBuf;
    dsy_system_delay(10);
}

static volatile bool waitingForUsbEnter = false;

void waitForUser() {
    // TODO - USB CDC Rx does not seem to work 
#if 0
    waitingForUsbEnter = true;
    while (waitingForUsbEnter) {
        dsy_system_delay(5);
        if (dbgPtr != dbgBuf) {
            dbgDump();
        }
    }
#else
    while (true) {
        button.Debounce();
        if (button.RisingEdge())
            break;
        dsy_system_delay(1);
    }
#endif
}

void UsbCallback(uint8_t *buf, uint32_t *len) {
    dbgPrintf("usb cb\n\r");

    if (!buf || !len) {
        return;
    }

    dbgPrintf("usb %ld %c\n\r", *len, buf[0]);

    for (size_t i = 0; i < *len; i++) {
        if (buf[i] == '\n') {
            waitingForUsbEnter = false;
        }
        if (buf[i] == 'a') {
            waitingForUsbEnter = false;
        }
    }
}

void AudioCallback(float *in, float *out, size_t size) {
    // Blank the audio
    for(size_t i = 0; i < size; i += 2) {
        out[i] = out[i + 1] = s162f(0) * 0.5f;
    }
}

void fillPattern(uint8_t *buffer, size_t length) {
    for (size_t i = 0; i < length; i++) {
        buffer[i] = TEST_PATTERN(i);
    }
}

bool testWrite(const char *testname, uint8_t *buffer, size_t length, size_t filelength)
{
    bool status = true;
    size_t byteswritten = 0;
    uint32_t start, end;
    size_t numwrites = filelength / length;
    uint64_t ticks = 0;

    // Open test file on the SD Card
    FRESULT result = f_open(&SDFile, TEST_FILE_NAME, (FA_CREATE_ALWAYS) | (FA_WRITE));

    if (result != FR_OK) {
        dbgPrintf("%s: f_open error: %d\n\r", testname, result);
        status = false;
    } else {
        for (size_t i = 0; i < numwrites; i++) {
            byteswritten = 0;
            start = dsy_tim_get_tick();
            result = f_write(&SDFile, buffer, length, &byteswritten);
            end = dsy_tim_get_tick();
            ticks += end - start;

            if ((result != FR_OK) || (byteswritten != length)) {
                dbgPrintf("%s: f_write error: %d, bytes written = %ld, loop %d\n\r",
                          testname, result, byteswritten, i);
                status = false;
                break;
            }
        }

        f_close(&SDFile);
    }

    if (status) {
        double timeMicroseconds = (double)ticks / TICKS_TO_US;
        double bytesPerS = (double)filelength * 1000000.0 / timeMicroseconds;
        dbgPrintf("%s: Passed: %10ld us, %10ld bytes/s\n\r", testname, (uint32_t)timeMicroseconds, (uint32_t)bytesPerS);
    } else {
        dbgPrintf("%s: Failed\n\r", testname);
    }

    dbgDump();

    return status;
}

bool testRead(const char *testname, uint8_t *buffer, size_t length, size_t filelength) {
    bool status = true;
    size_t bytesread = 0;
    uint32_t start, end;
    size_t numreads = filelength / length;
    uint64_t ticks = 0;

    // Open the test file on the SD Card
    FRESULT result = f_open(&SDFile, TEST_FILE_NAME, (FA_OPEN_EXISTING | FA_READ));

    if (result != FR_OK) {
        dbgPrintf("%s: f_open error: %d\n\r", testname, result);
        status = false;
    } else {
        for (size_t i = 0; i < numreads; i++) {
            bytesread = 0;
            start = dsy_tim_get_tick();
            result = f_read(&SDFile, buffer, length, &bytesread);
            end = dsy_tim_get_tick();
            ticks += end - start;

            if ((result != FR_OK) || (bytesread != length)) {
                dbgPrintf("%s: f_read error: %d, bytes read = %ld, loop %d\n\r",
                          testname, result, bytesread, i);
                status = false;
                break;
            }
        }

        f_close(&SDFile);
    }

    if (status) {
        double timeMicroseconds = (double)ticks / TICKS_TO_US;
        double bytesPerS = (double)filelength * 1000000.0 / timeMicroseconds;
        dbgPrintf("%s: Passed: %10ld us, %10ld bytes/s\n\r", testname, (uint32_t)timeMicroseconds, (uint32_t)bytesPerS);
    } else {
        dbgPrintf("%s: Failed\n\r", testname);
    }

    dbgDump();

    return status;
}

int main(void) {
    // Init hardware
    hw.Configure();
    hw.Init();
    dsy_tim_start();

    // Set button to pin 27, to be updated at a 1kHz  samplerate
    // This is the first button on pod, or hit pin 27 on seed to ground
    button.Init(hw.GetPin(27),1000);

    // Init USB serial
    hw.usb_handle.Init(UsbHandle::FS_INTERNAL);
    hw.usb_handle.SetReceiveCallback(UsbCallback, UsbHandle::FS_INTERNAL);

    // Wait for a button or pin to be grounded while user opens USB console
    waitForUser();

    // Init SD Card
    sd.Init();

    // Links libdaisy i/o to fatfs driver.
    dsy_fatfs_init();

    bool fileCreated = false;
    bool audioEnabled = false;
    bool mounted = false;
    FRESULT result;

    dbgPrintf("Running tests without audio running\n\r\n\r");
    dbgDump();

    while (true) {
        // Mount SD Card
        if ((!mounted) && ((result = f_mount(&SDFatFS, SDPath, 1)) != FR_OK)) {
            dbgPrintf("Error: Couldn't mount SD card, f_mount error: %d\n\r", result);
            dbgDump();
        } else {
            // Don't try to mount again
            mounted = true;

            // Fill buffer before write tests with known pattern
            fillPattern(testBuffer, sizeof(testBuffer));

            static const int filesize = 32 * 1024 * 1024;

            // Don't need to run this if already created
#if 1
            if (!fileCreated) {
                fileCreated = true;
                testWrite("Write 32 MB, 512 chunk", testBuffer, 512, filesize);
                // Don't want to stress writing the card yet
#if 0
                testWrite("Write 32 MB,  1K chunk", testBuffer, 1024, filesize);
                testWrite("Write 32 MB,  2K chunk", testBuffer, 2 * 1024, filesize);
                testWrite("Write 32 MB,  4K chunk", testBuffer, 4 * 1024, filesize);
                testWrite("Write 32 MB,  8K chunk", testBuffer, 8 * 1024, filesize);
                testWrite("Write 32 MB, 16K chunk", testBuffer, 16 * 1024, filesize);
#endif
            }
#endif

            // Test a couple of single chunk reads to time
            testRead( "Read    512, 512 chunk", testBuffer, 512, 512);
            testRead( "Read    512, 512 chunk", testBuffer, 512, 512);
            testRead( "Read    512, 512 chunk", testBuffer, 512, 512);
            testRead( "Read  32 MB, 512 chunk", testBuffer, 512, filesize);
            testRead( "Read  32 MB,  1K chunk", testBuffer, 1024, filesize);
            testRead( "Read  32 MB,  2K chunk", testBuffer, 2 * 1024, filesize);
            testRead( "Read  32 MB,  4K chunk", testBuffer, 4 * 1024, filesize);
            testRead( "Read  32 MB,  8K chunk", testBuffer, 8 * 1024, filesize);
            testRead( "Read  32 MB, 16K chunk", testBuffer, 16 * 1024, filesize);
            testRead( "Read  32 MB, 32K chunk", testBuffer, 32 * 1024, filesize);
        }


        // 2nd time around, enable audio
        if (!audioEnabled) {
            audioEnabled = true;

            dbgPrintf("\n\rRunning tests with audio running\n\r\n\r");
            dbgDump();

            // Init Audio
            hw.SetAudioBlockSize(AUDIO_BLOCK_SIZE);
            hw.StartAudio(AudioCallback);
        } else {
            waitForUser();
        }
    }
}
  

Forgot to post the performance numbers and error that happens. Also, I found it interesting that the first read of a 512 byte block is relatively fast, while opening the same file again and reading the same 512 bytes a 2nd and 3rd time is very slow. Maybe just some timing thing, but seems to happen every time.


Running tests without audio running

Write 32 MB, 512 chunk: Passed:   75688028 us,     443325 bytes/s
Read    512, 512 chunk: Passed:        396 us,    1290517 bytes/s
Read    512, 512 chunk: Passed:       2294 us,     223144 bytes/s
Read    512, 512 chunk: Passed:       2294 us,     223173 bytes/s
Read  32 MB, 512 chunk: Passed:   21002816 us,    1597615 bytes/s
Read  32 MB,  1K chunk: Passed:   17008792 us,    1972769 bytes/s
Read  32 MB,  2K chunk: Passed:   10975566 us,    3057193 bytes/s
Read  32 MB,  4K chunk: Passed:    8016941 us,    4185440 bytes/s
Read  32 MB,  8K chunk: Passed:    6638796 us,    5054294 bytes/s
Read  32 MB, 16K chunk: Passed:    5884563 us,    5702110 bytes/s
Read  32 MB, 32K chunk: Passed:    5884699 us,    5701978 bytes/s

Running tests with audio running

Read    512, 512 chunk: Passed:        397 us,    1286723 bytes/s
Read    512, 512 chunk: Passed:       2296 us,     222987 bytes/s
Read    512, 512 chunk: Passed:       2297 us,     222823 bytes/s
Read  32 MB, 512 chunk: f_read error: 1, bytes read = 0, loop 6
Read  32 MB, 512 chunk: Failed

That timeout is in OS ticks, 1 tick == 1 ms. I’m not very familiar with FatFS, but you could try opening all necessary files at patch start to see if just reading would have the same problems.

I’ve tried just reading with no writes and not opening/closing multiple times. Same issues with FatFs errors if audio processing is enabled. Light audio processing seems to be OK, but anything slightly strenuous is not.

Regardless, I need to be able to open different files and read them as fast as possible.

Here is the test output from a different card. The first one above was a SanDisk Ultra 32 GB class 10 A1 card. This one is a SanDisk Ultra 16 GB class 10 A1 card:

Running tests without audio running

Write 32 MB, 512 chunk: Passed:   65047853 us,     515842 bytes/s
Read    512, 512 chunk: Passed:        258 us,    1983650 bytes/s
Read    512, 512 chunk: Passed:       2162 us,     236755 bytes/s
Read    512, 512 chunk: Passed:       2163 us,     236610 bytes/s
Read  32 MB, 512 chunk: Passed:   17200876 us,    1950739 bytes/s
Read  32 MB,  1K chunk: Passed:   14201729 us,    2362700 bytes/s
Read  32 MB,  2K chunk: Passed:    9696041 us,    3460631 bytes/s
Read  32 MB,  4K chunk: Passed:    7339643 us,    4571670 bytes/s
Read  32 MB,  8K chunk: Passed:    6247965 us,    5370457 bytes/s
Read  32 MB, 16K chunk: Passed:    5720537 us,    5865608 bytes/s
Read  32 MB, 32K chunk: Passed:    5456578 us,    6149354 bytes/s

Running tests with audio running

Read    512, 512 chunk: Passed:        259 us,    1971126 bytes/s
Read    512, 512 chunk: Passed:       2164 us,     236501 bytes/s
Read    512, 512 chunk: Passed:       2164 us,     236562 bytes/s
Read  32 MB, 512 chunk: f_read error: 1, bytes read = 0, loop 7
Read  32 MB, 512 chunk: Failed

I’ve just noticed that you run your tests from main loop. But this code executes only when audio is not being processed. That’s why it gets huge performance drop from running audio code. You could try moving this to audioCallback, which is what Daisy uses for its sampler demos. But you won’t be able to run long tests in such case - it must finish while audio is being processed (within 1 ms with default buffer size).

I’m running the SD card reads from the main loop on purpose. FatFs is not appropriate for running inside the audio callback at interrupt level, as it blocks and takes a variable amount of time.

The Daisy sampler demos also run FatFs in the main loop, if you look closely.

There is no huge performance drop for my example when audio is started. If you look at the numbers, it is ~ 1 us increase per 1 ms callback rate that gets added for the simple blank audio callback in the example code I posted.

Is anybody able to try the example code I posted? @shensley, any ideas on why I get the FatFs errors when audio callbacks are enabled?

Indeed, they only stream audio from prefetched buffer in audio callback in demos. But file opening is still done in CB. Not sure if that makes a difference, but it at least would be setting a limit of opening files not more frequently than once per audio buffer. Sounds like this should be irrelevant if you say not reopening files made no difference.

Another difference with Daisy sample code is that it’s not constantly reading at full speed, but makes burst reads with average read of 48k samples / second. This could be the reason why some kind of issue between FatFS and audio callbacks would be unnoticed.

A temporary solution to this would be to actually process your audio in the main loop, but a set a flag, and copy the buffer in the AudioCallback. This still may result in underruns depending on overall usage. I have a working example as part of another project that overhauls the SamplePlayer class, and adds DMA to the SD card interface.

This was working well switching between several files, using 16kB read sizes and a 48 sample audio block size.

As soon as I have some extra time I’ll add those changes to libdaisy :smile:

As a pseudocode example of the code I described above:

static bool audio_copy_flag;
static float buffer[size_of_buffer];

void AudioCallback(float *in, float *out, size_t size)
{
    audio_copy_flag = true;
    memcpy(buffer, out, size * sizeof(in[0]));
}
int main(void)
{
    ...init
    for (;;)
    {
        sdplayer.Prepare();
        if (audio_copy_flag)
        {
            audio_copy_flag = false;
            ProcessTheAudio();
        }
    }
}

I need the SD card reads to happen in the main loop and the audio processing to happen in the AudioCallback.

In the example I posted, I’m actually not linking these at all. Just need FatFs to not error out for that test. The AudioCallback is about as simple as it can get.

@shensley, I’d be happy to try out your lower level DMA / SD changes if I could get a pre-release copy, even if that isn’t cleaned up yet. I don’t need the WavPlayer changes.

Had the same issue when writing a simple WAV player for Patch, so I load all samples to memory before starting audfio (Sample Player for Daisy Patch)
(this is not using the WavPlayer.cpp code, but only the SD card reading)

@shensley provided me with the DMA version of the SD card drivers. I tested my example with those with varying clock speed. Good news is that I didn’t get any errors when combining with audio! Writing doesn’t work yet with the DMA version, but that is fine for me.

I’m not sure what the clock divisions are because I don’t have all the Cube tools installed. I couldn’t quite work out the math, but based on these:
// hsd1.Init.ClockDiv = 168; // 476.2kHz works
hsd1.Init.ClockDiv = 6; // 12MHz works

Maybe a ClockDiv of 2 is ~27.5 MHz?

Anyway, here are the numbers I got from the test code.

With ClockDiv of 6 (12 Mhz):

Read 512, 512 chunk: Passed: 396 us, 1291201 bytes/s
Read 512, 512 chunk: Passed: 2296 us, 222904 bytes/s
Read 512, 512 chunk: Passed: 2296 us, 222947 bytes/s
Read 32 MB, 512 chunk: Passed: 18839712 us, 1781048 bytes/s
Read 32 MB, 1K chunk: Passed: 16599831 us, 2021371 bytes/s
Read 32 MB, 2K chunk: Passed: 10843470 us, 3094436 bytes/s
Read 32 MB, 4K chunk: Passed: 7936368 us, 4227932 bytes/s
Read 32 MB, 8K chunk: Passed: 6542593 us, 5128613 bytes/s
Read 32 MB, 16K chunk: Passed: 5869833 us, 5716419 bytes/s
Read 32 MB, 32K chunk: Passed: 5869850 us, 5716403 bytes/s

With ClockDiv of 2:

Read 512, 512 chunk: Passed: 344 us, 1488328 bytes/s
Read 512, 512 chunk: Passed: 2223 us, 230302 bytes/s
Read 512, 512 chunk: Passed: 2222 us, 230356 bytes/s
Read 32 MB, 512 chunk: Passed: 15394295 us, 2179666 bytes/s
Read 32 MB, 1K chunk: Passed: 13070389 us, 2567209 bytes/s
Read 32 MB, 2K chunk: Passed: 7388220 us, 4541612 bytes/s
Read 32 MB, 4K chunk: Passed: 4586053 us, 7316624 bytes/s
Read 32 MB, 8K chunk: Passed: 3105287 us, 10805580 bytes/s
Read 32 MB, 16K chunk: Passed: 2402714 us, 13965220 bytes/s
Read 32 MB, 32K chunk: Passed: 2402260 us, 13967858 bytes/s

Cool to see the profiling values, I didn’t record them when I was getting this up and running (was in quite a rush that day).

As far as I know, FATFs is optimized for 32K chunks which is why you have the best speeds there. Except it is interesting that at the lower clock speed 16K was a bit quicker.

Once I wrap up the few projects I’m in the middle of I’ll be able to get writing working, and get this properly merged into the libdaisy.

@shensley
Do you mind sharing the DMA version of the SD card drivers? I’m using my own WAV reader, and I’m having the same issue of not being able to load files from SD Card once the audio is started. Also, can we have method to stop the audio loop?

patch.StopAudio() and perhaps patch.StopAdc() ?

Of course! I was hoping I would have had time to get back to the SDMMC, but I’ve been very busy.

Here is the file I shared with @hexcode, you can just replace it with the one of the same name in libdaisy. Important to note that it’s only been tested for reading.


Also, regarding the Stop functions: absolutely, I plan on having an AudioHandle::Stop() function in my rework of the audio that’s in progress. I’ll add the stop function to the dev board API files as well. I will be able to jump back on that next week, and hopefully wrap it up in a day or two.

Definitely a good idea to add a stop function for the ADCs, too.

1 Like

thanks for sharing. I tried it on my wavplayer and granular synth for Daisy but it hangs (unless I’m impatient). Need to check why later.

Interesting. It’s still very much untested which is why it hasn’t been added to libdaisy yet. I’ll be curious to see what you find.

In the project I added it for, I think I ran into some issues as well, but I was able to get stable playback/streaming with a block size of 48 samples, and 32kB reads from the SD card.

On the hardware I was working on (not daisy itself) I was able to set the SDMMC clock div setting all the way down to 1, and still have everything work reliably.

That said, I really wasn’t doing much other than playback speed adjustment and file selection. I’m not positive, but trying to write to the SD card might result in a total stall with this file… but I don’t think I tried to write either (its been a few months since I was working on it.)

I wonder if you’re doing a lot of file seeking due to the granular. I’ve noticed f_lseek can be quite slow.

I’m loading WAVs directly into SDRAM (loading say 24MB at a time), is that supported?

Or do I need to load in regular memory first and then copy/assemble the file into SDRAM myself?

Oh that’s interesting, I hadn’t tried that as there wasn’t external SDRAM on the hardware I was using.

Hypothetically, it should work, though there could be some issues with how the cache is configured for the SDRAM.

The SDMMC1 peripheral has it’s own internal DMA on the STM32H7, and I haven’t read through all of the details on it yet.

However, based on this chart from the reference manual there is a connection between the SDMMC1 and the FMC so it should be possible to copy memory directly from the SDMMC to the external SDRAM:

The libdaisy SDMMC/FatFS implementation has a pretty substantial update incoming (I’ll be merging within the next hour or so).

And for anyone that wants to look at it: the PR

Details

The SDMMC Handle can now be configured for 1-bit, or 4-bit buswidth, and with configurable speed settings ranging from 400kHz up to an over-spec’d 100MHz.

Transfers are now done with the SDMMC1 peripheral’s dedicated IDMA bus. The current diskio setup is a slightly modified version of one of the example projects from an ST disco board. So there is probably still some room for improvement there, but the I/O functions are now interruptible and do not suffer the stability issues that were seen before while audio was running.

I’ve also added the beginnings of a few utility classes, one for Recording/Writing WAV files (WavWriter) and one for preloading files into RAM (mostly for wavetable purposes, but can be (ab)used for larger files (WaveTableLoader).

Changes

There are two “breaking” changes that will affect anyone’s projects that currently use the SD Card in any capacity:

  1. The project Makefile no longer needs any special attention when using the SD Card, and the Makefiles within these projects have been reduced to their simple forms. The necessary files are now included via the core/Makefile in libdaisy, but a custom FATFS implementation can be swapped out for the libdaisy one by using FATFS_DIR in the Makefile, though this has not been extensively tested.
  2. The daisy::SdmmcHandler class now has a configuration structure for setting bus-width and speed. There is a “Defaults” function available for easy adaptation.

Code change for new support is as follows:

// previous:
sdhandler.Init();
// new:
SdmmcHandler::Config sd_cfg;
sd_cfg.Defaults(); // 4-bit, 50MHz
sdhandler.Init(sd_cfg);

If anyone has any other issues or instabilities feel free to reach out or create github issues.

3 Likes

This sounds great! Is there an example project in the git somewhere?

By the way I really love the examples. By keeping them as simple as possible, you have suceeded in keeping them 100% transparent - one look and you know what’s going on. Very cool!

2 Likes