No-OS, Non-Blocking LittleFS

In another post I mentioned that I switched from FATFS to LittleFS (LFS) on an STM32. This caused me to write a file explorer for LittleFS in Windows.

https://bluscape.blog/2019/10/01/littlefs-explorer-lfse-for-windows/

Prior to switching to LittlFS, I was in the process of converting a FATFS library to a No Operating System (No-OS), Non-blocking library using Protothreads. I also started a post on this and might still complete it.

The reason why I converted the FATFS driver to No-OS, Non-blocking was that I’m not willing to block the program execution while the file system (FS) is waiting for disk reads and writes to complete.

LFS had the same problem. There is no feedback mechanism to allow the program to resume while the disk access is in progress. But the problem is not only with the FS drivers, the disk IO drivers provided by ST are blocking too. If you have a look for example at the ST SDIO SD card driver, in DMA mode, the DMA is interrupt driven but the driver is blocking and polling for results in a while loop.

Examples of the blocking ST SDIO SD driver

// From stm324x7i_eval_sdio_sd.c
SD_Error SD_WaitReadOperation(void)
{
  ...
  
  // Wait for the transfer to complete (blocking and polling)
  while ((DMAEndOfTransfer == 0x00)&&(TransferEnd == 0)&&
  (TransferError == SD_OK) && (timeout > 0))
  {
      timeout--;
  }

  ...

  // Wait for the read to become inactive (blocking and polling)
  while(((SDIO->STA & SDIO_FLAG_RXACT)) && (timeout > 0))
  {
      timeout--;  
  }

  ...
}

// From fatfs_drv.c
DRESULT disk_read(BYTE drv, BYTE *buff, DWORD sector, BYTE count)
{
  ...

  // Wait for the read operation to complete (blocking and polling)
  sdstatus =  SD_WaitReadOperation();

  // Wait until the transfer is OK (blocking and polling)
  while(SD_GetStatus() != SD_TRANSFER_OK);

 ...
}

There are several ways to overcome this and one of them is to use an OS with context switching, but I’m not a fan of context switching. This again consumes precious RAM resources and a couple other reasons too.

It was a massive task to convert FATFS to non-blocking and then I still had to do testing. It felt like I’m investing too much time into something of which I’m not certain about the results, and had no idea how much is still involved to get it to the point where it is stable without having to get to grips with the driver code and having a reasonable understanding of the FATFS specification. I started looking at alternative file systems for embedded implementation. Something simpler and more modern. I had a look several file systems and then came across LFS.

You can find a list of file systems here: https://en.wikipedia.org/wiki/List_of_file_systems

Just to give you an idea, I counted the lines of code using LocMetrics (http://www.locmetrics.com/) searching only in C source and header files. With the reduced line count it is much easier to find problems.

FATFS R0.14LFS 2.1
Lines of code23648 lines6138 lines

LFS seemed modern (more modern than FATFS), robust and the code was also much less than that of FATFS. The power-loss resilience and dynamic wear leveling of LFS was also very attractive.

So I converted the standard LFS library to non-blocking library using Protothreads. But this required that I convert the SDIO SD driver to non-blocking too.

The numbers: Standard LFS vs. Non-Blocking LFS

To compare the “performance” between the standard and a non-blocking LFS libraries, I wrote two test routines that continuously reads a 10k (10240 bytes) file from an SD card using LFS. Each read operation will read a 512 byte block from the SD card. The first routine uses the standard version of the LFS library whereas the second routine uses the converted, non-blocking, version of the LFS library. Both routines mount the FS, open a file, read to the end of the file and then rewind the file. The read and rewind operations are repeated. I’m changing the state of an IO pin to measure the the timing of the following items:

  • Time per read iteration (Non-blocking version only).
  • Time to read a 512 byte block.
  • Time to read the 10k file.

The position, in code, where the IO pin is changed was moved to allow the measurement of the respective items. There is a 10ms delay between each file read. The IO pin is set high during read operations and set low once the read operation is completed.

Following is the LFS configuration and both the Standard and Non-Blocking LFS test routines.

LFS Configuration (for both routines)

const struct lfs_config LFSConfig_S =
{
  .read = DISK_Read_S,
  .prog = DISK_Write_S,
  .erase = DISK_Erase_S,
  .sync = DISK_Sync_S,
  .read_size = 512,
  .prog_size = 512,
  .block_size = 512,
  .block_count = 7580712,
  .block_cycles = 1000,
  .cache_size = 512,
  .lookahead_size = 512,
  .name_max = 255,
  .file_max = 2147483647,
  .attr_max = 1022,
  .read_buffer = connectivity_fs_ReadBuffer,
  .prog_buffer = connectivity_fs_ProgBuffer,
  .lookahead_buffer = connectivity_fs_LookAheadBuffer,
};

Standard LFS Test Routine

static PT_THREAD(TestThread_V(struct pt* Thread_PS))
{
#define LFS_EOF 0
#define IO_PIN_BLOCK
//#define IO_PIN_FILE
  static enum lfs_error FileResult_E;
  static lfs_ssize_t BytesRead_S;
  static uint8_t ReadBuffer_AU8[512];
  static lfs_file_t LFSFileHandle_S;
  static lfs_t LFSHandle_S;
  static struct pt LocalThread_S;
  static TTIMER_S LocalTimer_S;

  // Begin the thread
  PT_BEGIN(Thread_PS);

  // Try and mount the disk
  FileResult_E = lfs_mount(&LFSHandle_S, &LFSConfig_S);

  // If there was an error
  if (FileResult_E < LFS_ERR_OK)
  {
    // Exit the thread
    PT_EXIT(Thread_PS);
  }

  // Try and open the file
  FileResult_E = lfs_file_open(&LFSHandle_S, &LFSFileHandle_S, 
                               "TestFile.jpg", LFS_O_RDONLY);

  // If there was an error
  if (FileResult_E < LFS_ERR_OK)
  {
    // Exit the thread
    PT_EXIT(Thread_PS);
  }

  // Loop forever
  while (1)
  {
    // Set the pin (for oscilloscope measurement)
    GPIO_SetBits(Measurement_Port, Measurement_Pin);

    // Try and read from the file
    BytesRead_S = lfs_file_read(&LFSHandle_S, &LFSFileHandle_S, 
                                ReadBuffer_AU8, 512);

#ifdef IO_PIN_BLOCK
    // Clear the pin (for oscilloscope measurement)
    GPIO_ResetBits(Measurement_Port, Measurement_Pin);
#endif

    // If there was an error
    if (BytesRead_S < LFS_ERR_OK)
    {
      // Exit the thread
      PT_EXIT(Thread_PS);
    }

    // If we reached the end of the file
    if (BytesRead_S == LFS_EOF)
    {

#ifdef IO_PIN_FILE
    // Clear the pin (for oscilloscope measurement)
    GPIO_ResetBits(Measurement_Port, Measurement_Pin);
#endif

      // Try and rewind the file
      FileResult_E = lfs_file_rewind(&LFSHandle_S, &LFSFileHandle_S);

      // If there was an error
      if (FileResult_E < LFS_ERR_OK)
      {
        // Exit the thread
        PT_EXIT(Thread_PS);
      }

      // Wait a little
      TIMER_Start_V(&LocalTimer_S, TIMER_10_MS);
      PT_WAIT_UNTIL(Thread_PS, TIMER_Expired_B(&LocalTimer_S));
    }
  }

  // End the thread
  PT_END(Thread_PS);
}

Non-Blocking LFS Test Routine

static PT_THREAD(TestThread_V(struct pt* Thread_PS))
{
#define LFS_EOF 0
#define IO_PIN_ITERATION
//#define IO_PIN_BLOCK
//#define IO_PIN_FILE
  static enum lfs_error FileResult_E;
  static lfs_ssize_t BytesRead_S;
  static uint8_t ReadBuffer_AU8[512];
  static lfs_file_t LFSFileHandle_S;
  static lfs_t LFSHandle_S;
  static struct pt LocalThread_S;
  TPT_Results_E ThreadResult_E;
  static TTIMER_S LocalTimer_S;

  // Begin the thread
  PT_BEGIN(Thread_PS);

  // Lock the mutex
  PT_SEM_WAIT(Thread_PS, &LFS_Mutex_S);

  // Try and mount the disk
  PT_SPAWN(Thread_PS, &LocalThread_S, 
           lfs_mount(&LocalThread_S, &LFSHandle_S, &LFSConfig_S, 
                     (int*)&FileResult_E));

  // Unlock the mutex
  PT_SEM_SIGNAL(Thread_PS, &LFS_Mutex_S);

  // If there was an error
  if (FileResult_E < LFS_ERR_OK)
  {
    // Exit the thread
    PT_EXIT(Thread_PS);
  }

  // Lock the mutex
  PT_SEM_WAIT(Thread_PS, &LFS_Mutex_S);

  // Try and open the file
  PT_SPAWN(Thread_PS, &LocalThread_S, 
           lfs_file_open(&LocalThread_S, &LFSHandle_S, &LFSFileHandle_S, 
                         "TestFile.jpg", LFS_O_RDONLY, 
                         (int*)&FileResult_E));

  // Unlock the mutex
  PT_SEM_SIGNAL(Thread_PS, &LFS_Mutex_S);

  // If there was an error
  if (FileResult_E < LFS_ERR_OK)
  {
    // Exit the thread
    PT_EXIT(Thread_PS);
  }

  // Loop forever
  while (1)
  {
    // Lock the mutex
    PT_SEM_WAIT(Thread_PS, &LFS_Mutex_S);

    // Initialise the thread
    PT_INIT(&LocalThread_S);

    // Loop until block read is complete
    while (1)
    {
      // Set the pin (for oscilloscope measurement)
      GPIO_SetBits(Measurement_Port, Measurement_Pin);

      // Try and read from the file
      ThreadResult_E = lfs_file_read(&LocalThread_S, &LFSHandle_S, 
                                     &LFSFileHandle_S, ReadBuffer_AU8, 
                                     512, &BytesRead_S);

#ifdef IO_PIN_ITERATION
      // Clear the pin (for oscilloscope measurement)
      GPIO_ResetBits(Measurement_Port, Measurement_Pin);
#endif

      // If the thread completed
      if (ThreadResult_E >= PT_EXITED)
      {
        break;
      }
    }

#ifdef IO_PIN_BLOCK
      // Clear the pin (for oscilloscope measurement)
      GPIO_ResetBits(Measurement_Port, Measurement_Pin);
#endif

    // Unlock the mutex
    PT_SEM_SIGNAL(Thread_PS, &LFS_Mutex_S);

    // If there was an error
    if (BytesRead_S < LFS_ERR_OK)
    {
      // Exit the thread
      PT_EXIT(Thread_PS);
    }

    // If we reached the end of the file
    if (BytesRead_S == LFS_EOF)
    {

#ifdef IO_PIN_FILE
    // Clear the pin (for oscilloscope measurement)
    GPIO_ResetBits(Measurement_Port, Measurement_Pin);
#endif

      // Lock the mutex
      PT_SEM_WAIT(Thread_PS, &LFS_Mutex_S);

      // Try and rewind the file
      PT_SPAWN(Thread_PS, &LocalThread_S, 
               lfs_file_rewind(&LocalThread_S, 
                               &LFSHandle_S, &LFSFileHandle_S, 
                               (int*)&FileResult_E));

      // Unlock the mutex
      PT_SEM_SIGNAL(Thread_PS, &LFS_Mutex_S);

      // If there was an error
      if (FileResult_E < LFS_ERR_OK)
      {
        // Exit the thread
        PT_EXIT(Thread_PS);
      }

      // Wait a little
      TIMER_Start_V(&LocalTimer_S, TIMER_10_MS);
      PT_WAIT_UNTIL(Thread_PS, TIMER_Expired_B(&LocalTimer_S));
    }
  }

  // End the thread
  PT_END(Thread_PS);
}

Results

The time for each item was measured on a IO pin using an oscilloscope. The standard library does not have a time per iteration since it will block until the read is complete. The time to read a block varied slightly, with a couple microseconds, over several measurements and the values for the block reads noted in the table below are only from the oscilloscope measurements in this post. But in general, the time to read a block between the standard and non-blocking routines was the same. The non-blocking version took slightly longer (less than 1ms longer) to read the entire file.

Standard LFSNon-Blocking LFS
Time per iterationN.A.3.463us
Number of iterationsN.A.101 (min)
1398 (max)
377 (avg)
Time to read a 512 byte block591.8us (min)
6.29ms (max)
591.8us (min)
6.34ms (max)
Time to read the 10k file42.64ms43.43ms

But what is really of interest here is the time per iteration. If you are like me and do not want to use an OS with context switching, you can reduce your read time (per iteration) from roughly speaking 591us (worst case 6.26ms) down to 3.5us. A saving of 588.33us (worst case 6.25ms). That is a hell of a lot of processing time. Valuable processing time that I can spend doing other things. So instead of blocking the program execution for 591us (worst case 6.26ms), we are now blocking it for only 3.5us.

Oscilloscope measurements

Standard LFS

Standard: Minimum read time per block (591.8us).
Standard: Maximum read time per block (6.29ms).
Standard: Total block reads for a 10k file (20 x 512 byte blocks).
Standard: Total read time for a 10k file (42.64ms).

Non-blocking LFS

Non-Blocking: Typical read time per iteration (3.463uS).
Non-Blocking: Total iterations for a 512 byte block read.
Non-Blocking: Minimum read time per block (591.8us).
Non-Blocking: Maximum read time per block (6.348ms).
Non-Blocking: Total block reads for a 10k file (20 x 512 byte blocks).
Non-Blocking: Total read time for a 10k file (43.43ms).

Notes on the Non-Blocking LFS Test Routine

The non-blocking routine was slightly modified to allow measurement between read iterations but will produce the same time per iteration if it was implemented in a normal manner. The only difference between the modified and normal non-blocking routines are the spawning of the child thread using “PT_SPAWN” when reading the file instead of manually checking the thread result. The normal implementation of the non-blocking routine will look like this:

Non-Blocking LFS Routine (normal implementation)

static PT_THREAD(TestThread_V(struct pt* Thread_PS))
{
#define LFS_EOF 0
  static enum lfs_error FileResult_E;
  static lfs_ssize_t BytesRead_S;
  static uint8_t ReadBuffer_AU8[512];
  static lfs_file_t LFSFileHandle_S;
  static lfs_t LFSHandle_S;
  static struct pt LocalThread_S;
  TPT_Results_E ThreadResult_E;
  static TTIMER_S LocalTimer_S;

  // Begin the thread
  PT_BEGIN(Thread_PS);

  // Lock the mutex
  PT_SEM_WAIT(Thread_PS, &LFS_Mutex_S);

  // Try and mount the disk
  PT_SPAWN(Thread_PS, &LocalThread_S, 
           lfs_mount(&LocalThread_S, &LFSHandle_S, &LFSConfig_S,  
                     (int*)&FileResult_E));

  // Unlock the mutex
  PT_SEM_SIGNAL(Thread_PS, &LFS_Mutex_S);

  // If there was an error
  if (FileResult_E < LFS_ERR_OK)
  {
    // Exit the thread
    PT_EXIT(Thread_PS);
  }

  // Lock the mutex
  PT_SEM_WAIT(Thread_PS, &LFS_Mutex_S);

  // Try and open the file
  PT_SPAWN(Thread_PS, &LocalThread_S, 
           lfs_file_open(&LocalThread_S, &LFSHandle_S, &LFSFileHandle_S, 
                         "TestFile.jpg", LFS_O_RDONLY, 
                         (int*)&FileResult_E));

  // Unlock the mutex
  PT_SEM_SIGNAL(Thread_PS, &LFS_Mutex_S);

  // If there was an error
  if (FileResult_E < LFS_ERR_OK)
  {
    // Exit the thread
    PT_EXIT(Thread_PS);
  }

  // Loop forever
  while (1)
  {
    // Lock the mutex
    PT_SEM_WAIT(Thread_PS, &LFS_Mutex_S);

    // Set the pin (for oscilloscope measurement)
    GPIO_SetBits(LED6_RED_GPIO_Port, LED6_RED_Pin);

    // Try and read from the file
    PT_SPAWN(Thread_PS, &LocalThread_S, 
             lfs_file_read(&LocalThread_S, &LFSHandle_S, &LFSFileHandle_S, 
                           ReadBuffer_AU8, 512, &BytesRead_S));

    // Clear the pin (for oscilloscope measurement)
    GPIO_ResetBits(LED6_RED_GPIO_Port, LED6_RED_Pin);

    // Unlock the mutex
    PT_SEM_SIGNAL(Thread_PS, &LFS_Mutex_S);

    // If there was an error
    if (BytesRead_S < LFS_ERR_OK)
    {
      // Exit the thread
      PT_EXIT(Thread_PS);
    }

    // If we reached the end of the file
    if (BytesRead_S == LFS_EOF)
    {
      // Lock the mutex
      PT_SEM_WAIT(Thread_PS, &LFS_Mutex_S);

      // Try and rewind the file
      PT_SPAWN(Thread_PS, &LocalThread_S, 
               lfs_file_rewind(&LocalThread_S, &LFSHandle_S, 
                               &LFSFileHandle_S, (int*)&FileResult_E));

      // Unlock the mutex
      PT_SEM_SIGNAL(Thread_PS, &LFS_Mutex_S);

      // If there was an error
      if (FileResult_E < LFS_ERR_OK)
      {
        // Exit the thread
        PT_EXIT(Thread_PS);
      }

      // Wait a little
      TIMER_Start_V(&LocalTimer_S, TIMER_10_MS);
      PT_WAIT_UNTIL(Thread_PS, TIMER_Expired_B(&LocalTimer_S));
    }
  }

  // End the thread
  PT_END(Thread_PS);
 }

Memory Usage

Both test routines were compiled with the following parameters:

  • IDE: STM32CubeIDE v1.1.0
  • Compiler: gcc version 7.3.1 20180622 (release) [ARM/embedded-7-branch revision 261907] (GNU Tools for STM32 7-2018-q2-update.20190328-1800)
  • Compiler Settings: -mcpu=cortex-m4 -std=gnu11 -g3 -c -Os -ffunction-sections -fdata-sections -Wall -fstack-usage –specs=nano.specs -mfpu=fpv4-sp-d16 -mfloat-abi=hard -mthumb
  • Linker Settings: -mcpu=cortex-m4 – –specs=nosys.specs -Wl,-Map=”${ProjName}.map” -Wl,–gc-sections -static –specs=nano.specs -mfpu=fpv4-sp-d16 -mfloat-abi=hard -mthumb -Wl,–start-group -lc -lm -Wl,–end-group

I extracted the memory usage (FLASH and RAM) from the MAP and ELF files using a tool called MapViewer ( https://github.com/govind-mukundan/MapViewer ). The memory usage below is for the FS driver only.

Standard LFSNon-Blocking LFSDifference
FLASH (text)11776 bytes21940 bytes10164 bytes
RAM (data and bss)1769 bytes3357 bytes1588 bytes

Conclusion

  • Was this exercise worthwhile? For my application yes. I need the processor bandwidth to do other things while accessing the disk through a file system.
  • Is it worth the cost of extra memory usage? I do not like using more memory than necessary and the additional memory usage is a bit higher than what I would have liked but I can always use a slightly larger device if needed . At the gain of processor bandwidth I would say yes.
  • I would like to see what is the memory implications when using the standard LFS library with an OS, but for that reason only. So I might do a test at some point and amend this post.

Let me know me what you think.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s