In another post I mentioned that I switched from FATFS to LittleFS (LFS) on an STM32. This caused me to write a file explorer for LittleFS in Windows.
https://bluscape.blog/2019/10/01/littlefs-explorer-lfse-for-windows/
Prior to switching to LittlFS, I was in the process of converting a FATFS library to a No Operating System (No-OS), Non-blocking library using Protothreads. I also started a post on this and might still complete it.
The reason why I converted the FATFS driver to No-OS, Non-blocking was that I’m not willing to block the program execution while the file system (FS) is waiting for disk reads and writes to complete.
LFS had the same problem. There is no feedback mechanism to allow the program to resume while the disk access is in progress. But the problem is not only with the FS drivers, the disk IO drivers provided by ST are blocking too. If you have a look for example at the ST SDIO SD card driver, in DMA mode, the DMA is interrupt driven but the driver is blocking and polling for results in a while loop.
Examples of the blocking ST SDIO SD driver
// From stm324x7i_eval_sdio_sd.c
SD_Error SD_WaitReadOperation(void)
{
...
// Wait for the transfer to complete (blocking and polling)
while ((DMAEndOfTransfer == 0x00)&&(TransferEnd == 0)&&
(TransferError == SD_OK) && (timeout > 0))
{
timeout--;
}
...
// Wait for the read to become inactive (blocking and polling)
while(((SDIO->STA & SDIO_FLAG_RXACT)) && (timeout > 0))
{
timeout--;
}
...
}
// From fatfs_drv.c
DRESULT disk_read(BYTE drv, BYTE *buff, DWORD sector, BYTE count)
{
...
// Wait for the read operation to complete (blocking and polling)
sdstatus = SD_WaitReadOperation();
// Wait until the transfer is OK (blocking and polling)
while(SD_GetStatus() != SD_TRANSFER_OK);
...
}
There are several ways to overcome this and one of them is to use an OS with context switching, but I’m not a fan of context switching. This again consumes precious RAM resources and a couple other reasons too.
It was a massive task to convert FATFS to non-blocking and then I still had to do testing. It felt like I’m investing too much time into something of which I’m not certain about the results, and had no idea how much is still involved to get it to the point where it is stable without having to get to grips with the driver code and having a reasonable understanding of the FATFS specification. I started looking at alternative file systems for embedded implementation. Something simpler and more modern. I had a look several file systems and then came across LFS.
You can find a list of file systems here: https://en.wikipedia.org/wiki/List_of_file_systems
Just to give you an idea, I counted the lines of code using LocMetrics (http://www.locmetrics.com/) searching only in C source and header files. With the reduced line count it is much easier to find problems.
FATFS R0.14 | LFS 2.1 | |
Lines of code | 23648 lines | 6138 lines |
LFS seemed modern (more modern than FATFS), robust and the code was also much less than that of FATFS. The power-loss resilience and dynamic wear leveling of LFS was also very attractive.
So I converted the standard LFS library to non-blocking library using Protothreads. But this required that I convert the SDIO SD driver to non-blocking too.
The numbers: Standard LFS vs. Non-Blocking LFS
To compare the “performance” between the standard and a non-blocking LFS libraries, I wrote two test routines that continuously reads a 10k (10240 bytes) file from an SD card using LFS. Each read operation will read a 512 byte block from the SD card. The first routine uses the standard version of the LFS library whereas the second routine uses the converted, non-blocking, version of the LFS library. Both routines mount the FS, open a file, read to the end of the file and then rewind the file. The read and rewind operations are repeated. I’m changing the state of an IO pin to measure the the timing of the following items:
- Time per read iteration (Non-blocking version only).
- Time to read a 512 byte block.
- Time to read the 10k file.
The position, in code, where the IO pin is changed was moved to allow the measurement of the respective items. There is a 10ms delay between each file read. The IO pin is set high during read operations and set low once the read operation is completed.
Following is the LFS configuration and both the Standard and Non-Blocking LFS test routines.
LFS Configuration (for both routines)
const struct lfs_config LFSConfig_S =
{
.read = DISK_Read_S,
.prog = DISK_Write_S,
.erase = DISK_Erase_S,
.sync = DISK_Sync_S,
.read_size = 512,
.prog_size = 512,
.block_size = 512,
.block_count = 7580712,
.block_cycles = 1000,
.cache_size = 512,
.lookahead_size = 512,
.name_max = 255,
.file_max = 2147483647,
.attr_max = 1022,
.read_buffer = connectivity_fs_ReadBuffer,
.prog_buffer = connectivity_fs_ProgBuffer,
.lookahead_buffer = connectivity_fs_LookAheadBuffer,
};
Standard LFS Test Routine
static PT_THREAD(TestThread_V(struct pt* Thread_PS))
{
#define LFS_EOF 0
#define IO_PIN_BLOCK
//#define IO_PIN_FILE
static enum lfs_error FileResult_E;
static lfs_ssize_t BytesRead_S;
static uint8_t ReadBuffer_AU8[512];
static lfs_file_t LFSFileHandle_S;
static lfs_t LFSHandle_S;
static struct pt LocalThread_S;
static TTIMER_S LocalTimer_S;
// Begin the thread
PT_BEGIN(Thread_PS);
// Try and mount the disk
FileResult_E = lfs_mount(&LFSHandle_S, &LFSConfig_S);
// If there was an error
if (FileResult_E < LFS_ERR_OK)
{
// Exit the thread
PT_EXIT(Thread_PS);
}
// Try and open the file
FileResult_E = lfs_file_open(&LFSHandle_S, &LFSFileHandle_S,
"TestFile.jpg", LFS_O_RDONLY);
// If there was an error
if (FileResult_E < LFS_ERR_OK)
{
// Exit the thread
PT_EXIT(Thread_PS);
}
// Loop forever
while (1)
{
// Set the pin (for oscilloscope measurement)
GPIO_SetBits(Measurement_Port, Measurement_Pin);
// Try and read from the file
BytesRead_S = lfs_file_read(&LFSHandle_S, &LFSFileHandle_S,
ReadBuffer_AU8, 512);
#ifdef IO_PIN_BLOCK
// Clear the pin (for oscilloscope measurement)
GPIO_ResetBits(Measurement_Port, Measurement_Pin);
#endif
// If there was an error
if (BytesRead_S < LFS_ERR_OK)
{
// Exit the thread
PT_EXIT(Thread_PS);
}
// If we reached the end of the file
if (BytesRead_S == LFS_EOF)
{
#ifdef IO_PIN_FILE
// Clear the pin (for oscilloscope measurement)
GPIO_ResetBits(Measurement_Port, Measurement_Pin);
#endif
// Try and rewind the file
FileResult_E = lfs_file_rewind(&LFSHandle_S, &LFSFileHandle_S);
// If there was an error
if (FileResult_E < LFS_ERR_OK)
{
// Exit the thread
PT_EXIT(Thread_PS);
}
// Wait a little
TIMER_Start_V(&LocalTimer_S, TIMER_10_MS);
PT_WAIT_UNTIL(Thread_PS, TIMER_Expired_B(&LocalTimer_S));
}
}
// End the thread
PT_END(Thread_PS);
}
Non-Blocking LFS Test Routine
static PT_THREAD(TestThread_V(struct pt* Thread_PS))
{
#define LFS_EOF 0
#define IO_PIN_ITERATION
//#define IO_PIN_BLOCK
//#define IO_PIN_FILE
static enum lfs_error FileResult_E;
static lfs_ssize_t BytesRead_S;
static uint8_t ReadBuffer_AU8[512];
static lfs_file_t LFSFileHandle_S;
static lfs_t LFSHandle_S;
static struct pt LocalThread_S;
TPT_Results_E ThreadResult_E;
static TTIMER_S LocalTimer_S;
// Begin the thread
PT_BEGIN(Thread_PS);
// Lock the mutex
PT_SEM_WAIT(Thread_PS, &LFS_Mutex_S);
// Try and mount the disk
PT_SPAWN(Thread_PS, &LocalThread_S,
lfs_mount(&LocalThread_S, &LFSHandle_S, &LFSConfig_S,
(int*)&FileResult_E));
// Unlock the mutex
PT_SEM_SIGNAL(Thread_PS, &LFS_Mutex_S);
// If there was an error
if (FileResult_E < LFS_ERR_OK)
{
// Exit the thread
PT_EXIT(Thread_PS);
}
// Lock the mutex
PT_SEM_WAIT(Thread_PS, &LFS_Mutex_S);
// Try and open the file
PT_SPAWN(Thread_PS, &LocalThread_S,
lfs_file_open(&LocalThread_S, &LFSHandle_S, &LFSFileHandle_S,
"TestFile.jpg", LFS_O_RDONLY,
(int*)&FileResult_E));
// Unlock the mutex
PT_SEM_SIGNAL(Thread_PS, &LFS_Mutex_S);
// If there was an error
if (FileResult_E < LFS_ERR_OK)
{
// Exit the thread
PT_EXIT(Thread_PS);
}
// Loop forever
while (1)
{
// Lock the mutex
PT_SEM_WAIT(Thread_PS, &LFS_Mutex_S);
// Initialise the thread
PT_INIT(&LocalThread_S);
// Loop until block read is complete
while (1)
{
// Set the pin (for oscilloscope measurement)
GPIO_SetBits(Measurement_Port, Measurement_Pin);
// Try and read from the file
ThreadResult_E = lfs_file_read(&LocalThread_S, &LFSHandle_S,
&LFSFileHandle_S, ReadBuffer_AU8,
512, &BytesRead_S);
#ifdef IO_PIN_ITERATION
// Clear the pin (for oscilloscope measurement)
GPIO_ResetBits(Measurement_Port, Measurement_Pin);
#endif
// If the thread completed
if (ThreadResult_E >= PT_EXITED)
{
break;
}
}
#ifdef IO_PIN_BLOCK
// Clear the pin (for oscilloscope measurement)
GPIO_ResetBits(Measurement_Port, Measurement_Pin);
#endif
// Unlock the mutex
PT_SEM_SIGNAL(Thread_PS, &LFS_Mutex_S);
// If there was an error
if (BytesRead_S < LFS_ERR_OK)
{
// Exit the thread
PT_EXIT(Thread_PS);
}
// If we reached the end of the file
if (BytesRead_S == LFS_EOF)
{
#ifdef IO_PIN_FILE
// Clear the pin (for oscilloscope measurement)
GPIO_ResetBits(Measurement_Port, Measurement_Pin);
#endif
// Lock the mutex
PT_SEM_WAIT(Thread_PS, &LFS_Mutex_S);
// Try and rewind the file
PT_SPAWN(Thread_PS, &LocalThread_S,
lfs_file_rewind(&LocalThread_S,
&LFSHandle_S, &LFSFileHandle_S,
(int*)&FileResult_E));
// Unlock the mutex
PT_SEM_SIGNAL(Thread_PS, &LFS_Mutex_S);
// If there was an error
if (FileResult_E < LFS_ERR_OK)
{
// Exit the thread
PT_EXIT(Thread_PS);
}
// Wait a little
TIMER_Start_V(&LocalTimer_S, TIMER_10_MS);
PT_WAIT_UNTIL(Thread_PS, TIMER_Expired_B(&LocalTimer_S));
}
}
// End the thread
PT_END(Thread_PS);
}
Results
The time for each item was measured on a IO pin using an oscilloscope. The standard library does not have a time per iteration since it will block until the read is complete. The time to read a block varied slightly, with a couple microseconds, over several measurements and the values for the block reads noted in the table below are only from the oscilloscope measurements in this post. But in general, the time to read a block between the standard and non-blocking routines was the same. The non-blocking version took slightly longer (less than 1ms longer) to read the entire file.
Standard LFS | Non-Blocking LFS | |
Time per iteration | N.A. | 3.463us |
Number of iterations | N.A. | 101 (min) 1398 (max) 377 (avg) |
Time to read a 512 byte block | 591.8us (min) 6.29ms (max) | 591.8us (min) 6.34ms (max) |
Time to read the 10k file | 42.64ms | 43.43ms |
But what is really of interest here is the time per iteration. If you are like me and do not want to use an OS with context switching, you can reduce your read time (per iteration) from roughly speaking 591us (worst case 6.26ms) down to 3.5us. A saving of 588.33us (worst case 6.25ms). That is a hell of a lot of processing time. Valuable processing time that I can spend doing other things. So instead of blocking the program execution for 591us (worst case 6.26ms), we are now blocking it for only 3.5us.
Oscilloscope measurements
Standard LFS




Non-blocking LFS






Notes on the Non-Blocking LFS Test Routine
The non-blocking routine was slightly modified to allow measurement between read iterations but will produce the same time per iteration if it was implemented in a normal manner. The only difference between the modified and normal non-blocking routines are the spawning of the child thread using “PT_SPAWN” when reading the file instead of manually checking the thread result. The normal implementation of the non-blocking routine will look like this:
Non-Blocking LFS Routine (normal implementation)
static PT_THREAD(TestThread_V(struct pt* Thread_PS))
{
#define LFS_EOF 0
static enum lfs_error FileResult_E;
static lfs_ssize_t BytesRead_S;
static uint8_t ReadBuffer_AU8[512];
static lfs_file_t LFSFileHandle_S;
static lfs_t LFSHandle_S;
static struct pt LocalThread_S;
TPT_Results_E ThreadResult_E;
static TTIMER_S LocalTimer_S;
// Begin the thread
PT_BEGIN(Thread_PS);
// Lock the mutex
PT_SEM_WAIT(Thread_PS, &LFS_Mutex_S);
// Try and mount the disk
PT_SPAWN(Thread_PS, &LocalThread_S,
lfs_mount(&LocalThread_S, &LFSHandle_S, &LFSConfig_S,
(int*)&FileResult_E));
// Unlock the mutex
PT_SEM_SIGNAL(Thread_PS, &LFS_Mutex_S);
// If there was an error
if (FileResult_E < LFS_ERR_OK)
{
// Exit the thread
PT_EXIT(Thread_PS);
}
// Lock the mutex
PT_SEM_WAIT(Thread_PS, &LFS_Mutex_S);
// Try and open the file
PT_SPAWN(Thread_PS, &LocalThread_S,
lfs_file_open(&LocalThread_S, &LFSHandle_S, &LFSFileHandle_S,
"TestFile.jpg", LFS_O_RDONLY,
(int*)&FileResult_E));
// Unlock the mutex
PT_SEM_SIGNAL(Thread_PS, &LFS_Mutex_S);
// If there was an error
if (FileResult_E < LFS_ERR_OK)
{
// Exit the thread
PT_EXIT(Thread_PS);
}
// Loop forever
while (1)
{
// Lock the mutex
PT_SEM_WAIT(Thread_PS, &LFS_Mutex_S);
// Set the pin (for oscilloscope measurement)
GPIO_SetBits(LED6_RED_GPIO_Port, LED6_RED_Pin);
// Try and read from the file
PT_SPAWN(Thread_PS, &LocalThread_S,
lfs_file_read(&LocalThread_S, &LFSHandle_S, &LFSFileHandle_S,
ReadBuffer_AU8, 512, &BytesRead_S));
// Clear the pin (for oscilloscope measurement)
GPIO_ResetBits(LED6_RED_GPIO_Port, LED6_RED_Pin);
// Unlock the mutex
PT_SEM_SIGNAL(Thread_PS, &LFS_Mutex_S);
// If there was an error
if (BytesRead_S < LFS_ERR_OK)
{
// Exit the thread
PT_EXIT(Thread_PS);
}
// If we reached the end of the file
if (BytesRead_S == LFS_EOF)
{
// Lock the mutex
PT_SEM_WAIT(Thread_PS, &LFS_Mutex_S);
// Try and rewind the file
PT_SPAWN(Thread_PS, &LocalThread_S,
lfs_file_rewind(&LocalThread_S, &LFSHandle_S,
&LFSFileHandle_S, (int*)&FileResult_E));
// Unlock the mutex
PT_SEM_SIGNAL(Thread_PS, &LFS_Mutex_S);
// If there was an error
if (FileResult_E < LFS_ERR_OK)
{
// Exit the thread
PT_EXIT(Thread_PS);
}
// Wait a little
TIMER_Start_V(&LocalTimer_S, TIMER_10_MS);
PT_WAIT_UNTIL(Thread_PS, TIMER_Expired_B(&LocalTimer_S));
}
}
// End the thread
PT_END(Thread_PS);
}
Memory Usage
Both test routines were compiled with the following parameters:
- IDE: STM32CubeIDE v1.1.0
- Compiler: gcc version 7.3.1 20180622 (release) [ARM/embedded-7-branch revision 261907] (GNU Tools for STM32 7-2018-q2-update.20190328-1800)
- Compiler Settings: -mcpu=cortex-m4 -std=gnu11 -g3 -c -Os -ffunction-sections -fdata-sections -Wall -fstack-usage –specs=nano.specs -mfpu=fpv4-sp-d16 -mfloat-abi=hard -mthumb
- Linker Settings: -mcpu=cortex-m4 – –specs=nosys.specs -Wl,-Map=”${ProjName}.map” -Wl,–gc-sections -static –specs=nano.specs -mfpu=fpv4-sp-d16 -mfloat-abi=hard -mthumb -Wl,–start-group -lc -lm -Wl,–end-group
I extracted the memory usage (FLASH and RAM) from the MAP and ELF files using a tool called MapViewer ( https://github.com/govind-mukundan/MapViewer ). The memory usage below is for the FS driver only.
Standard LFS | Non-Blocking LFS | Difference | |
FLASH (text) | 11776 bytes | 21940 bytes | 10164 bytes |
RAM (data and bss) | 1769 bytes | 3357 bytes | 1588 bytes |
Conclusion
- Was this exercise worthwhile? For my application yes. I need the processor bandwidth to do other things while accessing the disk through a file system.
- Is it worth the cost of extra memory usage? I do not like using more memory than necessary and the additional memory usage is a bit higher than what I would have liked but I can always use a slightly larger device if needed . At the gain of processor bandwidth I would say yes.
- I would like to see what is the memory implications when using the standard LFS library with an OS, but for that reason only. So I might do a test at some point and amend this post.
Let me know me what you think.