MISC-TN-017: Persistent storage and read-write file systems

From DAVE Developer's Wiki
Revision as of 08:04, 18 January 2022 by U0001 (talk | contribs) (Embedded Linux systems with eMMC or SD cards)

Jump to: navigation, search
Info Box


History[edit | edit source]

Version Date Notes
1.0.0 January 2021 First public release

Introduction[edit | edit source]

In many cases, embedded systems that are based on Application Processors such as the NXP i.MX6 make use of read/write file systems. In turn, these file systems use non-volatile flash technologies integrated into several different devices (NOR flashes, raw NAND flashes, eMMC's, etc.).

By nature, these components are subject to several issues that need to be handled properly. If not, this can affect negatively their performance in terms of reliability and/or lifetime.

This Technical Note deals with the use of read/write file systems in combination with such memories providing some real-world examples as well.

Embedded Linux systems with NOR flashes or raw NAND flashes[edit | edit source]

Some of the following examples refer to embedded Linux systems making use of NOR flashes or raw NAND flashes. Such systems are commonly managed by MTD/UBI subsystems and, on top of them, UBIFS to manage files.

Therefore, before diving into these examples, we suggest taking a look at our Memory Tecnology Device (MTD) article where these subsystems are explained in more detail.

Embedded Linux systems with eMMC or SD cards[edit | edit source]

Another typical use case refers to eMMC's and SD cards. As explained here, these components are FTL devices, where FTL stands for Flash Translation Layer. This layer emulates a block device on top of flash hardware. Therefore, these storage devices are used in tandem with file systems such as ext4 and FAT32. Besides a raw NAND flash memory, eMMC's and SD cards integrate a microcontroller implementing the FTL and other important tasks as detailed in the rest of the document. All things considered, eMMC's and SD cards appear therefore to the host as managed-NAND block devices.

The sections related to eMMC-based use cases are the result of a joint effort between Western Digital (which purchased SanDisk in 2016), Lauterbach Italy, and DAVE Embedded Systems.


Lauterbach-logo.png
WesterDigital-logo.png


Parts of such sections are retrieved from the White Paper TRACE32 log method for analysing accesses to an eMMC device by Lauterbach, which is freely available for download here.

Wear-out[edit | edit source]

One of the most important factors to take into account is wear-out. Simply put, this is a degradation of the memory device due to repeated erasing/writing cycles — aka P/E cycles — resulting in a limited lifetime.

In order to mitigate this phenomenon, erasing and writing operations have to be distributed uniformly all over the memory. Please note that this process, known as wear leveling, can be either implemented in the host (in the case of a raw NAND memory, for example) or in the memory device itself (for instance, in the case of eMMC's).

Even though wear-out is properly managed, it is unavoidable when writing operations are performed. That being said, how to estimate the lifetime of such a device in practice? Manufacturers provide the number of guaranteed P/E cycles. For more details about this number, please refer to the specifications of your device, which detail the test conditions this number refers to. Once the guaranteed P/E cycles are known and assuming a proper wear-leveling algorithm is in place, the expected lifetime can be determined as follows.

First of all, the Total Bytes Written (TBW) has to be calculated:

TBW = [capacity * P/E cycles] / WAF

where WAF is the Write Amplification Factor. WAF takes into account the actual amount of data written to the memory when performing write operations. This is due to the fact that non-volatile flash memories are organized as an array of sectors that can be individually erased or written. Often, the size of erase sectors and write sectors are different. That is why, in the case of NAND flashes for instance, they are named differently (blocks and pages, respectively). WAF varies largely depending on the workload. If it is not known for the application under discussion, it can also be measured experimentally (see the following example for more details).

Once the TBW is calculated, the expected lifetime can be estimated with this equation:

LT = TBW / D

where D is the amount of data written in the unit of time of interests (month, year, etc.).

Example: embedded Linux system equipped with a raw NAND flash memory and UBIFS file system[edit | edit source]

This example shows how to estimate the lifetime of a raw NAND flash memory used in an embedded Linux system making use of the UBIFS file system. Specifically, the memory p/n is W29N08GVSIAA by Winbond, which is a 1-bit ECC Single-Level Cell (SLC) component. In this case, the wear leveling algorithm is implemented at the Linux kernel level.

According to the datasheet:

  • erase block size is 128KiB
  • the number of P/E cycles is 100000
  • the capacity is 1 GiByte (8 Gibit).

For the sake of simplicity, it is assumed that the file system makes use of the entire memory. Otherwise, only the capacity of the partition of interest has to be taken into account. Regarding the WAF, it is assumed it is 5. This means that for each byte written by the user-space applications and daemons, five bytes are actually saved onto the memory.

TBW = (1 GiByte * 100000) / 5 = 20000 GiByte ~ 19.5 TiByte 

Assuming that the user-space software writes 650 GiB every year, the expected lifetime is

LT = 20000 / 650 = 30.8 years

Experimental measurement of actual written data[edit | edit source]

In many cases, WAF is unknown and can not be estimated either. As stated previously, the system integrator can determine the lifetime expectancy by adopting an experimental approach though. The following procedure describes how to determine the actual written data for the system used in this example.

The main indicator of how much data has been written for NAND devices is how many blocks has been erased, assuming that a block has been erased only if:

  • has already been written (even if not completely)
  • needs to be written again (this is not completely true, because UBI has a background tasks that erases dirty LEB while the system is idle).

Assuming that TEC is the sum of PEB Erase Counter and DAYS is the number of days the test has been run, the estimated amount of written data per year can be computed as:

D = (TEC * PEBsize) * (365 / DAYS)

This already includes WAF and, thus, we can estimate life-time, in year, as:

LF = [capacity * P/E cycles] / D

In the same case above, if we have 30000 TEC/day we have

LF = (1GiB * 100k) / ((30k * 128KiB) * (365 / 1)) ~ 74 years

Example: embedded Linux system equipped with SanDisk SDINBDG4-8G-XI1 eMMC and ext4 file system[edit | edit source]

Lauterbach-eMMC-schema.png

As stated previously, eMMC's and SD cards are block devices. As such, they are operated in tandem with file systems that have been developed for hard disks and solid-state drives. ext4 is one of them and one of the most popular in the Linux world.

From system integrators' perspective, eMMC's and SD cards are easier to use than raw NAND's because they hide most of the complexity regarding the management of the underlying memory. Nonetheless, the architecture of these devices could make it difficult to retrieve data regarding the actual usage of the memory. There are some techniques available, however, to address this issue when working with an embedded Linux platform. This sections will illustrate the following ones:

  • Logging the accesses to the storage device: The idea of this approach is to log all the accesses triggered by the host and isolate the write operations in order to determine the actual amount of data written onto the device. Two different methods are compared. The first one makes use of a hardware-based trace tool while the other exploits a software tracer, namely the Linux kernel's Function Tracer (ftrace).
  • Exploiting the storage device's built-in advanced functionalities.

These approaches are illustrated in more detail in the rest of the section with the help of actual test results. These tests were run on the Evaluation Kit of the Mito8M SoM featuring a SanDisk SDINBDG4-8G-XI1 eMMC operated with an ext4 file system.

Logging the accesses[edit | edit source]

As is known, the specific architecture of a managed-NAND device can be extremely sensitive to certain read and write access sequences performed by the host processor under the direction of the application software, especially if these are frequently iterated.

A classic software-based recording method (log) of these accesses requires the implementation of additional code that captures information and saves it securely. The information can be saved on another permanent storage device, for example, an external USB drive. This software method is intrusive and in addition to the overhead of monitoring the eMMC access, additional overhead is added in order to save the data.

Besides a software-based approach, this example shows also a different method of capturing and saving such information through the use of a hardware-based trace tool. This can be done with minimal intrusion on the software and, in some cases, almost zero. This tool captures the program and data trace transmitted by the cores of a system-on-chip (SoC) through a dedicated trace port and records it to its own dedicated memory. To do that, advanced hardware functionalities of modern SoC's are exploited.

Arm CoreSight™[edit | edit source]

Lauterbach TRACE32 development tools[edit | edit source]

TRACE32-based eMMC access log solution[edit | edit source]

Implementation example for Linux OS[edit | edit source]

Comparison with the software method ftrace[edit | edit source]

Conclusion[edit | edit source]

References[edit | edit source]

Appendix 1: source code example[edit | edit source]

Appendix 2: time details[edit | edit source]

Appendix 3: TRACE32 tools configuration for Arm Cortex-A/R architectures[edit | edit source]

Device's built-in advanced functionalities[edit | edit source]

TBD

Power failures[edit | edit source]

Even though modern file systems are usually tolerant w.r.t. power failures (*), in general, sudden power cuts should be avoided. The system should always be turned off cleanly. As this is not always possible, several techniques can be put in place to mitigate the effects of a power failure. For instance, see this section of the carrier board design guidelines.

(*) Roughly speaking, this means that these file systems are able to keep their consistency across such events. They can not avoid data loss if a power failure occurs in the middle of a write operation, however. For this reason, further countermeasures, such as data redundancy and/or the use of error-detecting/correcting codes, should be implemented at the application level for particularly important data. At the hardware level, DAVE Embedded Systems products usually leverage the "write protect" feature of flash memories in order to prevent erase/program operations during power transitions.

Example: embedded Linux system equipped with a raw NAND flash memory and UBIFS file system over UBI partition[edit | edit source]

Even though both UBI and UBIFS are designed with power-cut tolerance in mind without having support from additional hardware (e.g. supercap, battery power supply, and so on) some data might be lost and some weird effect happens when not performing a clean shutdown of the system.

E.g.:

Additional failures like UBIFS mounted as read-only at boot time usually do not depend only on power-cut but are symptom of major failures (buggy MTD device driver, storage device hardware failure, device wear-out, major EMI and so on).

When designing application to be as safe as possible w.r.t. power-cuts, please also take care of:

Example: embedded Linux system equipped with SanDisk SDINBDG4-8G-XI1 eMMC and ext4 file system[edit | edit source]

TBD

Memory health monitoring[edit | edit source]

Although implementing a mechanism for monitoring the health of flash memories is not required strictly speaking, it is recommended. Think about it as a sort of life insurance to cope with unpredictable events that might occur during the life of the product. As a result of a on-the-field software upgrade, for example, new features could be added leading to an increase of data rate written onto the flash memories. Consequently, the lifetime expectancy calculated when the product was designed is not valid anymore. In such a case, a properly designed monitoring system would alert the personnel devoted to the maintenance who could take measures before it is too late (see for instance the case of eMMC's used in Tesla cars). The following section details an example of such a system.

Example: embedded Linux system equipped with a raw NAND flash memory and UBIFS file system[edit | edit source]

There's two main indicator of NAND device health:

  • current ECC corrected errors
  • block erase counter.

We will focus on the latter because it is easy to extract and give a good lifetime expectation of the device.

UBI put its header on top of each NAND physical erase block (PEB) and here, among the other fields, user can find the erase counter (EC). By Comparing the sum of the EC of all PEB's with the nominal expected maximum erase count, user can estimate the usage of the whole NAND device.

To read EC directly from PEB at runtime, user can rely on ubidumpec tool: this is not yet merged in mtd-utils package, but is provided as RFC on linux-mtd mailing list (it is also provided by default on most of DAVE Linux Embedded development kit).

UBI partition expected remaining life in percentage can be calculated with a simple formula:

RL = ((MaxEC * nr_blocks) - sum(EC)) / (MaxEC * nr_blocks)) * 100

Where:

  • MaxEC is the maximum erase count supported by raw NAND
  • nr_blocks is the count of PEB that are contained on this partition

E.g. in case of a "standard" SLC NAND, which usually has 100k maximum erase count, this can be implemented as simple bash pipe between ubidumpec and awk:

ubidumpec /dev/ubi0 | awk -v MAXEC=100000 '{ s+=$1; n=n+1} END {print s, n*MAXEC, (((n*MAXEC)-s)/(n*MAXEC))*100 }'

This command prints:

  • sum of EC (in this /dev/ubi0 partition)
  • total number of erase/program cycle allowed by this partition
  • expected lifetime left to be used (in percentage).

Running on a (nearly) 1GiB partition on a brand new SLC NAND flash gives:

root@axel:~# ubinfo /dev/ubi0
ubi0
Volumes count:                           3
Logical eraseblock size:                 126976 bytes, 124.0 KiB
Total amount of logical eraseblocks:     8112 (1030029312 bytes, 982.3 MiB)
Amount of available logical eraseblocks: 0 (0 bytes)
Maximum count of volumes                 128
Count of bad physical eraseblocks:       0
Count of reserved physical eraseblocks:  160
Current maximum erase counter value:     2
Minimum input/output unit size:          2048 bytes
Character device major/minor:            248:0
Present volumes:                         0, 1, 2

root@axel:~# ubidumpec /dev/ubi0 | awk -v MAXEC=100000 '{ s+=$1; n=n+1} END {print s, n*MAXEC, (((n*MAXEC)-s)/(n*MAXEC))*100 }'
8161 811200000 99.999

As a confirmirmation of this data, the maximum EC of a given UBI partition can be read directly from sysfs:

root@axel:~# cat /sys/class/ubi/ubi0/max_ec 
2

Example: embedded Linux system equipped with SanDisk SDINBDG4-8G-XI1 eMMC and ext4 file system[edit | edit source]

TBD