MISC-TN-017: Persistent storage and read-write file systems

From DAVE Developer's Wiki
Revision as of 13:45, 20 January 2021 by U0002 (talk | contribs) (Example: embedded Linux system equipped with a raw NAND flash memory and UBIFS file system)

Jump to: navigation, search
Info Box


History[edit | edit source]

Version Date Notes
1.0.0 January 2021 First public release

Introduction[edit | edit source]

In many cases, embedded systems that are based on Application Processors such as the NXP i.MX6 make use of read/write file systems. In turn, these file systems use non-volatile flash technologies integrated into several different devices (NOR flashes, raw NAND flashes, eMMC's, etc.).

By nature, these components are subject to several issues that need to be handled properly. If not, this can affect negatively their performance in terms of reliability and/or lifetime.

This Technical Note deals with the use of read/write file systems in combination with such memories.

Wear-out[edit | edit source]

One of the most important factors to take into account is wear-out. Simply put, this is a degradation of the memory device due to repeated erasing/writing cycles — aka P/E cycles — resulting in a limited lifetime.

In order to mitigate this phenomenon, erasing and writing operations have to be distributed uniformly all over the memory. Please note that this process, known as wear leveling, can be either implemented in the host (in the case of a raw NAND memory, for example) or in the memory device itself (for instance, in the case of eMMC's).

Even though wear-out is properly managed, it is unavoidable when writing operations are performed. That being said, how to estimate the lifetime of such a device in practice? Manufacturers provide the number of guaranteed P/E cycles. For more details about this number, please refer to the specifications of your device, which detail the test conditions this number refers to. Once the guaranteed P/E cycles are known and assuming a proper wear-leveling algorithm is in place, the expected lifetime can be determined as follows.

First of all, the Total Bytes Written (TBW) has to be calculated:

TBW = [capacity * P/E cycles] / WAF

where WAF is the Write Amplification Factor. WAF takes into account the actual amount of data written to the memory when performing write operations. This is due to the fact that non-volatile flash memories are organized as an array of sectors that can be individually erased or written. Often, the size of erase sectors and write sectors are different. That is why, in the case of NAND flashes for instance, they are named differently (blocks and pages, respectively). WAF varies largely depending on the workload. If it is not known for the application under discussion, it can also be measured experimentally (see the following example for more details).

Once the TBW is calculated, the expected lifetime can be estimated with this equation:

LT = TBW / D

where D is the amount of data written in the unit of time of interests (month, year, etc.).

Example: embedded Linux system equipped with a raw NAND flash memory and UBIFS file system[edit | edit source]

This example shows how to estimate the lifetime of a raw NAND flash memory used in an embedded Linux system making use of the UBIFS file system. Specifically, the memory p/n is W29N08GVSIAA by Winbond, which is a 1-bit ECC Single-Level Cell (SLC) component. In this case, the wear leveling algorithm is implemented at the Linux kernel level.

According to the datasheet, the number of P/E cycles is 100000. The capacity is 1 GByte (8 Gbit). For the sake of simplicity, it is assumed that the file system makes use of the entire memory. Otherwise, only the capacity of the partition of interest has to be taken into account. Regarding the WAF, it is assumed it is 5. This means that for each byte written by the user-space applications and daemons, five bytes are actually saved onto the memory.

TBW = (1 GByte * 100000) / 5 = 20000 GByte ~ 19.5 TByte 

Assuming that the user-space software writes 650 gigabytes every year, the expected lifetime is

LT = 20000 / 650 = 30.8 years
Experimental measurement of the actual written data[edit | edit source]

In many cases, WAF is unknown and can not be estimated either. As stated previously, the system integrator can determine the lifetime expectancy by adopting an experimental approach though. The following procedure describes how to determine the actual written data for the system used in this example.

TBD

Thus, the expected lifetime is

LT = = x years

Power failures[edit | edit source]

Even though modern file systems are usually tolerant w.r.t. power failures (*), in general, sudden power cuts should be avoided. The system should always be turned off cleanly. As this is not always possible, several techniques can be put in place to mitigate the effects of a power failure. For instance, see this section of the carrier board design guidelines.

That being said, TBD

(*) Roughly speaking, this means that these file systems are able to keep their consistency across such events. They can not avoid data loss if a power failure occurs in the middle of a write operation, however. For this reason, further countermeasures, such as data redundancy and/or the use of error-detecting/correcting codes, should be implemented at the application level for particularly important data. At the hardware level, DAVE Embedded Systems products usually leverage the "write protect" feature of flash memories in order to prevent erase/program operations during power transitions.

Memory health monitoring[edit | edit source]

Although implementing a mechanism for monitoring the health of flash memories is not required strictly speaking, it is recommended. Think about it as a sort of life insurance to cope with unpredictable events that might occur during the life of the product. As a result of a on-the-field software upgrade, for example, new features could be added leading to an increase of data rate written onto the flash memories. Consequently, the lifetime expectancy calculated when the product was designed is not valid anymore. In such a case, a properly designed monitoring system would alert the personnel devoted to the maintenance who could take measures before it is too late (see for instance the case of eMMC's used in Tesla cars). The following section details an example of such a system.

Example: embedded Linux system equipped with a raw NAND flash memory and UBIFS file system[edit | edit source]

On embedded Linux system, raw NAND are commonly managed by MTD/UBI subsystems and, on top of them, UBIFS to manage files

These subsystems are explained in more details in Memory Tecnology Device (MTD) article, here we'll focus on health monitoring

There's two main indicator of NAND device health:

  • current ECC corrected errors
  • block erase counter

We'll focus on the latter because it's easy to extract and give a good lifetime expectation of the device

UBI put its header on top of each NAND physical erase block (PEB) and here, among the other fields, user can find the erase counter (EC). Comparing the sum of the EC of all PEB with the nominal expected maximum erase count, user can estimate the usage of the whole NAND device.

To read EC directly from PEB at runtime, user can rely on ubidumpec tool: this is not yet merged in mtd-utils package but is provided as RFC on linux-mtd mailing list (it's also provided by default on most of DAVE Linux Embedded development kit)

UBI partition expected remaining life in percentage can be calculated with a simple formula:

RL = (MaxEC - sum(EC)) / MaxEC) * 100

Where MaxEC is the maximum erase count supported by raw NAND. E.g. in case of a "standard" SLC NAND, which usually have 100k maximum erase count, this can be implemented as simple bash pipe between ubidumpec and awk

ubidumpec /dev/ubi0 | awk -v MAXEC=100000 '{ s+=$1; n=n+1} END {print s, n*MAXEC, (((n*MAXEC)-s)/(n*MAXEC))*100 }'

This command prints:

  • number of blocks (in this /dev/ubi0 partition)