Jump to: navigation, search

MISC-TN-017: Persistent storage and read-write file systems

1,380 bytes added, 09:13, 21 January 2021
U0002 moved page MISC-TN-017: Persistent storage and read/write file systems to MISC-TN-017: Persistent storage and read-write file systems without leaving a redirect: remove / from title (which is used in mediawiki for subpages
In many cases, embedded systems that are based on Application Processors such as the NXP i.MX6 make use of read/write file systems. In turn, these file systems use non-volatile flash technologies integrated into several different devices (NOR flashes, [ raw NAND flashes], eMMC's, etc.).
By nature, these components are subject to several issues that need to be handled properly. If not, this can affect negatively their performance in terms of reliability and/or lifetime.
This Technical Note deals with the use of read/write file systems in combination with such memoriesproviding some real-world examples as well. === Embedded Linux systems with NOR flashes or raw NAND flashes ===Some of the following examples refer to embedded Linux systems making use of NOR flashes or raw NAND flashes. Such systems are commonly managed by [ MTD]/[ UBI] subsystems and, on top of them, [ UBIFS] to manage files. Therefore, before diving into these examples, we suggest to take a look at our [[Memory Tecnology Device (MTD)]] article where these subsystems are explained in more detail.
This example shows how to estimate the lifetime of a raw NAND flash memory used in an embedded Linux system making use of the UBIFS file system. Specifically, the memory p/n is W29N08GVSIAA by Winbond, which is a 1-bit ECC Single-Level Cell (SLC) component. In this case, the wear leveling algorithm is implemented at the Linux kernel level.
According to the datasheet, :* erase block size is 128KiB* the number of P/E cycles is 100000. The * the capacity is 1 GByte GiByte (8 GbitGibit). For the sake of simplicity, it is assumed that the file system makes use of the entire memory. Otherwise, only the capacity of the partition of interest has to be taken into account. Regarding the WAF, it is assumed it is 5. This means that for each byte written by the user-space applications and daemons, five bytes are actually saved onto the memory.
TBW = (1 GByte GiByte * 100000) / 5 = 20000 GByte GiByte ~ 19.5 TByte TiByte
Assuming that the user-space software writes 650 gigabytes GiB every year, the expected lifetime is
LT = 20000 / 650 = 30.8 years
In many cases, WAF is unknown and can not be estimated either. As stated previously, the system integrator can determine the lifetime expectancy by adopting an experimental approach though. The following procedure describes how to determine the '''actual''' written data for the system used in this example.
TBDThe main indicator of ''how much data has been written'' for NAND devices is ''how many blocks has been erased'', assuming that a block has been erased only if:* has already being written (even if not completely)* needs to be written again (this is not completely true, because UBI has a background tasks that erases dirty LEB while the system is idle).Assuming that <code>TEC</code> is the ''sum of PEB Erase Counter'' and <code>DAYS</code> is the number of days the test has been run, the ''estimated amount of written data per year'' can be computed as:
Thus D = (TEC * PEBsize) * (365 / DAYS)This already includes WAF and, the expected lifetime isthus, we can estimate life-time, in year, as:<syntaxhighlight lang="text">LF = [capacity * P/E cycles] / D LT </syntaxhighlight>In the same case above, if we have 30000 TEC/day we have<syntaxhighlight lang= "text">LF = x (1GiB * 100k) / ((30k * 128KiB) * (365 / 1)) ~ 74 years</syntaxhighlight>
==Power failures==
Even though modern file systems are usually tolerant w.r.t. power failures (*), in general, sudden power cuts should be avoided. The system should always be turned off cleanly. As this is not always possible, several techniques can be put in place to mitigate the effects of a power failure. For instance, see [[Carrier_board_design_guidelines_(SOM)#Sudden_power_off_management|this section of the carrier board design guidelines]].
That being said, TBD
(*) Roughly speaking, this means that these file systems are able to keep their consistency across such events. They can not avoid data loss if a power failure occurs in the middle of a write operation, however. For this reason, further countermeasures, such as data redundancy and/or the use of error-detecting/correcting codes, should be implemented at the application level for particularly important data. At the hardware level, DAVE Embedded Systems products usually leverage the "write protect" feature of flash memories in order to prevent erase/program operations during power transitions.
=== Example: embedded Linux system equipped with a raw NAND flash memory and UBIFS file system over UBI partition ===
Even if though both UBI and UBIFS [ are designed with power-cut tolerance in mind] without having support from additional hardware (e.g. supercap, battery power supply , and so on) some data might be lost and some ''weird'' effect happens when not performing a clean shutdown of the system.
* [ zero-file length corruption]
* [ trailing zeros on files]
Additional failure failures like [ UBIFS mounted as read-only] at boot time usually do not depend only on power-cut but are symptom of major failure failures (buggy MTD device driver, storage device hardware failure, device wear-out, major EMI and so on).
When designing application to be as safer safe as possible w.r.t. power-cutcuts, please also take care of:
* [ information of how to change a file atomically]
* [ notes about UBIFS write-writeback support]
* [ UBIFS write-buffer].
== Memory health monitoring ==
=== Example: embedded Linux system equipped with a raw NAND flash memory and UBIFS file system ===
On embedded Linux system, raw NAND are commonly managed by MTD/UBI subsystems and, on top of them, UBIFS to manage files
These subsystems are explained in more details in [[Memory Tecnology Device (MTD)]] article, here we'll focus on health monitoring
There's two main indicator of NAND device health:
* current ECC corrected errors
* block erase counter.We'll will focus on the latter because it's is easy to extract and give a good lifetime expectation of the device.
UBI put [ its header] on top of each NAND physical erase block ('''PEB''') and here, among the other fields, user can find the erase counter ('''EC'''). By Comparing the sum of the EC of all PEB 's with the nominal expected maximum erase count, user can estimate the usage of the whole NAND device.
To read EC directly from PEB at runtime, user can rely on <code>ubidumpec</code> tool: this is not yet merged in [ mtd-utils] package , but is provided as [ RFC on linux-mtd mailing list] (it's is also provided by default on most of DAVE Linux Embedded development kit).
UBI partition expected remaining life in percentage can be calculated with a simple formula:<syntaxhighlight lang="text">
* <code>MaxEC</code> is the maximum erase count supported by raw NAND
* <code>nr_blocks</code> is the count of PEB that are contained on this partition
E.g. in case of a "standard" SLC NAND, which usually have has 100k maximum erase count, this can be implemented as simple bash pipe between <code>ubidumpec</code> and <code>awk</code>:<syntaxhighlight lang="bash">
ubidumpec /dev/ubi0 | awk -v MAXEC=100000 '{ s+=$1; n=n+1} END {print s, n*MAXEC, (((n*MAXEC)-s)/(n*MAXEC))*100 }'
</syntaxhighlight>This command prints:
* sum of EC (in this <code>/dev/ubi0</code> partition)
* total number of erase/program cycle allowed by this partition
* expected lifetime left to be used (in percentage).
Running on a (nearly) 1GiB partition on a brand new SLC NAND flash gives:<syntaxhighlight lang="text">
root@axel:~# ubinfo /dev/ubi0
As confirm a confirmirmation of this data, the maximum EC of a given UBI partition can be read directly from <code>sysfs</code>:<syntaxhighlight lang="text">
root@axel:~# cat /sys/class/ubi/ubi0/max_ec

Navigation menu