Changes

← Older edit

MISC-TN-017: Persistent storage and read-write file systems

6,839 bytes added, 14:17, 16 June 2022

→‎Appendix 2: Video

* "Example: embedded Linux system equipped with SanDisk SDINBDG4-8G-XI1 eMMC and <code>ext4</code> file system"

|-

|{{oldid|15868|2.0.1}}

|January 2022

|Minor changes

|-

|{{oldid|16652|3.0.0}}

|May 2022

|Added detailed analysis of e.MMC accesses (SanDisk SDINBDG4-8G-XI1)

|-

|3.1.0

|June 2022

|Added video of technical presentation by Lauterbach Italy

|}

Another typical use case refers to eMMC's and SD cards. As explained [http://www.linux-mtd.infradead.org/doc/ubifs.html#L_raw_vs_ftl here], these components are FTL devices, where FTL stands for ''Flash Translation Layer''. This layer ''emulates a block device on top of flash hardware''. Therefore, these storage devices are used in tandem with file systems such as [https://en.wikipedia.org/wiki/Ext4 ext4] and [https://en.wikipedia.org/wiki/File_Allocation_Table#FAT32 FAT32]. Besides a raw NAND flash memory, eMMC's and SD cards integrate a microcontroller implementing the FTL and other important tasks as detailed in the rest of the document. All things considered, eMMC's and SD cards appear therefore to the host as managed-NAND block devices.

Regardless of the file system used, e.MMC devices provide some functionalities conceived to monitor their health while operating. As these functionalities are defined by [https://www.jedec.org/sites/default/files/docs/JESD84-B51.pdf JEDEC standards], all the vendors implement them. In practice, e.MMC's integrate some registers providing specific information about the health status. These registers can be accessed with the <code>mmc-utils</code>, which are documented [https://www.kernel.org/doc/html/latest/driver-api/mmc/mmc-tools.html here]. Interestingly, JEDEC standard also defines a set of registers (<code>VENDOR_PROPRIETARY_HEALTH_REPORT</code>) that vendors are free to use for providing further, fine-grained information about the device's health status. Engineers and system integrators are supposed to contact the e.MMC manufacturer to get the required tools for accessing such registers. The sections related to eMMC-based use cases are the result of a joint effort between [https://www.westerndigital.com/ Western Digital] (which purchased [https://en.wikipedia.org/wiki/SanDisk SanDisk] in 2016), [https://www.lauterbach.it Lauterbach Italy], and DAVE Embedded Systems. Parts of such sections are retrieved from the White Paper ''TRACE32 log method for analysing accesses to an eMMC device'' by Lauterbach, which is freely available for download [https://www.lauterbach.com/publications/trace32_log_method_for_analysing_accesses_to_an_emmc_device.pdf here].

[[File:Lauterbach-logo.png|center|thumb|308x308px]]

LT = 20000 / 650 = 30.8 years

===Experimental measurement of ~~'''~~actual~~'''~~ written data===

In many cases, WAF is unknown and can not be estimated either. As stated previously, the system integrator can determine the lifetime expectancy by adopting an experimental approach though. The following procedure describes how to determine the '''actual''' written data for the system used in this example.

== Example: embedded Linux system equipped with SanDisk SDINBDG4-8G-XI1 eMMC and <code>ext4</code> file system ==

=== Introduction ===

As stated previously, eMMC's and SD cards are block devices. As such, they are operated in tandem with file systems that have been developed for hard disks and solid-state drives. [https://en.wikipedia.org/wiki/Ext4 <code>ext4</code>] is one of them and one of the most popular in the Linux world. [[File:Lauterbach-eMMC-schema.png|thumb|240px]]

This implementation along with the software-based method was tested on the following use case:

* Read/write workload to the <code>mmc0 </code> device was issued by using [https://github.com/stressapptest/stressapptest stressapptest] application (<code>stressapptest -s 20 -f /mnt/mmc0/file1 -f /mnt/mmc0/file2</code>) resulting in the creation of two files, 16 MByte each

-rw-r--r-- 1 root root 8388608 Dec 3 16:30 file1

cat /sys/kernel/debug/tracing/trace_pipe > /home/root/prove/ftrace.txt

</pre>

===== Verification =====

To verify the implementation of the TRACE32-based method, a specific test was run. In essence, the testbed was configured in order to run TRACE32 and <code>ftrace</code> tracing — more details in the following section — simultaneously for analyzing the same workload. The logs produced by the two methods were then compared to ensure they match. ~~To this end,~~ ~~These logs~~

==== Results and comparison with the software-based method (<code>ftrace</code>) ====

[[File:Lauterbach-eMMC-2-3.png|center|thumb|821x821px|With ftrace, no TRACE32 instrumentation]]

=== ~~Conclusion~~ =Analysis of the logs and conclusions ====No matter how the accesses to the e.MMC are traced, once the logs are available they can be processed thoroughly to produce reports that are very useful to analyze how the host actually operates the device. The following are some such reports from a test conducted on a e.MMC partition (<code>mmcblk0p1</code>) formatted with <code>ext4</code> file system: <pre class="board-terminal">root@imx8mqevk:~# mkfs.ext4 /dev/mmcblk0p1</pre> Please note that this formatting results in an <code>ext4</code> 4-kByte block size:<pre class="board-terminal">root@imx8mqevk:~# dumpe2fs -x /dev/mmcblk0p1 dumpe2fs 1.43.5 (04-Aug-2017)Filesystem volume name: <none>...Block size: 4096...</pre>The [https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Blocks ext4 block] '''must not be confused with the e.MMC blocks''', which are 512 bytes as per JEDEC specifications and are addressed according to the [https://en.wikipedia.org/wiki/Logical_block_addressing LBA] scheme. The analyzed workload is the result of a combination of different tools performing read and write accesses (<code>/mnt/mmc0</code> is the mount point of the partition being tested):<pre class="board-terminal">root@imx8mqevk:~# stressapptest -s 20 -f /mnt/mmc0/file1 -f /mnt/mmc0/file2root@imx8mqevk:~# find / -name "*" > /mnt/mmc0/find_results.txtroot@imx8mqevk:~# dd if=/dev/urandom of=/mnt/mmc0/dummyfile.bin bs=4k count=25000root@imx8mqevk:~# rm /mnt/mmc0/dummyfile.binroot@imx8mqevk:~# dd if=/dev/zero of=/mnt/mmc0/dummyfile.bin bs=4k count=25000root@imx8mqevk:~# sync</pre> The following chart shows the e.MMC accesses over time during the execution of the workload along with other measurements such as read/write throughput. [[File:MISC-TN-017-eMMC-chart1.png|center|thumb|800x800px|e.MMC accesses over time]] It is also possible to extrapolate the latency of the operations. [[File:MISC-TN-017-eMMC-chart3-latency.png|center|thumb|800x800px|Latency]] Another extremely useful graphical depiction is the chunk size distribution. For instance, this information is often used to understand how efficient the user application is when it comes to optimize the write operations for maximizing the e.MMC lifetime. The pie on the left refers to the read operations, while the other one refers to the write operations. [[File:MISC-TN-017-eMMC-chart2-chunk-size.png|center|thumb|800x800px|Chunk size distribution]]To interpret the result, one needs to take into account how the workload was implemented. In the example under discussion, the workload basically makes use of two applications: <code>[https://man7.org/linux/man-pages/man1/dd.1.html dd]</code> and <code>stressapptest</code>. <code>dd</code> was specified to use 4-kByte data chunks (<code>bs=4k</code>). <code>stressapptest</code> uses 512-byte chunks instead because the <code>--write-block-size</code> parameter was not used (for more details please refer to the [https://github.com/stressapptest/stressapptest/blob/e6c56d20c0fd16b07130d6e628d0dd6dcf1fe162/src/worker.cc#L2615 source code]). As a result, one would expect that the majority of accesses are 512 bytes and 4 kByte. The charts clearly show that this is not the case. Most of the accesses are 512kB instead. This is a blatant example of how the algorithms of the file systems and the kernel block driver can alter the accesses issued at application level for optimization purposes.

==== Appendix 1: source code example ====

</syntaxhighlight>

==== Appendix 2: Video ====Technical Note presentation by Lauterbach (Language: Italian; Subtitles: English and Italian){{#ev:youtube|YDWAGy2QnA0|600|center|Persistent storage and read-write file systems|frame}} === Device's built-in advanced functionalities ===e.MMC's feature advanced functionalities that are useful for monitoring wear-out and, in general, the health of the device. For more details, please see the section [[#Example: embedded Linux system equipped with an e.MMC|this section]]. =Power failures===

Even though modern file systems are usually tolerant w.r.t. power failures (*), in general, sudden power cuts should be avoided. The system should always be turned off cleanly. As this is not always possible, several techniques can be put in place to mitigate the effects of a power failure. For instance, see [[Carrier_board_design_guidelines_(SOM)#Sudden_power_off_management|this section of the carrier board design guidelines]].

* [http://www.linux-mtd.infradead.org/doc/ubifs.html#L_writebuffer UBIFS write-buffer].

== Memory health monitoring ==

Although implementing a mechanism for monitoring the health of flash memories is not required strictly speaking, it is recommended. Think about it as a sort of life insurance to cope with unpredictable events that might occur during the life of the product. As a result of a on-the-field software upgrade, for example, new features could be added leading to an increase of data rate written onto the flash memories. Consequently, the lifetime expectancy calculated when the product was designed is not valid anymore. In such a case, a properly designed monitoring system would alert the personnel devoted to the maintenance who could take measures before it is too late (see for instance the case of eMMC's used in [https://electrek.co/2020/11/09/tesla-emmc-failure-touchscreen-offers-extended-warranty/ Tesla cars]). The following section details an example of such a system.

== Example: embedded Linux system equipped with a raw NAND flash memory and UBIFS file system ==

There's are two main ~~indicator~~ indicators of NAND device health:

* current ECC corrected errors

* block erase counter.

</syntaxhighlight>

*

As a ~~confirmirmation~~ confirmation of this data, the maximum EC of a given UBI partition can be read directly from <code>sysfs</code>:<syntaxhighlight lang="text">

root@axel:~# cat /sys/class/ubi/ubi0/max_ec

2

</syntaxhighlight>

== Example: embedded Linux system equipped with an e.MMC ==As explained in [[#Embedded Linux systems with eMMC or SD cards|this section]], e.MMC's provide specific functionalities for device's health monitoring. In practice, these components expose some registers that make health-related information available to the host. Following is a dump of the such registers regarding the wear-out status of the device, namely <code>DEVICE_LIFE_TIME_EST_TYP_B</code>, <code>DEVICE_LIFE_TIME_EST_TYP_B</code>, and <code>PRE_EOL_INFO</code>: <pre class="board-terminal">root@desk-mx8mp:~# mmc extcsd read /dev/mmcblk2 | grep LIFEeMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x01eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x01root@desk-mx8mp:~# mmc extcsd read /dev/mmcblk2 | grep EOLeMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01</pre>This dump refers to the same testbed described here. Some manufacturers use additional proprietary registers also for providing information about the amount of data that have been actually written onto the device. If available, this number allows to calculate the WAF given that the amount of data written by the applications of the test workload is known too. The health status registers can be exploited to implement a monitoring mechanism as well. For example, a user-space application can poll periodically the status of the device and take actions accordingly if the wear-out exceeds predefined thresholds. Last but not least, it is worth remembering that advanced proprietary off-line tools may also be available for health monitoring. For instance, Western Digital provides such tools for its devices. For more information, please contact our [mailto:sales@dave.eu Sales Department]. = References ==

* Western Digital Corporation, ''[https://link.westerndigital.com/content/dam/customer-portal/en_us/external/public/cps/p/White_Paper_Design_Considerations_v1.0.pdf Design Considerations for Embedded Products]'', 2018

* Western Digital Corporartion, ''[https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/collateral/white-paper/white-paper-automotive-workload-analysis.pdf Automotive Workload Analysis]'', September 2021

U0009

dave_user, Administrators

5,141

edits

Changes

MISC-TN-017: Persistent storage and read-write file systems

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Quick Links

Contact us

How to use wiki

Advanced Search

Tools