Changes

← Older edit

MISC-TN-017: Persistent storage and read-write file systems

6,928 bytes added, 14:17, 16 June 2022

→‎Appendix 2: Video

[[Category:MISC-AN-TN]]

|First public release

|-

|{{oldid|15846|2.0.0}}

|January 2022

|Added the ~~section~~ sections* "Embedded Linux systems with eMMC or SD cards"* "Example: embedded Linux system equipped with SanDisk SDINBDG4-8G-XI1 eMMC and <code>ext4</code> file system"|-|{{oldid|15868|2.0.1}}|January 2022|Minor changes|-|{{oldid|16652|3.0.0}}|May 2022|Added detailed analysis of e.MMC accesses (SanDisk SDINBDG4-8G-XI1)|-|3.1.0|June 2022|Added video of technical presentation by Lauterbach Italy

|}

Another typical use case refers to eMMC's and SD cards. As explained [http://www.linux-mtd.infradead.org/doc/ubifs.html#L_raw_vs_ftl here], these components are FTL devices, where FTL stands for ''Flash Translation Layer''. This layer ''emulates a block device on top of flash hardware''. Therefore, these storage devices are used in tandem with file systems such as [https://en.wikipedia.org/wiki/Ext4 ext4] and [https://en.wikipedia.org/wiki/File_Allocation_Table#FAT32 FAT32]. Besides a raw NAND flash memory, eMMC's and SD cards integrate a microcontroller implementing the FTL and other important tasks as detailed in the rest of the document. All things considered, eMMC's and SD cards appear therefore to the host as managed-NAND block devices.

~~The sections related~~ Regardless of the file system used, e.MMC devices provide some functionalities conceived to ~~eMMC-based use cases~~ monitor their health while operating. As these functionalities are ~~the result of a joint effort between~~ defined by [https://www.~~westerndigital~~jedec.~~com~~org/ ~~Western Digital~~sites/default/files/docs/JESD84-B51.pdf JEDEC standards] (, all the vendors implement them. In practice, e.MMC's integrate some registers providing specific information about the health status. These registers can be accessed with the <code>mmc-utils</code>, which ~~purchased~~ are documented [https://enwww.~~wikipedia~~kernel.org/~~wiki~~doc/html/latest/driver-api/mmc/~~SanDisk SanDisk~~mmc-tools.html here] ~~in 2016~~. Interestingly, JEDEC standard also defines a set of registers (<code>VENDOR_PROPRIETARY_HEALTH_REPORT</code>)that vendors are free to use for providing further, ~~Lauterbach Italy,~~ fine-grained information about the device's health status. Engineers and ~~DAVE Embedded Systems~~system integrators are supposed to contact the e.MMC manufacturer to get the required tools for accessing such registers.

The sections related to eMMC-based use cases are the result of a joint effort between [https://www.westerndigital.com/ Western Digital] (which purchased [https://en.wikipedia.org/wiki/SanDisk SanDisk] in 2016), [https://www.lauterbach.it Lauterbach Italy], and DAVE Embedded Systems. Parts of such sections are retrieved from the White Paper ''TRACE32 log method for analysing accesses to an eMMC device'' by Lauterbach, which is freely available for download [https://www.lauterbach.com/publications/trace32_log_method_for_analysing_accesses_to_an_emmc_device.pdf here].

[[File:Lauterbach-logo.png|center|thumb|308x308px]]

[[File:WesterDigital-logo.png|center|thumb|180x180px]]

Parts of such sections are retrieved from the White Paper ''TRACE32 log method for analysing accesses to an eMMC device'' by Lauterbach, which is freely available for download [https://www.lauterbach.com/publications/trace32_log_method_for_analysing_accesses_to_an_emmc_device.pdf here].

=Wear-out=

LT = 20000 / 650 = 30.8 years

===Experimental measurement of ~~'''~~actual~~'''~~ written data===

In many cases, WAF is unknown and can not be estimated either. As stated previously, the system integrator can determine the lifetime expectancy by adopting an experimental approach though. The following procedure describes how to determine the '''actual''' written data for the system used in this example.

== Example: embedded Linux system equipped with SanDisk SDINBDG4-8G-XI1 eMMC and <code>ext4</code> file system ==

=== Introduction ===As stated previously, eMMC's and SD cards are block devices. As such, they are operated in tandem with file systems that have been developed for hard disks and solid-state drives. [https://en.wikipedia.org/wiki/Ext4 <code>ext4</code>] is one of them and one of the most popular in the Linux world.[[File:Lauterbach-eMMC-schema.png|thumb|~~481x481px~~240px]]

~~=== Introduction ===~~

As stated previously, eMMC's and SD cards are block devices. As such, they are operated in tandem with file systems that have been developed for hard disks and solid-state drives. [https://en.wikipedia.org/wiki/Ext4 <code>ext4</code>] is one of them and one of the most popular in the Linux world.

From system integrators' perspective, eMMC's and SD cards are easier to use than raw NAND's because they hide most of the complexity regarding the management of the underlying memory. ~~Nonetheless~~On the other hand, the architecture of these devices could make it difficult to retrieve data regarding the actual usage of the memory. There are some techniques available, however, to address this issue when working with an embedded Linux platform. This ~~sections~~ section will illustrate the following ones:* Logging the accesses to the storage device: The idea of this approach is to log all the accesses triggered by the host and isolate the write operations in order to determine the actual amount of data written onto the device. Two different methods are compared. The first one makes use of a hardware-based ~~trace~~ tracing tool while the other exploits a software tracer, namely the Linux kernel's Function Tracer (aka <code>ftrace</code>).

* Exploiting the storage device's built-in advanced functionalities.

These approaches are illustrated in more detail in the rest of the document with the help of ~~results of testing~~ specific tests conducted on a real target.

==== Testbed ====

Specifically, these tests were run on the [[MITO_8M_SOM/MITO_8M_Evaluation_Kit|Evaluation Kit]] of the [[MITO_8M_SOM|Mito8M SoM]] running Yocto Linux and featuring a SanDisk SDINBDG4-8G-XI1 eMMC operated with an <code>ext4</code> file system. It is worth remembering that the same testbed was used for this [[MITO8M-AN-001:_Advanced_multicore_debugging,_tracing,_and_energy_profiling_with_Lauterbach_TRACE32|Application Note]] as well.

The evaluation kit consists of three boards: the SoM, the SBCX carrier board, and an adapter board. This setup provides off-chip trace via a parallel trace port or a PCIe interface. The SoM is equipped with the NXP i.MX8M SoC, which in turn is based on the Quad-Core ~~Arm®~~ ARM® Cortex-A53 CPU. The SOC features two Ultra Secured Digital Host ~~Controller~~ Controllers (uSDHC) supporting SD/SDIO/MMC cards and devices. For the purpose of the tests under discussion, the uSDHC ports were used as depicted in the following image.

[[File:MISC-TN-017-eMMC-uSD-interfacing.png|center|thumb|~~585x585px~~400x400px|eMMC and microSD card interfacing]]

The microSD card connected to uSDHC1 was used for the bootloader, the Linux kernel, and the root file system. The eMMC device connected to uSDHC2 was used for the main workload to be analyzed. The Linux kernel version ~~used~~ is 4.14.98.

===Logging the accesses===

This becomes important when considering the life of the raw-NAND memory inside the eMMC, which has a finite number of Program/Erase cycles. See the example below:

~~TBD~~[[File:Lauterbach-eMMC-WAF-example.png|center|thumb|500x500px]]

To estimate the WAF for any particular eMMC device, and hence its expected lifetime on your application, you can capture the log file of the activity.

Below is an example of how the TRACE32-based log method can be applied to a Linux system. The solution is based on light instrumentation of the <code>mmc_start_request()</code> and <code>mmc_request_done()</code> functions defined in the Linux <code>drivers/mmc/core/core.c</code> source code file. Relevant eMMC device accesses are captured through the instrumentation code and they are written to a static data structure making them immediately traceable if data trace is available in the SoC. If data tracing is not possible, the instrumentation code writes the data to the Arm®/Cortex® Context ID register.

The solution was successfully tested on the aforementioned embedded platform. The instrumentation code is provided in [[#Appendix 1: source code example|Appendix 1]]. The zero initialization of the <code>T32_mmc</code> structure is guaranteed by Linux, since this variable is allocated in the <code>bss </code> section. The instrumentation is normally disabled but can be enabled by writing the value "1" into the <code>enable</code> field of the <code>T32_mmc</code> structure. The identifier of the eMMC device to be traced must be written into the <code>dev</code> field. Both of these operations can be performed from a TRACE32 script with the following commands:

NOTE: ‘OR Cycle task’ is optional.

This implementation along with the software-based method ~~were~~ was tested on the following use case:* Read/write workload to the <code>mmc0 </code> device was issued by using [https://github.com/stressapptest/stressapptest stressapptest] application (<code>stressapptest -s 20 -f /mnt/mmc0/file1 -f /mnt/mmc0/file2</code>) resulting in the creation of two files, 16 MByte each

-rw-r--r-- 1 root root 8388608 Dec 3 16:30 file1

cat /sys/kernel/debug/tracing/trace_pipe > /home/root/prove/ftrace.txt

</pre>

===== Verification =====

To verify the implementation of the TRACE32-based method, a specific test was run. In essence, the testbed was configured in order to run TRACE32 and <code>ftrace</code> tracing — more details in the following section — simultaneously for analyzing the same workload. The logs produced by the two methods were then compared to ensure they match.

==== Results and comparison with the software-based method (<code>ftrace</code>) ====

The functions average duration analysis of eMMC accesses highlights the greater weight required by <code>ftrace</code>. Additional, detailed charts are shown in the following section. They allow to determine that using <code>ftrace</code> also involves a greater dispersion of the execution times compared to both the kernel without <code>ftrace</code> and the kernel instrumented only with the code for TRACE32. In particular, the functions <code>mmc_start_request()</code> and <code>mmc_request_done()</code> have a few microseconds constant execution time without <code>ftrace</code>, and show a very variable execution time with <code>ftrace</code>, with a maximum time up to 279 us and 285 us respectively.

~~===== Detailed time analysis =====~~ ~~====== <code>mmc_start_request</code> ======[[File:Lauterbach-eMMC-1-1.png|center|thumb|724x724px|No ftrace~~In conclusion, ~~no TRACE32 instrumentation]][[File:Lauterbach-eMMC-1-2.png|center|thumb|725x725px|No ftrace, with TRACE32 instrumentation]][[File:Lauterbach-eMMC-1-3.png|center|thumb|725x725px|With ftrace, no TRACE32 instrumentation]]~~ ~~====== <code>mmc_request_done</code> ======[[File:Lauterbach-eMMC-2-1.png|center|thumb|821x821px|No ftrace, no TRACE32 instrumentation]]~~ ~~[[File:Lauterbach-eMMC-2-2.png|center|thumb|819x819px|No ftrace, with TRACE32 instrumentation]]~~ ~~[[File:Lauterbach-eMMC-2-3.png|center|thumb|821x821px|With ftrace, no TRACE32 instrumentation]]~~ ~~==== Conclusion ====The~~ the hardware method based on TRACE32 provides the same log data as recorded by <code>ftrace</code> but with minimal changes to the kernel (a few lines in a file) and a tiny time penalty. It also does not use any additional memory (in terms of RAM and file system storage) and allows for extremely long measurement times.

The following table summarizes the advantages and disadvantages of the two considered solutions: TRACE32 vs <code>ftrace</code>.

* For each eMMC operation, <code>ftrace</code> saves roughly 876 bytes of log information.

|}

===== Detailed time analysis ===== ====== <code>mmc_start_request</code> ======[[File:Lauterbach-eMMC-1-1.png|center|thumb|724x724px|No ftrace, no TRACE32 instrumentation]][[File:Lauterbach-eMMC-1-2.png|center|thumb|725x725px|No ftrace, with TRACE32 instrumentation]][[File:Lauterbach-eMMC-1-3.png|center|thumb|725x725px|With ftrace, no TRACE32 instrumentation]] ====== <code>mmc_request_done</code> ======[[File:Lauterbach-eMMC-2-1.png|center|thumb|821x821px|No ftrace, no TRACE32 instrumentation]] [[File:Lauterbach-eMMC-2-2.png|center|thumb|819x819px|No ftrace, with TRACE32 instrumentation]] [[File:Lauterbach-eMMC-2-3.png|center|thumb|821x821px|With ftrace, no TRACE32 instrumentation]] ==== Analysis of the logs and conclusions ====No matter how the accesses to the e.MMC are traced, once the logs are available they can be processed thoroughly to produce reports that are very useful to analyze how the host actually operates the device. The following are some such reports from a test conducted on a e.MMC partition (<code>mmcblk0p1</code>) formatted with <code>ext4</code> file system: <pre class="board-terminal">root@imx8mqevk:~# mkfs.ext4 /dev/mmcblk0p1</pre> Please ~~contact your~~ note that this formatting results in an <code>ext4</code> 4-kByte block size:<pre class="board-terminal">root@imx8mqevk:~# dumpe2fs -x /dev/mmcblk0p1 dumpe2fs 1.43.5 (04-Aug-2017)Filesystem volume name: <none>...Block size: 4096...</pre>The [https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Blocks ext4 block] '''must not be confused with the e.MMC blocks''', which are 512 bytes as per JEDEC specifications and are addressed according to the [https://en.wikipedia.org/wiki/Logical_block_addressing LBA] scheme. The analyzed workload is the result of a combination of different tools performing read and write accesses (<code>/mnt/mmc0</code> is the mount point of the partition being tested):<pre class="board-terminal">root@imx8mqevk:~# stressapptest -s 20 -f /mnt/mmc0/file1 -f /mnt/mmc0/file2root@imx8mqevk:~# find / -name "*" > /mnt/mmc0/find_results.txtroot@imx8mqevk:~# dd if=/dev/urandom of=/mnt/mmc0/dummyfile.bin bs=4k count=25000root@imx8mqevk:~# rm /mnt/mmc0/dummyfile.binroot@imx8mqevk:~# dd if=/dev/zero of=/mnt/mmc0/dummyfile.bin bs=4k count=25000root@imx8mqevk:~# sync</pre> The following chart shows the e.MMC accesses over time during the execution of the workload along with other measurements such as read/write throughput. [[File:MISC-TN-017-eMMC ~~vendor~~ -chart1.png|center|thumb|800x800px|e.MMC accesses over time]] It is also possible to ~~obtain more~~ extrapolate the latency of the operations. [[File:MISC-TN-017-eMMC-chart3-latency.png|center|thumb|800x800px|Latency]] Another extremely useful graphical depiction is the chunk size distribution. For instance, this information is often used to understand how efficient the user application is when it comes to optimize the write operations for maximizing the e.MMC lifetime. The pie on the left refers to the read operations, while the other one refers to the write operations. [[File:MISC-TN-017-eMMC-chart2-chunk-size.png|center|thumb|800x800px|Chunk size distribution]]To interpret the result, one needs to take into account how ~~TRACE32 logs can be~~ the workload was implemented. In the example under discussion, the workload basically makes use of two applications: <code>[https://man7.org/linux/man-pages/man1/dd.1.html dd]</code> and <code>stressapptest</code>. <code>dd</code> was specified to use 4-kByte data chunks (<code>bs=4k</code>). <code>stressapptest</code> uses 512-byte chunks instead because the <code>--write-block-size</code> parameter was not used (for more details please refer to ~~calculate your application lifespan~~the [https://github.com/stressapptest/stressapptest/blob/e6c56d20c0fd16b07130d6e628d0dd6dcf1fe162/src/worker.cc#L2615 source code]). As a result, one would expect that the majority of accesses are 512 bytes and 4 kByte. The charts clearly show that this is not the case. Most of the accesses are 512kB instead. This is a ~~very important milestone to improve~~ blatant example of how the ~~storage performance stability~~ algorithms of ~~your platform~~ the file systems and the kernel block driver can alter the accesses issued at application level for ~~making sure the expected reliability requirements are met~~optimization purposes. ~~TBD~~

==== Appendix 1: source code example ====

</syntaxhighlight>

==== Appendix 2: Video ====Technical Note presentation by Lauterbach (Language: Italian; Subtitles: English and Italian){{#ev:youtube|YDWAGy2QnA0|600|center|Persistent storage and read-write file systems|frame}} ===Device's built-in advanced functionalities===~~TBD~~e.MMC's feature advanced functionalities that are useful for monitoring wear-out and, in general, the health of the device. For more details, please see the section [[#Example: embedded Linux system equipped with an e.MMC|this section]].

=Power failures=

* [http://www.linux-mtd.infradead.org/doc/ubifs.html#L_writeback notes about UBIFS write-writeback support]

* [http://www.linux-mtd.infradead.org/doc/ubifs.html#L_writebuffer UBIFS write-buffer].

~~== Example: embedded Linux system equipped with SanDisk SDINBDG4-8G-XI1 eMMC and ext4 file system ==~~

~~TBD~~

= Memory health monitoring =

== Example: embedded Linux system equipped with a raw NAND flash memory and UBIFS file system ==

There's are two main ~~indicator~~ indicators of NAND device health:

* current ECC corrected errors

* block erase counter.

</syntaxhighlight>

*

As a ~~confirmirmation~~ confirmation of this data, the maximum EC of a given UBI partition can be read directly from <code>sysfs</code>:<syntaxhighlight lang="text">

root@axel:~# cat /sys/class/ubi/ubi0/max_ec

2

</syntaxhighlight>

== Example: embedded Linux system equipped with ~~SanDisk SDINBDG4~~an e.MMC ==As explained in [[#Embedded Linux systems with eMMC or SD cards|this section]], e.MMC's provide specific functionalities for device's health monitoring. In practice, these components expose some registers that make health-8Grelated information available to the host. Following is a dump of the such registers regarding the wear-~~XI1~~ out status of the device, namely <code>DEVICE_LIFE_TIME_EST_TYP_B</code>, <code>DEVICE_LIFE_TIME_EST_TYP_B</code>, and <code>PRE_EOL_INFO</code>: <pre class="board-terminal">root@desk-mx8mp:~# mmc extcsd read /dev/mmcblk2 | grep LIFEeMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x01eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x01root@desk-mx8mp:~# mmc extcsd read /dev/mmcblk2 | grep EOLeMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01</pre>This dump refers to the same testbed described here. Some manufacturers use additional proprietary registers also for providing information about the amount of data that have been actually written onto the device. If available, this number allows to calculate the WAF given that the amount of data written by the applications of the test workload is known too. The health status registers can be exploited to implement a monitoring mechanism as well. For example, a user-space application can poll periodically the status of the device and ~~ext4 file system ==~~take actions accordingly if the wear-out exceeds predefined thresholds. ~~TBD~~Last but not least, it is worth remembering that advanced proprietary off-line tools may also be available for health monitoring. For instance, Western Digital provides such tools for its devices. For more information, please contact our [mailto:sales@dave.eu Sales Department].

= References =

U0009

dave_user, Administrators

5,138

edits

Changes

MISC-TN-017: Persistent storage and read-write file systems

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Quick Links

Contact us

How to use wiki

Advanced Search

Tools