Changes

← Older edit

MISC-TN-017: Persistent storage and read-write file systems

7,274 bytes added, 14:17, 16 June 2022

→‎Appendix 2: Video

* "Example: embedded Linux system equipped with SanDisk SDINBDG4-8G-XI1 eMMC and <code>ext4</code> file system"

|-

|{{oldid|15868|2.0.1}}

|January 2022

|Minor changes

|-

|{{oldid|16652|3.0.0}}

|May 2022

|Added detailed analysis of e.MMC accesses (SanDisk SDINBDG4-8G-XI1)

|-

|3.1.0

|June 2022

|Added video of technical presentation by Lauterbach Italy

|}

Another typical use case refers to eMMC's and SD cards. As explained [http://www.linux-mtd.infradead.org/doc/ubifs.html#L_raw_vs_ftl here], these components are FTL devices, where FTL stands for ''Flash Translation Layer''. This layer ''emulates a block device on top of flash hardware''. Therefore, these storage devices are used in tandem with file systems such as [https://en.wikipedia.org/wiki/Ext4 ext4] and [https://en.wikipedia.org/wiki/File_Allocation_Table#FAT32 FAT32]. Besides a raw NAND flash memory, eMMC's and SD cards integrate a microcontroller implementing the FTL and other important tasks as detailed in the rest of the document. All things considered, eMMC's and SD cards appear therefore to the host as managed-NAND block devices.

Regardless of the file system used, e.MMC devices provide some functionalities conceived to monitor their health while operating. As these functionalities are defined by [https://www.jedec.org/sites/default/files/docs/JESD84-B51.pdf JEDEC standards], all the vendors implement them. In practice, e.MMC's integrate some registers providing specific information about the health status. These registers can be accessed with the <code>mmc-utils</code>, which are documented [https://www.kernel.org/doc/html/latest/driver-api/mmc/mmc-tools.html here]. Interestingly, JEDEC standard also defines a set of registers (<code>VENDOR_PROPRIETARY_HEALTH_REPORT</code>) that vendors are free to use for providing further, fine-grained information about the device's health status. Engineers and system integrators are supposed to contact the e.MMC manufacturer to get the required tools for accessing such registers. The sections related to eMMC-based use cases are the result of a joint effort between [https://www.westerndigital.com/ Western Digital] (which purchased [https://en.wikipedia.org/wiki/SanDisk SanDisk] in 2016), [https://www.lauterbach.it Lauterbach Italy], and DAVE Embedded Systems. Parts of such sections are retrieved from the White Paper ''TRACE32 log method for analysing accesses to an eMMC device'' by Lauterbach, which is freely available for download [https://www.lauterbach.com/publications/trace32_log_method_for_analysing_accesses_to_an_emmc_device.pdf here].

[[File:Lauterbach-logo.png|center|thumb|308x308px]]

LT = 20000 / 650 = 30.8 years

===Experimental measurement of ~~'''~~actual~~'''~~ written data===

In many cases, WAF is unknown and can not be estimated either. As stated previously, the system integrator can determine the lifetime expectancy by adopting an experimental approach though. The following procedure describes how to determine the '''actual''' written data for the system used in this example.

== Example: embedded Linux system equipped with SanDisk SDINBDG4-8G-XI1 eMMC and <code>ext4</code> file system ==

=== Introduction ===

As stated previously, eMMC's and SD cards are block devices. As such, they are operated in tandem with file systems that have been developed for hard disks and solid-state drives. [https://en.wikipedia.org/wiki/Ext4 <code>ext4</code>] is one of them and one of the most popular in the Linux world. [[File:Lauterbach-eMMC-schema.png|thumb|240px]]

Below is an example of how the TRACE32-based log method can be applied to a Linux system. The solution is based on light instrumentation of the <code>mmc_start_request()</code> and <code>mmc_request_done()</code> functions defined in the Linux <code>drivers/mmc/core/core.c</code> source code file. Relevant eMMC device accesses are captured through the instrumentation code and they are written to a static data structure making them immediately traceable if data trace is available in the SoC. If data tracing is not possible, the instrumentation code writes the data to the Arm®/Cortex® Context ID register.

The solution was successfully tested on the aforementioned embedded platform. The instrumentation code is provided in [[#Appendix 1: source code example|Appendix 1]]. The zero initialization of the <code>T32_mmc</code> structure is guaranteed by Linux, since this variable is allocated in the <code>bss </code> section. The instrumentation is normally disabled but can be enabled by writing the value "1" into the <code>enable</code> field of the <code>T32_mmc</code> structure. The identifier of the eMMC device to be traced must be written into the <code>dev</code> field. Both of these operations can be performed from a TRACE32 script with the following commands:

NOTE: ‘OR Cycle task’ is optional.

This implementation along with the software-based method ~~were~~ was tested on the following use case:* Read/write workload to the <code>mmc0 </code> device was issued by using [https://github.com/stressapptest/stressapptest stressapptest] application (<code>stressapptest -s 20 -f /mnt/mmc0/file1 -f /mnt/mmc0/file2</code>) resulting in the creation of two files, 16 MByte each

-rw-r--r-- 1 root root 8388608 Dec 3 16:30 file1

cat /sys/kernel/debug/tracing/trace_pipe > /home/root/prove/ftrace.txt

</pre>

===== Verification =====

To verify the implementation of the TRACE32-based method, a specific test was run. In essence, the testbed was configured in order to run TRACE32 and <code>ftrace</code> tracing — more details in the following section — simultaneously for analyzing the same workload. The logs produced by the two methods were then compared to ensure they match.

==== Results and comparison with the software-based method (<code>ftrace</code>) ====

The functions average duration analysis of eMMC accesses highlights the greater weight required by <code>ftrace</code>. Additional, detailed charts are shown in the following section. They allow to determine that using <code>ftrace</code> also involves a greater dispersion of the execution times compared to both the kernel without <code>ftrace</code> and the kernel instrumented only with the code for TRACE32. In particular, the functions <code>mmc_start_request()</code> and <code>mmc_request_done()</code> have a few microseconds constant execution time without <code>ftrace</code>, and show a very variable execution time with <code>ftrace</code>, with a maximum time up to 279 us and 285 us respectively.

~~===== Detailed time analysis =====~~ ~~====== <code>mmc_start_request</code> ======[[File:Lauterbach-eMMC-1-1.png|center|thumb|724x724px|No ftrace~~In conclusion, ~~no TRACE32 instrumentation]][[File:Lauterbach-eMMC-1-2.png|center|thumb|725x725px|No ftrace, with TRACE32 instrumentation]][[File:Lauterbach-eMMC-1-3.png|center|thumb|725x725px|With ftrace, no TRACE32 instrumentation]]~~ ~~====== <code>mmc_request_done</code> ======[[File:Lauterbach-eMMC-2-1.png|center|thumb|821x821px|No ftrace, no TRACE32 instrumentation]]~~ ~~[[File:Lauterbach-eMMC-2-2.png|center|thumb|819x819px|No ftrace, with TRACE32 instrumentation]]~~ ~~[[File:Lauterbach-eMMC-2-3.png|center|thumb|821x821px|With ftrace, no TRACE32 instrumentation]]~~ ~~==== Conclusion ====The~~ the hardware method based on TRACE32 provides the same log data as recorded by <code>ftrace</code> but with minimal changes to the kernel (a few lines in a file) and a tiny time penalty. It also does not use any additional memory (in terms of RAM and file system storage) and allows for extremely long measurement times.

The following table summarizes the advantages and disadvantages of the two considered solutions: TRACE32 vs <code>ftrace</code>.

* For each eMMC operation, <code>ftrace</code> saves roughly 876 bytes of log information.

|}

===== Detailed time analysis =====

====== <code>mmc_start_request</code> ======

[[File:Lauterbach-eMMC-1-1.png|center|thumb|724x724px|No ftrace, no TRACE32 instrumentation]]

[[File:Lauterbach-eMMC-1-2.png|center|thumb|725x725px|No ftrace, with TRACE32 instrumentation]]

[[File:Lauterbach-eMMC-1-3.png|center|thumb|725x725px|With ftrace, no TRACE32 instrumentation]]

====== <code>mmc_request_done</code> ======

[[File:Lauterbach-eMMC-2-1.png|center|thumb|821x821px|No ftrace, no TRACE32 instrumentation]]

[[File:Lauterbach-eMMC-2-2.png|center|thumb|819x819px|No ftrace, with TRACE32 instrumentation]]

[[File:Lauterbach-eMMC-2-3.png|center|thumb|821x821px|With ftrace, no TRACE32 instrumentation]]

==== Analysis of the logs and conclusions ====

No matter how the accesses to the e.MMC are traced, once the logs are available they can be processed thoroughly to produce reports that are very useful to analyze how the host actually operates the device.

The following are some such reports from a test conducted on a e.MMC partition (<code>mmcblk0p1</code>) formatted with <code>ext4</code> file system:

root@imx8mqevk:~# mkfs.ext4 /dev/mmcblk0p1

</pre>

Please note that this formatting results in an <code>ext4</code> 4-kByte block size:

root@imx8mqevk:~# dumpe2fs -x /dev/mmcblk0p1

dumpe2fs 1.43.5 (04-Aug-2017)

Filesystem volume name: <none>

...

Block size: 4096

...

</pre>

The [https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Blocks ext4 block] '''must not be confused with the e.MMC blocks''', which are 512 bytes as per JEDEC specifications and are addressed according to the [https://en.wikipedia.org/wiki/Logical_block_addressing LBA] scheme.

The analyzed workload is the result of a combination of different tools performing read and write accesses (<code>/mnt/mmc0</code> is the mount point of the partition being tested):

root@imx8mqevk:~# stressapptest -s 20 -f /mnt/mmc0/file1 -f /mnt/mmc0/file2

root@imx8mqevk:~# find / -name "*" > /mnt/mmc0/find_results.txt

root@imx8mqevk:~# dd if=/dev/urandom of=/mnt/mmc0/dummyfile.bin bs=4k count=25000

root@imx8mqevk:~# rm /mnt/mmc0/dummyfile.bin

root@imx8mqevk:~# dd if=/dev/zero of=/mnt/mmc0/dummyfile.bin bs=4k count=25000

root@imx8mqevk:~# sync

</pre>

The following chart shows the e.MMC accesses over time during the execution of the workload along with other measurements such as read/write throughput.

[[File:MISC-TN-017-eMMC-chart1.png|center|thumb|800x800px|e.MMC accesses over time]]

It is also possible to extrapolate the latency of the operations.

[[File:MISC-TN-017-eMMC-chart3-latency.png|center|thumb|800x800px|Latency]]

Another extremely useful graphical depiction is the chunk size distribution. For instance, this information is often used to understand how efficient the user application is when it comes to optimize the write operations for maximizing the e.MMC lifetime. The pie on the left refers to the read operations, while the other one refers to the write operations.

[[File:MISC-TN-017-eMMC-chart2-chunk-size.png|center|thumb|800x800px|Chunk size distribution]]

To interpret the result, one needs to take into account how the workload was implemented. In the example under discussion, the workload basically makes use of two applications: <code>[https://man7.org/linux/man-pages/man1/dd.1.html dd]</code> and <code>stressapptest</code>. <code>dd</code> was specified to use 4-kByte data chunks (<code>bs=4k</code>). <code>stressapptest</code> uses 512-byte chunks instead because the <code>--write-block-size</code> parameter was not used (for more details please refer to the [https://github.com/stressapptest/stressapptest/blob/e6c56d20c0fd16b07130d6e628d0dd6dcf1fe162/src/worker.cc#L2615 source code]). As a result, one would expect that the majority of accesses are 512 bytes and 4 kByte. The charts clearly show that this is not the case. Most of the accesses are 512kB instead. This is a blatant example of how the algorithms of the file systems and the kernel block driver can alter the accesses issued at application level for optimization purposes.

==== Appendix 1: source code example ====

}

</syntaxhighlight>

==== Appendix 2: Video ====

Technical Note presentation by Lauterbach (Language: Italian; Subtitles: English and Italian)

{{#ev:youtube|YDWAGy2QnA0|600|center|Persistent storage and read-write file systems|frame}}

=== Device's built-in advanced functionalities ===

e.MMC's feature advanced functionalities that are useful for monitoring wear-out and, in general, the health of the device. For more details, please see the section [[#Example: embedded Linux system equipped with an e.MMC|this section]].

=Power failures=

== Example: embedded Linux system equipped with a raw NAND flash memory and UBIFS file system ==

There's are two main ~~indicator~~ indicators of NAND device health:

* current ECC corrected errors

* block erase counter.

</syntaxhighlight>

*

As a ~~confirmirmation~~ confirmation of this data, the maximum EC of a given UBI partition can be read directly from <code>sysfs</code>:<syntaxhighlight lang="text">

root@axel:~# cat /sys/class/ubi/ubi0/max_ec

2

</syntaxhighlight>

== Example: embedded Linux system equipped with an e.MMC ==As explained in [[#Embedded Linux systems with eMMC or SD cards|this section]], e.MMC's provide specific functionalities for device's health monitoring. In practice, these components expose some registers that make health-related information available to the host. Following is a dump of the such registers regarding the wear-out status of the device, namely <code>DEVICE_LIFE_TIME_EST_TYP_B</code>, <code>DEVICE_LIFE_TIME_EST_TYP_B</code>, and <code>PRE_EOL_INFO</code>: <pre class="board-terminal">root@desk-mx8mp:~# mmc extcsd read /dev/mmcblk2 | grep LIFEeMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x01eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x01root@desk-mx8mp:~# mmc extcsd read /dev/mmcblk2 | grep EOLeMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01</pre>This dump refers to the same testbed described here. Some manufacturers use additional proprietary registers also for providing information about the amount of data that have been actually written onto the device. If available, this number allows to calculate the WAF given that the amount of data written by the applications of the test workload is known too. The health status registers can be exploited to implement a monitoring mechanism as well. For example, a user-space application can poll periodically the status of the device and take actions accordingly if the wear-out exceeds predefined thresholds. Last but not least, it is worth remembering that advanced proprietary off-line tools may also be available for health monitoring. For instance, Western Digital provides such tools for its devices. For more information, please contact our [mailto:sales@dave.eu Sales Department]. = References ==

* Western Digital Corporation, ''[https://link.westerndigital.com/content/dam/customer-portal/en_us/external/public/cps/p/White_Paper_Design_Considerations_v1.0.pdf Design Considerations for Embedded Products]'', 2018

* Western Digital Corporartion, ''[https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/collateral/white-paper/white-paper-automotive-workload-analysis.pdf Automotive Workload Analysis]'', September 2021

U0009

dave_user, Administrators

5,138

edits

Changes

MISC-TN-017: Persistent storage and read-write file systems

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Quick Links

Contact us

How to use wiki

Advanced Search

Tools