Changes

← Older edit

MISC-TN-017: Persistent storage and read-write file systems

31,798 bytes added, 14:17, 16 June 2022

→‎Appendix 2: Video

[[Category:MISC-AN-TN]]

!Notes

|-

|{{oldid|12641|1.0.0}}

|January 2021

|First public release

|-

|{{oldid|15846|2.0.0}}

|January 2022

|Added the sections

* "Embedded Linux systems with eMMC or SD cards"

* "Example: embedded Linux system equipped with SanDisk SDINBDG4-8G-XI1 eMMC and <code>ext4</code> file system"

|-

|{{oldid|15868|2.0.1}}

|January 2022

|Minor changes

|-

|{{oldid|16652|3.0.0}}

|May 2022

|Added detailed analysis of e.MMC accesses (SanDisk SDINBDG4-8G-XI1)

|-

|3.1.0

|June 2022

|Added video of technical presentation by Lauterbach Italy

|}

=Introduction=

In many cases, embedded systems that are based on Application Processors such as the NXP i.MX6 make use of read/write file systems. In turn, these file systems use non-volatile flash technologies integrated into several different devices (NOR flashes, [https://www.micron.com/products/nand-flash/choosing-the-right-nand raw NAND flashes], eMMC's, etc.).

Another typical use case refers to eMMC's and SD cards. As explained [http://www.linux-mtd.infradead.org/doc/ubifs.html#L_raw_vs_ftl here], these components are FTL devices, where FTL stands for ''Flash Translation Layer''. This layer ''emulates a block device on top of flash hardware''. Therefore, these storage devices are used in tandem with file systems such as [https://en.wikipedia.org/wiki/Ext4 ext4] and [https://en.wikipedia.org/wiki/File_Allocation_Table#FAT32 FAT32]. Besides a raw NAND flash memory, eMMC's and SD cards integrate a microcontroller implementing the FTL and other important tasks as detailed in the rest of the document. All things considered, eMMC's and SD cards appear therefore to the host as managed-NAND block devices.

~~The sections related~~ Regardless of the file system used, e.MMC devices provide some functionalities conceived to ~~eMMC-based use cases~~ monitor their health while operating. As these functionalities are ~~the result of a joint effort between~~ defined by [https://www.~~westerndigital~~jedec.~~com~~org/ ~~Western Digital~~sites/default/files/docs/JESD84-B51.pdf JEDEC standards] (, all the vendors implement them. In practice, e.MMC's integrate some registers providing specific information about the health status. These registers can be accessed with the <code>mmc-utils</code>, which ~~purchased~~ are documented [https://enwww.~~wikipedia~~kernel.org/~~wiki~~doc/html/latest/driver-api/mmc/~~SanDisk SanDisk~~mmc-tools.html here] ~~in 2016~~. Interestingly, JEDEC standard also defines a set of registers (<code>VENDOR_PROPRIETARY_HEALTH_REPORT</code>)that vendors are free to use for providing further, ~~Lauterbach Italy,~~ fine-grained information about the device's health status. Engineers and ~~DAVE Embedded Systems~~system integrators are supposed to contact the e.MMC manufacturer to get the required tools for accessing such registers.

The sections related to eMMC-based use cases are the result of a joint effort between [https://www.westerndigital.com/ Western Digital] (which purchased [https://en.wikipedia.org/wiki/SanDisk SanDisk] in 2016), [https://www.lauterbach.it Lauterbach Italy], and DAVE Embedded Systems. Parts of such sections are retrieved from the White Paper ''TRACE32 log method for analysing accesses to an eMMC device'' by Lauterbach, which is freely available for download [https://www.lauterbach.com/publications/trace32_log_method_for_analysing_accesses_to_an_emmc_device.pdf here].

[[File:Lauterbach-logo.png|center|thumb|308x308px]]

[[File:WesterDigital-logo.png|center|thumb|180x180px]]

Parts of such sections are retrieved from the White Paper ''TRACE32 log method for analysing accesses to an eMMC device'' by Lauterbach, which is freely available for download [https://www.lauterbach.com/publications/trace32_log_method_for_analysing_accesses_to_an_emmc_device.pdf here].

=Wear-out=

LT = 20000 / 650 = 30.8 years

===Experimental measurement of ~~'''~~actual~~'''~~ written data===

In many cases, WAF is unknown and can not be estimated either. As stated previously, the system integrator can determine the lifetime expectancy by adopting an experimental approach though. The following procedure describes how to determine the '''actual''' written data for the system used in this example.

== Example: embedded Linux system equipped with SanDisk SDINBDG4-8G-XI1 eMMC and <code>ext4</code> file system ==

~~[[File:Lauterbach-eMMC-schema.png|thumb|481x481px]]~~=== Introduction ===As stated previously, eMMC's and SD cards are block devices. As such, they are operated in tandem with file systems that have been developed for hard disks and solid-state drives. [https://en.wikipedia.org/wiki/Ext4 <code>ext4</code>] is one of them and one of the most popular in the Linux world.[[File:Lauterbach-eMMC-schema.png|thumb|240px]]

From system integrators' perspective, eMMC's and SD cards are easier to use than raw NAND's because they hide most of the complexity regarding the management of the underlying memory. ~~Nonetheless~~On the other hand, the architecture of these devices could make it difficult to retrieve data regarding the actual usage of the memory. There are some techniques available, however, to address this issue when working with an embedded Linux platform. This ~~sections~~ section will illustrate the following ones:* Logging the accesses to the storage device: The idea of this approach is to log all the accesses triggered by the host and isolate the write operations in order to determine the actual amount of data written onto the device. Two different methods are compared. The first one makes use of a hardware-based ~~trace~~ tracing tool while the other exploits a software tracer, namely the Linux kernel's Function Tracer (aka <code>ftrace</code>).

* Exploiting the storage device's built-in advanced functionalities.

These approaches are illustrated in more detail in the rest of the ~~section~~ document with the help of ~~actual test results~~specific tests conducted on a real target. ~~These~~ ==== Testbed ====Specifically, these tests were run on the [[MITO_8M_SOM/MITO_8M_Evaluation_Kit|Evaluation Kit]] of the [[MITO_8M_SOM|Mito8M SoM]] running Yocto Linux and featuring a SanDisk SDINBDG4-8G-XI1 eMMC operated with an <code>ext4 </code> file system. It is worth remembering that the same testbed was used for this [[MITO8M-AN-001:_Advanced_multicore_debugging,_tracing,_and_energy_profiling_with_Lauterbach_TRACE32|Application Note]] as well. The evaluation kit consists of three boards: the SoM, the SBCX carrier board, and an adapter board. This setup provides off-chip trace via a parallel trace port or a PCIe interface. The SoM is equipped with the NXP i.MX8M SoC, which in turn is based on the Quad-Core ARM® Cortex-A53 CPU. The SOC features two Ultra Secured Digital Host Controllers (uSDHC) supporting SD/SDIO/MMC cards and devices. For the purpose of the tests under discussion, the uSDHC ports were used as depicted in the following image. [[File:MISC-TN-017-eMMC-uSD-interfacing.png|center|thumb|400x400px|eMMC and microSD card interfacing]] The microSD card connected to uSDHC1 was used for the bootloader, the Linux kernel, and the root file system. The eMMC device connected to uSDHC2 was used for the main workload to be analyzed. The Linux kernel version is 4.14.98.

===Logging the accesses===

A classic software-based recording method (log) of these accesses requires the implementation of additional code that captures information and saves it securely. The information can be saved on another permanent storage device, for example, an external USB drive. This software method is intrusive and in addition to the overhead of monitoring the eMMC access, additional overhead is added in order to save the data.

Besides a traditional software-based approach, this example shows also a different method of capturing and saving such information through the use of a hardware-based trace tool. This can be done with minimal intrusion on the software and, in some cases, almost zero. This tool captures the program and data trace transmitted by the cores of a system-on-chip (SoC) through a dedicated trace port and records it to its own dedicated memory. To do that, advanced hardware functionalities of modern SoC's are exploited.

====Arm CoreSight™====

Many embedded microprocessors and microcontrollers are able to trace information related to the program execution flow. This allows the sequence of instructions executed by the program to be reconstructed and examined in great detail. In some configurations it is also possible to record the data related to the read and/or write cycles performed by the program.

[https://developer.arm.com/ip-products/system-ip/coresight-debug-and-trace CoreSight]™ is the name of the on-chip debug and trace technology provided by Arm®. CoreSight™ is not intended as a default logic block but, like a construction kit, it provides many different components. This allows the SoC designer to define the debug and trace resources that they want to provide. Program flow (and sometimes data flow) information is output through a resource called ETM (Embedded Trace Macrocell). The ETM trace information flow can be stored internally (on-chip trace) or can be exported outside of the SoC (off-chip trace). Arm® provides several ways for exporting a trace flow: through a parallel trace port (TPIU, Trace Port Interface Unit), or serial trace port (HSSTP, High-Speed Serial Trace Port) or through a PCIe interface.

When data trace is not available, Arm® provides the Context ID register. This is often used by an Operating System (OS) to indicate that a task switch has occurred. This is done by code in the OS kernel writing the task identifier to this register. In a multicore Arm®/Cortex® SoC, each core implements this register.

==== Lauterbach TRACE32 development tools ====

Lauterbach's TRACE32 development tools enable hardware-based debug and trace of a wide range of embedded microprocessors and microcontrollers and support debug technologies such as JTAG or SWD, as well as trace technologies such as NEXUS or ETM.

The TRACE32 tools support all Arm® CoreSight™ configurations. A TRACE32 development tool for debug and trace is typically comprised of these units:

* a universal PowerDebug module connected to the host computer via USB3 or Ethernet;

* a debugger (debug cable) for the specific architecture of the microprocessor or microcontroller under debug;

* for the off-chip trace, a universal PowerTrace II or PowerTrace III module providing 4GB or 8GB memory, complemented by a parallel or serial pre-processor to access the trace data;

* or a dedicated PowerTrace Serial module for serial or PCIe trace data.

==== TRACE32-based eMMC access log solution ====

In all operating systems or device drivers that manage an eMMC memory device, some functions are provided for device access which incorporate the eMMC JEDEC standard commands. Long-term monitoring of the execution of these commands and their parameters is the best way to collect the data necessary for the access analysis. After accessing the eMMC device, a function or a code point is usually available where the eMMC command is completed. Monitoring this code point allows the detection of additional information, such as the execution time of the command.

The TRACE32 trace tool can sample the code points where eMMC accesses start and finish. By adding a tiny amount of instrumentation to your source code, you can also trace device access data. In cases where data trace is not available, the instrumentation code writes the access data to the ContextID register, allowing both types of system to be adapted to use this technique.

The following data is traced in the TRACE32-based log solution:

* at the beginning of eMMC access: eMMC device id, command executed and related flags, access address, number of accessed memory blocks and their size;

* at the end of the eMMC access: eMMC device id, command executed, result code and other return codes;

* access duration.

A possible example of access monitoring is shown below, as it appears in the trace views available in TRACE32:

2| ptrace \\vmlinux\core_core\mmc_start_request 24.228827980s

2| info 24.228828005s 31636D6D

2| info 24.228828030s 00000019

2| info 24.228828055s 01620910

2| info 24.228828080s 000000B5

2| info 24.228828105s 00000200

2| info 24.228828130s 00000010

0| ptrace \\vmlinux\core_core\mmc_request_done 24.231239610s

0| info 24.231241385s 31636D6D

0| info 24.231241410s 00000019

0| info 24.231241435s 00000000

0| info 24.231308085s 00000900

0| info 24.231308210s 00000000

</pre>

This is, typically, a few trace records for each eMMC access. Stress tests have verified that logging an eMMC access (functions <code>mmc_start_request()</code> and <code>mmc_request_done()</code> with related data) requires about 416 trace records in the PowerTrace memory and these accesses occur on average every 4 mSec.

This corresponds to approximately 1GB/416 = 2.5 million eMMC logs, or approximately 10,000 seconds (2h45min) for each gigabyte of trace storage. The PowerTrace family provides either 10 million eMMC logs (11h) for a 4GB PowerTrace or 20 million (22h) for an 8GB module.

By extending the trace duration with trace streaming, the limit becomes the size of the computer hard-disk/SSD or the TRACE32 limit which is 1 Tera-frame, i.e., 2.5 billion eMMC logs (over 100 days!).

The trace data can be filtered and saved on disk, and then converted into a more suitable format for analysis using a TRACE32 script (PRACTICE script), Python script, or an external conversion program.

~~==== Implementation~~ For example , the trace shown above can be converted into the format shown below, which is more suitable for ~~Linux OS ====~~importing into specific eMMC analysis tools:

<pre class="workstation-terminal">24.228827980 mmc_start_req_cmd: host=mmc1 CMD25 arg=01620910 flags= ~~Comparison with the software method ftrace~~ 000000B5 blksz=00000200 blks=0000001024.231239610 mmc_request_done: host=mmc1 CMD25 err=00000000 resp1=00000900 resp2=00000000</pre>

~~==== Conclusion ====~~These tools perform a complete analysis of the eMMC device application accesses, in terms of addresses accessed, frequency and access methods.

The end goal is to calculate the Write Amplification Factor (WAF) seen by the eMMC (or by any other managed-NAND block device). WAF is defined as the ratio of physical data written onto the NAND and the data written by the host. When the host writes logical sectors of the eMMC, the internal eMMC controller erases and re-programs physical pages of the NAND device. This could cause a management overhead. Large sequential writes aligned to physical page boundaries typically result in minimal overhead and optimal NAND write activity (WAF=~1). Small chunks of random writes could result in a higher overhead (WAF>>1). This becomes important when considering the life of the raw-NAND memory inside the eMMC, which has a finite number of Program/Erase cycles. See the example below: [[File:Lauterbach-eMMC-WAF-example.png|center|thumb|500x500px]] To estimate the WAF for any particular eMMC device, and hence its expected lifetime on your application, you can capture the log file of the activity. Once a log is obtained, it's recommended to contact your eMMC vendor to get more information about the log analysis tools required for analyzing the specific eMMC product. === ~~References~~ =Implementation example for GNU/Linux o.s. ====Below is an example of how the TRACE32-based log method can be applied to a Linux system. The solution is based on light instrumentation of the <code>mmc_start_request()</code> and <code>mmc_request_done()</code> functions defined in the Linux <code>drivers/mmc/core/core.c</code> source code file. Relevant eMMC device accesses are captured through the instrumentation code and they are written to a static data structure making them immediately traceable if data trace is available in the SoC. If data tracing is not possible, the instrumentation code writes the data to the Arm®/Cortex® Context ID register. The solution was successfully tested on the aforementioned embedded platform. The instrumentation code is provided in [[#Appendix 1: source code example|Appendix 1]]. The zero initialization of the <code>T32_mmc</code> structure is guaranteed by Linux, since this variable is allocated in the <code>bss</code> section. The instrumentation is normally disabled but can be enabled by writing the value "1" into the <code>enable</code> field of the <code>T32_mmc</code> structure. The identifier of the eMMC device to be traced must be written into the <code>dev</code> field. Both of these operations can be performed from a TRACE32 script with the following commands: <pre class="workstation-terminal">Var.set T32_mmc.enable = 1Var.set T32_mmc.dev = 0x30636D6D // e.g.: "mmc0" in reverse ASCII order</pre> The <code>infoBit</code> field can be written as follows: <pre class="workstation-terminal">Var.set T32_mmc.infoBit = 0x80000000</pre> This allows the user and the tools to distinguish between data written in the Context ID register by the instrumentation code from those written by Linux for task switches. In this case, the range of values must also be reserved so that they are not interpreted as task switch identifiers. The command to do this is shown below: <pre class="workstation-terminal">ETM.ReserveContextID 0x80000000--0xffffffff</pre> It’s important to note that the Linux kernel must be compiled for debugging (see the Training Linux Debugging manual at [https://www.lauterbach.com/manual.html]). The TRACE32 debugger also offers extensions for many different operating systems, known as an “OS awareness”. These add OS-specific features to the TRACE32 debugger such as the display of OS resources (tasks, queues, semaphores, ...) or support for MMU management in the OS. In TRACE32, the ability to trace tasks and execute code is based on task switch information in the trace flow. The command <code>ETM.ReserveContextID</code> allows simultaneous use of the Linux OS awareness support and the instrumentation for eMMC access analysis. To reduce the amount of trace information generated by the target and to allow long-term trace via streaming, filters can be applied to isolate just the instrumentation code and its writes to the Context ID register. For example: <pre class="workstation-terminal">Break.REsetBreak.Set mmc_request_done /Program /TraceONBreak.Set mmc_request_done\94 /Program /TraceOFFBreak.Set mmc_start_request /Program /TraceONBreak.Set mmc_start_request\38 /Program /TraceOFF</pre> where the filters marked as /TraceOFF are mapped to program addresses immediately after the instrumentation. To ensure the task switch data generated by the OS is included in the filtered trace flow, add an additional filter to the <code>__switch_to()</code> function (<code>arch/arm64/kernel/process.c</code>) where it calls the static inline <code>contextidr_thread_switch()</code> function: <pre class="workstation-terminal">Break.Set __switch_to+0x74 /Program /TraceONBreak.Set __switch_to+0x80 /Program /TraceOFF</pre> The trace flow recorded by TRACE32 can be arranged into a view suitable for exporting by post-processing with the command: <pre class="workstation-terminal">Trace.FindAll , Address address.offset(mmc_start_request) OR Address address.offset(mmc_request_done) OR Cycle info OR Cycle task /List run cycle symbol %TimeFixed TIme.Zero data</pre> NOTE: ‘OR Cycle task’ is optional. This implementation along with the software-based method was tested on the following use case:* Read/write workload to the <code>mmc0</code> device was issued by using [https://github.com/stressapptest/stressapptest stressapptest] application (<code>stressapptest -s 20 -f /mnt/mmc0/file1 -f /mnt/mmc0/file2</code>) resulting in the creation of two files, 16 MByte each<pre class="workstation-terminal">-rw-r--r-- 1 root root 8388608 Dec 3 16:30 file1-rw-r--r-- 1 root root 8388608 Dec 3 16:30 file2</pre>* To setup <code>ftrace</code>, the following commands were run (please note that the <code>ftrace</code> pipe is saved to a file on a different memory device i.e. <code>mmc1</code> purposely):<pre class="workstation-terminal">echo 1 > /sys/kernel/debug/tracing/tracing_onecho 1 > /sys/kernel/debug/tracing/events/mmc/enableecho 20000 > /sys/kernel/debug/tracing/buffer_size_kb ; 20MB buffer sizeecho > /sys/kernel/debug/tracing/tracecat /sys/kernel/debug/tracing/trace_pipe > /home/root/prove/ftrace.txt</pre>===== Verification =====To verify the implementation of the TRACE32-based method, a specific test was run. In essence, the testbed was configured in order to run TRACE32 and <code>ftrace</code> tracing — more details in the following section — simultaneously for analyzing the same workload. The logs produced by the two methods were then compared to ensure they match. ==== Results and comparison with the software-based method (<code>ftrace</code>) ====In Linux, eMMC access log solutions based on purely software methods are already available. The <code>ftrace</code> framework provides this capability, as well as being able to log many other events. The term <code>ftrace</code> stands for “function tracer” and basically allows you to examine and record the execution flow of kernel functions. The dynamic tracing mode of <code>ftrace</code> is implemented through dynamic probes injected into the code, which allow the definition of the code to be traced at runtime. When tracing is enabled, all the collected data is stored by <code>ftrace</code> in a circular memory buffer. In the framework, there is a virtual filesystem called <code>tracefs</code> (usually mounted in <code>/sys/kernel/tracing</code>) which is used to configure <code>ftrace</code> and collect the trace data. All management is done with simple operations on the files in this directory. Comparative tests performed on the DAVE Embedded Systems “MITO 8M Evaluation Kit” target showed that the <code>ftrace</code> impact compared to the TRACE32-based log solution is considerably higher in several respects. This is understandable, considering that <code>ftrace</code> is a general-purpose trace framework designed to trace many possible events, while the instrumentation required for the TRACE32 log method is specific and limited to the pertinent functions. Moreover, <code>ftrace</code> requires some buffering (ring buffer) and saving data to a target's persistent memory, while the solution based on TRACE32 uses off-chip trace to save the data externally in real-time. The following tables show a comparison between <code>ftrace</code> and the TRACE32 solution. {| class="wikitable"|+Table 1: Instrumentation size! rowspan="2" |Setup! colspan="2" |vmlinux code size! colspan="2" |vmlinux data size! colspan="2" |vmlinux source files! rowspan="2" |instrumentation code size (*)! rowspan="2" |instrumentation data size (*)|-!Asbolute[MByte]!Increment w.r.t. the baseline[%]!Absolute[MByte]!Increment w.r.t. the baseline setup[%]!#!Increment w.r.t. the baseline setup|-|Baseline(no instrumentation)|12.79|n/a|10.78|n/a|4640|n/a|n/a|n/a|-|TRACE32|12.79|0|10.78|0|4640|0(41 source code lines in mmc driver)|372 bytes|64 bytes|-|<code>ftrace</code>|14.78|15.6|11.77|9|5476|836|1.99 MByte|0,99MByte + ??MByte ring buffer (**)|}(*) <code>ftrace</code> instrumentation applies to the whole Linux kernel. TRACE32 instrumentation applies to the functions <code>mmc_start_request()</code> and <code>mmc_request_done()</code> only. (**) the actual size of the <code>ftrace</code> ring buffer can be configured during runtime but is typically in the 10—100 MByte range . In the ftrace-based solution, an increase in kernel size of approximately 15% (code) and 9% (data) is observed compared to the kernel without ftrace. During the execution of ftrace it’s also necessary to reserve additional memory for the ring buffer. The number of source files used in building the kernel increases by 18% when the ftrace framework is included. The weight of the instrumentation required by TRACE32, on the other hand, is practically negligible both in terms of code and data. {| class="wikitable"|+Table 2: Instrumentation overhead! rowspan="3" |Measuring points (*)! colspan="5" |Average duration[us]|-! rowspan="2" |No ftrace No TRACE32 instr.(baseline)! colspan="2" |No ftrace With TRACE32 instr.! colspan="2" |ftrace enabled No TRACE32 instr.|-!Absolute!Increment w.r.t. the baseline!Absolute!Increment w.r.t. the baseline|-|<code>mmc_start_request</code>|6.950|8.108|1.158|36.875|29.925|-|<code>mmc_request_done</code>|0.770|1.364|0.594|63.031|62.261|}(*) measuring points are the part of functions where the instrumentation is added. The functions average duration analysis of eMMC accesses highlights the greater weight required by <code>ftrace</code>. Additional, detailed charts are shown in the following section. They allow to determine that using <code>ftrace</code> also involves a greater dispersion of the execution times compared to both the kernel without <code>ftrace</code> and the kernel instrumented only with the code for TRACE32. In particular, the functions <code>mmc_start_request()</code> and <code>mmc_request_done()</code> have a few microseconds constant execution time without <code>ftrace</code>, and show a very variable execution time with <code>ftrace</code>, with a maximum time up to 279 us and 285 us respectively. In conclusion, the hardware method based on TRACE32 provides the same log data as recorded by <code>ftrace</code> but with minimal changes to the kernel (a few lines in a file) and a tiny time penalty. It also does not use any additional memory (in terms of RAM and file system storage) and allows for extremely long measurement times. The following table summarizes the advantages and disadvantages of the two considered solutions: TRACE32 vs <code>ftrace</code>.{| class="wikitable"|+Table 3: Pros and cons!Method!Pros!Cons|-|TRACE32|* Light kernel instrumentation* No additional memory required* Long-term analysis (few hours up to over 100 days)* Can be ported to other OS's / eMMC device drivers.|* Hardware-based solution: requires a debug and trace tool + offchip-trace capable processor and target|-|<code>ftrace</code>|* Software-based solution|* Available for Linux kernel only* Heavy kernel instrumentation* Time intrusion in eMMC operations* Kernel program and data size increase* 10—100 MB of RAM required for ring buffer* Additional storage device to save the ring buffer* For each eMMC operation, <code>ftrace</code> saves roughly 876 bytes of log information.|} ===== Detailed time analysis ===== ====== <code>mmc_start_request</code> ======[[File:Lauterbach-eMMC-1-1.png|center|thumb|724x724px|No ftrace, no TRACE32 instrumentation]][[File:Lauterbach-eMMC-1-2.png|center|thumb|725x725px|No ftrace, with TRACE32 instrumentation]][[File:Lauterbach-eMMC-1-3.png|center|thumb|725x725px|With ftrace, no TRACE32 instrumentation]] ====== <code>mmc_request_done</code> ======[[File:Lauterbach-eMMC-2-1.png|center|thumb|821x821px|No ftrace, no TRACE32 instrumentation]] [[File:Lauterbach-eMMC-2-2.png|center|thumb|819x819px|No ftrace, with TRACE32 instrumentation]] [[File:Lauterbach-eMMC-2-3.png|center|thumb|821x821px|With ftrace, no TRACE32 instrumentation]] ==== Analysis of the logs and conclusions ====No matter how the accesses to the e.MMC are traced, once the logs are available they can be processed thoroughly to produce reports that are very useful to analyze how the host actually operates the device. The following are some such reports from a test conducted on a e.MMC partition (<code>mmcblk0p1</code>) formatted with <code>ext4</code> file system: <pre class="board-terminal">root@imx8mqevk:~# mkfs.ext4 /dev/mmcblk0p1</pre> Please note that this formatting results in an <code>ext4</code> 4-kByte block size:<pre class="board-terminal">root@imx8mqevk:~# dumpe2fs -x /dev/mmcblk0p1 dumpe2fs 1.43.5 (04-Aug-2017)Filesystem volume name: <none>...Block size: 4096...</pre>The [https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Blocks ext4 block] '''must not be confused with the e.MMC blocks''', which are 512 bytes as per JEDEC specifications and are addressed according to the [https://en.wikipedia.org/wiki/Logical_block_addressing LBA] scheme. The analyzed workload is the result of a combination of different tools performing read and write accesses (<code>/mnt/mmc0</code> is the mount point of the partition being tested):<pre class="board-terminal">root@imx8mqevk:~# stressapptest -s 20 -f /mnt/mmc0/file1 -f /mnt/mmc0/file2root@imx8mqevk:~# find / -name "*" > /mnt/mmc0/find_results.txtroot@imx8mqevk:~# dd if=/dev/urandom of=/mnt/mmc0/dummyfile.bin bs=4k count=25000root@imx8mqevk:~# rm /mnt/mmc0/dummyfile.binroot@imx8mqevk:~# dd if=/dev/zero of=/mnt/mmc0/dummyfile.bin bs=4k count=25000root@imx8mqevk:~# sync</pre> The following chart shows the e.MMC accesses over time during the execution of the workload along with other measurements such as read/write throughput. [[File:MISC-TN-017-eMMC-chart1.png|center|thumb|800x800px|e.MMC accesses over time]] It is also possible to extrapolate the latency of the operations. [[File:MISC-TN-017-eMMC-chart3-latency.png|center|thumb|800x800px|Latency]] Another extremely useful graphical depiction is the chunk size distribution. For instance, this information is often used to understand how efficient the user application is when it comes to optimize the write operations for maximizing the e.MMC lifetime. The pie on the left refers to the read operations, while the other one refers to the write operations. [[File:MISC-TN-017-eMMC-chart2-chunk-size.png|center|thumb|800x800px|Chunk size distribution]]To interpret the result, one needs to take into account how the workload was implemented. In the example under discussion, the workload basically makes use of two applications: <code>[https://man7.org/linux/man-pages/man1/dd.1.html dd]</code> and <code>stressapptest</code>. <code>dd</code> was specified to use 4-kByte data chunks (<code>bs=4k</code>). <code>stressapptest</code> uses 512-byte chunks instead because the <code>--write-block-size</code> parameter was not used (for more details please refer to the [https://github.com/stressapptest/stressapptest/blob/e6c56d20c0fd16b07130d6e628d0dd6dcf1fe162/src/worker.cc#L2615 source code]). As a result, one would expect that the majority of accesses are 512 bytes and 4 kByte. The charts clearly show that this is not the case. Most of the accesses are 512kB instead. This is a blatant example of how the algorithms of the file systems and the kernel block driver can alter the accesses issued at application level for optimization purposes.

==== Appendix 1: source code example ====

static struct T32_mmc_struct {

unsigned int enable;

unsigned int infoBit;

unsigned int dev;

unsigned int * pHost;

unsigned int cmd;

unsigned int arg;

unsigned int flags;

unsigned int blksz;

unsigned int blocks;

unsigned int err;

unsigned int resp0;

unsigned int resp1;

unsigned int resp2;

unsigned int resp3;

} T32_mmc;

int mmc_start_request(struct mmc_host *host, struct mmc_request *mrq)

{

int err;

mmc_retune_hold(host);

if (mmc_card_removed(host->card))

return -ENOMEDIUM;

mmc_mrq_pr_debug(host, mrq, false);

WARN_ON(!host->claimed);

if (T32_mmc.enable) {

T32_mmc.pHost = (unsigned int *)mmc_hostname(host);

if ((*T32_mmc.pHost)==T32_mmc.dev) {

if (mrq->cmd) {

write_sysreg((*T32_mmc.pHost)|T32_mmc.infoBit,

contextidr_el1);

isb();

T32_mmc.cmd = (mrq->cmd->opcode)|T32_mmc.infoBit;

write_sysreg(T32_mmc.cmd, contextidr_el1);

isb();

T32_mmc.arg = (mrq->cmd->arg)|T32_mmc.infoBit;

write_sysreg(T32_mmc.arg, contextidr_el1);

isb();

T32_mmc.flags = (mrq->cmd->flags)|T32_mmc.infoBit;

write_sysreg(T32_mmc.flags, contextidr_el1);

isb();

}

if (mrq->data) {

T32_mmc.blksz = (mrq->data->blksz)|T32_mmc.infoBit;

write_sysreg(T32_mmc.blksz, contextidr_el1);

isb();

T32_mmc.blocks = (mrq->data->blocks)|T32_mmc.infoBit;

write_sysreg(T32_mmc.blocks, contextidr_el1);

isb();

}

err = mmc_mrq_prep(host, mrq);

if (err)

return err;

...

~~==== Appendix 2: time details ====~~

void mmc_request_done(struct mmc_host *host, struct mmc_request *mrq){ struct mmc_command *cmd =mrq->cmd; int err =~~== Appendix 3: TRACE32 tools configuration for Arm Cortex~~cmd-~~A/R architectures ====~~>error;...

... if (!err || !cmd->retries || mmc_card_removed(host->card)) { mmc_should_fail_request(host, mrq); if (!host->ongoing_mrq) led_trigger_event(host->led, LED_OFF); if (mrq->sbc) { pr_debug("%s: req done <CMD%u>: %d: %08x %08x %08x %08x\n", mmc_hostname(host), mrq->sbc->opcode, mrq->sbc->error, mrq->sbc->resp[0], mrq->sbc->resp[1], mrq->sbc->resp[2], mrq->sbc->resp[3]); } pr_debug("%s: req done (CMD%u): %d: %08x %08x %08x %08x\n", mmc_hostname(host), cmd->opcode, err, cmd->resp[0], cmd->resp[1], cmd->resp[2], cmd->resp[3]); if (mrq->data) { pr_debug("%s: %d bytes transferred: %d\n", mmc_hostname(host), mrq->data->bytes_xfered, mrq->data->error); } if (mrq->stop) { pr_debug("%s: (CMD%u): %d: %08x %08x %08x %08x\n", mmc_hostname(host), mrq->stop->opcode, mrq->stop->error, mrq->stop->resp[0], mrq->stop->resp[1], mrq->stop->resp[2], mrq->stop->resp[3]); } if (T32_mmc.enable) { T32_mmc.pHost = (unsigned int *)mmc_hostname(host); if ((*T32_mmc.pHost)==T32_mmc.dev) { write_sysreg((*T32_mmc.pHost)|T32_mmc.infoBit, contextidr_el1); isb(); T32_mmc.cmd = (cmd->opcode)|T32_mmc.infoBit; write_sysreg(T32_mmc.cmd, contextidr_el1); isb(); T32_mmc.err = (err)|T32_mmc.infoBit; write_sysreg(T32_mmc.err, contextidr_el1); isb(); T32_mmc.resp0 = (cmd->resp[0])|T32_mmc.infoBit; write_sysreg(T32_mmc.resp0, contextidr_el1); isb(); } } } /* * Request starter must handle retries - see * mmc_wait_for_req_done(). */ if (mrq->done) mrq->done(mrq);}</syntaxhighlight> ==== Appendix 2: Video ====Technical Note presentation by Lauterbach (Language: Italian; Subtitles: English and Italian){{#ev:youtube|YDWAGy2QnA0|600|center|Persistent storage and read-write file systems|frame}} ===Device's built-in advanced functionalities===~~TBD~~e.MMC's feature advanced functionalities that are useful for monitoring wear-out and, in general, the health of the device. For more details, please see the section [[#Example: embedded Linux system equipped with an e.MMC|this section]].

=Power failures=

* [http://www.linux-mtd.infradead.org/doc/ubifs.html#L_writeback notes about UBIFS write-writeback support]

* [http://www.linux-mtd.infradead.org/doc/ubifs.html#L_writebuffer UBIFS write-buffer].

~~== Example: embedded Linux system equipped with SanDisk SDINBDG4-8G-XI1 eMMC and ext4 file system ==~~

~~TBD~~

= Memory health monitoring =

== Example: embedded Linux system equipped with a raw NAND flash memory and UBIFS file system ==

There's are two main ~~indicator~~ indicators of NAND device health:

* current ECC corrected errors

* block erase counter.

</syntaxhighlight>

*

As a ~~confirmirmation~~ confirmation of this data, the maximum EC of a given UBI partition can be read directly from <code>sysfs</code>:<syntaxhighlight lang="text">

root@axel:~# cat /sys/class/ubi/ubi0/max_ec

2

</syntaxhighlight>

== Example: embedded Linux system equipped with ~~SanDisk SDINBDG4~~an e.MMC ==As explained in [[#Embedded Linux systems with eMMC or SD cards|this section]], e.MMC's provide specific functionalities for device's health monitoring. In practice, these components expose some registers that make health-related information available to the host. Following is a dump of the such registers regarding the wear-8Gout status of the device, namely <code>DEVICE_LIFE_TIME_EST_TYP_B</code>, <code>DEVICE_LIFE_TIME_EST_TYP_B</code>, and <code>PRE_EOL_INFO</code>: <pre class="board-terminal">root@desk-mx8mp:~# mmc extcsd read /dev/mmcblk2 | grep LIFEeMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x01eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x01root@desk-~~XI1~~ mx8mp:~# mmc extcsd read /dev/mmcblk2 | grep EOLeMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01</pre>This dump refers to the same testbed described here. Some manufacturers use additional proprietary registers also for providing information about the amount of data that have been actually written onto the device. If available, this number allows to calculate the WAF given that the amount of data written by the applications of the test workload is known too. The health status registers can be exploited to implement a monitoring mechanism as well. For example, a user-space application can poll periodically the status of the device and ~~ext4 file system~~ take actions accordingly if the wear-out exceeds predefined thresholds. Last but not least, it is worth remembering that advanced proprietary off-line tools may also be available for health monitoring. For instance, Western Digital provides such tools for its devices. For more information, please contact our [mailto:sales@dave.eu Sales Department]. =References =~~TBD~~* Western Digital Corporation, ''[https://link.westerndigital.com/content/dam/customer-portal/en_us/external/public/cps/p/White_Paper_Design_Considerations_v1.0.pdf Design Considerations for Embedded Products]'', 2018* Western Digital Corporartion, ''[https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/collateral/white-paper/white-paper-automotive-workload-analysis.pdf Automotive Workload Analysis]'', September 2021 =Credits===Lauterbach Italian branch office==Lauterbach SRL Via Caldera 21 20153 Milan (Italy) Tel. +39 02 45490282 email [mailto:info_it@lauterbach.it info_it@lauterbach.it] Web [https://www.lauterbach.com www.lauterbach.it]

U0009

dave_user, Administrators

5,171

edits

Changes

MISC-TN-017: Persistent storage and read-write file systems

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Quick Links

Website

Contact us

How to use wiki

Advanced Search

Tools