Difference between revisions of "MISC-TN-009: Characterizing the RAM bandwidth of Mito8M SoM"

From DAVE Developer's Wiki
Jump to: navigation, search
(Created page with "{{InfoBoxTop}} {{AppliesToMito8M}} {{InfoBoxBottom}} {{WarningMessage|text=This technical note was validated against specific versions of hardware and software. What is descri...")
(No difference)

Revision as of 11:07, 15 January 2020

Info Box
TBD.png Applies to Mito8M
Warning-icon.png This technical note was validated against specific versions of hardware and software. What is described here may not work with other versions. Warning-icon.png

History[edit | edit source]

Version Date Notes
1.0.0 January 2020 First public release

Introduction[edit | edit source]

Mito8M is the first DAVE Embedded Systems' product based on a core implementing the ARMv8-A architecture. Traditionally, ARM cores that are based on 32-bit ARMv7-A architecture exhibit a limited RAM bandwidth even if they are coupled with 64-bit witdh SDRAM banks. When dealing with computationally heavy tasks, this factor may turn out to be a severe bottleneck limiting the overall performance.

Beside an intrinsic increased computational power, ARMv8-A-based SoC's are expected to improve significantly RAM bandwidth as well. This technical note (TN for short) illustrates several benchmarking tests that were run on Mito8M SoM, which is built upon NXP i.MX8M Quad.

Testbed general configuration[edit | edit source]

This section illustrates the configuration settings common to all the tests performed.

SoC and SDRAM bank organization[edit | edit source]

SoC SoC NXP i.MX8M Quad
ARM frequency




Bus witdth


Theoretical bandiwidth





Software configuration[edit | edit source]

  • Linux kernel: 4.14.98
  • Architecture: aarch64
  • Governor: userspace @ 800 MHz
root@Mito8M:~# echo userspace > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
root@Mito8M:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
root@Mito8M:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq


armbian@Mito8M:~/devel/lmbench$ gcc -v
Using built-in specs.
Target: aarch64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 8.3.0-6' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-8 --program-prefix=aarch64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --disable-libphobos --enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release --build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu
Thread model: posix
gcc version 8.3.0 (Debian 8.3.0-6)

Results[edit | edit source]

This section details the results that were achieved by the different benchmarks

General configuration[edit | edit source]

Testbed #1[edit | edit source]

ARM frequency




Bus witdth



Detailed testing procedures[edit | edit source]

This sections details how the benchmarks were configured and run on the testbed.

STREAM[edit | edit source]

Building[edit | edit source]

git clone https://github.com/jeffhammond/STREAM.git
 1 armbian@Mito8M:~/devel/STREAM$ cat Makefile 
 2 CC = gcc
 3 CFLAGS = -O2 -fopenmp
 5 FC = gfortran-4.9
 6 FFLAGS = -O2 -fopenmp
 8 all: stream_c.exe
10 stream_f.exe: stream.f mysecond.o
11         $(CC) $(CFLAGS) -c mysecond.c
12         $(FC) $(FFLAGS) -c stream.f
13         $(FC) $(FFLAGS) stream.o mysecond.o -o stream_f.exe
15 stream_c.exe: stream.c
16         $(CC) $(CFLAGS) stream.c -o stream_c.exe
18 clean:
19         rm -f stream_f.exe stream_c.exe *.o
21 # an example of a more complex build line for the Intel icc compiler
22 stream.icc: stream.c
23         icc -O3 -xCORE-AVX2 -ffreestanding -qopenmp -DSTREAM_ARRAY_SIZE=80000000 -DNTIMES=20 stream.c -o stream.omp.AVX2.80M.20x.icc

Running[edit | edit source]

armbian@Mito8M:~/devel/STREAM$ ./stream_c.exe 
STREAM version $Revision: 5.10 $
This system uses 8 bytes per array element.
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
Number of Threads requested = 4
Number of Threads counted = 4
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 46427 microseconds.
   (= 46427 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            6770.5     0.024010     0.023632     0.025117
Scale:           6093.2     0.027474     0.026259     0.029142
Add:             5263.5     0.046008     0.045597     0.046230
Triad:           4820.0     0.050297     0.049793     0.050723
Solution Validates: avg error less than 1.000000e-13 on all three arrays

Useful links[edit | edit source]