Difference between revisions of "MISC-TN-008: Running Debian Buster (armbian) on Mito8M"

From DAVE Developer's Wiki
Jump to: navigation, search
(Test programs)
(History)
Line 17: Line 17:
 
|First public release
 
|First public release
 
|}
 
|}
 +
==Introduction==
 +
Mito8M is the first DAVE Embedded Systems' product based on a core implementing the [https://en.wikipedia.org/wiki/ARM_architecture#64/32-bit_architecture ARMv8-A] architecture. Traditionally, ARM cores that are based on 32-bit [https://en.wikipedia.org/wiki/ARM_architecture#AArch32 ARMv7-A] architecture exhibit a limited RAM bandwidth even if they are coupled with 64-bit witdh SDRAM banks. When dealing with computationally heavy tasks, this factor may turn out to be a severe bottleneck limiting the overall performance.
 +
 +
Beside an intrinsic increased computational power, ARMv8-A-based SoC's are expected to improve significantly RAM bandwidth as well. This technical note (TN for short) illustrates several benchmarking tests that were run on Mito8M SoM, which is built upon [https://www.nxp.com/products/processors-and-microcontrollers/arm-processors/i.mx-applications-processors/i.mx-8-processors/i.mx-8m-family-armcortex-a53-cortex-m4-audio-voice-video:i.MX8M NXP i.MX8M Quad].
  
 
==Testbed general configuration==
 
==Testbed general configuration==

Revision as of 11:05, 15 January 2020

Info Box
DMI-Mito-top.png Applies to MITO 8M
Warning-icon.png This technical note was validated against specific versions of hardware and software. What is described here may not work with other versions. Warning-icon.png


History[edit | edit source]

Version Date Notes
1.0.0 January 2020 First public release

Introduction[edit | edit source]

Mito8M is the first DAVE Embedded Systems' product based on a core implementing the ARMv8-A architecture. Traditionally, ARM cores that are based on 32-bit ARMv7-A architecture exhibit a limited RAM bandwidth even if they are coupled with 64-bit witdh SDRAM banks. When dealing with computationally heavy tasks, this factor may turn out to be a severe bottleneck limiting the overall performance.

Beside an intrinsic increased computational power, ARMv8-A-based SoC's are expected to improve significantly RAM bandwidth as well. This technical note (TN for short) illustrates several benchmarking tests that were run on Mito8M SoM, which is built upon NXP i.MX8M Quad.

Testbed general configuration[edit | edit source]

This section illustrates the configuration settings common to all the tests performed.

SoC and SDRAM bank organization[edit | edit source]

Mito8M
SoC SoC NXP i.MX8M Quad
ARM frequency

[MHz]

800
SDRAM Type LPDDR4
Frequency

[MHz]

1600
Bus witdth

[bit]

32
Theoretical bandiwidth

[Gb/s]

102.4
Size

[MB]

3072

Software configuration[edit | edit source]

  • Linux kernel: 4.14.98
  • Architecture: aarch64
  • Governor: userspace @ 800 MHz
root@Mito8M:~# echo userspace > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
root@Mito8M:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
userspace
root@Mito8M:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
800000

GCC

armbian@Mito8M:~/devel/lmbench$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/aarch64-linux-gnu/8/lto-wrapper
Target: aarch64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 8.3.0-6' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-8 --program-prefix=aarch64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --disable-libphobos --enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release --build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu
Thread model: posix
gcc version 8.3.0 (Debian 8.3.0-6)


Results[edit | edit source]

This section details the results that were achieved by the different benchmarks

General configuration[edit | edit source]

Testbed #1[edit | edit source]

Mito8M
ARM frequency

[MHz]

792
Frequency

[MHz]

1600
Bus witdth

[bit]

32

Detailed testing procedures[edit | edit source]

This sections details how the benchmarks were configured and run on the testbed.

STREAM[edit | edit source]

Building[edit | edit source]

git clone https://github.com/jeffhammond/STREAM.git
make
 1 armbian@Mito8M:~/devel/STREAM$ cat Makefile 
 2 CC = gcc
 3 CFLAGS = -O2 -fopenmp
 4 
 5 FC = gfortran-4.9
 6 FFLAGS = -O2 -fopenmp
 7 
 8 all: stream_c.exe
 9 
10 stream_f.exe: stream.f mysecond.o
11         $(CC) $(CFLAGS) -c mysecond.c
12         $(FC) $(FFLAGS) -c stream.f
13         $(FC) $(FFLAGS) stream.o mysecond.o -o stream_f.exe
14 
15 stream_c.exe: stream.c
16         $(CC) $(CFLAGS) stream.c -o stream_c.exe
17 
18 clean:
19         rm -f stream_f.exe stream_c.exe *.o
20 
21 # an example of a more complex build line for the Intel icc compiler
22 stream.icc: stream.c
23         icc -O3 -xCORE-AVX2 -ffreestanding -qopenmp -DSTREAM_ARRAY_SIZE=80000000 -DNTIMES=20 stream.c -o stream.omp.AVX2.80M.20x.icc

Running[edit | edit source]

armbian@Mito8M:~/devel/STREAM$ ./stream_c.exe 
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 4
Number of Threads counted = 4
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 46427 microseconds.
   (= 46427 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            6770.5     0.024010     0.023632     0.025117
Scale:           6093.2     0.027474     0.026259     0.029142
Add:             5263.5     0.046008     0.045597     0.046230
Triad:           4820.0     0.050297     0.049793     0.050723
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

Useful links[edit | edit source]