Changes

Jump to: navigation, search

MISC-TN-008: Running Debian Buster (armbian) on Mito8M

5,765 bytes removed, 14:34, 20 January 2020
no edit summary
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
</pre>
 
==Testbed general configuration==
This section illustrates the configuration settings common to all the tests performed.
 
====SoC and SDRAM bank organization====
{| class="wikitable"
|+
!
!
!Mito8M
!
|-
| rowspan="2" |SoC
|SoC
|NXP i.MX8M Quad
|
|-
|ARM frequency
[MHz]
|800
|
|-
| rowspan="5" |SDRAM
|Type
|LPDDR4
|
|-
|Frequency
[MHz]
|1600
|
|-
|Bus witdth
[bit]
|32
|
|-
|Theoretical bandiwidth
[Gb/s]
|102.4
|
|-
|Size
[MB]
|3072
|
|}
 
====Software configuration====
 
* Linux kernel: 4.14.98
* Architecture: aarch64
* Governor: userspace @ 800 MHz
<pre class="board-terminal">
root@Mito8M:~# echo userspace > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
root@Mito8M:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
userspace
root@Mito8M:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
800000
</pre>
 
GCC
<pre class="board-terminal">
armbian@Mito8M:~/devel/lmbench$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/aarch64-linux-gnu/8/lto-wrapper
Target: aarch64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 8.3.0-6' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-8 --program-prefix=aarch64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --disable-libphobos --enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release --build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu
Thread model: posix
gcc version 8.3.0 (Debian 8.3.0-6)
</pre>
 
 
==Results==
This section details the results that were achieved by the different benchmarks
 
===General configuration===
 
===Testbed #1===
 
{| class="wikitable"
|+
!
!
!Mito8M
!
|-
|
|ARM frequency
[MHz]
|792
|
|-
|
|Frequency
[MHz]
|1600
|
|-
|
|Bus witdth
[bit]
|32
|
|}
 
==Detailed testing procedures==
This sections details how the benchmarks were configured and run on the testbed.
===STREAM===
 
====Building====
<pre class="board-terminal">
git clone https://github.com/jeffhammond/STREAM.git
make
</pre>
 
<syntaxhighlight lang="makefile" line="line">
armbian@Mito8M:~/devel/STREAM$ cat Makefile
CC = gcc
CFLAGS = -O2 -fopenmp
 
FC = gfortran-4.9
FFLAGS = -O2 -fopenmp
 
all: stream_c.exe
 
stream_f.exe: stream.f mysecond.o
$(CC) $(CFLAGS) -c mysecond.c
$(FC) $(FFLAGS) -c stream.f
$(FC) $(FFLAGS) stream.o mysecond.o -o stream_f.exe
 
stream_c.exe: stream.c
$(CC) $(CFLAGS) stream.c -o stream_c.exe
 
clean:
rm -f stream_f.exe stream_c.exe *.o
 
# an example of a more complex build line for the Intel icc compiler
stream.icc: stream.c
icc -O3 -xCORE-AVX2 -ffreestanding -qopenmp -DSTREAM_ARRAY_SIZE=80000000 -DNTIMES=20 stream.c -o stream.omp.AVX2.80M.20x.icc
</syntaxhighlight>
 
====Running====
<pre class="board-terminal">
armbian@Mito8M:~/devel/STREAM$ ./stream_c.exe
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 4
Number of Threads counted = 4
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 46427 microseconds.
(= 46427 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 6770.5 0.024010 0.023632 0.025117
Scale: 6093.2 0.027474 0.026259 0.029142
Add: 5263.5 0.046008 0.045597 0.046230
Triad: 4820.0 0.050297 0.049793 0.050723
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
</pre>
 
==Useful links==
*[https://www.cs.virginia.edu/stream/ STREAM benchmark]
*[http://lmbench.sourceforge.net/ LM Bench benchmark]
*[https://panthema.net/2013/pmbw/ pmbw benchmark ]
*Joshua Wyatt Smith and Andrew Hamilton, [http://inspirehep.net/record/1424637/files/1719033_626-630.pdf Parallel benchmarks for ARM processors in the highenergy context]
*T Wrigley, G Harmsen and B Mellado, [http://inspirehep.net/record/1424631/files/1719033_275-280.pdf Memory performance of ARM processors and itsrelevance to High Energy Physics]
*G. T. Wrigley, R. G. Reed, B. Mellado, [http://inspirehep.net/record/1424637/files/1719033_626-630.pdf Memory benchmarking characterisation of ARM-based SoCs]
4,650
edits

Navigation menu