Difference between revisions of "SBCX-TN-006: Characterizing the RAM bandwidth of Axel Lite SoM"

From DAVE Developer's Wiki
Jump to: navigation, search
(Running the tests)
(39 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
{{InfoBoxTop}}
 
{{InfoBoxTop}}
{{AppliesToSBCX}}
 
 
{{AppliesToAxel}}
 
{{AppliesToAxel}}
 +
{{AppliesToAxelEsatta}}
 
{{AppliesToAxelLite}}
 
{{AppliesToAxelLite}}
{{AppliesToAxelEsatta}}
+
{{AppliesToAXEL Lite TN}}
 +
{{AppliesToSBCX}}
 
{{InfoBoxBottom}}
 
{{InfoBoxBottom}}
 
{{WarningMessage|text=This technical note was validated against specific versions of hardware and software. What is described here may not work with other versions.}}
 
{{WarningMessage|text=This technical note was validated against specific versions of hardware and software. What is described here may not work with other versions.}}
Line 19: Line 20:
 
|}
 
|}
 
==Introduction==
 
==Introduction==
Mito8M is the first DAVE Embedded Systems' system-on-module (SoM) based on a core implementing the [https://en.wikipedia.org/wiki/ARM_architecture#64/32-bit_architecture ARMv8-A] architecture. Traditionally, ARM cores that are based on 32-bit [https://en.wikipedia.org/wiki/ARM_architecture#AArch32 ARMv7-A] architecture exhibit a limited RAM bandwidth even if they are coupled with 64-bit width SDRAM banks. When dealing with computationally heavy tasks, this factor may turn out to be a severe bottleneck bounding the overall performance.
+
When dealing with computationally heavy tasks, the RAM bandwidth may be a severe bottleneck bounding the overall performance. This is true especially for the SoC's used in embedded systems. For this reason, characterizing the RAM bandwidth is useful when dealing with such demanding applications.
  
Besides an intrinsic increased computational power over their predecessors, ARMv8-A-based SoC's are also expected to improve RAM bandwidth significantly. This technical note (TN for short) illustrates several benchmarking tests that were run on Mito8M SoM to characterize this bandwidth. It is worth to remember that Mito8M is built upon the [https://www.nxp.com/products/processors-and-microcontrollers/arm-processors/i.mx-applications-processors/i.mx-8-processors/i.mx-8m-family-armcortex-a53-cortex-m4-audio-voice-video:i.MX8M i.MX8M processor by NXP].
+
This technical note (TN for short) illustrates several benchmarking tests that were run on Axel Lite SoM to characterize its RAM bandwidth. As known, this SoM is built upon the [https://www.nxp.com/products/processors-and-microcontrollers/arm-processors/i.mx-applications-processors/i.mx-6-processors/i.mx-6quad-processors-high-performance-3d-graphics-hd-video-arm-cortex-a9-core:i.MX6Q i.MX6Q/D/DL/S family of processors by NXP].
  
 
==Testbed general configuration==
 
==Testbed general configuration==
This section illustrates the configuration settings common to all the tests performed. Basically, the testbed that was used is the same described in [[MISC-TN-008:_Running_Debian_Buster_(armbian)_on_Mito8M|this TN]].
+
This section illustrates the configuration settings common to all the tests that were performed. Basically, the testbed is the same described in [[SBCX-TN-004:_Running_Armbian_Buster_(Debian_10)|this TN]]. As such, it consists of Axel Lite SoM and [[:Category:SBC-AXEL|SBCX carrier board]].
  
 
===SoC and SDRAM bank===
 
===SoC and SDRAM bank===
The SoC model is i.MX8M Quad:
+
The SoC model is i.MX6Q:
 
<pre class="board-terminal">
 
<pre class="board-terminal">
armbian@Mito8M:~/devel/lmbench/tmp$ lscpu
+
armbian@sbcx:~/devel/stream/lmbench$ lscpu  
Architecture:        aarch64
+
Architecture:        armv7l
 
Byte Order:          Little Endian
 
Byte Order:          Little Endian
 
CPU(s):              4
 
CPU(s):              4
Line 37: Line 38:
 
Core(s) per socket:  4
 
Core(s) per socket:  4
 
Socket(s):          1
 
Socket(s):          1
NUMA node(s):        1
 
 
Vendor ID:          ARM
 
Vendor ID:          ARM
Model:              4
+
Model:              10
Model name:          Cortex-A53
+
Model name:          Cortex-A9
Stepping:            r0p4
+
Stepping:            r2p10
CPU max MHz:        1300.0000
+
CPU max MHz:        996.0000
CPU min MHz:        800.0000
+
CPU min MHz:        396.0000
BogoMIPS:            16.66
+
BogoMIPS:            7.54
L1d cache:          unknown size
+
Flags:              half thumb fastmult vfp edsp neon vfpv3 tls vfpd32
L1i cache:          unknown size
 
L2 cache:            unknown size
 
NUMA node0 CPU(s):  0-3
 
Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
 
 
</pre>
 
</pre>
  
This processor is capable of running either at 800 MHz or 1.3 GHz. All the tests were conducted at 800 MHz.
+
This processor is capable of running at different speeds. All the tests were conducted at 996 MHz.
  
 
The following table details the characteristics of the SDRAM bank connected to the SoC.
 
The following table details the characteristics of the SDRAM bank connected to the SoC.
Line 58: Line 54:
 
{| class="wikitable"
 
{| class="wikitable"
 
|+
 
|+
!
+
SoC and SDRAM bank configuration
!
+
! rowspan="2" |Subsystem
! colspan="2" |Platform
+
! rowspan="2" |Feature
 +
!Platform
 
|-
 
|-
!
+
!Axel Lite
!
 
!Mito8M
 
!AxelLite
 
 
|-
 
|-
| rowspan="2" |SoC
+
| rowspan="5" |SoC
 
|SoC
 
|SoC
|NXP i.MX8M Quad
+
|NXP i.MX6Q
|
 
 
|-
 
|-
|ARM frequency
+
|ARM core frequency
 
[MHz]
 
[MHz]
|800 or 1300
+
|996
|
+
|-
 +
|L1 cache (D)
 +
[kB]
 +
|32
 +
|-
 +
|L1 cache (I)
 +
[kB]
 +
|32
 +
|-
 +
|L2 cache
 +
[MB]
 +
|1
 
|-
 
|-
 
| rowspan="6" |SDRAM
 
| rowspan="6" |SDRAM
 
|Type
 
|Type
|LPDDR4
+
|DDR3
|
 
 
|-
 
|-
 
|Frequency
 
|Frequency
 
[MHz]
 
[MHz]
|1600
+
|533 (*)
|
 
 
|-
 
|-
 
|Bus witdth
 
|Bus witdth
 
[bit]
 
[bit]
|32
+
|64
|
 
 
|-
 
|-
 
|Theoretical bandwidth
 
|Theoretical bandwidth
 
[Gb/s]
 
[Gb/s]
|102.4
+
|68.2
|
 
 
|-
 
|-
 
|Theoretical bandwidth
 
|Theoretical bandwidth
 
[GB/s]
 
[GB/s]
|12.8
+
|7.9
|
 
 
|-
 
|-
 
|Size
 
|Size
 
[MB]
 
[MB]
|3072
+
|2048
|
 
 
|}
 
|}
 +
 +
 +
(*) It is worth remembering that i.MX6DualLite/Solo could achieve better results in terms of memory bandwidth, even though their SDRAM bus frequency is lower (400 MHz). This is due to an errata of the ARM PL310 L2 cache controller. This bug is not present in the i.MX6DualLite/Solo SoC's, which integrate a newer version of the controller.
  
 
===Software configuration===
 
===Software configuration===
  
* Linux kernel: 4.14.98
+
* Linux kernel: 4.9.11
 
*Root file system: Debian GNU/Linux 10 (buster)
 
*Root file system: Debian GNU/Linux 10 (buster)
* Architecture: aarch64
+
* Architecture: armv7l
* Governor: userspace @ 800 MHz
+
* Governor: userspace @ 996 MHz
 
<pre class="board-terminal">
 
<pre class="board-terminal">
root@Mito8M:~# echo userspace > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
+
armbian@sbcx:~/devel/stream/lmbench$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
root@Mito8M:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
 
 
userspace
 
userspace
root@Mito8M:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
+
armbian@sbcx:~/devel/stream/lmbench$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
800000
+
996000
 
</pre>
 
</pre>
  
  
As some benchmarks were built natively on the platform under test itself, the version of the GCC compiler is indicated as well:
+
Some benchmarks were built natively on the platform under test. For the sake of completeness, the version of the GCC compiler is then indicated as well:
 
<pre class="board-terminal">
 
<pre class="board-terminal">
armbian@Mito8M:~/devel/lmbench$ gcc -v
+
armbian@sbcx:~/devel/stream/lmbench$ gcc -v
 
Using built-in specs.
 
Using built-in specs.
 
COLLECT_GCC=gcc
 
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/aarch64-linux-gnu/8/lto-wrapper
+
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/8/lto-wrapper
Target: aarch64-linux-gnu
+
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Debian 8.3.0-6' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-8 --program-prefix=aarch64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --disable-libphobos --enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release --build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu
+
Configured with: ../src/configure -v --with-pkgversion='Debian 8.3.0-6' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-8 --program-prefix=arm-linux-gnueabihf- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-sjlj-exceptions --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard --with-mode=thumb --disable-werror --enable-checking=release --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf
 
Thread model: posix
 
Thread model: posix
gcc version 8.3.0 (Debian 8.3.0-6)
+
gcc version 8.3.0 (Debian 8.3.0-6)  
 
</pre>
 
</pre>
===Benchmarks===
 
====STREAM====
 
TBD
 
====LMbench====
 
TBD
 
====pmbw====
 
TBD
 
  
 
==Overall results==
 
==Overall results==
This section details the results that were achieved by the different benchmarks
+
This section illustrates the overall results achieved by the benchmarks.
  
 
===STREAM===
 
===STREAM===
Line 149: Line 142:
 
{| class="wikitable"
 
{| class="wikitable"
 
|+
 
|+
!
+
Overall results
!
+
!Function
!Mito8M
+
!Best rate
!
+
[MB/s]
 +
!Efficiency
 +
 
 +
[%]
 
|-
 
|-
|
+
|Copy
|ARM frequency
+
|1139.8
[MHz]
+
|14.0
|792
+
|-
|
+
|Scale
 +
|1124.8
 +
|13.8
 
|-
 
|-
|
+
|Add
|Frequency
+
|1185.1
[MHz]
+
|14.6
|1600
 
|
 
 
|-
 
|-
|
+
|Triad
|Bus witdth
+
|1214.4
[bit]
+
|14.9
|32
 
|
 
 
|}
 
|}
 +
 +
As expected, the efficiency is relatively low. Generally, 32-bit ARM architectures are known to have mediocre performances when it comes to memory bandwidth.
 +
 +
Please see [https://www.cs.virginia.edu/stream/ this page] for more details about STREAM benchmark.
 +
 
===LMbench===
 
===LMbench===
TBD
+
For what regards the memory bandwidth, LMbench provides many results organized in different categories. For the sake of simplicity, the following tables details just a couple of categories. The full results are available for download [http://mirror.dave.eu/axel/SBCX-TN-006/lmbench-axellite-i.MX6Q-996MHz.txt here].
 +
 
 +
{| class="wikitable"
 +
|+Memory read bandwidth
 +
!Buffer size
 +
!Bandwitdth
 +
[MB/s]
 +
|-
 +
|512B
 +
|2861
 +
|-
 +
|1kB
 +
|3047
 +
|-
 +
|2kB
 +
|3065
 +
|-
 +
|4kB
 +
|3077
 +
|-
 +
|8kB
 +
|3081
 +
|-
 +
|16kB
 +
|3072
 +
|-
 +
|32kB
 +
|1309
 +
|-
 +
| 0.065536 |64kB
 +
|902
 +
|-
 +
|128kB
 +
|787
 +
|-
 +
|256kB
 +
|775
 +
|-
 +
|512kB
 +
|749
 +
|-
 +
|1MB
 +
|687
 +
|-
 +
|2MB
 +
|642
 +
|-
 +
|4MB
 +
|629
 +
|-
 +
|8MB
 +
|630
 +
|-
 +
|16MB
 +
|632
 +
|-
 +
|32MB
 +
|631
 +
|-
 +
|64MB
 +
|632
 +
|-
 +
|128MB
 +
|633
 +
|-
 +
|256MB
 +
|634
 +
|-
 +
|512MB
 +
|634
 +
|-
 +
|1GB
 +
|633
 +
|}
 +
 
 +
{| class="wikitable"
 +
|+Memory write bandwidth
 +
!Buffer size
 +
!Bandwitdth
 +
[MB/s]
 +
|-
 +
|512B
 +
|3724
 +
|-
 +
|1kB
 +
|3848
 +
|-
 +
|2kB
 +
|3902
 +
|-
 +
|4kB
 +
|3940
 +
|-
 +
|8kB
 +
|3958
 +
|-
 +
|16kB
 +
|3957
 +
|-
 +
|32kB
 +
|3964
 +
|-
 +
|64kB
 +
|3967
 +
|-
 +
|128kB
 +
|3967
 +
|-
 +
|256kB
 +
|3956
 +
|-
 +
|512kB
 +
|3947
 +
|-
 +
|1MB
 +
|2097
 +
|-
 +
|2MB
 +
|2154
 +
|-
 +
|4MB
 +
|2114
 +
|-
 +
|8MB
 +
|2082
 +
|-
 +
|16MB
 +
|2084
 +
|-
 +
|32MB
 +
|2085
 +
|-
 +
|64MB
 +
|2093
 +
|-
 +
|128MB
 +
|2086
 +
|-
 +
|256MB
 +
|2089
 +
|-
 +
|512MB
 +
|2087
 +
|-
 +
|1GB
 +
|2088
 +
|}
 +
 
 +
The most interesting results to consider are those that refer to buffer sizes exceeding 1MB, which is the size of the L2 cache. Approximately, read bandwidth is 630MB/s (7.8% efficiency), while write bandwidth is 2080 MB/s (25.7% efficiency). These numbers are significantly different that the ones provided by STREAM. This confirms once again that such results are strongly dependent on the implementation of the test used to determine the bandwidth.
 +
 
 +
For more information regarding LMbench, please see [http://lmbench.sourceforge.net/ this page].
 +
 
 
===pmbw===
 
===pmbw===
TBD
+
As defined by the author, <code>pmbw</code> is "a set of assembler routines to measure the parallel memory (cache and RAM) bandwidth of modern multi-core machines." It performs a myriad of tests. Luckily, it comes with a handful tool that plots the results—which are stored in a text file—in a series of charts.
 +
 
 +
The complete results and the charts are available at the following links:
 +
*http://mirror.dave.eu/axel/SBCX-TN-006/pmbw-stats-AxelLite-i.MX6Q-996MHz.txt
 +
*http://mirror.dave.eu/axel/SBCX-TN-006/pmbw-plots-AxelLite-i.MX6Q-996MHz.pdf
 +
 
 +
Generally speaking, the charts exhibit significant declines in the performances when the array size is around the L1 and the L2 cache size.
 +
 
 +
For more details about <code>pmbw</code>, please refer to [https://panthema.net/2013/pmbw/ this page].
  
 
==Useful links==
 
==Useful links==
*[https://www.cs.virginia.edu/stream/ STREAM benchmark]
 
*[http://lmbench.sourceforge.net/ LM Bench benchmark]
 
*[https://panthema.net/2013/pmbw/ pmbw benchmark]
 
 
*Joshua Wyatt Smith and Andrew Hamilton, [http://inspirehep.net/record/1424637/files/1719033_626-630.pdf Parallel benchmarks for ARM processors in the highenergy context]
 
*Joshua Wyatt Smith and Andrew Hamilton, [http://inspirehep.net/record/1424637/files/1719033_626-630.pdf Parallel benchmarks for ARM processors in the highenergy context]
 
*T Wrigley, G Harmsen and B Mellado, [http://inspirehep.net/record/1424631/files/1719033_275-280.pdf Memory performance of ARM processors and itsrelevance to High Energy Physics]
 
*T Wrigley, G Harmsen and B Mellado, [http://inspirehep.net/record/1424631/files/1719033_275-280.pdf Memory performance of ARM processors and itsrelevance to High Energy Physics]
Line 190: Line 345:
  
 
====Building====
 
====Building====
 +
To build STREAM:
 +
* clone its git repository
 +
*modify the <code>Makefile</code> as shown below
 +
*issue the <code>make</code> command.
 +
 
<pre class="board-terminal">
 
<pre class="board-terminal">
 
git clone https://github.com/jeffhammond/STREAM.git
 
git clone https://github.com/jeffhammond/STREAM.git
Line 260: Line 420:
  
 
===LMbench===
 
===LMbench===
 
+
To run this benchmark, the native prebuilt package provided by Debian Buster was used.<pre class="board-terminal" mw-collapsible="" mw-collapsed"="">
====Running the tests====
+
armbian@sbcx:~/devel/stream/STREAM$ sudo lmbench-run  
=====ARM core clock = 800 MHz=====
 
<pre class="board-terminal" mw-collapsible="" mw-collapsed"="">
 
armbian@Mito8M:~/devel/lmbench$ sudo lmbench-run
 
[sudo] password for armbian:
 
/usr/lib/lmbench/scripts/gnu-os: unable to guess system type
 
 
 
This script, last modified 2004-08-18, has failed to recognize
 
the operating system you are using. It is advised that you
 
download the most up to date version of the config scripts from
 
 
 
    ftp://ftp.gnu.org/pub/gnu/config/
 
 
 
If the version you run (/usr/lib/lmbench/scripts/gnu-os) is already up to date, please
 
send the following data and any information you think might be
 
pertinent to <config-patches@gnu.org> in order to provide the needed
 
information to handle your system.
 
 
 
config.guess timestamp = 2004-08-18
 
 
 
uname -m = aarch64
 
uname -r = 4.14.98-g4c94e1dbaec2
 
uname -s = Linux
 
uname -v = #1 SMP PREEMPT Mon Sep 30 14:46:22 CEST 2019
 
 
 
/usr/bin/uname -p =
 
/bin/uname -X    =
 
 
 
hostinfo              =
 
/bin/universe          =
 
/usr/bin/arch -k      =
 
/bin/arch              =
 
/usr/bin/oslevel      =
 
/usr/convex/getsysinfo =
 
 
 
UNAME_MACHINE = aarch64
 
UNAME_RELEASE = 4.14.98-g4c94e1dbaec2
 
UNAME_SYSTEM  = Linux
 
UNAME_VERSION = #1 SMP PREEMPT Mon Sep 30 14:46:22 CEST 2019
 
 
=====================================================================
 
=====================================================================
  
Line 365: Line 487:
  
 
Hang on, we are calculating your loop overhead.
 
Hang on, we are calculating your loop overhead.
OK, it looks like your benchmark loop costs 0.00000136 usecs.
+
OK, it looks like your benchmark loop costs 0.00000055 usecs.
  
 
=====================================================================
 
=====================================================================
Line 376: Line 498:
 
take somewhat longer to run the benchmark.
 
take somewhat longer to run the benchmark.
  
MB [default 2097]: 1024
+
MB [default 1186]: 1024
 
Checking to see if you have 1024 MB; please wait for a moment...
 
Checking to see if you have 1024 MB; please wait for a moment...
 
1024MB OK
 
1024MB OK
Line 382: Line 504:
 
1024MB OK
 
1024MB OK
 
Hang on, we are calculating your cache line size.
 
Hang on, we are calculating your cache line size.
OK, it looks like your cache line is 64 bytes.
+
OK, it looks like your cache line is 32 bytes.
  
 
=====================================================================
 
=====================================================================
Line 457: Line 579:
 
I think your CPU mhz is  
 
I think your CPU mhz is  
  
         798 MHz, 1.2531 nanosec clock
+
         1992 MHz, 0.5020 nanosec clock
  
 
but I am frequently wrong.  If that is the wrong Mhz, type in your
 
but I am frequently wrong.  If that is the wrong Mhz, type in your
Line 468: Line 590:
 
1.8GHz P4 may be reported as a 3592MHz processor.
 
1.8GHz P4 may be reported as a 3592MHz processor.
  
Processor mhz [default 798 MHz, 1.2531 nanosec clock]:  
+
Processor mhz [default 1992 MHz, 0.5020 nanosec clock]: 996
 
=====================================================================
 
=====================================================================
  
Line 477: Line 599:
 
system.
 
system.
  
FSDIR [default /var/tmp/lmbench]: /tmp/lmbench
+
FSDIR [default /var/tmp/lmbench]: /home/armbian/devel/stream/lmbench
 
=====================================================================
 
=====================================================================
  
Line 504: Line 626:
 
Send mail to majordomo@bitmover.com to join the list.
 
Send mail to majordomo@bitmover.com to join the list.
  
/usr/lib/lmbench/scripts/gnu-os: unable to guess system type
+
Using config in CONFIG.sbcx
 
+
Thu Jan 24 09:27:32 CET 2020
This script, last modified 2004-08-18, has failed to recognize
 
the operating system you are using. It is advised that you
 
download the most up to date version of the config scripts from
 
 
 
    ftp://ftp.gnu.org/pub/gnu/config/
 
 
 
If the version you run (/usr/lib/lmbench/scripts/gnu-os) is already up to date, please
 
send the following data and any information you think might be
 
pertinent to <config-patches@gnu.org> in order to provide the needed
 
information to handle your system.
 
 
 
config.guess timestamp = 2004-08-18
 
 
 
uname -m = aarch64
 
uname -r = 4.14.98-g4c94e1dbaec2
 
uname -s = Linux
 
uname -v = #1 SMP PREEMPT Mon Sep 30 14:46:22 CEST 2019
 
 
 
/usr/bin/uname -p =
 
/bin/uname -X    =
 
 
 
hostinfo              =
 
/bin/universe          =
 
/usr/bin/arch -k      =
 
/bin/arch              =
 
/usr/bin/oslevel      =
 
/usr/convex/getsysinfo =
 
 
 
UNAME_MACHINE = aarch64
 
UNAME_RELEASE = 4.14.98-g4c94e1dbaec2
 
UNAME_SYSTEM  = Linux
 
UNAME_VERSION = #1 SMP PREEMPT Mon Sep 30 14:46:22 CEST 2019
 
Using config in CONFIG.Mito8M
 
Wed Jan 15 10:56:54 CET 2020
 
 
Latency measurements
 
Latency measurements
Wed Jan 15 10:57:29 CET 2020
+
Thu Jan 24 09:28:38 CET 2020
 
Local networking
 
Local networking
Wed Jan 15 10:58:36 CET 2020
+
Fri Jan 24 09:29:38 CET 2020
 
Bandwidth measurements
 
Bandwidth measurements
Wed Jan 15 11:03:02 CET 2020
+
Fri Jan 24 09:41:01 CET 2020
 
Calculating context switch overhead
 
Calculating context switch overhead
Wed Jan 15 11:03:09 CET 2020
+
Fri Jan 24 09:41:06 CET 2020
 
Calculating effective TLB size
 
Calculating effective TLB size
Wed Jan 15 11:03:10 CET 2020
+
Fri Jan 24 09:41:08 CET 2020
 
Calculating memory load parallelism
 
Calculating memory load parallelism
Wed Jan 15 11:14:34 CET 2020
+
Fri Jan 24 09:53:02 CET 2020
 
McCalpin's STREAM benchmark
 
McCalpin's STREAM benchmark
Wed Jan 15 11:15:30 CET 2020
+
Fri Jan 24 09:54:55 CET 2020
 
Calculating memory load latency
 
Calculating memory load latency
Wed Jan 15 11:35:54 CET 2020
+
Fri Jan 24 10:28:12 CET 2020
 
Benchmark run finished....
 
Benchmark run finished....
 
Remember you can find the results of the benchmark  
 
Remember you can find the results of the benchmark  
 
under /var/lib/lmbench/results
 
under /var/lib/lmbench/results
 
</pre>
 
</pre>
=====ARM core clock = 1300 MHz=====
 
 
====Results====
 
TBD inserire link a file scaricabile
 
  
 +
===pmbw===
 +
====Building====
 +
Building pmbw is straightforward. Please click on ''Expand'' to show the box that illustrates the procedure.
 
<pre class="board-terminal mw-collapsible mw-collapsed">
 
<pre class="board-terminal mw-collapsible mw-collapsed">
 +
armbian@sbcx:~/devel/pmbw$ git clone https://github.com/bingmann/pmbw.git
 +
Cloning into 'pmbw'...
 +
remote: Enumerating objects: 15, done.
 +
remote: Counting objects: 100% (15/15), done.
 +
remote: Compressing objects: 100% (15/15), done.
 +
remote: Total 386 (delta 1), reused 3 (delta 0), pack-reused 371
 +
Receiving objects: 100% (386/386), 369.04 KiB | 1.23 MiB/s, done.
 +
Resolving deltas: 100% (232/232), done.
 +
armbian@sbcx:~/devel/pmbw$ cd pmbw/
 +
armbian@sbcx:~/devel/pmbw/pmbw$ ./configure && make
 +
checking build system type... armv7l-unknown-linux-gnueabihf
 +
checking host system type... armv7l-unknown-linux-gnueabihf
 +
checking target system type... armv7l-unknown-linux-gnueabihf
 +
checking for a BSD-compatible install... /usr/bin/install -c
 +
checking whether build environment is sane... yes
 +
checking for a thread-safe mkdir -p... /bin/mkdir -p
 +
checking for gawk... no
 +
checking for mawk... mawk
 +
checking whether make sets $(MAKE)... yes
 +
checking whether make supports nested variables... yes
 +
checking whether to enable maintainer-specific portions of Makefiles... no
 +
checking building for Windows... no
 +
checking for g++... g++
 +
checking whether the C++ compiler works... yes
 +
checking for C++ compiler default output file name... a.out
 +
checking for suffix of executables...
 +
checking whether we are cross compiling... no
 +
checking for suffix of object files... o
 +
checking whether we are using the GNU C++ compiler... yes
 +
checking whether g++ accepts -g... yes
 +
checking whether make supports the include directive... yes (GNU style)
 +
checking dependency style of g++... gcc3
 +
checking whether g++ supports -march=x86-64... no
 +
checking for pthread_mutex_init in -lpthread... yes
 +
checking for clock_gettime in -lrt... yes
 +
checking for posix_memalign in -lc... yes
 +
checking that generated files are newer than configure... done
 +
configure: creating ./config.status
 +
config.status: creating Makefile
 +
config.status: executing depfiles commands
 +
g++ -DPACKAGE_NAME=\"pmbw\" -DPACKAGE_TARNAME=\"pmbw\" -DPACKAGE_VERSION=\"0.6.3\" -DPACKAGE_STRING=\"pmbw\ 0.6.3\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"pmbw\" -DVERSION=\"0.6.3\" -DON_WINDOWS=false -DHAVE_POSIX_MEMALIGN=1 -I.    -W -Wall -g -O2 -MT pmbw.o -MD -MP -MF .deps/pmbw.Tpo -c -o pmbw.o pmbw.cc
 +
mv -f .deps/pmbw.Tpo .deps/pmbw.Po
 +
g++ -W -Wall -g -O2  -o pmbw pmbw.o  -lpthread -lrt
 +
g++ -DPACKAGE_NAME=\"pmbw\" -DPACKAGE_TARNAME=\"pmbw\" -DPACKAGE_VERSION=\"0.6.3\" -DPACKAGE_STRING=\"pmbw\ 0.6.3\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"pmbw\" -DVERSION=\"0.6.3\" -DON_WINDOWS=false -DHAVE_POSIX_MEMALIGN=1 -I.    -W -Wall -g -O2 -MT stats2gnuplot.o -MD -MP -MF .deps/stats2gnuplot.Tpo -c -o stats2gnuplot.o stats2gnuplot.cc
 +
In file included from /usr/include/c++/8/vector:69,
 +
                from stats2gnuplot.cc:34:
 +
/usr/include/c++/8/bits/vector.tcc: In member function 'void std::vector<_Tp, _Alloc>::_M_realloc_insert(std::vector<_Tp, _Alloc>::iterator, _Args&& ...) [with _Args = {const Result&}; _Tp = Result; _Alloc = std::allocator<Result>]':
 +
/usr/include/c++/8/bits/vector.tcc:413:7: note: parameter passing for argument of type 'std::vector<Result>::iterator' {aka '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >'} changed in GCC 7.1
 +
      vector<_Tp, _Alloc>::
 +
      ^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)[[User:U0001|U0001]] ([[User talk:U0001|talk]])
 +
In file included from /usr/include/c++/8/vector:64,
 +
                from stats2gnuplot.cc:34:
 +
/usr/include/c++/8/bits/stl_vector.h: In function 'bool process_line(const string&)':
 +
/usr/include/c++/8/bits/stl_vector.h:1085:4: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
 +
    _M_realloc_insert(end(), __x);
 +
    ^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~
 +
In file included from /usr/include/c++/8/algorithm:62,
 +
                from stats2gnuplot.cc:33:
 +
/usr/include/c++/8/bits/stl_algo.h: In function 'void std::__unguarded_linear_insert(_RandomAccessIterator, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<Result*, std::vector<Result> >; _Compare = __gnu_cxx::__ops::_Val_less_iter]':
 +
/usr/include/c++/8/bits/stl_algo.h:1821:5: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
 +
    __unguarded_linear_insert(_RandomAccessIterator __last,
 +
    ^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)[[User:U0001|U0001]] ([[User talk:U0001|talk]]) 09:46, 24 January 2020 (UTC)
 +
/usr/include/c++/8/bits/stl_algo.h: In function 'void std::__insertion_sort(_RandomAccessIterator, _RandomAccessIterator, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<Result*, std::vector<Result> >; _Compare = __gnu_cxx::__ops::_Iter_less_iter]':
 +
/usr/include/c++/8/bits/stl_algo.h:1840:5: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
 +
    __insertion_sort(_RandomAccessIterator __first,
 +
    ^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)
 +
/usr/include/c++/8/bits/stl_algo.h:1840:5: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
 +
In file included from /usr/include/c++/8/bits/stl_algo.h:61,
 +
                from /usr/include/c++/8/algorithm:62,
 +
                from stats2gnuplot.cc:33:
 +
/usr/include/c++/8/bits/stl_heap.h: In function 'void std::__adjust_heap(_RandomAccessIterator, _Distance, _Distance, _Tp, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<Result*, std::vector<Result> >; _Distance = int; _Tp = Result; _Compare = __gnu_cxx::__ops::_Iter_less_iter]':
 +
/usr/include/c++/8/bits/stl_heap.h:214:5: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
 +
    __adjust_heap(_RandomAccessIterator __first, _Distance __holeIndex,
 +
    ^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~~
 +
/usr/include/c++/8/bits/stl_heap.h: In function 'void std::__make_heap(_RandomAccessIterator, _RandomAccessIterator, _Compare&) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<Result*, std::vector<Result> >; _Compare = __gnu_cxx::__ops::_Iter_less_iter]':
 +
/usr/include/c++/8/bits/stl_heap.h:326:5: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
 +
    __make_heap(_RandomAccessIterator __first, _RandomAccessIterator __last,
 +
    ^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)
 +
/usr/include/c++/8/bits/stl_heap.h:326:5: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
 +
/usr/include/c++/8/bits/stl_heap.h: In function 'void std::__pop_heap(_RandomAccessIterator, _RandomAccessIterator, _RandomAccessIterator, _Compare&) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<Result*, std::vector<Result> >; _Compare = __gnu_cxx::__ops::_Iter_less_iter]':
 +
/usr/include/c++/8/bits/stl_heap.h:243:5: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
 +
    __pop_heap(_RandomAccessIterator __first, _RandomAccessIterator __last,
 +
    ^09:46, 24 January 2020 (UTC)[[User:U0001|U0001]] ([[User talk:U0001|talk]]) 09:46, 24 January 2020 (UTC)
 +
/usr/include/c++/8/bits/stl_heap.h:243:5: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
 +
/usr/include/c++/8/bits/stl_heap.h:243:5: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
 +
In file included from /usr/include/c++/8/algorithm:62,
 +
                from stats2gnuplot.cc:33:
 +
/usr/include/c++/8/bits/stl_algo.h: In function 'void std::__introsort_loop(_RandomAccessIterator, _RandomAccessIterator, _Size, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<Result*, std::vector<Result> >; _Size = int; _Compare = __gnu_cxx::__ops::_Iter_less_iter]':
 +
/usr/include/c++/8/bits/stl_algo.h:1940:5: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
 +
    __introsort_loop(_RandomAccessIterator __first,
 +
    ^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)
 +
/usr/include/c++/8/bits/stl_algo.h:1940:5: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
 +
In file included from /usr/include/c++/8/bits/stl_algo.h:61,
 +
                from /usr/include/c++/8/algorithm:62,
 +
                from stats2gnuplot.cc:33:
 +
/usr/include/c++/8/bits/stl_heap.h:408:19: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
 +
    std::__pop_heap(__first, __last, __last, __comp);
 +
    09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~~
 +
In file included from /usr/include/c++/8/algorithm:62,
 +
                from stats2gnuplot.cc:33:
 +
/usr/include/c++/8/bits/stl_algo.h:1954:25: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
 +
    std::__introsort_loop(__cut, __last, __depth_limit, __comp);
 +
    09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~~
 +
/usr/include/c++/8/bits/stl_algo.h:1672:23: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
 +
      std::__make_heap(__first, __middle, __comp);
 +
      09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~
 +
/usr/include/c++/8/bits/stl_algo.h: In function 'int main(int, char**)':
 +
/usr/include/c++/8/bits/stl_algo.h:1968:25: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
 +
    std::__introsort_loop(__first, __last,
 +
    09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~
 +
    std::__lg(__last - __first) * 2,
 +
    09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~~
 +
    __comp);
 +
    09:46, 24 January 2020 (UTC)~~             
 +
/usr/include/c++/8/bits/stl_algo.h:1885:25: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
 +
    std::__insertion_sort(__first, __first + int(_S_threshold), __comp);
 +
    09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)
 +
/usr/include/c++/8/bits/stl_algo.h:1890:23: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
 +
  std::__insertion_sort(__first, __last, __comp);
 +
  09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)[[User:U0001|U0001]] ([[User talk:U0001|talk]]) 09:46, 24 January 2020 (UTC)
 +
mv -f .deps/stats2gnuplot.Tpo .deps/stats2gnuplot.Po
 +
g++ -W -Wall -g -O2  -o stats2gnuplot stats2gnuplot.o  -lpthread -lrt
 
</pre>
 
</pre>
  
===pmbw===
 
====Building====
 
 
====Running the tests====
 
====Running the tests====
=====ARM core clock = 800 MHz=====
+
The benchmark was run as follows:
=====ARM core clock = 1300 MHz=====
+
<pre class="board-terminal">
 +
armbian@sbcx:~/devel/pmbw/pmbw$ sudo nice -n -2 ./pmbw -S 0
 +
Running benchmarks with no upper array size limit.
 +
Detected 1695 MiB physical RAM and 4 CPUs.
 +
 
 +
Allocating 1024 MiB for testing.
 +
Running nthreads=1 factor=1073741824 areasize=1024 thrsize=1024 testsize=1024 repeats=1048576 testvol=1073741824 testaccess=268435456
 +
run time = 0.424801 -> rerunning test with repeat factor=3791455289
 +
Running nthreads=1 factor=3791455289 areasize=1024 thrsize=1024 testsize=1024 repeats=3702594 testvol=3791456256 testaccess=947864064
 +
...
 +
</pre>
 +
 
 +
 
 +
To generate the charts plotting the results, the following command was issued:
 +
<pre class="board-terminal">
 +
./stats2gnuplot stats.txt | gnuplot
 +
</pre>

Revision as of 14:25, 8 October 2020

Info Box
Axel-04.png Applies to Axel Ultra
Axel-02.png Applies to AXEL ESATTA
Axel-lite 02.png Applies to Axel Lite
SBC-AXEL-02.png Applies to SBC AXEL
Warning-icon.png This technical note was validated against specific versions of hardware and software. What is described here may not work with other versions. Warning-icon.png


History[edit | edit source]

Version Date Notes
1.0.0 January 2020 First public release

Introduction[edit | edit source]

When dealing with computationally heavy tasks, the RAM bandwidth may be a severe bottleneck bounding the overall performance. This is true especially for the SoC's used in embedded systems. For this reason, characterizing the RAM bandwidth is useful when dealing with such demanding applications.

This technical note (TN for short) illustrates several benchmarking tests that were run on Axel Lite SoM to characterize its RAM bandwidth. As known, this SoM is built upon the i.MX6Q/D/DL/S family of processors by NXP.

Testbed general configuration[edit | edit source]

This section illustrates the configuration settings common to all the tests that were performed. Basically, the testbed is the same described in this TN. As such, it consists of Axel Lite SoM and SBCX carrier board.

SoC and SDRAM bank[edit | edit source]

The SoC model is i.MX6Q:

armbian@sbcx:~/devel/stream/lmbench$ lscpu 
Architecture:        armv7l
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
Vendor ID:           ARM
Model:               10
Model name:          Cortex-A9
Stepping:            r2p10
CPU max MHz:         996.0000
CPU min MHz:         396.0000
BogoMIPS:            7.54
Flags:               half thumb fastmult vfp edsp neon vfpv3 tls vfpd32

This processor is capable of running at different speeds. All the tests were conducted at 996 MHz.

The following table details the characteristics of the SDRAM bank connected to the SoC.

SoC and SDRAM bank configuration
Subsystem Feature Platform
Axel Lite
SoC SoC NXP i.MX6Q
ARM core frequency

[MHz]

996
L1 cache (D)

[kB]

32
L1 cache (I)

[kB]

32
L2 cache

[MB]

1
SDRAM Type DDR3
Frequency

[MHz]

533 (*)
Bus witdth

[bit]

64
Theoretical bandwidth

[Gb/s]

68.2
Theoretical bandwidth

[GB/s]

7.9
Size

[MB]

2048


(*) It is worth remembering that i.MX6DualLite/Solo could achieve better results in terms of memory bandwidth, even though their SDRAM bus frequency is lower (400 MHz). This is due to an errata of the ARM PL310 L2 cache controller. This bug is not present in the i.MX6DualLite/Solo SoC's, which integrate a newer version of the controller.

Software configuration[edit | edit source]

  • Linux kernel: 4.9.11
  • Root file system: Debian GNU/Linux 10 (buster)
  • Architecture: armv7l
  • Governor: userspace @ 996 MHz
armbian@sbcx:~/devel/stream/lmbench$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
userspace
armbian@sbcx:~/devel/stream/lmbench$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
996000


Some benchmarks were built natively on the platform under test. For the sake of completeness, the version of the GCC compiler is then indicated as well:

armbian@sbcx:~/devel/stream/lmbench$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/8/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Debian 8.3.0-6' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-8 --program-prefix=arm-linux-gnueabihf- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-sjlj-exceptions --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=hard --with-mode=thumb --disable-werror --enable-checking=release --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf
Thread model: posix
gcc version 8.3.0 (Debian 8.3.0-6) 

Overall results[edit | edit source]

This section illustrates the overall results achieved by the benchmarks.

STREAM[edit | edit source]

Overall results
Function Best rate

[MB/s]

Efficiency

[%]

Copy 1139.8 14.0
Scale 1124.8 13.8
Add 1185.1 14.6
Triad 1214.4 14.9

As expected, the efficiency is relatively low. Generally, 32-bit ARM architectures are known to have mediocre performances when it comes to memory bandwidth.

Please see this page for more details about STREAM benchmark.

LMbench[edit | edit source]

For what regards the memory bandwidth, LMbench provides many results organized in different categories. For the sake of simplicity, the following tables details just a couple of categories. The full results are available for download here.

Memory read bandwidth
Buffer size Bandwitdth

[MB/s]

512B 2861
1kB 3047
2kB 3065
4kB 3077
8kB 3081
16kB 3072
32kB 1309
64kB 902
128kB 787
256kB 775
512kB 749
1MB 687
2MB 642
4MB 629
8MB 630
16MB 632
32MB 631
64MB 632
128MB 633
256MB 634
512MB 634
1GB 633
Memory write bandwidth
Buffer size Bandwitdth

[MB/s]

512B 3724
1kB 3848
2kB 3902
4kB 3940
8kB 3958
16kB 3957
32kB 3964
64kB 3967
128kB 3967
256kB 3956
512kB 3947
1MB 2097
2MB 2154
4MB 2114
8MB 2082
16MB 2084
32MB 2085
64MB 2093
128MB 2086
256MB 2089
512MB 2087
1GB 2088

The most interesting results to consider are those that refer to buffer sizes exceeding 1MB, which is the size of the L2 cache. Approximately, read bandwidth is 630MB/s (7.8% efficiency), while write bandwidth is 2080 MB/s (25.7% efficiency). These numbers are significantly different that the ones provided by STREAM. This confirms once again that such results are strongly dependent on the implementation of the test used to determine the bandwidth.

For more information regarding LMbench, please see this page.

pmbw[edit | edit source]

As defined by the author, pmbw is "a set of assembler routines to measure the parallel memory (cache and RAM) bandwidth of modern multi-core machines." It performs a myriad of tests. Luckily, it comes with a handful tool that plots the results—which are stored in a text file—in a series of charts.

The complete results and the charts are available at the following links:

Generally speaking, the charts exhibit significant declines in the performances when the array size is around the L1 and the L2 cache size.

For more details about pmbw, please refer to this page.

Useful links[edit | edit source]

Appendix A: Detailed testing procedures[edit | edit source]

This section details how the benchmarks were configured and run on the testbed.

STREAM[edit | edit source]

Building[edit | edit source]

To build STREAM:

  • clone its git repository
  • modify the Makefile as shown below
  • issue the make command.
git clone https://github.com/jeffhammond/STREAM.git
make
 1 armbian@Mito8M:~/devel/STREAM$ cat Makefile 
 2 CC = gcc
 3 CFLAGS = -O2 -fopenmp
 4 
 5 FC = gfortran-4.9
 6 FFLAGS = -O2 -fopenmp
 7 
 8 all: stream_c.exe
 9 
10 stream_f.exe: stream.f mysecond.o
11         $(CC) $(CFLAGS) -c mysecond.c
12         $(FC) $(FFLAGS) -c stream.f
13         $(FC) $(FFLAGS) stream.o mysecond.o -o stream_f.exe
14 
15 stream_c.exe: stream.c
16         $(CC) $(CFLAGS) stream.c -o stream_c.exe
17 
18 clean:
19         rm -f stream_f.exe stream_c.exe *.o
20 
21 # an example of a more complex build line for the Intel icc compiler
22 stream.icc: stream.c
23         icc -O3 -xCORE-AVX2 -ffreestanding -qopenmp -DSTREAM_ARRAY_SIZE=80000000 -DNTIMES=20 stream.c -o stream.omp.AVX2.80M.20x.icc

Running the tests[edit | edit source]

armbian@sbcx:~/devel/stream/STREAM$ ./stream_c.exe 
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 4
Number of Threads counted = 4
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 153104 microseconds.
   (= 153104 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            1139.8     0.141910     0.140376     0.144318
Scale:           1124.8     0.143720     0.142245     0.144615
Add:             1185.1     0.204792     0.202517     0.206718
Triad:           1214.4     0.201321     0.197631     0.202673
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

LMbench[edit | edit source]

To run this benchmark, the native prebuilt package provided by Debian Buster was used.

armbian@sbcx:~/devel/stream/STREAM$ sudo lmbench-run 
=====================================================================

                L M B E N C H   C ON F I G U R A T I O N
                ----------------------------------------

You need to configure some parameters to lmbench.  Once you have configured
these parameters, you may do multiple runs by saying

        "make rerun"

in the src subdirectory.

NOTICE: please do not have any other activity on the system if you can
help it.  Things like the second hand on your xclock or X perfmeters
are not so good when benchmarking.  In fact, X is not so good when
benchmarking.

=====================================================================

If you are running on an MP machine and you want to try running
multiple copies of lmbench in parallel, you can specify how many here.

Using this option will make the benchmark run 100x slower (sorry).

NOTE:  WARNING! This feature is experimental and many results are 
        known to be incorrect or random!

MULTIPLE COPIES [default 1]: 
=====================================================================

Options to control job placement
1) Allow scheduler to place jobs
2) Assign each benchmark process with any attendent child processes
   to its own processor
3) Assign each benchmark process with any attendent child processes
   to its own processor, except that it will be as far as possible
   from other processes
4) Assign each benchmark and attendent processes to their own
   processors
5) Assign each benchmark and attendent processes to their own
   processors, except that they will be as far as possible from
   each other and other processes
6) Custom placement: you assign each benchmark process with attendent
   child processes to processors
7) Custom placement: you assign each benchmark and attendent
   processes to processors

Note: some benchmarks, such as bw_pipe, create attendent child
processes for each benchmark process.  For example, bw_pipe
needs a second process to send data down the pipe to be read
by the benchmark process.  If you have three copies of the
benchmark process running, then you actually have six processes;
three attendent child processes sending data down the pipes and 
three benchmark processes reading data and doing the measurements.

Job placement selection [default 1]: 
=====================================================================

Hang on, we are calculating your timing granularity.
OK, it looks like you can time stuff down to 5000 usec resolution.

Hang on, we are calculating your timing overhead.
OK, it looks like your gettimeofday() costs 0 usecs.

Hang on, we are calculating your loop overhead.
OK, it looks like your benchmark loop costs 0.00000055 usecs.

=====================================================================

Several benchmarks operate on a range of memory.  This memory should be
sized such that it is at least 4 times as big as the external cache[s]
on your system.   It should be no more than 80% of your physical memory.

The bigger the range, the more accurate the results, but larger sizes
take somewhat longer to run the benchmark.

MB [default 1186]: 1024
Checking to see if you have 1024 MB; please wait for a moment...
1024MB OK
1024MB OK
1024MB OK
Hang on, we are calculating your cache line size.
OK, it looks like your cache line is 32 bytes.

=====================================================================

lmbench measures a wide variety of system performance, and the full suite
of benchmarks can take a long time on some platforms.  Consequently, we
offer the capability to run only predefined subsets of benchmarks, one
for operating system specific benchmarks and one for hardware specific
benchmarks.  We also offer the option of running only selected benchmarks
which is useful during operating system development.

Please remember that if you intend to publish the results you either need
to do a full run or one of the predefined OS or hardware subsets.

SUBSET (ALL|HARWARE|OS|DEVELOPMENT) [default all]: 
=====================================================================

This benchmark measures, by default, memory latency for a number of
different strides.  That can take a long time and is most useful if you
are trying to figure out your cache line size or if your cache line size
is greater than 128 bytes.

If you are planning on sending in these results, please don't do a fast
run.

Answering yes means that we measure memory latency with a 128 byte stride.  

FASTMEM [default no]: 
=====================================================================

This benchmark measures, by default, file system latency.  That can
take a long time on systems with old style file systems (i.e., UFS,
FFS, etc.).  Linux' ext2fs and Sun's tmpfs are fast enough that this
test is not painful.

If you are planning on sending in these results, please don't do a fast
run.

If you want to skip the file system latency tests, answer "yes" below.

SLOWFS [default no]: yes
=====================================================================

This benchmark can measure disk zone bandwidths and seek times.  These can
be turned into whizzy graphs that pretty much tell you everything you might
need to know about the performance of your disk.  

This takes a while and requires read access to a disk drive.  
Write is not measured, see disk.c to see how if you want to do so.

If you want to skip the disk tests, hit return below.

If you want to include disk tests, then specify the path to the disk
device, such as /dev/sda.  For each disk that is readable, you'll be
prompted for a one line description of the drive, i.e., 

        Iomega IDE ZIP
or
        HP C3725S 2GB on 10MB/sec NCR SCSI bus

DISKS [default none]: 
=====================================================================

If you are running on an idle network and there are other, identically
configured systems, on the same wire (no gateway between you and them),
and you have rsh access to them, then you should run the network part
of the benchmarks to them.  Please specify any such systems as a space
separated list such as: ether-host fddi-host hippi-host.

REMOTE [default none]: 
=====================================================================

Calculating mhz, please wait for a moment...
I think your CPU mhz is 

        1992 MHz, 0.5020 nanosec clock

but I am frequently wrong.  If that is the wrong Mhz, type in your
best guess as to your processor speed.  It doesn't have to be exact,
but if you know it is around 800, say 800.  

Please note that some processors, such as the P4, have a core which
is double-clocked, so on those processors the reported clock speed
will be roughly double the advertised clock rate.  For example, a
1.8GHz P4 may be reported as a 3592MHz processor.

Processor mhz [default 1992 MHz, 0.5020 nanosec clock]: 996
=====================================================================

We need a place to store a 1024 Mbyte file as well as create and delete a
large number of small files.  We default to /var/tmp.  If /var/tmp is a
memory resident file system (i.e., tmpfs), pick a different place.
Please specify a directory that has enough space and is a local file
system.

FSDIR [default /var/tmp/lmbench]: /home/armbian/devel/stream/lmbench
=====================================================================

lmbench outputs status information as it runs various benchmarks.
By default this output is sent to /dev/tty, but you may redirect
it to any file you wish (such as /dev/null...).

Status output file [default /dev/tty]: 
=====================================================================

There is a database of benchmark results that is shipped with new
releases of lmbench.  Your results can be included in the database
if you wish.  The more results the better, especially if they include
remote networking.  If your results are interesting, i.e., for a new
fast box, they may be made available on the lmbench web page, which is

        http://www.bitmover.com/lmbench

Mail results [default yes]: no
OK, no results mailed.
=====================================================================

Confguration done, thanks.

There is a mailing list for discussing lmbench hosted at BitMover. 
Send mail to majordomo@bitmover.com to join the list.

Using config in CONFIG.sbcx
Thu Jan 24 09:27:32 CET 2020
Latency measurements
Thu Jan 24 09:28:38 CET 2020
Local networking
Fri Jan 24 09:29:38 CET 2020
Bandwidth measurements
Fri Jan 24 09:41:01 CET 2020
Calculating context switch overhead
Fri Jan 24 09:41:06 CET 2020
Calculating effective TLB size
Fri Jan 24 09:41:08 CET 2020
Calculating memory load parallelism
Fri Jan 24 09:53:02 CET 2020
McCalpin's STREAM benchmark
Fri Jan 24 09:54:55 CET 2020
Calculating memory load latency
Fri Jan 24 10:28:12 CET 2020
Benchmark run finished....
Remember you can find the results of the benchmark 
under /var/lib/lmbench/results

pmbw[edit | edit source]

Building[edit | edit source]

Building pmbw is straightforward. Please click on Expand to show the box that illustrates the procedure.

armbian@sbcx:~/devel/pmbw$ git clone https://github.com/bingmann/pmbw.git
Cloning into 'pmbw'...
remote: Enumerating objects: 15, done.
remote: Counting objects: 100% (15/15), done.
remote: Compressing objects: 100% (15/15), done.
remote: Total 386 (delta 1), reused 3 (delta 0), pack-reused 371
Receiving objects: 100% (386/386), 369.04 KiB | 1.23 MiB/s, done.
Resolving deltas: 100% (232/232), done.
armbian@sbcx:~/devel/pmbw$ cd pmbw/
armbian@sbcx:~/devel/pmbw/pmbw$ ./configure && make
checking build system type... armv7l-unknown-linux-gnueabihf
checking host system type... armv7l-unknown-linux-gnueabihf
checking target system type... armv7l-unknown-linux-gnueabihf
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... no
checking for mawk... mawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether to enable maintainer-specific portions of Makefiles... no
checking building for Windows... no
checking for g++... g++
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking whether make supports the include directive... yes (GNU style)
checking dependency style of g++... gcc3
checking whether g++ supports -march=x86-64... no
checking for pthread_mutex_init in -lpthread... yes
checking for clock_gettime in -lrt... yes
checking for posix_memalign in -lc... yes
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: executing depfiles commands
g++ -DPACKAGE_NAME=\"pmbw\" -DPACKAGE_TARNAME=\"pmbw\" -DPACKAGE_VERSION=\"0.6.3\" -DPACKAGE_STRING=\"pmbw\ 0.6.3\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"pmbw\" -DVERSION=\"0.6.3\" -DON_WINDOWS=false -DHAVE_POSIX_MEMALIGN=1 -I.    -W -Wall -g -O2 -MT pmbw.o -MD -MP -MF .deps/pmbw.Tpo -c -o pmbw.o pmbw.cc
mv -f .deps/pmbw.Tpo .deps/pmbw.Po
g++ -W -Wall -g -O2   -o pmbw pmbw.o  -lpthread -lrt
g++ -DPACKAGE_NAME=\"pmbw\" -DPACKAGE_TARNAME=\"pmbw\" -DPACKAGE_VERSION=\"0.6.3\" -DPACKAGE_STRING=\"pmbw\ 0.6.3\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"pmbw\" -DVERSION=\"0.6.3\" -DON_WINDOWS=false -DHAVE_POSIX_MEMALIGN=1 -I.    -W -Wall -g -O2 -MT stats2gnuplot.o -MD -MP -MF .deps/stats2gnuplot.Tpo -c -o stats2gnuplot.o stats2gnuplot.cc
In file included from /usr/include/c++/8/vector:69,
                 from stats2gnuplot.cc:34:
/usr/include/c++/8/bits/vector.tcc: In member function 'void std::vector<_Tp, _Alloc>::_M_realloc_insert(std::vector<_Tp, _Alloc>::iterator, _Args&& ...) [with _Args = {const Result&}; _Tp = Result; _Alloc = std::allocator<Result>]':
/usr/include/c++/8/bits/vector.tcc:413:7: note: parameter passing for argument of type 'std::vector<Result>::iterator' {aka '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >'} changed in GCC 7.1
       vector<_Tp, _Alloc>::
       ^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)[[User:U0001|U0001]] ([[User talk:U0001|talk]])
In file included from /usr/include/c++/8/vector:64,
                 from stats2gnuplot.cc:34:
/usr/include/c++/8/bits/stl_vector.h: In function 'bool process_line(const string&)':
/usr/include/c++/8/bits/stl_vector.h:1085:4: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
    _M_realloc_insert(end(), __x);
    ^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~
In file included from /usr/include/c++/8/algorithm:62,
                 from stats2gnuplot.cc:33:
/usr/include/c++/8/bits/stl_algo.h: In function 'void std::__unguarded_linear_insert(_RandomAccessIterator, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<Result*, std::vector<Result> >; _Compare = __gnu_cxx::__ops::_Val_less_iter]':
/usr/include/c++/8/bits/stl_algo.h:1821:5: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
     __unguarded_linear_insert(_RandomAccessIterator __last,
     ^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)[[User:U0001|U0001]] ([[User talk:U0001|talk]]) 09:46, 24 January 2020 (UTC)
/usr/include/c++/8/bits/stl_algo.h: In function 'void std::__insertion_sort(_RandomAccessIterator, _RandomAccessIterator, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<Result*, std::vector<Result> >; _Compare = __gnu_cxx::__ops::_Iter_less_iter]':
/usr/include/c++/8/bits/stl_algo.h:1840:5: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
     __insertion_sort(_RandomAccessIterator __first,
     ^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)
/usr/include/c++/8/bits/stl_algo.h:1840:5: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
In file included from /usr/include/c++/8/bits/stl_algo.h:61,
                 from /usr/include/c++/8/algorithm:62,
                 from stats2gnuplot.cc:33:
/usr/include/c++/8/bits/stl_heap.h: In function 'void std::__adjust_heap(_RandomAccessIterator, _Distance, _Distance, _Tp, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<Result*, std::vector<Result> >; _Distance = int; _Tp = Result; _Compare = __gnu_cxx::__ops::_Iter_less_iter]':
/usr/include/c++/8/bits/stl_heap.h:214:5: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
     __adjust_heap(_RandomAccessIterator __first, _Distance __holeIndex,
     ^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~~
/usr/include/c++/8/bits/stl_heap.h: In function 'void std::__make_heap(_RandomAccessIterator, _RandomAccessIterator, _Compare&) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<Result*, std::vector<Result> >; _Compare = __gnu_cxx::__ops::_Iter_less_iter]':
/usr/include/c++/8/bits/stl_heap.h:326:5: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
     __make_heap(_RandomAccessIterator __first, _RandomAccessIterator __last,
     ^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)
/usr/include/c++/8/bits/stl_heap.h:326:5: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
/usr/include/c++/8/bits/stl_heap.h: In function 'void std::__pop_heap(_RandomAccessIterator, _RandomAccessIterator, _RandomAccessIterator, _Compare&) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<Result*, std::vector<Result> >; _Compare = __gnu_cxx::__ops::_Iter_less_iter]':
/usr/include/c++/8/bits/stl_heap.h:243:5: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
     __pop_heap(_RandomAccessIterator __first, _RandomAccessIterator __last,
     ^09:46, 24 January 2020 (UTC)[[User:U0001|U0001]] ([[User talk:U0001|talk]]) 09:46, 24 January 2020 (UTC)
/usr/include/c++/8/bits/stl_heap.h:243:5: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
/usr/include/c++/8/bits/stl_heap.h:243:5: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
In file included from /usr/include/c++/8/algorithm:62,
                 from stats2gnuplot.cc:33:
/usr/include/c++/8/bits/stl_algo.h: In function 'void std::__introsort_loop(_RandomAccessIterator, _RandomAccessIterator, _Size, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<Result*, std::vector<Result> >; _Size = int; _Compare = __gnu_cxx::__ops::_Iter_less_iter]':
/usr/include/c++/8/bits/stl_algo.h:1940:5: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
     __introsort_loop(_RandomAccessIterator __first,
     ^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)
/usr/include/c++/8/bits/stl_algo.h:1940:5: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
In file included from /usr/include/c++/8/bits/stl_algo.h:61,
                 from /usr/include/c++/8/algorithm:62,
                 from stats2gnuplot.cc:33:
/usr/include/c++/8/bits/stl_heap.h:408:19: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
    std::__pop_heap(__first, __last, __last, __comp);
    09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~~
In file included from /usr/include/c++/8/algorithm:62,
                 from stats2gnuplot.cc:33:
/usr/include/c++/8/bits/stl_algo.h:1954:25: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
    std::__introsort_loop(__cut, __last, __depth_limit, __comp);
    09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~~
/usr/include/c++/8/bits/stl_algo.h:1672:23: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
       std::__make_heap(__first, __middle, __comp);
       09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~
/usr/include/c++/8/bits/stl_algo.h: In function 'int main(int, char**)':
/usr/include/c++/8/bits/stl_algo.h:1968:25: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
    std::__introsort_loop(__first, __last,
    09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~
     std::__lg(__last - __first) * 2,
     09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~~
     __comp);
     09:46, 24 January 2020 (UTC)~~              
/usr/include/c++/8/bits/stl_algo.h:1885:25: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
    std::__insertion_sort(__first, __first + int(_S_threshold), __comp);
    09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)
/usr/include/c++/8/bits/stl_algo.h:1890:23: note: parameter passing for argument of type '__gnu_cxx::__normal_iterator<Result*, std::vector<Result> >' changed in GCC 7.1
  std::__insertion_sort(__first, __last, __comp);
  09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)~^09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)09:46, 24 January 2020 (UTC)[[User:U0001|U0001]] ([[User talk:U0001|talk]]) 09:46, 24 January 2020 (UTC)
mv -f .deps/stats2gnuplot.Tpo .deps/stats2gnuplot.Po
g++ -W -Wall -g -O2   -o stats2gnuplot stats2gnuplot.o  -lpthread -lrt

Running the tests[edit | edit source]

The benchmark was run as follows:

armbian@sbcx:~/devel/pmbw/pmbw$ sudo nice -n -2 ./pmbw -S 0
Running benchmarks with no upper array size limit.
Detected 1695 MiB physical RAM and 4 CPUs. 

Allocating 1024 MiB for testing.
Running nthreads=1 factor=1073741824 areasize=1024 thrsize=1024 testsize=1024 repeats=1048576 testvol=1073741824 testaccess=268435456
run time = 0.424801 -> rerunning test with repeat factor=3791455289
Running nthreads=1 factor=3791455289 areasize=1024 thrsize=1024 testsize=1024 repeats=3702594 testvol=3791456256 testaccess=947864064
...


To generate the charts plotting the results, the following command was issued:

./stats2gnuplot stats.txt | gnuplot