Changes

Jump to: navigation, search
no edit summary
{{InfoBoxTop}}
{{AppliesToSBCX}}
{{AppliesToAxel}}
{{AppliesToAxelEsatta}}
{{AppliesToAxelLite}}
{{AppliesToAxelEsattaAppliesToAXEL Lite TN}}{{AppliesToSBCX}}
{{InfoBoxBottom}}
{{WarningMessage|text=This technical note was validated against specific versions of hardware and software. What is described here may not work with other versions.}}
!Axel Lite
|-
| rowspan="25" |SoC
|SoC
|NXP i.MX6Q
|-
|ARM core frequency
[MHz]
|996
|-
|L1 cache (D)
[kB]
|32
|-
|L1 cache (I)
[kB]
|32
|-
|L2 cache
[MB]
|1
|-
| rowspan="6" |SDRAM
|Frequency
[MHz]
|533(*)
|-
|Bus witdth
|2048
|}
 
 
(*) It is worth remembering that i.MX6DualLite/Solo could achieve better results in terms of memory bandwidth, even though their SDRAM bus frequency is lower (400 MHz). This is due to an errata of the ARM PL310 L2 cache controller. This bug is not present in the i.MX6DualLite/Solo SoC's, which integrate a newer version of the controller.
===Software configuration===
As some Some benchmarks were built natively on the platform under test itself. For the sake of completeness, the version of the GCC compiler is then indicated as well:
<pre class="board-terminal">
armbian@sbcx:~/devel/stream/lmbench$ gcc -v
gcc version 8.3.0 (Debian 8.3.0-6)
</pre>
 
===Benchmarks===
====STREAM====
TBD
====LMbench====
TBD
====pmbw====
TBD
==Overall results==
This section details illustrates the overall results that were achieved by the different benchmarks.
===STREAM===
{| class="wikitable"
|+
Overall results!Function!Best rate[MB/s]!Mito8MEfficiency [%]|-|Copy|1139.8!|14.0
|-
|Scale|ARM frequency[MHz]|7921124.8|13.8
|-
|Add|Frequency[MHz]|16001185.1|14.6
|-
|Triad|Bus witdth[bit]|321214.4|14.9
|}
 
As expected, the efficiency is relatively low. Generally, 32-bit ARM architectures are known to have mediocre performances when it comes to memory bandwidth.
 
Please see [https://www.cs.virginia.edu/stream/ this page] for more details about STREAM benchmark.
 
===LMbench===
TBDFor what regards the memory bandwidth, LMbench provides many results organized in different categories. For the sake of simplicity, the following tables details just a couple of categories. The full results are available for download [http://mirror.dave.eu/axel/SBCX-TN-006/lmbench-axellite-i.MX6Q-996MHz.txt here]. {| class="wikitable"|+Memory read bandwidth!Buffer size!Bandwitdth[MB/s]|-|512B|2861|-|1kB|3047|-|2kB|3065|-|4kB|3077|-|8kB|3081|-|16kB|3072|-|32kB|1309|-| 0.065536 |64kB|902|-|128kB|787|-|256kB|775|-|512kB|749|-|1MB|687|-|2MB|642|-|4MB|629|-|8MB|630|-|16MB|632|-|32MB|631|-|64MB|632|-|128MB|633|-|256MB|634|-|512MB|634|-|1GB|633|} {| class="wikitable"|+Memory write bandwidth!Buffer size!Bandwitdth[MB/s]|-|512B|3724|-|1kB|3848|-|2kB|3902|-|4kB|3940|-|8kB|3958|-|16kB|3957|-|32kB|3964|-|64kB|3967|-|128kB|3967|-|256kB|3956|-|512kB|3947|-|1MB|2097|-|2MB|2154|-|4MB|2114|-|8MB|2082|-|16MB|2084|-|32MB|2085|-|64MB|2093|-|128MB|2086|-|256MB|2089|-|512MB|2087|-|1GB|2088|} The most interesting results to consider are those that refer to buffer sizes exceeding 1MB, which is the size of the L2 cache. Approximately, read bandwidth is 630MB/s (7.8% efficiency), while write bandwidth is 2080 MB/s (25.7% efficiency). These numbers are significantly different that the ones provided by STREAM. This confirms once again that such results are strongly dependent on the implementation of the test used to determine the bandwidth. For more information regarding LMbench, please see [http://lmbench.sourceforge.net/ this page]. 
===pmbw===
TBDAs defined by the author, <code>pmbw</code> is "a set of assembler routines to measure the parallel memory (cache and RAM) bandwidth of modern multi-core machines." It performs a myriad of tests. Luckily, it comes with a handful tool that plots the results—which are stored in a text file—in a series of charts. The complete results and the charts are available at the following links:*http://mirror.dave.eu/axel/SBCX-TN-006/pmbw-stats-AxelLite-i.MX6Q-996MHz.txt*http://mirror.dave.eu/axel/SBCX-TN-006/pmbw-plots-AxelLite-i.MX6Q-996MHz.pdf Generally speaking, the charts exhibit significant declines in the performances when the array size is around the L1 and the L2 cache size. For more details about <code>pmbw</code>, please refer to [https://panthema.net/2013/pmbw/ this page].
==Useful links==
*[https://www.cs.virginia.edu/stream/ STREAM benchmark]
*[http://lmbench.sourceforge.net/ LM Bench benchmark]
*[https://panthema.net/2013/pmbw/ pmbw benchmark]
*Joshua Wyatt Smith and Andrew Hamilton, [http://inspirehep.net/record/1424637/files/1719033_626-630.pdf Parallel benchmarks for ARM processors in the highenergy context]
*T Wrigley, G Harmsen and B Mellado, [http://inspirehep.net/record/1424631/files/1719033_275-280.pdf Memory performance of ARM processors and itsrelevance to High Energy Physics]
====Building====
To build STREAM:
* clone its git repository
*modify the <code>Makefile</code> as shown below
*issue the <code>make</code> command.
 
<pre class="board-terminal">
git clone https://github.com/jeffhammond/STREAM.git
===LMbench===
 ====Running To run this benchmark, the tests====native prebuilt package provided by Debian Buster was used.<pre class="board-terminal" mw-collapsible="" mw-collapsed"="">
armbian@sbcx:~/devel/stream/STREAM$ sudo lmbench-run
=====================================================================
Remember you can find the results of the benchmark
under /var/lib/lmbench/results
</pre>
 
====Results====
TBD inserire link a file scaricabile
 
<pre class="board-terminal mw-collapsible mw-collapsed">
</pre>
===pmbw===
====Building====
Building pmbw is straightforward. Please click on ''Expand'' to show the box that illustrates the procedure.
<pre class="board-terminal mw-collapsible mw-collapsed">
armbian@sbcx:~/devel/pmbw$ git clone https://github.com/bingmann/pmbw.git
Cloning into 'pmbw'...
mv -f .deps/stats2gnuplot.Tpo .deps/stats2gnuplot.Po
g++ -W -Wall -g -O2 -o stats2gnuplot stats2gnuplot.o -lpthread -lrt
</pre>
====Running the tests====
The benchmark was run as follows:<pre class="board-terminal">armbian@sbcx:~/devel/pmbw/pmbw$ sudo nice -n -2 ./pmbw -S 0Running benchmarks with no upper array size limit.Detected 1695 MiB physical RAM and 4 CPUs.  Allocating 1024 MiB for testing.Running nthreads=1 factor=1073741824 areasize=1024 thrsize=ARM core clock 1024 testsize= 800 MHz1024 repeats=1048576 testvol=1073741824 testaccess=268435456run time =0.424801 -> rerunning test with repeat factor=3791455289Running nthreads=1 factor=3791455289 areasize=1024 thrsize=1024 testsize=ARM core clock = 1300 MHz=1024 repeats=3702594 testvol=3791456256 testaccess=947864064...</pre>  To generate the charts plotting the results, the following command was issued:<pre class="board-terminal">./stats2gnuplot stats.txt | gnuplot</pre>
8,154
edits

Navigation menu