Open main menu

DAVE Developer's Wiki β

Changes

no edit summary
{{InfoBoxTop}}
{{AppliesToMachineLearning}}
{{AppliesTo Machine Learning TN}}
{{InfoBoxBottom}}
 
[[File:TBD.png|thumb|center|200px|Work in progress]]
__FORCETOC__
|September 2020
|First public release
|-
|1.1.0
|November 2020
|Added application written in Python (version 2B)
|}
==Introduction==
This Technical Note (TN for short) belongs to the series introduced [[ML-TN-001_-_AI_at_the_edge:_comparison_of_different_embedded_platforms_-_Part_1|here]].
In particular, it illustrates the execution of different versions of an inference application (fruit classifier) that makes use of the model described in [[ML-TN-001_-_AI_at_the_edge:_comparison_of_different_embedded_platforms_-_Part_1#Reference_application_.231:_fruit_classifier|this section]], when executed on the [https://www.nxp.com/products/processors-and-microcontrollers/arm-processors/i-mx-applications-processors/i-mx-8-processors/i-mx-8m-plus-arm-cortex-a53-machine-learning-vision-multimedia-and-industrial-iot:IMX8MPLUS NXP i.MX8M Plus EVK]board. In addition, this document compares the results achieved to the ones produced by the platforms that were considered in the i.MX8M-powered [[:Category:Mito8M|Mito8M SoM]] detailed [[ML-TN-001_001 -_AI_at_the_edgeAI at the edge:_comparison_of_different_embedded_platforms_comparison of different embedded platforms -_Part_1#Articles_in_this_seriesPart 2|previous articles of this serieshere]].
Specifically, the following versions of the application were tested:
* Version 1: This version is the same described in [[ML-TN-001 - AI at the edge: comparison of different embedded platforms - Part 2|this article]]. As such, inference is implemented in software and is applied to images retrieved from files.
* Version 22A: This version is functionally equivalent to the version 1, but it leverages the Neural Processing Unit (NPU) to hardware accelerate the inference.* Version 2B: This is a Python alternative to version 2A.* Version 3: This is like version 32A, but the inference is applied to the frames captured live from an image sensor.
=== Test Bed Testbed ===
The kernel and the root file system of the tested platform were built with the L5.4.24_2.1.0 release of the Yocto Board Support Package (BSP) for i.MX 8 family of devices. They were built with support for [https://www.nxp.com/design/software/development-software/eiq-ml-development-environment:EIQ eIQ]: "a collection of software and development tools for NXP microprocessors and microcontrollers to do inference of neural network models on embedded systems".
The following table details the relevant specs of the test bedtestbed.
{| class="wikitable" style="margin: auto;"
|'''SDRAM memory frequency (LPDDR4)'''
'''[MHz]'''
|TBD2000
|-
|'''[https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt Governor]'''
== Model deployment and inference applications ==
=== Version 1 (C++) ===The C++ application previously used and described [https://wiki.dave.eu/index.php/ML-TN-001_-_AI_at_the_edge:_comparison_of_different_embedded_platforms_-_Part_2#Model_deployment_and_inference_application here] was adapted to work with the new NXP Linux BSP release. Now it uses OpenCV 4.2.0 to pre-process the input image and TensorFlow Lite (TFL) 2.1 as inference engine. It still supports all the 3 TFL models previously tested on the [https[://wiki.dave.eu/index.php?title=Category:Mito8M&action=edit&redlink=1 |Mito8M SoM]]:* 32-bit floating-point model;* half-quantized model (post-training 8-bit quantization of the weights only);
* fully-quantized model (TensorFlow v1 quantization-aware training and 8-bit quantization of the weights and activations).
=== Version 2 2A (C++) ===The version 1 application was then modified to accelerate the inference using the NPU (ML module) of the [https://www.nxp.com/products/processors-and-microcontrollers/arm-processors/i-mx-applications-processors/i-mx-8-processors/i-mx-8m-plus-arm-cortex-a53-machine-learning-vision-multimedia-and-industrial-iot:IMX8MPLUS i.MX8M Plus Soc] SoC. This is possible because "''the TensorFlow Lite library uses the Android NN API driver implementation from the GPU/ML module driver for running inference using the GPU/ML module"''.
Neither the floating-point nor the half-quantized models work in with the NPU, however. Moreover, "''the GPU/ML module driver does not support per-channel quantization yet. Therefore post-training quantization of models with TensorFlow v2 cannot be used if the model is supposed to run on the GPU/ML module (inference on CPU does not have this limitation). TensorFlow v1 quantization-aware training and model conversion is recommended in this case"''. Therefore, only the fully-quantized model was tested with this version of the application.
Therefore, only the fully-quantized model was tested with the version 2 application. === Version 3 2B (Python) ===A new The version 2A application was then ported to Python. This Python version is functionally equivalent to the 2A version, which is written in C++ application was written . The goal of version 2B is to make a comparison in terms of performance with respect to version 2A. Generally, Python has the advantage of being easier to work with, but at the cost of being slower to apply execute. However, in this case, '''regarding the inference computation''', the performance is '''pretty much the same between the two versions'''. This is because the Python API's act only as a wrapper to the frames captured from an image sensor core TensorFlow library written in C++ (and other "fast" languages). As detailed [[https:#Results comparison|in this section]], the overall time is significantly different because it takes into account the pre//cdnpost-processing computations as well.sparkfunThese computations don't leverage the NPU accelerator and thus are more affected by the slower Python code.com/datasheets/Sensors/LightImaging/OV5640_datasheetNevertheless, in case the model used is much more complex as it usually occurs in real-world cases, this overhead could be still tolerable because it might be neglectable.pdf OV5640]) instead In conclusion, the use of Python has not to be discarded a priori because of images retrieved from filesperformance concerns. Like version 2, inference run Depending on NPUthe specific use case, so only the fully-quantized model was tested with the version 3 applicationit can be a valid option to consider.
Note that with this === Version 3 (C++) ===A new C++ application was written to apply the inference to the frames captured from the image sensor([https://cdn.sparkfun.com/datasheets/Sensors/LightImaging/OV5640_datasheet.pdf OV5640]) of a [https://www.nxp.com/part/MINISASTOCSI#/ camera module], instead of images retrieved from files. This version uses OpenCV 4.2.0 to control the camera and to pre-process the frames. Like version 2, inference runs on NPU, so only the frame rate is capped at 30 fpsfully-quantized model was tested.
== Running the applications ==
* All the files required to run the test—the executable, the image files, etc.—are stored on a [https://www.jamescoyle.net/how-to/943-create-a-ram-disk-in-linux tmpfs RAM disk] in order to make file system/storage medium overhead neglectable.
=== <big>Version 1</big> (no NPU acceleration) ===The following sections detail the execution of the first version of the classifier on the embedded platform. The number of threads was also tweaked in order to test different configurations. During the execution, the well-know known <code>[https://en.wikipedia.org/wiki/Htop htop]</code> utility was used to monitor the system. This tool is very convenient to get some useful information such as cores allocation, processor load, and number of running threads. ====Floating-point model====The following dump refers to the execution of the application when using the floating-point model.
==== <big>Floating-point model</big> ====
<pre class="board-terminal">
root@imx8mpevk:/mnt/ramdisk/image_classifier_eIQ_plus# build/image_classifier_cv 2 my_converted_model.tflite labels.txt testdata/red-apple1.jpg
The following screenshots show the system status while executing the application with different values of the thread parameter.
[IMAGE "[File:ML-TN-001 4 float default.png|thumb|center|600px|Thread parameter unspecified"]]
[IMAGE "Thread parameter set to 1"]
[IMAGE "[File:ML-TN-001 4 float 1thread.png|thumb|center|600px|Thread parameter set to 1]]  [[File:ML-TN-001 4 float 2threads.png|thumb|center|600px|Thread parameter set to 2"]] ====Half-quantized model ====The following dump refers to the execution of the application in combination with the half-quantized model.
==== <big>Half-quantized model</big> ====
<pre class="board-terminal">
root@imx8mpevk:/mnt/ramdisk/image_classifier_eIQ_plus# build/image_classifier_cv 2 my_fruits_model_1.12_quant.tflite labels.txt testdata/red-apple1.jpg
7.44711e-18 Banana
2.47029e-18 Hand
</pre>The following screenshot shows the system status while executing the application. In this case, the thread parameter was unspecified. 
The following screenshot shows the system status during the execution. In this case, the thread parameter was unspecified.  [[IMAGE "File:ML-TN-001 4 weightsquant default.png|thumb|center|600px|Thread parameter unspecified"]] ====Fully-quantized model====The following dump refers to the execution of the application when using the fully-quantized model.
==== <big>Fully-quantized model</big> ====
<pre class="board-terminal">
root@imx8mpevk:/mnt/ramdisk/image_classifier_eIQ_plus# build/image_classifier_cv 3 my_fruits_model_qatlegacy.tflite labels.txt testdata/red-apple1.jpg
The following screenshots show the system status while executing the application with different values of the thread parameter.
[IMAGE "[File:ML-TN-001 4 fullquant default.png|thumb|center|600px|Thread parameter unspecified"]]  [[File:ML-TN-001 4 fullquant 4threads.png|thumb|center|600px|Thread parameter set to 4]]
[IMAGE "Thread parameter set === Version 2A (C++) ===The execution of the version 2A of the classifier on the embedded platform is detailed below. During the execution, <code>htop</code> was used to 4"]monitor the system. Note that ''the first execution of model inference using the NN API always takes many times longer, because of model graph initialization needed by the GPU/ML module'', as stated by NXP documentation. Therefore, the time needed for the first inference (warm up) is measured separately.
=== <big>Version 2</big> ===The execution of the second version of the classifier on the embedded platform is detailed below. During the execution, <code>htop</code> was used to monitor the system. Note that "the first execution of model inference using the NN API always takes many times longer, because of model graph initialization needed by the GPU/ML module". Therefore, the time needed for the first inference (warm up) is measured separately.<pre class="board-terminal">
root@imx8mpevk:/mnt/ramdisk/image_classifier_eIQ_plus# build/image_classifier_cv 3 my_fruits_model_qatlegacy.tflite labels.txt testdata/red-apple1.jpg
INFO: Created TensorFlow Lite delegate for NNAPI.
Top results:
1 Red Apple
</pre>The following screenshot shows the system status while executing the application.
[IMAGE]
==== <big>Profiling model execution on NPU</big> ====The following block shows the profiler log. "The log captures detailed information of the execution clock cycles and DDR data transmission in each layer". Note that the time needed for inference is longer than usual while the profiler overhead is added.<pre class="board-terminal">root@imx8mpevk:/mnt/ramdisk/image_classifier_eIQ_plus# build/image_classifier_cv 3 my_fruits_model_qatlegacy.tflite labels.txt testdata/red-apple1.jpg INFO: Created TensorFlow Lite delegate for NNAPI.#productname=VIPNano-D+I, pid=0x9fCreated VX Thread: 0xa3ee5fb0Applied NNAPI delegateprev_ptrs = 0xffffa369c040Can't support one shaderCoreCount!---------------------------Begin VerifyTiling -------------------------AXI-SRAM = 0 Bytes VIP-SRAM = 260096 Bytes SWTILING_PHASE_FEATURES[1, 1, 1] 0 TP [( 3 224 224 1, 150528, 0x0xaaaab1874580(0x0xaaaab1874580, 0x(nil)) File:ML-> 224 224 3 1, 150528, 0x0xaaaab187db10(0x0xaaaab187db10, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] C[ 1] 1 NN [( 224 224 3 1, 150528, 0x0xaaaab187db10(0x0xaaaab187db10, 0x(nil)) TN-> 111 111 32 1, 394272, 0x0xaaaab1881a90(0x0xaaaab1881a90, 0x(nil))) k(3 3 3, 1152) pad(0 0) pool(2 2, 2 2)] P[ 0] C[ 2] 2 NN [( 111 111 32 1, 394272, 0x0xaaaab1881a90(0x0xaaaab1881a90, 0x(nil)) -> 109 109 32 1, 380192, 0x0xaaaab1884270(0x0xaaaab1884270, 0x(nil))) k(3 3 32, 9984) pad(0 0) pool(0 0, 1 1)] P[ 1] C[ 3] 3 TP [( 109 109 32 1, 380192, 0x0xaaaab1884270(0x0xaaaab1884270, 0x(nil)) -> 54 54 32 1, 93312, 0x0xaaaab1887410(0x0xaaaab1887410, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(2 2, 2 2)] P[ 2] C[ 001 4acceleration.png|thumb|center|600px] 4 NN [( 54 54 32 1, 93312, 0x0xaaaab1887410(0x0xaaaab1887410, 0x(nil)) -> 26 26 64 1, 43264, 0x0xaaaab188cd90(0x0xaaaab188cd90, 0x(nil))) k(3 3 32, 19968) pad(0 0) pool(2 2, 2 2)] P[ 3] C[ 5] 5 NN [( 26 26 64 1, 43264, 0x0xaaaab188cd90(0x0xaaaab188cd90, 0x(nil)) -> 12 12 128 1, 18432, 0x0xaaaab1892710(0x0xaaaab1892710, 0x(nil))) k(3 3 64, 79616) pad(0 0) pool(2 2, 2 2)] P[ 4] C[ 6] 6 TP [( 12 12 128 1, 18432, 0x0xaaaab1892710(0x0xaaaab1892710, 0x(nil)) -> 128 12 12 1, 18432, 0x0xaaaab1894ef0(0x0xaaaab1894ef0, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] P[ 5] C[ 7] 7 TP [(18432 1 1 1, 18432, 0x0xaaaab1894ef0(0x0xaaaab1894ef0, 0x(nil)) -> 256 1 1 1, 256, 0x0xaaaab18965b0(0x0xaaaab18965b0, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] P[ 6] C[ 8] 8 TP [( 256 1 1 1, 256, 0x0xaaaab18965b0(0x0xaaaab18965b0, 0x(nil)) -> 6 1 1 1, 6, 0x0xaaaab1897c10(0x0xaaaab1897c10, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] P[ 7] C[ 9] 9 SH [( 6 1 1 1, 6, 0x0xaaaab1897c10(0x0xaaaab1897c10, 0x(nil)) -> 6 1 1 1, 6, 0x0xaaaab187a200(0x0xaaaab187a200, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] P[ 8]
Detected Segments
AB_VS (0 - 1)
TL_VS (1 - 2)
AB_VS (3 - 8)
======================== Block [0 - 2] ==============================
0 TP DD -> VS [( 150528, 150528), IC( 0), KC( 0)]
1 NN VS -> VS [( 150528, 96000), IC( 0), KC( 1408)]
2 NN VS -> DD [( 96000, 0), IC( 0), KC( 11648)]
------------------------------------------------------------------
Segment AB (0 - 0)
------------------------------------------------------------------
Segment Tiling (1 - 2)
[VS 24( 0, 24)(224) ->VS 11( 0, 11)( 27) P( 0) F(1)] [VS 11( 0, 11)( 27) ->DD 9( 0, 9)( 0) P( 0) F(0)]
[VS 52( 22, 74)(224) ->VS 25( 11, 36)( 27) P( 0) F(1)] [VS 27( 9, 36)( 27) ->DD 25( 9, 34)( 0) P( 0) F(0)]
[VS 52( 72,124)(224) ->VS 25( 36, 61)( 27) P( 0) F(1)] [VS 27( 34, 61)( 27) ->DD 25( 34, 59)( 0) P( 0) F(0)]
[VS 52(122,174)(224) ->VS 25( 61, 86)( 27) P( 0) F(1)] [VS 27( 59, 86)( 27) ->DD 25( 59, 84)( 0) P( 0) F(0)]
[VS 52(172,224)(224) ->VS 25( 86,111)( 27) P( 0) F(1)] [VS 27( 84,111)( 27) ->DD 25( 84,109)( 0) P( 0) F(1)]
It is worth to remember that, when using the NPU accelerator, it is not possible to select the number of threads.==== Profiling model execution on NPU ====For the sake of completeness, the eIQ profiler log is provided as well in the following box. According to NXP documentation, ''The log captures detailed information of the execution clock cycles and DDR data transmission in each layer''. Note that the time needed for inference is longer than usual because of profiler overhead. The input command and the messages printed from the application are in bold to separate them from the log. '''root@imx8mpevk:/mnt/ramdisk/image_classifier_eIQ_plus# build/image_classifier_cv 3 my_fruits_model_qatlegacy.tflite labels.txt testdata/red-apple1.jpg ''' '''INFO: Created TensorFlow Lite delegate for NNAPI.''' #productname=VIPNano-D+I, pid=0x9f Created VX Thread: 0xa3ee5fb0 '''Applied NNAPI delegate''' prev_ptrs = 0xffffa369c040 Can't support one shaderCoreCount! ---------------------------Begin VerifyTiling ------------------------- AXI-SRAM = 0 Bytes VIP-SRAM = 260096 Bytes SWTILING_PHASE_FEATURES[1, 1, 1] 0 TP [( 3 224 224 1, 150528, 0x0xaaaab1874580(0x0xaaaab1874580, 0x(nil)) -> 224 224 3 1, 150528, 0x0xaaaab187db10(0x0xaaaab187db10, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] C[ 1] 1 NN [( 224 224 3 1, 150528, 0x0xaaaab187db10(0x0xaaaab187db10, 0x(nil)) -> 111 111 32 1, 394272, 0x0xaaaab1881a90(0x0xaaaab1881a90, 0x(nil))) k(3 3 3, 1152) pad(0 0) pool(2 2, 2 2)] P[ 0] C[ 2] 2 NN [( 111 111 32 1, 394272, 0x0xaaaab1881a90(0x0xaaaab1881a90, 0x(nil)) -> 109 109 32 1, 380192, 0x0xaaaab1884270(0x0xaaaab1884270, 0x(nil))) k(3 3 32, 9984) pad(0 0) pool(0 0, 1 1)] P[ 1] C[ 3] 3 TP [( 109 109 32 1, 380192, 0x0xaaaab1884270(0x0xaaaab1884270, 0x(nil)) -> 54 54 32 1, 93312, 0x0xaaaab1887410(0x0xaaaab1887410, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(2 2, 2 2)] P[ 2] C[ 4] 4 NN [( 54 54 32 1, 93312, 0x0xaaaab1887410(0x0xaaaab1887410, 0x(nil)) -> 26 26 64 1, 43264, 0x0xaaaab188cd90(0x0xaaaab188cd90, 0x(nil))) k(3 3 32, 19968) pad(0 0) pool(2 2, 2 2)] P[ 3] C[ 5] 5 NN [( 26 26 64 1, 43264, 0x0xaaaab188cd90(0x0xaaaab188cd90, 0x(nil)) -> 12 12 128 1, 18432, 0x0xaaaab1892710(0x0xaaaab1892710, 0x(nil))) k(3 3 64, 79616) pad(0 0) pool(2 2, 2 2)] P[ 4] C[ 6] 6 TP [( 12 12 128 1, 18432, 0x0xaaaab1892710(0x0xaaaab1892710, 0x(nil)) -> 128 12 12 1, 18432, 0x0xaaaab1894ef0(0x0xaaaab1894ef0, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] P[ 5] C[ 7] 7 TP [(18432 1 1 1, 18432, 0x0xaaaab1894ef0(0x0xaaaab1894ef0, 0x(nil)) -> 256 1 1 1, 256, 0x0xaaaab18965b0(0x0xaaaab18965b0, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] P[ 6] C[ 8] 8 TP [( 256 1 1 1, 256, 0x0xaaaab18965b0(0x0xaaaab18965b0, 0x(nil)) -> 6 1 1 1, 6, 0x0xaaaab1897c10(0x0xaaaab1897c10, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] P[ 7] C[ 9] 9 SH [( 6 1 1 1, 6, 0x0xaaaab1897c10(0x0xaaaab1897c10, 0x(nil)) -> 6 1 1 1, 6, 0x0xaaaab187a200(0x0xaaaab187a200, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] P[ 8] Detected Segments AB_VS (0 - 1) TL_VS (1 - 2) AB_VS (3 - 8) ======================== Block [0 - 2] ============================== 0 TP DD -> VS [( 150528, 150528), IC( 0), KC( 0)] 1 NN VS -> VS [( 150528, 96000), IC( 0), KC( 1408)] 2 NN VS -> DD [( 96000, 0), IC( 0), KC( 11648)] ------------------------------------------------------------------ Segment AB (0 - 0) ------------------------------------------------------------------ Segment Tiling (1 - 2) [VS 24( 0, 24)(224) ->VS 11( 0, 11)( 27) P( 0) F(1)] [VS 11( 0, 11)( 27) ->DD 9( 0, 9)( 0) P( 0) F(0)] [VS 52( 22, 74)(224) ->VS 25( 11, 36)( 27) P( 0) F(1)] [VS 27( 9, 36)( 27) ->DD 25( 9, 34)( 0) P( 0) F(0)] [VS 52( 72,124)(224) ->VS 25( 36, 61)( 27) P( 0) F(1)] [VS 27( 34, 61)( 27) ->DD 25( 34, 59)( 0) P( 0) F(0)] [VS 52(122,174)(224) ->VS 25( 61, 86)( 27) P( 0) F(1)] [VS 27( 59, 86)( 27) ->DD 25( 59, 84)( 0) P( 0) F(0)] [VS 52(172,224)(224) ->VS 25( 86,111)( 27) P( 0) F(1)] [VS 27( 84,111)( 27) ->DD 25( 84,109)( 0) P( 0) F(1)] AXISRAM: Estimate used 0 0.000000% VIPSRAM: Estimate used 107040 41.154037% M = 25 AXISRAM: Peak used 0 0.000000% VIPSRAM: Peak used 259584 99.803146% ======================== Block [0 - 2] SUCCEED ========================= ======================== Block [3 - 8] ============================== 3 TP DD -> VS [( 0, 93312), IC( 0), KC( 0)] 4 NN VS -> VS [( 93312, 43264), IC( 0), KC( 20608)] 5 NN VS -> VS [( 43264, 18432), IC( 0), KC( 79744)] 6 TP VS -> VS [( 18432, 18432), IC( 0), KC( 0)] 7 TP VS -> VS [( 18432, 256), IC( 0), KC( 0)] 8 TP VS -> DD [( 256, 0), IC( 0), KC( 0)] ------------------------------------------------------------------ Segment AB (3 - 8) AXISRAM: Peak used 0 0.000000% VIPSRAM: Peak used 157184 60.433071% ======================== Block [3 - 8] SUCCEED ========================= F(1) F(0) F(1) F(0) F(1) F(0) F(1) F(0) F(1) F(1) id IN [ x y w h ] OUT [ x y w h ] (tx, ty, kpc) (ic, kc, kc/ks, ks/eks, kernel_type) 0 TP DD 0x(nil) [ 0 0 3 224] -> VS 0x0x400800 [ 0 0 224 224] ( 0, 0, 0) ( 0, 0, 0.000000%, 0.000000%, NONE) 1 NN VS 0x0x400800 [ 0 0 224 24] -> VS 0x0x425400 [ 0 0 111 11] ( 32, 2, 6) ( 0, 1408, 100.000000%, 122.222221%, DD) 2 NN VS 0x0x425400 [ 0 0 111 11] -> DD 0x(nil) [ 0 0 109 9] ( 55, 2, 6) ( 0, 11648, 100.000000%, 116.666664%, DD) 1 NN VS 0x0x401b40 [ 0 22 224 52] -> VS 0x0x4258c5 [ 0 11 111 25] ( 56, 2, 6) ( 0, 1408, 100.000000%, 122.222221%, DD) 2 NN VS 0x0x4257e7 [ 0 9 111 27] -> DD 0x0x3d5 [ 0 9 109 25] ( 55, 2, 6) ( 0, 11648, 100.000000%, 116.666664%, DD) 1 NN VS 0x0x404700 [ 0 72 224 52] -> VS 0x0x42639c [ 0 36 111 25] ( 56, 2, 6) ( 0, 1408, 100.000000%, 122.222221%, DD) 2 NN VS 0x0x4262be [ 0 34 111 27] -> DD 0x0xe7a [ 0 34 109 25] ( 55, 2, 6) ( 0, 11648, 100.000000%, 116.666664%, DD) 1 NN VS 0x0x4072c0 [ 0 122 224 52] -> VS 0x0x426e73 [ 0 61 111 25] ( 56, 2, 6) ( 0, 1408, 100.000000%, 122.222221%, DD) 2 NN VS 0x0x426d95 [ 0 59 111 27] -> DD 0x0x191f [ 0 59 109 25] ( 55, 2, 6) ( 0, 11648, 100.000000%, 116.666664%, DD) 1 NN VS 0x0x409e80 [ 0 172 224 52] -> VS 0x0x42794a [ 0 86 111 25] ( 56, 2, 6) ( 0, 1408, 100.000000%, 122.222221%, DD) 2 NN VS 0x0x42786c [ 0 84 111 27] -> DD 0x0x23c4 [ 0 84 109 25] ( 55, 2, 6) ( 0, 11648, 100.000000%, 116.666664%, DD) 3 TP DD 0x(nil) [ 0 0 109 109] -> VS 0x0x400800 [ 0 0 54 54] ( 0, 0, 0) ( 0, 0, 0.000000%, 0.000000%, NONE) 4 NN VS 0x0x400800 [ 0 0 54 54] -> VS 0x0x41c500 [ 0 0 26 26] ( 52, 6, 4) ( 0, 20608, 100.000000%, 103.205132%, DD) 5 NN VS 0x0x41c500 [ 0 0 26 26] -> VS 0x0x400800 [ 0 0 12 12] ( 24, 16, 5) ( 0, 79744, 100.000000%, 100.160774%, DD) 6 TP VS 0x0x400800 [ 0 0 12 12] -> VS 0x0x422600 [ 0 0 128 12] ( 0, 0, 0) ( 0, 0, 0.000000%, 0.000000%, NONE) 7 TP VS 0x0x422600 [ 0 0 18432 1] -> VS 0x0x400800 [ 0 0 256 1] ( 0, 0, 0) ( 0, 0, 0.000000%, 0.000000%, NONE) 8 TP VS 0x0x400800 [ 0 0 256 1] -> DD 0x(nil) [ 0 0 6 1] ( 0, 0, 0) ( 0, 0, 0.000000%, 0.000000%, NONE) 9 SH DD 0x(nil) [ 0 0 0 0] -> DD 0x(nil) [ 0 0 0 0] ( 0, 0, 0) ( 0, 0, 0.000000%, 0.000000%, NONE) PreLoadWeightBiases = 0 nan% ---------------------------End VerifyTiling ------------------------- ArchModelVersion: ARCHCTS@230121 SWTilingVersion: ARCHCTS@230121 ProfileMode: 0 NumNNCores:6 NumNNCoresInt8: 6 NumNNCoresInt16: 6 NumNNCoresFloat16: 0 NumTPCores: 3 NumTPLiteCores: 0 MadPerCore: 64 VIP7Version: 1 InBuffDepth: 9 AccumBufferDepth: 32 DPAmount: 3 XYDPX: 0 XYDPY: 0 ZDP: 3 AXISRAMSize: 0 VIPSRAMSize: 262144 L2CacheWidth: 32 USCCacheSize: 8 BrickMode: 0 SWTiling: 1 SmallBatchEnable: 0 SWTilingPhase1: 1 TPWithFCLayer: 1 TPCircularBufferSupport: 1 KERNEL_HEADER_NOT_CACHED_FIX: 0 NNFCNonPruneAccel: 0 Conv1x1HalfPerformance: 0 DDRLatency: 0 CacheLineModeDisabled: 0 PER_3D_TILE_BUBBLE_FIX: 1 SWConv1x1To1x2: 0 TP_LOCALIZATION_REORDER_DISABLED_Fix: 1 USCCacheControllers: 1 AsyncCopyPerfFix: 1 ZDP3NoCompressFix: 1 ZXDP3KernelReadConflictFix: 1 CoefDecodePerf: 2 VectorPrune: 1 EnableCacheDataFromSRAM: 1 IMAGE_PARTIAL_CACHE_FIX: 0 DDRReadBandWidthLimit: 3.80 DDRWriteBandWidthLimit: 3.80 DDRTotalBandWidthLimit: 3.80 AXISRAMReadBandWidthLimit: 16.00 AXISRAMWriteBandWidthLimit: 16.00 AXISRAMTotalBandWidthLimit: 16.00 AXIBusReadBandWidthLimit: 16.00 AXIBusWriteBandWidthLimit: 16.00 AXIBusTotalBandWidthLimit: 32.00 HANDLE_ABBUFFER: 1 HANDLE_SUBIMAGE: 1 HANDLE_BRANCH: 1 FreqInMHZ: 1000 AxiClockFreqInMHZ: 1000 OutstandingTransfer: 64 InternalWriteBWLimit: 16.00 LanesPerConv: 64 MaxTileSize: 64 AxiSramSlowedDownByAddr: 1 SLOW_NN_REQ_ARBITRATION_FIX: 0 FLOAT_XYDP_X: 1 FLOAT_XYDP_Y: 1 FLOAT_ZDP: 1 SINGLE_PORT_ACC_BUFFER: 1 MAX_ZRL_BIT_WIDTH: 8 MAX_SOC_OUT_STANDING_NUMBER: 32 SWTilingPhase3: 1 AXI_SRAM_ONLY_SW_TILING: 0 VIP_CORE_COUNT: 1 DEPTH_WISE_SUPPORT: 1 NN_WRITE_WITHOUT_USC: 0 EQUIVALENT_VIP_SRAM_WIDTH_IN_BYTE: 32 IMAGE_NOT_PACKED_IN_SRAM: 0 NN_COEF_COMPRESSION_ENHANCEMENT: 1 TP_COMPRESSION_ENHANCEMENT: 1 COEF_DELTA_CORD_OVER_FLOW_ZRL_8BIT_FIX: 1 NumShaderCores: 1 KERNEL_PER_CORE_LESS_THAN_THIRD_COEF_BUFF_DEPTH_FIX: 0 LOW_EFFICIENCY_OF_ID_WRITE_IMGBUF_FIX: 0 DR_JD_Diff_For_Cacheline_Mode_FIX: 1 CONVOUT_FIFO_DEPTH_FIX: 1 =========================== **********Show Perf******** =========================== layer_id:0 layer_name:TensorTranspose operation_id:0 operation_name:VXNNE_OPERATOR_TENSOR_TRANS operation_target:VXNNE_OPERATION_TARGET_TP abs_op_id:0 upstream_layer_num:0 upstream_opertaion_num:0 downstream_layer_num:1 downstream_opertaion_num:1 0) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:4 downstream_layer_name:ConvolutionReluPoolingLayer2) InImageX: 3 InImageY: 224 InImageZ: 224 OutImageX: 224 (sub: 224) OutImageY: 224 (sub: 224) OutImageZ: 3 (sub: 3) KernelX: 1 KernelY: 1 KernelZ: 224 PoolingSize: 1 PoolingStride: 1 InputDataSize: 8 OutputDataSize: 8 FP16: 0 archModel_kernelSize: 0 kernelSize: 0 SrcBuf: DDR DstBuf: VIP_SRAM KernelBuf: DDR KernelCacheMode=VXNNE_SRAM_CACHE_MODE_NONE ImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONE xOffset: 0, yOffset: 0 kernelDDRReadBW: 0 InImageDDrReadBW: 150528 ReadBW: 150656 WriteBW: 0 CycleCount: 77927 =========================== **********Show Perf******** =========================== layer_id:4 layer_name:ConvolutionReluPoolingLayer2 operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NN abs_op_id:1 upstream_layer_num:1 upstream_opertaion_num:1 0) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_TENSOR_TRANS (upstream_layer_id:0 upstream_layer_name:TensorTranspose) downstream_layer_num:1 downstream_opertaion_num:1 0) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:5 downstream_layer_name:ConvolutionReluPoolingLayer2) NumUsedNNCores: 6 ConvOutFIFODepth: 168 OrigInImageX: 224 OrigInImageY: 224 OrigInImageZ: 3 NNOutImageX: 222 (sub: 222) NNOutImageY: 22 (sub: 22) NNOutImageZ: 32 (sub: 32) FinalOutImageX: 111 FinalOutImageY: 111 FinalOutImageZ: 32 KernelX: 3 KernelY: 3 KernelZ: 3 PoolingSize: 2 PoolingStride: 2 InputDataSize: 8 OutputDataSize: 8 FP16: 0 archModel_kernelSize: 1352 kernelSize: 1408 SrcBuf: VIP_SRAM DstBuf: VIP_SRAM KernelBuf: DDR KernelCacheMode=VXNNE_SRAM_CACHE_MODE_FULL_CACHE ImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONE xOffset: 0, yOffset: 0 coefNonZeroRatio: 1.000000000000000 coefCompression: 1.354838709677419 imageCompression: 1.000000000000000 imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408 coefCompression_llu: 4608780470280697261 imageCompression_llu: 4607182418800017408 imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 32 OutImageTileYSize: 2 KernelsPerCore: 6 kernelDDRReadBW: 6099 InImageDDrReadBW: 0 ReadBW: 6227 WriteBW: 0 CycleCount: 12213 =========================== **********Show Perf******** =========================== layer_id:5 layer_name:ConvolutionReluPoolingLayer2 operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NN abs_op_id:2 upstream_layer_num:1 upstream_opertaion_num:1 0) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:4 upstream_layer_name:ConvolutionReluPoolingLayer2) downstream_layer_num:1 downstream_opertaion_num:1 0) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_POOLING (downstream_layer_id:1 downstream_layer_name:PoolingLayer2) NumUsedNNCores: 6 ConvOutFIFODepth: 168 OrigInImageX: 111 OrigInImageY: 111 OrigInImageZ: 32 NNOutImageX: 109 (sub: 109) NNOutImageY: 9 (sub: 9) NNOutImageZ: 32 (sub: 32) FinalOutImageX: 109 FinalOutImageY: 109 FinalOutImageZ: 32 KernelX: 3 KernelY: 3 KernelZ: 32 PoolingSize: 1 PoolingStride: 1 InputDataSize: 8 OutputDataSize: 8 FP16: 0 archModel_kernelSize: 11113 kernelSize: 11648 SrcBuf: VIP_SRAM DstBuf: DDR KernelBuf: DDR KernelCacheMode=VXNNE_SRAM_CACHE_MODE_FULL_CACHE ImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONE xOffset: 0, yOffset: 0 coefNonZeroRatio: 1.000000000000000 coefCompression: 0.965753424657534 imageCompression: 1.000000000000000 imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408 coefCompression_llu: 4606873953072115319 imageCompression_llu: 4607182418800017408 imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 55 OutImageTileYSize: 2 KernelsPerCore: 6 kernelDDRReadBW: 9540 InImageDDrReadBW: 0 ReadBW: 9668 WriteBW: 37746 CycleCount: 14667 =========================== **********Show Perf******** =========================== layer_id:4 layer_name:ConvolutionReluPoolingLayer2 operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NN abs_op_id:1 upstream_layer_num:1 upstream_opertaion_num:1 0) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_TENSOR_TRANS (upstream_layer_id:0 upstream_layer_name:TensorTranspose) downstream_layer_num:1 downstream_opertaion_num:1 0) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:5 downstream_layer_name:ConvolutionReluPoolingLayer2) NumUsedNNCores: 6 ConvOutFIFODepth: 168 OrigInImageX: 224 OrigInImageY: 224 OrigInImageZ: 3 NNOutImageX: 222 (sub: 222) NNOutImageY: 50 (sub: 50) NNOutImageZ: 32 (sub: 32) FinalOutImageX: 111 FinalOutImageY: 111 FinalOutImageZ: 32 KernelX: 3 KernelY: 3 KernelZ: 3 PoolingSize: 2 PoolingStride: 2 InputDataSize: 8 OutputDataSize: 8 FP16: 0 archModel_kernelSize: 1352 kernelSize: 1408 SrcBuf: VIP_SRAM DstBuf: VIP_SRAM KernelBuf: VIP_SRAM KernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHE ImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONE xOffset: 0, yOffset: 0 coefNonZeroRatio: 1.000000000000000 coefCompression: 1.354838709677419 imageCompression: 1.000000000000000 imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408 coefCompression_llu: 4608780470280697261 imageCompression_llu: 4607182418800017408 imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 56 OutImageTileYSize: 2 KernelsPerCore: 6 kernelDDRReadBW: 7571 InImageDDrReadBW: 0 ReadBW: 7699 WriteBW: 0 CycleCount: 24949 =========================== **********Show Perf******** =========================== layer_id:5 layer_name:ConvolutionReluPoolingLayer2 operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NN abs_op_id:2 upstream_layer_num:1 upstream_opertaion_num:1 0) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:4 upstream_layer_name:ConvolutionReluPoolingLayer2) downstream_layer_num:1 downstream_opertaion_num:1 0) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_POOLING (downstream_layer_id:1 downstream_layer_name:PoolingLayer2) NumUsedNNCores: 6 ConvOutFIFODepth: 168 OrigInImageX: 111 OrigInImageY: 111 OrigInImageZ: 32 NNOutImageX: 109 (sub: 109) NNOutImageY: 25 (sub: 25) NNOutImageZ: 32 (sub: 32) FinalOutImageX: 109 FinalOutImageY: 109 FinalOutImageZ: 32 KernelX: 3 KernelY: 3 KernelZ: 32 PoolingSize: 1 PoolingStride: 1 InputDataSize: 8 OutputDataSize: 8 FP16: 0 archModel_kernelSize: 11113 kernelSize: 11648 SrcBuf: VIP_SRAM DstBuf: DDR KernelBuf: VIP_SRAM KernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHE ImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONE xOffset: 0, yOffset: 0 coefNonZeroRatio: 1.000000000000000 coefCompression: 0.965753424657534 imageCompression: 1.000000000000000 imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408 coefCompression_llu: 4606873953072115319 imageCompression_llu: 4607182418800017408 imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 55 OutImageTileYSize: 2 KernelsPerCore: 6 kernelDDRReadBW: 10564 InImageDDrReadBW: 0 ReadBW: 10692 WriteBW: 104716 CycleCount: 32561 =========================== **********Show Perf******** =========================== layer_id:4 layer_name:ConvolutionReluPoolingLayer2 operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NN abs_op_id:1 upstream_layer_num:1 upstream_opertaion_num:1 0) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_TENSOR_TRANS (upstream_layer_id:0 upstream_layer_name:TensorTranspose) downstream_layer_num:1 downstream_opertaion_num:1 0) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:5 downstream_layer_name:ConvolutionReluPoolingLayer2) NumUsedNNCores: 6 ConvOutFIFODepth: 168 OrigInImageX: 224 OrigInImageY: 224 OrigInImageZ: 3 NNOutImageX: 222 (sub: 222) NNOutImageY: 50 (sub: 50) NNOutImageZ: 32 (sub: 32) FinalOutImageX: 111 FinalOutImageY: 111 FinalOutImageZ: 32 KernelX: 3 KernelY: 3 KernelZ: 3 PoolingSize: 2 PoolingStride: 2 InputDataSize: 8 OutputDataSize: 8 FP16: 0 archModel_kernelSize: 1352 kernelSize: 1408 SrcBuf: VIP_SRAM DstBuf: VIP_SRAM KernelBuf: VIP_SRAM KernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHE ImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONE xOffset: 0, yOffset: 0 coefNonZeroRatio: 1.000000000000000 coefCompression: 1.354838709677419 imageCompression: 1.000000000000000 imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408 coefCompression_llu: 4608780470280697261 imageCompression_llu: 4607182418800017408 imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 56 OutImageTileYSize: 2 KernelsPerCore: 6 kernelDDRReadBW: 7571 InImageDDrReadBW: 0 ReadBW: 7699 WriteBW: 0 CycleCount: 24949 =========================== **********Show Perf******** =========================== layer_id:5 layer_name:ConvolutionReluPoolingLayer2 operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NN abs_op_id:2 upstream_layer_num:1 upstream_opertaion_num:1 0) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:4 upstream_layer_name:ConvolutionReluPoolingLayer2) downstream_layer_num:1 downstream_opertaion_num:1 0) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_POOLING (downstream_layer_id:1 downstream_layer_name:PoolingLayer2) NumUsedNNCores: 6 ConvOutFIFODepth: 168 OrigInImageX: 111 OrigInImageY: 111 OrigInImageZ: 32 NNOutImageX: 109 (sub: 109) NNOutImageY: 25 (sub: 25) NNOutImageZ: 32 (sub: 32) FinalOutImageX: 109 FinalOutImageY: 109 FinalOutImageZ: 32 KernelX: 3 KernelY: 3 KernelZ: 32 PoolingSize: 1 PoolingStride: 1 InputDataSize: 8 OutputDataSize: 8 FP16: 0 archModel_kernelSize: 11113 kernelSize: 11648 SrcBuf: VIP_SRAM DstBuf: DDR KernelBuf: VIP_SRAM KernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHE ImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONE xOffset: 0, yOffset: 0 coefNonZeroRatio: 1.000000000000000 coefCompression: 0.965753424657534 imageCompression: 1.000000000000000 imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408 coefCompression_llu: 4606873953072115319 imageCompression_llu: 4607182418800017408 imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 55 OutImageTileYSize: 2 KernelsPerCore: 6 kernelDDRReadBW: 10564 InImageDDrReadBW: 0 ReadBW: 10692 WriteBW: 104716 CycleCount: 32561 =========================== **********Show Perf******** =========================== layer_id:4 layer_name:ConvolutionReluPoolingLayer2 operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NN abs_op_id:1 upstream_layer_num:1 upstream_opertaion_num:1 0) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_TENSOR_TRANS (upstream_layer_id:0 upstream_layer_name:TensorTranspose) downstream_layer_num:1 downstream_opertaion_num:1 0) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:5 downstream_layer_name:ConvolutionReluPoolingLayer2) NumUsedNNCores: 6 ConvOutFIFODepth: 168 OrigInImageX: 224 OrigInImageY: 224 OrigInImageZ: 3 NNOutImageX: 222 (sub: 222) NNOutImageY: 50 (sub: 50) NNOutImageZ: 32 (sub: 32) FinalOutImageX: 111 FinalOutImageY: 111 FinalOutImageZ: 32 KernelX: 3 KernelY: 3 KernelZ: 3 PoolingSize: 2 PoolingStride: 2 InputDataSize: 8 OutputDataSize: 8 FP16: 0 archModel_kernelSize: 1352 kernelSize: 1408 SrcBuf: VIP_SRAM DstBuf: VIP_SRAM KernelBuf: VIP_SRAM KernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHE ImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONE xOffset: 0, yOffset: 0 coefNonZeroRatio: 1.000000000000000 coefCompression: 1.354838709677419 imageCompression: 1.000000000000000 imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408 coefCompression_llu: 4608780470280697261 imageCompression_llu: 4607182418800017408 imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 56 OutImageTileYSize: 2 KernelsPerCore: 6 kernelDDRReadBW: 7571 InImageDDrReadBW: 0 ReadBW: 7699 WriteBW: 0 CycleCount: 24949 =========================== **********Show Perf******** =========================== layer_id:5 layer_name:ConvolutionReluPoolingLayer2 operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NN abs_op_id:2 upstream_layer_num:1 upstream_opertaion_num:1 0) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:4 upstream_layer_name:ConvolutionReluPoolingLayer2) downstream_layer_num:1 downstream_opertaion_num:1 0) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_POOLING (downstream_layer_id:1 downstream_layer_name:PoolingLayer2) NumUsedNNCores: 6 ConvOutFIFODepth: 168 OrigInImageX: 111 OrigInImageY: 111 OrigInImageZ: 32 NNOutImageX: 109 (sub: 109) NNOutImageY: 25 (sub: 25) NNOutImageZ: 32 (sub: 32) FinalOutImageX: 109 FinalOutImageY: 109 FinalOutImageZ: 32 KernelX: 3 KernelY: 3 KernelZ: 32 PoolingSize: 1 PoolingStride: 1 InputDataSize: 8 OutputDataSize: 8 FP16: 0 archModel_kernelSize: 11113 kernelSize: 11648 SrcBuf: VIP_SRAM DstBuf: DDR KernelBuf: VIP_SRAM KernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHE ImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONE xOffset: 0, yOffset: 0 coefNonZeroRatio: 1.000000000000000 coefCompression: 0.965753424657534 imageCompression: 1.000000000000000 imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408 coefCompression_llu: 4606873953072115319 imageCompression_llu: 4607182418800017408 imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 55 OutImageTileYSize: 2 KernelsPerCore: 6 kernelDDRReadBW: 10564 InImageDDrReadBW: 0 ReadBW: 10692 WriteBW: 104716 CycleCount: 32561 =========================== **********Show Perf******** =========================== layer_id:4 layer_name:ConvolutionReluPoolingLayer2 operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NN abs_op_id:1 upstream_layer_num:1 upstream_opertaion_num:1 0) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_TENSOR_TRANS (upstream_layer_id:0 upstream_layer_name:TensorTranspose) downstream_layer_num:1 downstream_opertaion_num:1 0) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:5 downstream_layer_name:ConvolutionReluPoolingLayer2) NumUsedNNCores: 6 ConvOutFIFODepth: 168 OrigInImageX: 224 OrigInImageY: 224 OrigInImageZ: 3 NNOutImageX: 222 (sub: 222) NNOutImageY: 50 (sub: 50) NNOutImageZ: 32 (sub: 32) FinalOutImageX: 111 FinalOutImageY: 111 FinalOutImageZ: 32 KernelX: 3 KernelY: 3 KernelZ: 3 PoolingSize: 2 PoolingStride: 2 InputDataSize: 8 OutputDataSize: 8 FP16: 0 archModel_kernelSize: 1352 kernelSize: 1408 SrcBuf: VIP_SRAM DstBuf: VIP_SRAM KernelBuf: VIP_SRAM KernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHE ImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONE xOffset: 0, yOffset: 0 coefNonZeroRatio: 1.000000000000000 coefCompression: 1.354838709677419 imageCompression: 1.000000000000000 imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408 coefCompression_llu: 4608780470280697261 imageCompression_llu: 4607182418800017408 imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 56 OutImageTileYSize: 2 KernelsPerCore: 6 kernelDDRReadBW: 7571 InImageDDrReadBW: 0 ReadBW: 7699 WriteBW: 0 CycleCount: 24949 =========================== **********Show Perf******** =========================== layer_id:5 layer_name:ConvolutionReluPoolingLayer2 operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NN abs_op_id:2 upstream_layer_num:1 upstream_opertaion_num:1 0) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:4 upstream_layer_name:ConvolutionReluPoolingLayer2) downstream_layer_num:1 downstream_opertaion_num:1 0) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_POOLING (downstream_layer_id:1 downstream_layer_name:PoolingLayer2) NumUsedNNCores: 6 ConvOutFIFODepth: 168 OrigInImageX: 111 OrigInImageY: 111 OrigInImageZ: 32 NNOutImageX: 109 (sub: 109) NNOutImageY: 25 (sub: 25) NNOutImageZ: 32 (sub: 32) FinalOutImageX: 109 FinalOutImageY: 109 FinalOutImageZ: 32 KernelX: 3 KernelY: 3 KernelZ: 32 PoolingSize: 1 PoolingStride: 1 InputDataSize: 8 OutputDataSize: 8 FP16: 0 archModel_kernelSize: 11113 kernelSize: 11648 SrcBuf: VIP_SRAM DstBuf: DDR KernelBuf: VIP_SRAM KernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHE ImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONE xOffset: 0, yOffset: 0 coefNonZeroRatio: 1.000000000000000 coefCompression: 0.965753424657534 imageCompression: 1.000000000000000 imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408 coefCompression_llu: 4606873953072115319 imageCompression_llu: 4607182418800017408 imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 55 OutImageTileYSize: 2 KernelsPerCore: 6 kernelDDRReadBW: 10564 InImageDDrReadBW: 0 ReadBW: 10692 WriteBW: 104716 CycleCount: 32561 =========================== **********Show Perf******** =========================== layer_id:1 layer_name:PoolingLayer2 operation_id:0 operation_name:VXNNE_OPERATOR_POOLING operation_target:VXNNE_OPERATION_TARGET_TP abs_op_id:3 upstream_layer_num:1 upstream_opertaion_num:1 0) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:5 upstream_layer_name:ConvolutionReluPoolingLayer2) downstream_layer_num:1 downstream_opertaion_num:1 0) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:6 downstream_layer_name:ConvolutionReluPoolingLayer2) InImageX: 109 (sub: 109) InImageY: 109 (sub: 109) InImageZ: 32 (sub: 32) OutImageX: 54 OutImageY: 54 OutImageZ: 32 KernelX: 1 KernelY: 1 KernelZ: 32 PoolingSize: 2 PoolingStride: 2 InputDataSize: 8 OutputDataSize: 8 FP16: 0 archModel_kernelSize: 0 kernelSize: 0 SrcBuf: DDR DstBuf: VIP_SRAM KernelBuf: DDR KernelCacheMode=VXNNE_SRAM_CACHE_MODE_NONE ImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONE xOffset: 0, yOffset: 0 kernelDDRReadBW: 0 InImageDDrReadBW: 380192 ReadBW: 380320 WriteBW: 0 CycleCount: 129138 =========================== **********Show Perf******** =========================== layer_id:6 layer_name:ConvolutionReluPoolingLayer2 operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NN abs_op_id:4 upstream_layer_num:1 upstream_opertaion_num:1 0) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_POOLING (upstream_layer_id:1 upstream_layer_name:PoolingLayer2) downstream_layer_num:1 downstream_opertaion_num:1 0) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:7 downstream_layer_name:ConvolutionReluPoolingLayer2) NumUsedNNCores: 6 ConvOutFIFODepth: 168 OrigInImageX: 54 OrigInImageY: 54 OrigInImageZ: 32 NNOutImageX: 52 (sub: 52) NNOutImageY: 52 (sub: 52) NNOutImageZ: 64 (sub: 64) FinalOutImageX: 26 FinalOutImageY: 26 FinalOutImageZ: 64 KernelX: 3 KernelY: 3 KernelZ: 32 PoolingSize: 2 PoolingStride: 2 InputDataSize: 8 OutputDataSize: 8 FP16: 0 archModel_kernelSize: 19841 kernelSize: 20608 SrcBuf: VIP_SRAM DstBuf: VIP_SRAM KernelBuf: DDR KernelCacheMode=VXNNE_SRAM_CACHE_MODE_FULL_CACHE ImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONE xOffset: 0, yOffset: 0 coefNonZeroRatio: 1.000000000000000 coefCompression: 0.934931506849315 imageCompression: 1.000000000000000 imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408 coefCompression_llu: 4606596333917003439 imageCompression_llu: 4607182418800017408 imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 52 OutImageTileYSize: 6 KernelsPerCore: 4 kernelDDRReadBW: 17809 InImageDDrReadBW: 0 ReadBW: 17937 WriteBW: 0 CycleCount: 47726 =========================== **********Show Perf******** =========================== layer_id:7 layer_name:ConvolutionReluPoolingLayer2 operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NN abs_op_id:5 upstream_layer_num:1 upstream_opertaion_num:1 0) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:6 upstream_layer_name:ConvolutionReluPoolingLayer2) downstream_layer_num:1 downstream_opertaion_num:1 0) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_TENSOR_TRANS (downstream_layer_id:2 downstream_layer_name:TensorTranspose) NumUsedNNCores: 6 ConvOutFIFODepth: 168 OrigInImageX: 26 OrigInImageY: 26 OrigInImageZ: 64 NNOutImageX: 24 (sub: 24) NNOutImageY: 24 (sub: 24) NNOutImageZ: 128 (sub: 128) FinalOutImageX: 12 FinalOutImageY: 12 FinalOutImageZ: 128 KernelX: 3 KernelY: 3 KernelZ: 64 PoolingSize: 2 PoolingStride: 2 InputDataSize: 8 OutputDataSize: 8 FP16: 0 archModel_kernelSize: 76726 kernelSize: 79744 SrcBuf: VIP_SRAM DstBuf: VIP_SRAM KernelBuf: DDR KernelCacheMode=VXNNE_SRAM_CACHE_MODE_FULL_CACHE ImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONE xOffset: 0, yOffset: 0 coefNonZeroRatio: 0.999959309895833 coefCompression: 0.897413793103448 imageCompression: 1.000000000000000 imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182052296141483 coefCompression_llu: 4606258404393712082 imageCompression_llu: 4607182418800017408 imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 24 OutImageTileYSize: 16 KernelsPerCore: 5 kernelDDRReadBW: 66293 InImageDDrReadBW: 0 ReadBW: 66421 WriteBW: 0 CycleCount: 40241 =========================== **********Show Perf******** =========================== layer_id:2 layer_name:TensorTranspose operation_id:0 operation_name:VXNNE_OPERATOR_TENSOR_TRANS operation_target:VXNNE_OPERATION_TARGET_TP abs_op_id:6 upstream_layer_num:1 upstream_opertaion_num:1 0) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:7 upstream_layer_name:ConvolutionReluPoolingLayer2) downstream_layer_num:1 downstream_opertaion_num:1 0) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_FULLYCONNECTED (downstream_layer_id:8 downstream_layer_name:FullyConnectedReluLayer) InImageX: 12 InImageY: 12 InImageZ: 128 OutImageX: 128 (sub: 128) OutImageY: 12 (sub: 12) OutImageZ: 12 (sub: 12) KernelX: 1 KernelY: 1 KernelZ: 128 PoolingSize: 1 PoolingStride: 1 InputDataSize: 8 OutputDataSize: 8 FP16: 0 archModel_kernelSize: 0 kernelSize: 0 SrcBuf: VIP_SRAM DstBuf: VIP_SRAM KernelBuf: DDR KernelCacheMode=VXNNE_SRAM_CACHE_MODE_NONE ImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONE xOffset: 0, yOffset: 0 kernelDDRReadBW: 0 InImageDDrReadBW: 0 ReadBW: 128 WriteBW: 0 CycleCount: 11879 =========================== **********Show Perf******** =========================== layer_id:8 layer_name:FullyConnectedReluLayer operation_id:0 operation_name:VXNNE_OPERATOR_FULLYCONNECTED operation_target:VXNNE_OPERATION_TARGET_TP abs_op_id:7 upstream_layer_num:1 upstream_opertaion_num:1 0) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_TENSOR_TRANS (upstream_layer_id:2 upstream_layer_name:TensorTranspose) downstream_layer_num:1 downstream_opertaion_num:1 0) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_FULLYCONNECTED (downstream_layer_id:9 downstream_layer_name:FullyConnectedReluLayer) InImageX: 1 InImageY: 1 InImageZ: 18432 OutImageX: 1 (sub: 1) OutImageY: 1 (sub: 1) OutImageZ: 256 (sub: 256) KernelX: 1 KernelY: 1 KernelZ: 18432 PoolingSize: 1 PoolingStride: 1 InputDataSize: 8 OutputDataSize: 8 FP16: 0 archModel_kernelSize: 7078638 kernelSize: 0 SrcBuf: VIP_SRAM DstBuf: VIP_SRAM KernelBuf: DDR KernelCacheMode=VXNNE_SRAM_CACHE_MODE_NONE ImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONE xOffset: 0, yOffset: 0 coefNonZeroRatio: 0.972156100802951 coefCompression: 1.493328270774571 imageCompression: 1.000000000000000 imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4606931623251920668 coefCompression_llu: 4609404171816449099 imageCompression_llu: 4607182418800017408 imageNonZeroRatio_llu: 4599075939470750515 kernelDDRReadBW: 2113922 InImageDDrReadBW: 0 ReadBW: 2114050 WriteBW: 0 CycleCount: 558736 =========================== **********Show Perf******** =========================== layer_id:9 layer_name:FullyConnectedReluLayer operation_id:0 operation_name:VXNNE_OPERATOR_FULLYCONNECTED operation_target:VXNNE_OPERATION_TARGET_TP abs_op_id:8 upstream_layer_num:1 upstream_opertaion_num:1 0) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_FULLYCONNECTED (upstream_layer_id:8 upstream_layer_name:FullyConnectedReluLayer) downstream_layer_num:1 downstream_opertaion_num:1 0) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_SOFTMAX (downstream_layer_id:3 downstream_layer_name:Softmax2Layer) InImageX: 1 InImageY: 1 InImageZ: 256 OutImageX: 1 (sub: 1) OutImageY: 1 (sub: 1) OutImageZ: 6 (sub: 6) KernelX: 1 KernelY: 1 KernelZ: 256 PoolingSize: 1 PoolingStride: 1 InputDataSize: 8 OutputDataSize: 8 FP16: 0 archModel_kernelSize: 0 kernelSize: 0 SrcBuf: VIP_SRAM DstBuf: DDR KernelBuf: DDR KernelCacheMode=VXNNE_SRAM_CACHE_MODE_NONE ImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONE xOffset: 0, yOffset: 0 coefNonZeroRatio: 0.994791666666667 coefCompression: 32.615384615384613 imageCompression: 1.000000000000000 imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607135506303898965 coefCompression_llu: 4629787024622011628 imageCompression_llu: 4607182418800017408 imageNonZeroRatio_llu: 4599075939470750515 kernelDDRReadBW: 15029 InImageDDrReadBW: 0 ReadBW: 15157 WriteBW: 6 CycleCount: 6397 =========================== **********Show Perf******** =========================== layer_id:3 layer_name:Softmax2Layer operation_id:0 operation_name:VXNNE_OPERATOR_SOFTMAX operation_target:VXNNE_OPERATION_TARGET_SH abs_op_id:9 upstream_layer_num:1 upstream_opertaion_num:1 0) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_FULLYCONNECTED (upstream_layer_id:9 upstream_layer_name:FullyConnectedReluLayer) downstream_layer_num:0 downstream_opertaion_num:0 prev_ptrs = 0xffffa369c040 Warning: swapHandel, CMD changed NN/TP: pre_physical:0x1FE2C040, new_physical:0x1FE2C040 layer id: 0 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP. execution time: 290 us layer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 77 us layer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 63 us layer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 80 us layer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 80 us layer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 80 us layer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 74 us layer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 76 us layer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 84 us layer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 76 us layer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 73 us layer id: 1 layer name:PoolingLayer2 operation[0]:VXNNE_OPERATOR_POOLING target:VXNNE_OPERATION_TARGET_TP. execution time: 209 us layer id: 6 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 140 us layer id: 7 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 102 us layer id: 2 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP. execution time: 101 us layer id: 8 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP. execution time: 469 us layer id: 9 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP. execution time: 54 us layer id: 3 layer name:Softmax2Layer operation[0]:VXNNE_OPERATOR_SOFTMAX target:VXNNE_OPERATION_TARGET_SH. execution time: 187 us '''Warmup time: 3602.98 ms''' '''Original image size: 600x600x3''' '''Cropped image size: 600x600x3''' '''Resized image size: 224x224x3''' '''Input tensor index: 14''' '''Input tensor name: conv2d_input''' '''Selected order of channels: RGB''' '''Selected pixel values range: NA''' '''Filling time: 0.195005 ms''' prev_ptrs = 0xffffa369c040 Warning: swapHandel, CMD changed NN/TP: pre_physical:0x1FE2C040, new_physical:0x1FE2C040 layer id: 0 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP. execution time: 286 us layer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 77 us layer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 59 us layer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 78 us layer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 74 us layer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 81 us layer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 72 us layer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 74 us layer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 73 us layer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 74 us layer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 88 us layer id: 1 layer name:PoolingLayer2 operation[0]:VXNNE_OPERATOR_POOLING target:VXNNE_OPERATION_TARGET_TP. execution time: 200 us layer id: 6 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 105 us layer id: 7 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 88 us layer id: 2 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP. execution time: 82 us layer id: 8 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP. execution time: 154 us layer id: 9 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP. execution time: 48 us layer id: 3 layer name:Softmax2Layer operation[0]:VXNNE_OPERATOR_SOFTMAX target:VXNNE_OPERATION_TARGET_SH. execution time: 131 us '''Inference time 1: 2.49207 ms''' prev_ptrs = 0xffffa369c040 Warning: swapHandel, CMD changed NN/TP: pre_physical:0x1FE2C040, new_physical:0x1FE2C040 layer id: 0 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP. execution time: 240 us layer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 74 us layer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 57 us layer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 87 us layer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 81 us layer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 80 us layer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 78 us layer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 81 us layer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 86 us layer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 77 us layer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 73 us layer id: 1 layer name:PoolingLayer2 operation[0]:VXNNE_OPERATOR_POOLING target:VXNNE_OPERATION_TARGET_TP. execution time: 209 us layer id: 6 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 108 us layer id: 7 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 90 us layer id: 2 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP. execution time: 84 us layer id: 8 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP. execution time: 157 us layer id: 9 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP. execution time: 48 us layer id: 3 layer name:Softmax2Layer operation[0]:VXNNE_OPERATOR_SOFTMAX target:VXNNE_OPERATION_TARGET_SH. execution time: 136 us '''Inference time 2: 2.47457 ms''' prev_ptrs = 0xffffa369c040 Warning: swapHandel, CMD changed NN/TP: pre_physical:0x1FE2C040, new_physical:0x1FE2C040 layer id: 0 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP. execution time: 254 us layer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 69 us layer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 60 us layer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 82 us layer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 77 us layer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 77 us layer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 73 us layer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 76 us layer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 73 us layer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 76 us layer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 73 us layer id: 1 layer name:PoolingLayer2 operation[0]:VXNNE_OPERATOR_POOLING target:VXNNE_OPERATION_TARGET_TP. execution time: 210 us layer id: 6 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 107 us layer id: 7 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN. execution time: 89 us layer id: 2 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP. execution time: 83 us layer id: 8 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP. execution time: 155 us layer id: 9 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP. execution time: 185 us layer id: 3 layer name:Softmax2Layer operation[0]:VXNNE_OPERATOR_SOFTMAX target:VXNNE_OPERATION_TARGET_SH. execution time: 151 us '''Inference time 3: 2.61483 ms''' '''Average inference time: 2.52716 ms''' '''Total prediction time: 2.72216 ms''' '''Output tensor index: 5''' '''Output tensor name: activation_5/Softmax''' '''Top results:''' '''1 Red Apple''' prev_ptrs = 0xffffa369c040 Exit VX Thread: 0xa3ee5fb0
AXISRAM: Peak used 0 0.000000% VIPSRAM: Peak used 259584 99.803146%======================== Block [0 - 2] SUCCEED ======================Version 2B =========================== Block [3 - 8] ============================== 3 TP DD -> VS [( 0The execution of the version 2B of the classifier on the embedded platform is detailed below. As before, 93312), IC( 0), KC( 0)] 4 NN VS -<code> VS [( 93312, 43264), IC( 0), KC( 20608)] 5 NN VS -htop</code> VS [( 43264, 18432), IC( 0), KC( 79744)] 6 TP VS -> VS [( 18432, 18432), IC( 0), KC( 0)] 7 TP VS -> VS [( 18432, 256), IC( 0), KC( 0)] 8 TP VS -> DD [( 256, 0), IC( 0), KC( 0)]------------------------------------------------------------------Segment AB (3 - 8)was used to monitor the system.
AXISRAM<pre class="board-terminal">root@imx8mpevk: Peak used 0 0/home/mathias/devel/image_classifier_eIQ_plus# python3 image_classifier.py -m my_fruits_model_qatlegacy.tflite -l labels.txt -i testdata/red-apple1.000000% VIPSRAMjpg INFO: Peak used 157184 60Created TensorFlow Lite delegate for NNAPI.Applied NNAPI delegate.433071%======================== Block [3 Warm- 8] SUCCEED =========================up time: 3474.22 msFOriginal image size: (1600, 600) FCropped image size: (0600, 600) FResized image size: (1224, 224) F(Filling time: 0) .72 msInference time 1: 1.44 msF(Inference time 2: 1) F(0) .38 msF(Inference time 3: 1) F(0) .39 msF(Average inference time: 1) F(.40 msTotal prediction time: 2.12 msResults: 1) .000 Red Apple 0.000 Orange 0.000 Hand</pre>
id IN [ x y w h ] OUT [ x y w h ] (tx, ty, kpc) (ic, kc, kc/ks, ks/eks, kernel_type) 0 TP DD 0x(nil) [ 0 0 3 224] -> VS 0x0x400800 [ 0 0 224 224] ( 0, 0, 0) ( 0, 0, 0.000000%, 0.000000%, NONE) 1 NN VS 0x0x400800 [ 0 0 224 24] -> VS 0x0x425400 [ 0 0 111 11] ( 32, 2, 6) ( 0, 1408, 100.000000%, 122.222221%, DD) 2 NN VS 0x0x425400 [ 0 0 111 11] -> DD 0x(nil) [ 0 0 109 9] ( 55, 2, 6) ( 0, 11648, 100.000000%, 116.666664%, DD) 1 NN VS 0x0x401b40 [ 0 22 224 52] -> VS 0x0x4258c5 [ 0 11 111 25] ( 56, 2, 6) ( 0, 1408, 100.000000%, 122.222221%, DD) 2 NN VS 0x0x4257e7 [ 0 9 111 27] -> DD 0x0x3d5 [ 0 9 109 25] ( 55, 2, 6) ( 0, 11648, 100.000000%, 116.666664%, DD) 1 NN VS 0x0x404700 [ 0 72 224 52] -> VS 0x0x42639c [ 0 36 111 25] ( 56, 2, 6) ( 0, 1408, 100.000000%, 122.222221%, DD) 2 NN VS 0x0x4262be [ 0 34 111 27] -> DD 0x0xe7a [ 0 34 109 25] ( 55, 2, 6) ( 0, 11648, 100.000000%, 116.666664%, DD) 1 NN VS 0x0x4072c0 [ 0 122 224 52] -> VS 0x0x426e73 [ 0 61 111 25] ( 56, 2, 6) ( 0, 1408, 100.000000%, 122.222221%, DD) 2 NN VS 0x0x426d95 [ 0 59 111 27] -> DD 0x0x191f [ 0 59 109 25] ( 55, 2, 6) ( 0, 11648, 100.000000%, 116.666664%Note that the inference time is close to the C++ one, DD) 1 NN VS 0x0x409e80 [ 0 172 224 52] -> VS 0x0x42794a [ 0 86 111 25] but the filling time ( 56, 2, 6needed to fill the input tensor with the image) ( 0, 1408, 100.000000%, 122is slower.222221%, DD) 2 NN VS 0x0x42786c [ 0 84 111 27] This because Python doesn't allow some low-> DD 0x0x23c4 [ 0 84 109 25] ( 55, 2, 6) ( 0, 11648, 100.000000%, 116level operations with pointers like C++.666664%, DD) 3 TP DD 0x(nil) [ 0 0 109 109] -> VS 0x0x400800 [ 0 0 54 54] ( 0, 0, 0) ( 0, 0, 0.000000%, 0.000000%, NONE) 4 NN VS 0x0x400800 [ 0 0 54 54] -> VS 0x0x41c500 [ 0 0 26 26] ( 52, 6, 4) ( 0, 20608, 100.000000%, 103.205132%, DD) 5 NN VS 0x0x41c500 [ 0 0 26 26] -> VS 0x0x400800 [ 0 0 12 12] ( 24, 16, 5) ( 0, 79744, 100.000000%, 100.160774%, DD) 6 TP VS 0x0x400800 [ 0 0 12 12] -> VS 0x0x422600 [ 0 0 128 12] ( 0, 0, 0) ( 0, 0, 0.000000%, 0.000000%, NONE) 7 TP VS 0x0x422600 [ 0 0 18432 1] -> VS 0x0x400800 [ 0 0 256 1] ( 0, 0, 0) ( 0, 0, 0.000000%, 0.000000%, NONE) 8 TP VS 0x0x400800 [ 0 0 256 1] -> DD 0x(nil) [ 0 0 6 1] ( 0, 0, 0) ( 0, 0, 0.000000%, 0.000000%, NONE) 9 SH DD 0x(nil) [ 0 0 0 0] -> DD 0x(nil) [ 0 0 0 0] ( 0, 0, 0) ( 0, 0, 0.000000%, 0.000000%, NONE)
PreLoadWeightBiases = 0 nan%---------------------------End VerifyTiling -------------------------The following screenshot shows the system status while executing the application.
ArchModelVersion: ARCHCTS@230121
SWTilingVersion: ARCHCTS@230121
ProfileMode: 0
NumNNCores:6
NumNNCoresInt8: 6
NumNNCoresInt16: 6
NumNNCoresFloat16: 0
NumTPCores: 3
NumTPLiteCores: 0
MadPerCore: 64
VIP7Version: 1
InBuffDepth: 9
AccumBufferDepth: 32
DPAmount: 3
XYDPX: 0
XYDPY: 0
ZDP: 3
AXISRAMSize: 0
VIPSRAMSize: 262144
L2CacheWidth: 32
USCCacheSize: 8
BrickMode: 0
SWTiling: 1
SmallBatchEnable: 0
SWTilingPhase1: 1
TPWithFCLayer: 1
TPCircularBufferSupport: 1
KERNEL_HEADER_NOT_CACHED_FIX: 0
NNFCNonPruneAccel: 0
Conv1x1HalfPerformance: 0
DDRLatency: 0
CacheLineModeDisabled: 0
PER_3D_TILE_BUBBLE_FIX: 1
SWConv1x1To1x2: 0
TP_LOCALIZATION_REORDER_DISABLED_Fix: 1
USCCacheControllers: 1
AsyncCopyPerfFix: 1
ZDP3NoCompressFix: 1
ZXDP3KernelReadConflictFix: 1
CoefDecodePerf: 2
VectorPrune: 1
EnableCacheDataFromSRAM: 1
IMAGE_PARTIAL_CACHE_FIX: 0
DDRReadBandWidthLimit: 3.80
DDRWriteBandWidthLimit: 3.80
DDRTotalBandWidthLimit: 3.80
AXISRAMReadBandWidthLimit: 16.00
AXISRAMWriteBandWidthLimit: 16.00
AXISRAMTotalBandWidthLimit: 16.00
AXIBusReadBandWidthLimit: 16.00
AXIBusWriteBandWidthLimit: 16.00
AXIBusTotalBandWidthLimit: 32.00
HANDLE_ABBUFFER[[File: 1HANDLE_SUBIMAGE: 1HANDLE_BRANCH: 1ML-TN-001 4 acceleration python.png|center|thumb|600x600px]]
FreqInMHZ: 1000=== Version 3 ===AxiClockFreqInMHZ: 1000OutstandingTransfer: 64InternalWriteBWLimit: 16The following image shows the execution of the third version of the classifier on the embedded platform. The image sensor is pointed at a red apple which is correctly classified with 98% confidence. Note that with this camera, the frame rate is capped at 30 fps, but it could be way higher because the inference on NPU only takes few milliseconds as shown before.00
LanesPerConv: 64
MaxTileSize: 64
AxiSramSlowedDownByAddr: 1
SLOW_NN_REQ_ARBITRATION_FIX: 0
FLOAT_XYDP_X[[File: 1FLOAT_XYDP_Y: 1FLOAT_ZDP: 1SINGLE_PORT_ACC_BUFFER: 1MAX_ZRL_BIT_WIDTH: 8MAX_SOC_OUT_STANDING_NUMBER: 32ML-TN-001 4 camera photo.jpg|thumb|center|600px|Version 3 of the application running on the i.MX8 Plus EVK]]
SWTilingPhase3: 1
AXI_SRAM_ONLY_SW_TILING: 0
VIP_CORE_COUNT: 1
DEPTH_WISE_SUPPORT: 1
NN_WRITE_WITHOUT_USC: 0
EQUIVALENT_VIP_SRAM_WIDTH_IN_BYTE: 32
IMAGE_NOT_PACKED_IN_SRAM: 0
NN_COEF_COMPRESSION_ENHANCEMENT: 1
TP_COMPRESSION_ENHANCEMENT: 1
COEF_DELTA_CORD_OVER_FLOW_ZRL_8BIT_FIX: 1
NumShaderCores: 1
KERNEL_PER_CORE_LESS_THAN_THIRD_COEF_BUFF_DEPTH_FIX: 0
LOW_EFFICIENCY_OF_ID_WRITE_IMGBUF_FIX: 0
DR_JD_Diff_For_Cacheline_Mode_FIX: 1
CONVOUT_FIFO_DEPTH_FIX: 1
===========================**********Show Perf********===========================layer_id:0 layer_name:TensorTransposeoperation_id:0 operation_name:VXNNE_OPERATOR_TENSOR_TRANS operation_target:VXNNE_OPERATION_TARGET_TPabs_op_id:0upstream_layer_num:0 upstream_opertaion_num:0downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:4 downstream_layer_name:ConvolutionReluPoolingLayer2)InImageX: 3InImageY: 224InImageZ: 224OutImageX: 224 (sub: 224)OutImageY: 224 (sub: 224)OutImageZ: 3 (sub: 3)KernelX: 1KernelY: 1KernelZ: 224PoolingSize: 1PoolingStride: 1InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 0kernelSize: 0SrcBuf: DDRDstBuf: VIP_SRAMKernelBuf: DDRKernelCacheMode=VXNNE_SRAM_CACHE_MODE_NONEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0During the execution, yOffset: 0<code>htop</code> was used to monitor the system. The following screenshot shows the system status while executing the application.
kernelDDRReadBW: 0
InImageDDrReadBW: 150528
ReadBW: 150656
WriteBW: 0
CycleCount: 77927
[[File:ML-TN-001 4 camera htop.png|thumb|center|600px|<code>htop</code> screenshot during the execution of the classifier version 3]]
==Results =========================**********Show Perf********===========================layer_id:4 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:1upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_TENSOR_TRANS (upstream_layer_id:0 upstream_layer_name:TensorTranspose)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:5 downstream_layer_name:ConvolutionReluPoolingLayer2)NumUsedNNCores: 6ConvOutFIFODepth: 168
OrigInImageX: 224OrigInImageY: 224OrigInImageZ: 3NNOutImageX: 222 (sub: 222)NNOutImageY: 22 (sub: 22)NNOutImageZ: 32 (sub: 32)FinalOutImageX: 111FinalOutImageY: 111FinalOutImageZ: 32KernelX: 3KernelY: 3KernelZ: 3PoolingSize: 2PoolingStride: 2InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 1352kernelSize: 1408SrcBuf: VIP_SRAMDstBuf: VIP_SRAMKernelBuf: DDRKernelCacheMode=VXNNE_SRAM_CACHE_MODE_FULL_CACHEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: = Version 1.000000000000000coefCompression: 1.354838709677419imageCompression: 1.000000000000000===imageNonZeroRatio: 0The following table lists the prediction times for a single image depending on the model and the thread parameter.300000000000000
coefNonZeroRatio__llu{| class="wikitable" style="margin: 4607182418800017408auto;"coefCompression_llu: 4608780470280697261|+Prediction times!Model!Threads parameter!Prediction time[ms]|-| rowspan="3" |'''Floating-point'''|unspecified|89|-|1|160|-|2|130|-|'''Half-quantized'''|unspecified|180|-| rowspan="2" |'''Fully-quantized'''|unspecified|85|-|4imageCompression_llu: 4607182418800017408|29imageNonZeroRatio_llu: 4599075939470750515|}
OutImageTileXSize: 32OutImageTileYSize: 2KernelsPerCore: 6The prediction time '''takes into account the inference time and the time needed to fill the input tensor with the image'''. Furthermore, the inference time is averaged over several inferences.
kernelDDRReadBW: 6099InImageDDrReadBW: 0ReadBW: 6227WriteBW: 0CycleCount: 12213The same tests were repeated using a network file system (NFS) over an Ethernet connection, too. No significant variations in the prediction times were observed.
In conclusion, to maximize the performance in terms of execution time, the model has to be fully-quantized and the number of threads has to be specified explicitly.
===Version 2A and 3 ========================**********Show Perf********===========================layer_id:5 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:2upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:4 upstream_layer_name:ConvolutionReluPoolingLayer2)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_POOLING (downstream_layer_id:1 downstream_layer_name:PoolingLayer2)NumUsedNNCores: 6ConvOutFIFODepth: 168In this case, only the fully-quantized model could be tested and the thread number has no effect.
OrigInImageX: 111OrigInImageY: 111OrigInImageZ: 32NNOutImageX: 109 (sub: 109)NNOutImageY: 9 (sub: 9)NNOutImageZ: 32 (sub{| class="wikitable" style="margin: 32)auto;"FinalOutImageX: 109|+FinalOutImageY: 109Prediction timesFinalOutImageZ: 32!ModelKernelX: 3!Prediction timeKernelY: 3[ms]KernelZ: 32|-PoolingSize: 1|'''Fully-quantized'''PoolingStride: |1InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 11113kernelSize: 11648SrcBuf: VIP_SRAMDstBuf: DDRKernelBuf: DDRKernelCacheMode=VXNNE_SRAM_CACHE_MODE_FULL_CACHEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 1.000000000000000coefCompression: 0.9657534246575345imageCompression: 1.000000000000000imageNonZeroRatio: 0.300000000000000|}
coefNonZeroRatio__llu: 4607182418800017408coefCompression_llu: 4606873953072115319imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515=== Version 2B ===
OutImageTileXSize{| class="wikitable" style="margin: 55auto;"OutImageTileYSize: |+Prediction times!Model!Prediction time[ms]|-|'''Fully-quantized'''|2.1KernelsPerCore: 6|}
kernelDDRReadBW: 9540InImageDDrReadBW: 0ReadBW: 9668WriteBW: 37746== Results comparison ==CycleCountThe following table compares the results achieved to the ones measured on the [[ML-TN-001 - AI at the edge: 14667comparison of different embedded platforms - Part 2|i.MX8M-based Mito8M SoM]].
 {| class="wikitable" style==========================**********Show Perf********===========================layer_id:4 layer_name"margin:ConvolutionReluPoolingLayer2auto;"operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NN|+abs_op_id:1upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_TENSOR_TRANS (upstream_layer_id:0 upstream_layer_name:TensorTranspose)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:5 downstream_layer_name:ConvolutionReluPoolingLayer2)NumUsedNNCores: 6ConvOutFIFODepth: 168Prediction times!PlatformOrigInImageX: 224!BSPOrigInImageY: 224!TensorFlow LiteOrigInImageZ: 3!ARM coresNNOutImageX: 222 (sub: 222# / Type / Max freq. [GHz])NNOutImageY: 50 (sub: 50)!AccelerationNNOutImageZ: 32 (sub: 32)!ModelFinalOutImageX: 111!ThreadsFinalOutImageY: 111!Prediction timeFinalOutImageZ: 32[ms]KernelX: 3KernelY: 3KernelZ: 3PoolingSize: 2PoolingStride: 2InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 1352kernelSize: 1408SrcBuf: VIP_SRAMDstBuf: VIP_SRAM!NotesKernelBuf: VIP_SRAM|-KernelCacheMode| rowspan=VXNNE_SRAM_CACHE_MODE_STREAM_CACHEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 1.000000000000000coefCompression: 1.354838709677419imageCompression: 1.000000000000000imageNonZeroRatio: 0"6" |'''NXP i.300000000000000 coefNonZeroRatio__llu: 4607182418800017408coefCompression_llu: 4608780470280697261imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 56OutImageTileYSize: 2KernelsPerCore: 6 kernelDDRReadBW: 7571InImageDDrReadBW: 0ReadBW: 7699WriteBW: 0CycleCount: 24949  ===========================**********Show Perf********MX8M-based Mito8M SoM'''| rowspan===========================layer_id:5 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:2upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:4 upstream_layer_name:ConvolutionReluPoolingLayer2)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_POOLING (downstream_layer_id:1 downstream_layer_name:PoolingLayer2)NumUsedNNCores: "6ConvOutFIFODepth: 168 OrigInImageX: 111OrigInImageY: 111OrigInImageZ: 32NNOutImageX: 109 (sub: 109)NNOutImageY: 25 (sub: 25)NNOutImageZ: 32 (sub: 32)FinalOutImageX: 109FinalOutImageY: 109FinalOutImageZ: 32KernelX: 3KernelY: 3KernelZ: 32PoolingSize: 1PoolingStride: 1InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 11113kernelSize: 11648SrcBuf: VIP_SRAMDstBuf: DDRKernelBuf: VIP_SRAMKernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 1" |L4.000000000000000coefCompression: 014.965753424657534imageCompression: 198_2.000000000000000imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408coefCompression_llu: 4606873953072115319imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 55OutImageTileYSize: 2KernelsPerCore: 6 kernelDDRReadBW: 10564InImageDDrReadBW: 0ReadBW: 10692WriteBW: 104716CycleCount: 32561  ===========================**********Show Perf********==========================| rowspan=layer_id:4 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:1upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_TENSOR_TRANS (upstream_layer_id:0 upstream_layer_name:TensorTranspose)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:5 downstream_layer_name:ConvolutionReluPoolingLayer2)NumUsedNNCores: "6ConvOutFIFODepth: 168 OrigInImageX: 224OrigInImageY: 224OrigInImageZ: 3NNOutImageX: 222 (sub: 222)NNOutImageY: 50 (sub: 50)NNOutImageZ: 32 (sub: 32)FinalOutImageX: 111FinalOutImageY: 111FinalOutImageZ: 32KernelX: 3KernelY: 3KernelZ: 3PoolingSize: 2PoolingStride: 2InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 1352kernelSize: 1408SrcBuf: VIP_SRAMDstBuf: VIP_SRAMKernelBuf: VIP_SRAMKernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: " |1.00000000000000012coefCompression: 1.354838709677419imageCompression: 1.000000000000000imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408coefCompression_llu: 4608780470280697261imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 56OutImageTileYSize: 2KernelsPerCore: | rowspan="kernelDDRReadBW: 7571InImageDDrReadBW: 0ReadBW: 7699WriteBW: 0CycleCount: 24949  ===========================**********Show Perf********===========================layer_id:5 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:2upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:" |4 upstream_layer_name:ConvolutionReluPoolingLayer2)downstream_layer_num:/ Cortex-A53 / 1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_POOLING (downstream_layer_id:1 downstream_layer_name:PoolingLayer2)NumUsedNNCores: 6ConvOutFIFODepth: 168 OrigInImageX: 111OrigInImageY: 111OrigInImageZ: 32NNOutImageX: 109 (sub: 109)NNOutImageY: 25 (sub: 25)NNOutImageZ: 32 (sub: 32)FinalOutImageX: 109FinalOutImageY: 109FinalOutImageZ: 32KernelX: 3KernelY: .3KernelZ: 32PoolingSize: 1PoolingStride: 1InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 11113kernelSize: 11648SrcBuf: VIP_SRAMDstBuf: DDRKernelBuf: VIP_SRAMKernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHEImageCacheMode| rowspan=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 1.000000000000000coefCompression: 0.965753424657534imageCompression: 1.000000000000000imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408coefCompression_llu: 4606873953072115319imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 55OutImageTileYSize: 2KernelsPerCore: "6" |no kernelDDRReadBW: 10564InImageDDrReadBW: 0ReadBW: 10692WriteBW: 104716CycleCount: 32561  ===========================**********Show Perf********==========================| rowspan=layer_id:4 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:1upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_TENSOR_TRANS (upstream_layer_id:0 upstream_layer_name:TensorTranspose)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:5 downstream_layer_name:ConvolutionReluPoolingLayer2)NumUsedNNCores: 6ConvOutFIFODepth: 168 OrigInImageX: 224OrigInImageY: 224OrigInImageZ: "3" |Floating-pointNNOutImageX: 222 |unspecified (sub: 222)NNOutImageY: 50 (sub: 50)NNOutImageZ: 32 (sub: 324)FinalOutImageX: 111FinalOutImageY: 111FinalOutImageZ: 32KernelX: 3KernelY: 3KernelZ: 3PoolingSize: 2PoolingStride: 2InputDataSize: 8OutputDataSize: 8|220FP16: 0|archModel_kernelSize: 1352|-kernelSize: 1408SrcBuf: VIP_SRAMDstBuf: VIP_SRAMKernelBuf: VIP_SRAMKernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: |1.000000000000000coefCompression: 1.354838709677419|220imageCompression: 1.000000000000000|imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408coefCompression_llu: 4608780470280697261imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 56|-OutImageTileYSize: |2KernelsPerCore: 6 kernelDDRReadBW: 7571InImageDDrReadBW: 0ReadBW: 7699WriteBW: 0|390CycleCount: 24949||-|Half-quantized===========================**********Show Perf********===========================layer_id:5 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:2upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION |unspecified (upstream_layer_id:4 upstream_layer_name:ConvolutionReluPoolingLayer2)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_POOLING (downstream_layer_id:1 downstream_layer_name:PoolingLayer2)NumUsedNNCores: 6ConvOutFIFODepth: 168 OrigInImageX: 111OrigInImageY: 111OrigInImageZ: 32NNOutImageX: 109 (sub: 109)NNOutImageY: 25 (sub: 25)NNOutImageZ: 32 (sub: 32)FinalOutImageX: 109FinalOutImageY: 109FinalOutImageZ: 32KernelX: 3KernelY: 3KernelZ: 32PoolingSize: 1PoolingStride: 1InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 11113kernelSize: 11648SrcBuf: VIP_SRAMDstBuf: DDR|330KernelBuf: VIP_SRAM|KernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHE|-ImageCacheMode| rowspan=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 1.000000000000000coefCompression: 0.965753424657534imageCompression: 1.000000000000000imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408coefCompression_llu: 4606873953072115319imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 55OutImageTileYSize: "2" |Fully-quantizedKernelsPerCore: 6 kernelDDRReadBW: 10564InImageDDrReadBW: 0ReadBW: 10692WriteBW: 104716CycleCount: 32561  ===========================**********Show Perf********===========================layer_id:4 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:1upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_TENSOR_TRANS |unspecified (upstream_layer_id:0 upstream_layer_name:TensorTranspose)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:5 downstream_layer_name:ConvolutionReluPoolingLayer2)NumUsedNNCores: 6|200ConvOutFIFODepth: 168||-OrigInImageX: 224|4OrigInImageY: 224|84OrigInImageZ: 3|NNOutImageX: 222 (sub: 222)|-NNOutImageY: 50 (sub: 50)NNOutImageZ: 32 (sub: 32)FinalOutImageX: 111FinalOutImageY: 111FinalOutImageZ: 32KernelX: 3KernelY: 3KernelZ: 3PoolingSize: 2PoolingStride: 2InputDataSize: | rowspan="8" |'''NXP i.MX8M Plus EVK'''OutputDataSize: | rowspan="8FP16: 0archModel_kernelSize: 1352kernelSize: 1408SrcBuf: VIP_SRAMDstBuf: VIP_SRAMKernelBuf: VIP_SRAMKernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 1.000000000000000coefCompression: 1.354838709677419imageCompression: 1.000000000000000imageNonZeroRatio: 0" |L5.300000000000000 coefNonZeroRatio__llu: 4607182418800017408coefCompression_llu: 4608780470280697261imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 56OutImageTileYSize: 2KernelsPerCore: 6 kernelDDRReadBW: 7571InImageDDrReadBW: 0ReadBW: 7699WriteBW: 0CycleCount: 24949  ===========================**********Show Perf********===========================layer_id:5 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:2upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:4 upstream_layer_name:ConvolutionReluPoolingLayer2)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_POOLING (downstream_layer_id:1 downstream_layer_name:PoolingLayer2)NumUsedNNCores: 6ConvOutFIFODepth: 168 OrigInImageX: 111OrigInImageY: 111OrigInImageZ: 32NNOutImageX: 109 (sub: 109)NNOutImageY: 25 (sub: 25)NNOutImageZ: 32 (sub: 32)FinalOutImageX: 109FinalOutImageY: 109FinalOutImageZ: 32KernelX: 3KernelY: 3KernelZ: 32PoolingSize: 1PoolingStride: 1InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 11113kernelSize: 11648SrcBuf: VIP_SRAMDstBuf: DDRKernelBuf: VIP_SRAMKernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 1.000000000000000coefCompression: 024_2.965753424657534imageCompression: 1.000000000000000imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408coefCompression_llu: 4606873953072115319imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 55OutImageTileYSize: 2KernelsPerCore: 6 kernelDDRReadBW: 10564InImageDDrReadBW: 0ReadBW: 10692WriteBW: 104716CycleCount: 32561  ===========================**********Show Perf********==========================| rowspan=layer_id:1 layer_name:PoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_POOLING operation_target:VXNNE_OPERATION_TARGET_TPabs_op_id:3upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:5 upstream_layer_name:ConvolutionReluPoolingLayer2)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:6 downstream_layer_name:ConvolutionReluPoolingLayer2)InImageX: 109 (sub: 109)InImageY: 109 (sub: 109)InImageZ: 32 (sub: 32)OutImageX: 54OutImageY: 54OutImageZ: 32KernelX: 1KernelY: 1KernelZ: 32PoolingSize: 2PoolingStride: 2InputDataSize: 8OutputDataSize: "8FP16: 0archModel_kernelSize: 0kernelSize: 0SrcBuf: DDRDstBuf: VIP_SRAMKernelBuf: DDRKernelCacheMode=VXNNE_SRAM_CACHE_MODE_NONEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0 kernelDDRReadBW: 0InImageDDrReadBW: 380192ReadBW: 380320WriteBW: 0CycleCount: 129138  ===========================**********Show Perf********===========================layer_id:6 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:4upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_POOLING (upstream_layer_id:1 upstream_layer_name:PoolingLayer2)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:7 downstream_layer_name:ConvolutionReluPoolingLayer2)NumUsedNNCores: 6ConvOutFIFODepth: 168 OrigInImageX: 54OrigInImageY: 54OrigInImageZ: 32NNOutImageX: 52 (sub: 52)NNOutImageY: 52 (sub: 52)NNOutImageZ: 64 (sub: 64)FinalOutImageX: 26FinalOutImageY: 26FinalOutImageZ: 64KernelX: 3KernelY: 3KernelZ: 32PoolingSize: " |2PoolingStride: 2InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 19841kernelSize: 20608SrcBuf: VIP_SRAMDstBuf: VIP_SRAMKernelBuf: DDRKernelCacheMode=VXNNE_SRAM_CACHE_MODE_FULL_CACHEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 1.000000000000000coefCompression: 0.934931506849315imageCompression: 1.000000000000000imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408coefCompression_llu: 4606596333917003439imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 52OutImageTileYSize: 6KernelsPerCore: 4 kernelDDRReadBW: 17809InImageDDrReadBW: 0ReadBW: 17937WriteBW: 0CycleCount: 47726  ===========================**********Show Perf********============| rowspan===============layer_id:7 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:5upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:6 upstream_layer_name:ConvolutionReluPoolingLayer2)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_TENSOR_TRANS (downstream_layer_id:2 downstream_layer_name:TensorTranspose)NumUsedNNCores: 6ConvOutFIFODepth: 168 OrigInImageX: 26OrigInImageY: 26OrigInImageZ: 64NNOutImageX: 24 (sub: 24)NNOutImageY: 24 (sub: 24)NNOutImageZ: 128 (sub: 128)FinalOutImageX: 12FinalOutImageY: 12FinalOutImageZ: 128KernelX: 3KernelY: 3KernelZ: 64PoolingSize: 2PoolingStride: 2InputDataSize: "8OutputDataSize: 8FP16: 0archModel_kernelSize: 76726kernelSize: 79744SrcBuf: VIP_SRAMDstBuf: VIP_SRAMKernelBuf: DDRKernelCacheMode=VXNNE_SRAM_CACHE_MODE_FULL_CACHEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 0.999959309895833coefCompression: 0.897413793103448imageCompression: " |4 / Cortex-A53 / 1.000000000000000imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182052296141483coefCompression_llu: 4606258404393712082imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 45990759394707505158 OutImageTileXSize: 24OutImageTileYSize: 16KernelsPerCore: 5 kernelDDRReadBW: 66293InImageDDrReadBW: 0ReadBW: 66421WriteBW: 0CycleCount: 40241  =========| rowspan==================**********Show Perf********===========================layer_id:2 layer_name:TensorTransposeoperation_id:0 operation_name:VXNNE_OPERATOR_TENSOR_TRANS operation_target:VXNNE_OPERATION_TARGET_TPabs_op_id:"6" |noupstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:7 upstream_layer_name:ConvolutionReluPoolingLayer2)downstream_layer_num:1 downstream_opertaion_num:version 10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_FULLYCONNECTED (downstream_layer_id:8 downstream_layer_name:FullyConnectedReluLayer)InImageX: 12InImageY: 12InImageZ: 128| rowspan="3" |Floating-pointOutImageX: 128 |unspecified (sub: 1284)OutImageY: 12 (sub: 12)OutImageZ: 12 (sub: 12)KernelX: 1KernelY: 1|89KernelZ: 128|PoolingSize: 1|-PoolingStride: |1InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 0kernelSize: 0SrcBuf: VIP_SRAMDstBuf: VIP_SRAMKernelBuf: DDRKernelCacheMode=VXNNE_SRAM_CACHE_MODE_NONEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0 kernelDDRReadBW: 0InImageDDrReadBW: 0ReadBW: 128WriteBW: 0CycleCount: 11879 |160===========================|**********Show Perf********|-===========================layer_id:8 layer_name:FullyConnectedReluLayeroperation_id:0 operation_name:VXNNE_OPERATOR_FULLYCONNECTED operation_target:VXNNE_OPERATION_TARGET_TPabs_op_id:7upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_TENSOR_TRANS (upstream_layer_id:|2 upstream_layer_name:TensorTranspose)downstream_layer_num:1 downstream_opertaion_num:1|1300) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_FULLYCONNECTED (downstream_layer_id:9 downstream_layer_name:FullyConnectedReluLayer)InImageX: 1|InImageY: 1|-InImageZ: 18432|Half-quantizedOutImageX: 1 |unspecified (sub: 1)OutImageY: 1 (sub: 1)OutImageZ: 256 (sub: 2564)KernelX: 1KernelY: 1KernelZ: 18432PoolingSize: 1PoolingStride: 1InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 7078638kernelSize: 0SrcBuf: VIP_SRAM|180DstBuf: VIP_SRAM|KernelBuf: DDR|-KernelCacheMode| rowspan=VXNNE_SRAM_CACHE_MODE_NONEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0"2" |Fully-quantizedcoefNonZeroRatio: 0.972156100802951coefCompression: 1.493328270774571imageCompression: 1.000000000000000imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4606931623251920668coefCompression_llu: 4609404171816449099imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 kernelDDRReadBW: 2113922InImageDDrReadBW: 0ReadBW: 2114050WriteBW: 0CycleCount: 558736  ===========================**********Show Perf********===========================layer_id:9 layer_name:FullyConnectedReluLayeroperation_id:0 operation_name:VXNNE_OPERATOR_FULLYCONNECTED operation_target:VXNNE_OPERATION_TARGET_TPabs_op_id:8upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_FULLYCONNECTED |unspecified (upstream_layer_id:8 upstream_layer_name:FullyConnectedReluLayer)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_SOFTMAX (downstream_layer_id:3 downstream_layer_name:Softmax2Layer)InImageX: 1|85InImageY: 1|InImageZ: 256|-OutImageX: 1 (sub: 1)OutImageY: 1 (sub: 1)OutImageZ: 6 (sub: 6)KernelX: 1KernelY: 1KernelZ: 256PoolingSize: 1PoolingStride: 1InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 0kernelSize: 0SrcBuf: VIP_SRAMDstBuf: DDRKernelBuf: DDRKernelCacheMode=VXNNE_SRAM_CACHE_MODE_NONE|4ImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONE|29xOffset: 0|Interestingly, yOffset: 0coefNonZeroRatio: 0.994791666666667coefCompression: 32.615384615384613imageCompression: 1.000000000000000imageNonZeroRatio: 0this time is significantly smaller than the one measured on the i.300000000000000 coefNonZeroRatio__llu: 4607135506303898965coefCompression_llu: 4629787024622011628imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 kernelDDRReadBW: 15029InImageDDrReadBW: 0ReadBW: 15157WriteBW: 6CycleCount: 6397  ===========================**********Show Perf********===========================layer_id:3 layer_name:Softmax2Layeroperation_id:0 operation_name:VXNNE_OPERATOR_SOFTMAX operation_target:VXNNE_OPERATION_TARGET_SHabs_op_id:9upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_FULLYCONNECTED MX8M (upstream_layer_id:9 upstream_layer_name:FullyConnectedReluLayer)downstream_layer_num:0 downstream_opertaion_num:0prev_ptrs = 0xffffa369c040 Warning: swapHandel, CMD changed  NN/TP: pre_physical:0x1FE2C040, new_physical:0x1FE2C040 layer id: 0 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP.execution time: 290 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 77 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 63 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 80 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 80 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 80 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 74 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 76 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 84 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 76 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 73 uslayer id: 1 layer name:PoolingLayer2 operation[0]:VXNNE_OPERATOR_POOLING target:VXNNE_OPERATION_TARGET_TP.execution time: 209 uslayer id: 6 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 140 uslayer id: 7 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 102 uslayer id: 2 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP.execution time: 101 uslayer id: 8 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP.execution time: 469 uslayer id: 9 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP.execution time: 54 uslayer id: 3 layer name:Softmax2Layer operation[0]:VXNNE_OPERATOR_SOFTMAX target:VXNNE_OPERATION_TARGET_SH.execution time: 187 usWarmup time: 3602.98 msOriginal image size: 600x600x3Cropped image size: 600x600x3Resized image size: 224x224x3Input tensor index: 14Input tensor name: conv2d_inputSelected order of channels: RGBSelected pixel values range: NAFilling time: 0).195005 msprev_ptrs = 0xffffa369c040 Warning: swapHandelProbably, CMD changed  NN/TP: pre_physical:0x1FE2C040this is due to improvements at the TFL inference engine level, new_physical:0x1FE2C040 layer id: 0 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TPbesides the increased maximum ARM frequency.execution time: 286 us|-layer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.|NPUexecution time(version 2A: 77 usC++)layer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.|Fully-quantizedexecution time: 59 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 78 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 74 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 81 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 72 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 74 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 73 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 74 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 88 uslayer id: 1 layer name:PoolingLayer2 operation[0]:VXNNE_OPERATOR_POOLING target:VXNNE_OPERATION_TARGET_TP.execution time: 200 uslayer id: 6 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 105 uslayer id: 7 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 88 uslayer id: 2 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP.execution time: 82 uslayer id: 8 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP.execution time: 154 uslayer id: 9 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP.execution time: 48 uslayer id: 3 layer name:Softmax2Layer operation[0]:VXNNE_OPERATOR_SOFTMAX target:VXNNE_OPERATION_TARGET_SH.execution time: 131 us|NAInference time |1: 2.49207 msprev_ptrs = 0xffffa369c040 Warning: swapHandel, CMD changed  NN/TP: pre_physical:0x1FE2C040, new_physical:0x1FE2C040 layer id: 0 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP.execution time: 240 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 74 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 57 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 87 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 81 us|layer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.|-execution time: 80 us|NPUlayer id(version 2B: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.Python)execution time: 78 us|Fully-quantizedlayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 81 us|NAlayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 86 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 77 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 73 uslayer id: 1 layer name:PoolingLayer2 operation[0]:VXNNE_OPERATOR_POOLING target:VXNNE_OPERATION_TARGET_TP.execution time: 209 uslayer id: 6 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 108 uslayer id: 7 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 90 uslayer id: 2 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP.execution time: 84 uslayer id: 8 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP.execution time: 157 uslayer id: 9 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP.execution time: 48 uslayer id: 3 layer name:Softmax2Layer operation[0]:VXNNE_OPERATOR_SOFTMAX target:VXNNE_OPERATION_TARGET_SH.execution time: 136 usInference time 2: |2.47457 msprev_ptrs = 0xffffa369c040 Warning: swapHandel, CMD changed  NN/TP: pre_physical:0x1FE2C040, new_physical:0x1FE2C040 layer id: 0 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP.execution time: 254 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 69 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 60 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 82 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 77 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 77 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 73 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 76 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 73 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 76 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 73 uslayer id: 1 layer name:PoolingLayer2 operation[0]:VXNNE_OPERATOR_POOLING target:VXNNE_OPERATION_TARGET_TP.execution time: 210 uslayer id: 6 layer name:ConvolutionReluPoolingLayer2 operation|See also section [0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 107 uslayer id: 7 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 89 uslayer id: 2 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP.execution time: 83 uslayer id: 8 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP.execution time: 155 uslayer id: 9 layer name:FullyConnectedReluLayer operation[0#Version 2B (Python)|''Version 2B (Python)'']:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP.execution time: 185 uslayer id: 3 layer name:Softmax2Layer operation[0]:VXNNE_OPERATOR_SOFTMAX target:VXNNE_OPERATION_TARGET_SH.execution time: 151 usInference time 3: 2.61483 msAverage inference time: 2.52716 msTotal prediction time: 2.72216 msOutput tensor index: 5Output tensor name: activation_5/SoftmaxTop results: 1 Red Appleprev_ptrs = 0xffffa369c040Exit VX Thread: 0xa3ee5fb0</pre>{ $ export CNN_PERF=1 NN_EXT_SHOW_PERF=1 VIV_VX_DEBUG_LEVEL=1 VIV_VX_PROFILE=1 $ build/image_classifier_cv ... > viv_test_app_profile.log 2>&1 |=== <big>Version 3</big> === == Results ==
dave_user, Administrators
5,193
edits