Open main menu

DAVE Developer's Wiki β

Changes

no edit summary
==== <big>Floating-point model</big> ====
<pre class="board-terminal">
TBDroot@imx8mpevk:/mnt/ramdisk/image_classifier_eIQ_plus# build/image_classifier_cv 2 my_converted_model.tflite labels.txt testdata/red-apple1.jpg Number of threads: undefinedWarmup time: 92.4871 msOriginal image size: 600x600x3Cropped image size: 600x600x3Resized image size: 224x224x3Input tensor index: 1Input tensor name: conv2d_8_inputSelected order of channels: RGBSelected pixel values range: 0-1Filling time: 0.923276 msInference time 1: 88.2438 msInference time 2: 89.3992 msInference time 3: 86.3731 msAverage inference time: 88.0054 msTotal prediction time: 88.9287 msOutput tensor index: 0Output tensor name: IdentityTop results: 1 Red Apple 1.13485e-10 Orange 5.58774e-18 Avocado 7.49401e-20 Hand 1.40373e-22 Banana
</pre>
==== <big>Half-quantized model</big> ====
<pre class="board-terminal">
TBDroot@imx8mpevk:/mnt/ramdisk/image_classifier_eIQ_plus# build/image_classifier_cv 2 my_fruits_model_1.12_quant.tflite labels.txt testdata/red-apple1.jpgNumber of threads: undefinedWarmup time: 180.551 msOriginal image size: 600x600x3Cropped image size: 600x600x3Resized image size: 224x224x3Input tensor index: 12Input tensor name: conv2d_inputSelected order of channels: RGBSelected pixel values range: 0-1Filling time: 0.811773 msInference time 1: 176.78 msInference time 2: 184.297 msInference time 3: 176.743 msAverage inference time: 179.273 msTotal prediction time: 180.085 msOutput tensor index: 18Output tensor name: dense_1/SoftmaxTop results: 1 Red Apple 1.53349e-07 Orange 1.67772e-15 Avocado 7.44711e-18 Banana 2.47029e-18 Hand
</pre>The following screenshot shows the system status while executing the application. In this case, the thread parameter was unspecified.
==== <big>Fully-quantized model</big> ====
<pre class="board-terminal">
TBDroot@imx8mpevk:/mnt/ramdisk/image_classifier_eIQ_plus# build/image_classifier_cv 3 my_fruits_model_qatlegacy.tflite labels.txt testdata/red-apple1.jpg Number of threads: undefinedWarmup time: 88.5131 msOriginal image size: 600x600x3Cropped image size: 600x600x3Resized image size: 224x224x3Input tensor index: 14Input tensor name: conv2d_inputSelected order of channels: RGBSelected pixel values range: NAFilling time: 0.290634 msInference time 1: 84.8542 msInference time 2: 85.1227 msInference time 3: 84.8016 msAverage inference time: 84.9262 msTotal prediction time: 85.2168 msOutput tensor index: 5Output tensor name: activation_5/SoftmaxTop results: 1 Red Apple
</pre>
=== <big>Version 2</big> ===
The execution of the second version of the classifier on the embedded platform is detailed below. During the execution, <code>htop</code> was used to monitor the system. Note that "the first execution of model inference using the NN API always takes many times longer, because of model graph initialization needed by the GPU/ML module". Therefore, the time needed for the first inference (warm up) is measured separately.<pre class="board-terminal">
TBDroot@imx8mpevk:/mnt/ramdisk/image_classifier_eIQ_plus# build/image_classifier_cv 3 my_fruits_model_qatlegacy.tflite labels.txt testdata/red-apple1.jpg INFO: Created TensorFlow Lite delegate for NNAPI.Applied NNAPI delegateWarmup time: 3529.8 msOriginal image size: 600x600x3Cropped image size: 600x600x3Resized image size: 224x224x3Input tensor index: 14Input tensor name: conv2d_inputSelected order of channels: RGBSelected pixel values range: NAFilling time: 0.215756 msInference time 1: 1.33429 msInference time 2: 1.31204 msInference time 3: 1.26541 msAverage inference time: 1.30391 msTotal prediction time: 1.51967 msOutput tensor index: 5Output tensor name: activation_5/SoftmaxTop results: 1 Red Apple
</pre>The following screenshot shows the system status while executing the application.
==== <big>Profiling model execution on NPU</big> ====
The following block shows the profiler log. "The log captures detailed information of the execution clock cycles and DDR data transmission in each layer". Note that the time needed for inference is longer than usual while the profiler overhead is added.<pre class="board-terminal">root@imx8mpevk:/mnt/ramdisk/image_classifier_eIQ_plus# build/image_classifier_cv 3 my_fruits_model_qatlegacy.tflite labels.txt testdata/red-apple1.jpg INFO: Created TensorFlow Lite delegate for NNAPI.#productname=VIPNano-D+I, pid=0x9fCreated VX Thread: 0xa3ee5fb0Applied NNAPI delegateprev_ptrs = 0xffffa369c040Can't support one shaderCoreCount!---------------------------Begin VerifyTiling -------------------------AXI-SRAM = 0 Bytes VIP-SRAM = 260096 Bytes SWTILING_PHASE_FEATURES[1, 1, 1] 0 TP [( 3 224 224 1, 150528, 0x0xaaaab1874580(0x0xaaaab1874580, 0x(nil)) -> 224 224 3 1, 150528, 0x0xaaaab187db10(0x0xaaaab187db10, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] C[ 1] 1 NN [( 224 224 3 1, 150528, 0x0xaaaab187db10(0x0xaaaab187db10, 0x(nil)) -> 111 111 32 1, 394272, 0x0xaaaab1881a90(0x0xaaaab1881a90, 0x(nil))) k(3 3 3, 1152) pad(0 0) pool(2 2, 2 2)] P[ 0] C[ 2] 2 NN [( 111 111 32 1, 394272, 0x0xaaaab1881a90(0x0xaaaab1881a90, 0x(nil)) -> 109 109 32 1, 380192, 0x0xaaaab1884270(0x0xaaaab1884270, 0x(nil))) k(3 3 32, 9984) pad(0 0) pool(0 0, 1 1)] P[ 1] C[ 3] 3 TP [( 109 109 32 1, 380192, 0x0xaaaab1884270(0x0xaaaab1884270, 0x(nil)) -> 54 54 32 1, 93312, 0x0xaaaab1887410(0x0xaaaab1887410, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(2 2, 2 2)] P[ 2] C[ 4] 4 NN [( 54 54 32 1, 93312, 0x0xaaaab1887410(0x0xaaaab1887410, 0x(nil)) -> 26 26 64 1, 43264, 0x0xaaaab188cd90(0x0xaaaab188cd90, 0x(nil))) k(3 3 32, 19968) pad(0 0) pool(2 2, 2 2)] P[ 3] C[ 5] 5 NN [( 26 26 64 1, 43264, 0x0xaaaab188cd90(0x0xaaaab188cd90, 0x(nil)) -> 12 12 128 1, 18432, 0x0xaaaab1892710(0x0xaaaab1892710, 0x(nil))) k(3 3 64, 79616) pad(0 0) pool(2 2, 2 2)] P[ 4] C[ 6] 6 TP [( 12 12 128 1, 18432, 0x0xaaaab1892710(0x0xaaaab1892710, 0x(nil)) -> 128 12 12 1, 18432, 0x0xaaaab1894ef0(0x0xaaaab1894ef0, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] P[ 5] C[ 7] 7 TP [(18432 1 1 1, 18432, 0x0xaaaab1894ef0(0x0xaaaab1894ef0, 0x(nil)) -> 256 1 1 1, 256, 0x0xaaaab18965b0(0x0xaaaab18965b0, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] P[ 6] C[ 8] 8 TP [( 256 1 1 1, 256, 0x0xaaaab18965b0(0x0xaaaab18965b0, 0x(nil)) -> 6 1 1 1, 6, 0x0xaaaab1897c10(0x0xaaaab1897c10, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] P[ 7] C[ 9] 9 SH [( 6 1 1 1, 6, 0x0xaaaab1897c10(0x0xaaaab1897c10, 0x(nil)) -> 6 1 1 1, 6, 0x0xaaaab187a200(0x0xaaaab187a200, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] P[ 8] Detected SegmentsAB_VS (0 - 1)TL_VS (1 - 2)AB_VS (3 - 8)======================== Block [0 - 2] ============================== 0 TP DD -> VS [( 150528, 150528), IC( 0), KC( 0)] 1 NN VS -> VS [( 150528, 96000), IC( 0), KC( 1408)] 2 NN VS -> DD [( 96000, 0), IC( 0), KC( 11648)]------------------------------------------------------------------Segment AB (0 - 0)------------------------------------------------------------------Segment Tiling (1 - 2)[VS 24( 0, 24)(224) ->VS 11( 0, 11)( 27) P( 0) F(1)] [VS 11( 0, 11)( 27) ->DD 9( 0, 9)( 0) P( 0) F(0)] [VS 52( 22, 74)(224) ->VS 25( 11, 36)( 27) P( 0) F(1)] [VS 27( 9, 36)( 27) ->DD 25( 9, 34)( 0) P( 0) F(0)] [VS 52( 72,124)(224) ->VS 25( 36, 61)( 27) P( 0) F(1)] [VS 27( 34, 61)( 27) ->DD 25( 34, 59)( 0) P( 0) F(0)] [VS 52(122,174)(224) ->VS 25( 61, 86)( 27) P( 0) F(1)] [VS 27( 59, 86)( 27) ->DD 25( 59, 84)( 0) P( 0) F(0)] [VS 52(172,224)(224) ->VS 25( 86,111)( 27) P( 0) F(1)] [VS 27( 84,111)( 27) ->DD 25( 84,109)( 0) P( 0) F(1)]  AXISRAM: Estimate used 0 0.000000% VIPSRAM: Estimate used 107040 41.154037% M = 25 AXISRAM: Peak used 0 0.000000% VIPSRAM: Peak used 259584 99.803146%======================== Block [0 - 2] SUCCEED ================================================= Block [3 - 8] ============================== 3 TP DD -> VS [( 0, 93312), IC( 0), KC( 0)] 4 NN VS -> VS [( 93312, 43264), IC( 0), KC( 20608)] 5 NN VS -> VS [( 43264, 18432), IC( 0), KC( 79744)] 6 TP VS -> VS [( 18432, 18432), IC( 0), KC( 0)] 7 TP VS -> VS [( 18432, 256), IC( 0), KC( 0)] 8 TP VS -> DD [( 256, 0), IC( 0), KC( 0)]------------------------------------------------------------------Segment AB (3 - 8) AXISRAM: Peak used 0 0.000000% VIPSRAM: Peak used 157184 60.433071%======================== Block [3 - 8] SUCCEED =========================F(1) F(0) F(1) F(0) F(1) F(0) F(1) F(0) F(1) F(1)   id IN [ x y w h ] OUT [ x y w h ] (tx, ty, kpc) (ic, kc, kc/ks, ks/eks, kernel_type) 0 TP DD 0x(nil) [ 0 0 3 224] -> VS 0x0x400800 [ 0 0 224 224] ( 0, 0, 0) ( 0, 0, 0.000000%, 0.000000%, NONE) 1 NN VS 0x0x400800 [ 0 0 224 24] -> VS 0x0x425400 [ 0 0 111 11] ( 32, 2, 6) ( 0, 1408, 100.000000%, 122.222221%, DD) 2 NN VS 0x0x425400 [ 0 0 111 11] -> DD 0x(nil) [ 0 0 109 9] ( 55, 2, 6) ( 0, 11648, 100.000000%, 116.666664%, DD) 1 NN VS 0x0x401b40 [ 0 22 224 52] -> VS 0x0x4258c5 [ 0 11 111 25] ( 56, 2, 6) ( 0, 1408, 100.000000%, 122.222221%, DD) 2 NN VS 0x0x4257e7 [ 0 9 111 27] -> DD 0x0x3d5 [ 0 9 109 25] ( 55, 2, 6) ( 0, 11648, 100.000000%, 116.666664%, DD) 1 NN VS 0x0x404700 [ 0 72 224 52] -> VS 0x0x42639c [ 0 36 111 25] ( 56, 2, 6) ( 0, 1408, 100.000000%, 122.222221%, DD) 2 NN VS 0x0x4262be [ 0 34 111 27] -> DD 0x0xe7a [ 0 34 109 25] ( 55, 2, 6) ( 0, 11648, 100.000000%, 116.666664%, DD) 1 NN VS 0x0x4072c0 [ 0 122 224 52] -> VS 0x0x426e73 [ 0 61 111 25] ( 56, 2, 6) ( 0, 1408, 100.000000%, 122.222221%, DD) 2 NN VS 0x0x426d95 [ 0 59 111 27] -> DD 0x0x191f [ 0 59 109 25] ( 55, 2, 6) ( 0, 11648, 100.000000%, 116.666664%, DD) 1 NN VS 0x0x409e80 [ 0 172 224 52] -> VS 0x0x42794a [ 0 86 111 25] ( 56, 2, 6) ( 0, 1408, 100.000000%, 122.222221%, DD) 2 NN VS 0x0x42786c [ 0 84 111 27] -> DD 0x0x23c4 [ 0 84 109 25] ( 55, 2, 6) ( 0, 11648, 100.000000%, 116.666664%, DD) 3 TP DD 0x(nil) [ 0 0 109 109] -> VS 0x0x400800 [ 0 0 54 54] ( 0, 0, 0) ( 0, 0, 0.000000%, 0.000000%, NONE) 4 NN VS 0x0x400800 [ 0 0 54 54] -> VS 0x0x41c500 [ 0 0 26 26] ( 52, 6, 4) ( 0, 20608, 100.000000%, 103.205132%, DD) 5 NN VS 0x0x41c500 [ 0 0 26 26] -> VS 0x0x400800 [ 0 0 12 12] ( 24, 16, 5) ( 0, 79744, 100.000000%, 100.160774%, DD) 6 TP VS 0x0x400800 [ 0 0 12 12] -> VS 0x0x422600 [ 0 0 128 12] ( 0, 0, 0) ( 0, 0, 0.000000%, 0.000000%, NONE) 7 TP VS 0x0x422600 [ 0 0 18432 1] -> VS 0x0x400800 [ 0 0 256 1] ( 0, 0, 0) ( 0, 0, 0.000000%, 0.000000%, NONE) 8 TP VS 0x0x400800 [ 0 0 256 1] -> DD 0x(nil) [ 0 0 6 1] ( 0, 0, 0) ( 0, 0, 0.000000%, 0.000000%, NONE) 9 SH DD 0x(nil) [ 0 0 0 0] -> DD 0x(nil) [ 0 0 0 0] ( 0, 0, 0) ( 0, 0, 0.000000%, 0.000000%, NONE) PreLoadWeightBiases = 0 nan%---------------------------End VerifyTiling ------------------------- ArchModelVersion: ARCHCTS@230121SWTilingVersion: ARCHCTS@230121ProfileMode: 0NumNNCores:6NumNNCoresInt8: 6NumNNCoresInt16: 6NumNNCoresFloat16: 0NumTPCores: 3NumTPLiteCores: 0MadPerCore: 64VIP7Version: 1InBuffDepth: 9AccumBufferDepth: 32DPAmount: 3XYDPX: 0XYDPY: 0ZDP: 3AXISRAMSize: 0VIPSRAMSize: 262144L2CacheWidth: 32USCCacheSize: 8BrickMode: 0SWTiling: 1SmallBatchEnable: 0SWTilingPhase1: 1TPWithFCLayer: 1TPCircularBufferSupport: 1KERNEL_HEADER_NOT_CACHED_FIX: 0NNFCNonPruneAccel: 0Conv1x1HalfPerformance: 0DDRLatency: 0CacheLineModeDisabled: 0PER_3D_TILE_BUBBLE_FIX: 1SWConv1x1To1x2: 0TP_LOCALIZATION_REORDER_DISABLED_Fix: 1USCCacheControllers: 1AsyncCopyPerfFix: 1ZDP3NoCompressFix: 1ZXDP3KernelReadConflictFix: 1CoefDecodePerf: 2VectorPrune: 1EnableCacheDataFromSRAM: 1IMAGE_PARTIAL_CACHE_FIX: 0DDRReadBandWidthLimit: 3.80DDRWriteBandWidthLimit: 3.80DDRTotalBandWidthLimit: 3.80AXISRAMReadBandWidthLimit: 16.00AXISRAMWriteBandWidthLimit: 16.00AXISRAMTotalBandWidthLimit: 16.00AXIBusReadBandWidthLimit: 16.00AXIBusWriteBandWidthLimit: 16.00AXIBusTotalBandWidthLimit: 32.00 HANDLE_ABBUFFER: 1HANDLE_SUBIMAGE: 1HANDLE_BRANCH: 1 FreqInMHZ: 1000AxiClockFreqInMHZ: 1000OutstandingTransfer: 64InternalWriteBWLimit: 16.00 LanesPerConv: 64MaxTileSize: 64AxiSramSlowedDownByAddr: 1SLOW_NN_REQ_ARBITRATION_FIX: 0 FLOAT_XYDP_X: 1FLOAT_XYDP_Y: 1FLOAT_ZDP: 1SINGLE_PORT_ACC_BUFFER: 1MAX_ZRL_BIT_WIDTH: 8MAX_SOC_OUT_STANDING_NUMBER: 32 SWTilingPhase3: 1AXI_SRAM_ONLY_SW_TILING: 0VIP_CORE_COUNT: 1DEPTH_WISE_SUPPORT: 1NN_WRITE_WITHOUT_USC: 0EQUIVALENT_VIP_SRAM_WIDTH_IN_BYTE: 32IMAGE_NOT_PACKED_IN_SRAM: 0NN_COEF_COMPRESSION_ENHANCEMENT: 1TP_COMPRESSION_ENHANCEMENT: 1COEF_DELTA_CORD_OVER_FLOW_ZRL_8BIT_FIX: 1NumShaderCores: 1KERNEL_PER_CORE_LESS_THAN_THIRD_COEF_BUFF_DEPTH_FIX: 0LOW_EFFICIENCY_OF_ID_WRITE_IMGBUF_FIX: 0DR_JD_Diff_For_Cacheline_Mode_FIX: 1CONVOUT_FIFO_DEPTH_FIX: 1  ===========================**********Show Perf********===========================layer_id:0 layer_name:TensorTransposeoperation_id:0 operation_name:VXNNE_OPERATOR_TENSOR_TRANS operation_target:VXNNE_OPERATION_TARGET_TPabs_op_id:0upstream_layer_num:0 upstream_opertaion_num:0downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:4 downstream_layer_name:ConvolutionReluPoolingLayer2)InImageX: 3InImageY: 224InImageZ: 224OutImageX: 224 (sub: 224)OutImageY: 224 (sub: 224)OutImageZ: 3 (sub: 3)KernelX: 1KernelY: 1KernelZ: 224PoolingSize: 1PoolingStride: 1InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 0kernelSize: 0SrcBuf: DDRDstBuf: VIP_SRAMKernelBuf: DDRKernelCacheMode=VXNNE_SRAM_CACHE_MODE_NONEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0 kernelDDRReadBW: 0InImageDDrReadBW: 150528ReadBW: 150656WriteBW: 0CycleCount: 77927  ===========================**********Show Perf********===========================layer_id:4 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:1upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_TENSOR_TRANS (upstream_layer_id:0 upstream_layer_name:TensorTranspose)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:5 downstream_layer_name:ConvolutionReluPoolingLayer2)NumUsedNNCores: 6ConvOutFIFODepth: 168 OrigInImageX: 224OrigInImageY: 224OrigInImageZ: 3NNOutImageX: 222 (sub: 222)NNOutImageY: 22 (sub: 22)NNOutImageZ: 32 (sub: 32)FinalOutImageX: 111FinalOutImageY: 111FinalOutImageZ: 32KernelX: 3KernelY: 3KernelZ: 3PoolingSize: 2PoolingStride: 2InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 1352kernelSize: 1408SrcBuf: VIP_SRAMDstBuf: VIP_SRAMKernelBuf: DDRKernelCacheMode=VXNNE_SRAM_CACHE_MODE_FULL_CACHEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 1.000000000000000coefCompression: 1.354838709677419imageCompression: 1.000000000000000imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408coefCompression_llu: 4608780470280697261imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 32OutImageTileYSize: 2KernelsPerCore: 6 kernelDDRReadBW: 6099InImageDDrReadBW: 0ReadBW: 6227WriteBW: 0CycleCount: 12213  ===========================**********Show Perf********===========================layer_id:5 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:2upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:4 upstream_layer_name:ConvolutionReluPoolingLayer2)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_POOLING (downstream_layer_id:1 downstream_layer_name:PoolingLayer2)NumUsedNNCores: 6ConvOutFIFODepth: 168 OrigInImageX: 111OrigInImageY: 111OrigInImageZ: 32NNOutImageX: 109 (sub: 109)NNOutImageY: 9 (sub: 9)NNOutImageZ: 32 (sub: 32)FinalOutImageX: 109FinalOutImageY: 109FinalOutImageZ: 32KernelX: 3KernelY: 3KernelZ: 32PoolingSize: 1PoolingStride: 1InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 11113kernelSize: 11648SrcBuf: VIP_SRAMDstBuf: DDRKernelBuf: DDRKernelCacheMode=VXNNE_SRAM_CACHE_MODE_FULL_CACHEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 1.000000000000000coefCompression: 0.965753424657534imageCompression: 1.000000000000000imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408coefCompression_llu: 4606873953072115319imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 55OutImageTileYSize: 2KernelsPerCore: 6 kernelDDRReadBW: 9540InImageDDrReadBW: 0ReadBW: 9668WriteBW: 37746CycleCount: 14667  ===========================**********Show Perf********===========================layer_id:4 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:1upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_TENSOR_TRANS (upstream_layer_id:0 upstream_layer_name:TensorTranspose)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:5 downstream_layer_name:ConvolutionReluPoolingLayer2)NumUsedNNCores: 6ConvOutFIFODepth: 168 OrigInImageX: 224OrigInImageY: 224OrigInImageZ: 3NNOutImageX: 222 (sub: 222)NNOutImageY: 50 (sub: 50)NNOutImageZ: 32 (sub: 32)FinalOutImageX: 111FinalOutImageY: 111FinalOutImageZ: 32KernelX: 3KernelY: 3KernelZ: 3PoolingSize: 2PoolingStride: 2InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 1352kernelSize: 1408SrcBuf: VIP_SRAMDstBuf: VIP_SRAMKernelBuf: VIP_SRAMKernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 1.000000000000000coefCompression: 1.354838709677419imageCompression: 1.000000000000000imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408coefCompression_llu: 4608780470280697261imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 56OutImageTileYSize: 2KernelsPerCore: 6 kernelDDRReadBW: 7571InImageDDrReadBW: 0ReadBW: 7699WriteBW: 0CycleCount: 24949  ===========================**********Show Perf********===========================layer_id:5 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:2upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:4 upstream_layer_name:ConvolutionReluPoolingLayer2)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_POOLING (downstream_layer_id:1 downstream_layer_name:PoolingLayer2)NumUsedNNCores: 6ConvOutFIFODepth: 168 OrigInImageX: 111OrigInImageY: 111OrigInImageZ: 32NNOutImageX: 109 (sub: 109)NNOutImageY: 25 (sub: 25)NNOutImageZ: 32 (sub: 32)FinalOutImageX: 109FinalOutImageY: 109FinalOutImageZ: 32KernelX: 3KernelY: 3KernelZ: 32PoolingSize: 1PoolingStride: 1InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 11113kernelSize: 11648SrcBuf: VIP_SRAMDstBuf: DDRKernelBuf: VIP_SRAMKernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 1.000000000000000coefCompression: 0.965753424657534imageCompression: 1.000000000000000imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408coefCompression_llu: 4606873953072115319imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 55OutImageTileYSize: 2KernelsPerCore: 6 kernelDDRReadBW: 10564InImageDDrReadBW: 0ReadBW: 10692WriteBW: 104716CycleCount: 32561  ===========================**********Show Perf********===========================layer_id:4 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:1upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_TENSOR_TRANS (upstream_layer_id:0 upstream_layer_name:TensorTranspose)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:5 downstream_layer_name:ConvolutionReluPoolingLayer2)NumUsedNNCores: 6ConvOutFIFODepth: 168 OrigInImageX: 224OrigInImageY: 224OrigInImageZ: 3NNOutImageX: 222 (sub: 222)NNOutImageY: 50 (sub: 50)NNOutImageZ: 32 (sub: 32)FinalOutImageX: 111FinalOutImageY: 111FinalOutImageZ: 32KernelX: 3KernelY: 3KernelZ: 3PoolingSize: 2PoolingStride: 2InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 1352kernelSize: 1408SrcBuf: VIP_SRAMDstBuf: VIP_SRAMKernelBuf: VIP_SRAMKernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 1.000000000000000coefCompression: 1.354838709677419imageCompression: 1.000000000000000imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408coefCompression_llu: 4608780470280697261imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 56OutImageTileYSize: 2KernelsPerCore: 6 kernelDDRReadBW: 7571InImageDDrReadBW: 0ReadBW: 7699WriteBW: 0CycleCount: 24949  ===========================**********Show Perf********===========================layer_id:5 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:2upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:4 upstream_layer_name:ConvolutionReluPoolingLayer2)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_POOLING (downstream_layer_id:1 downstream_layer_name:PoolingLayer2)NumUsedNNCores: 6ConvOutFIFODepth: 168 OrigInImageX: 111OrigInImageY: 111OrigInImageZ: 32NNOutImageX: 109 (sub: 109)NNOutImageY: 25 (sub: 25)NNOutImageZ: 32 (sub: 32)FinalOutImageX: 109FinalOutImageY: 109FinalOutImageZ: 32KernelX: 3KernelY: 3KernelZ: 32PoolingSize: 1PoolingStride: 1InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 11113kernelSize: 11648SrcBuf: VIP_SRAMDstBuf: DDRKernelBuf: VIP_SRAMKernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 1.000000000000000coefCompression: 0.965753424657534imageCompression: 1.000000000000000imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408coefCompression_llu: 4606873953072115319imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 55OutImageTileYSize: 2KernelsPerCore: 6 kernelDDRReadBW: 10564InImageDDrReadBW: 0ReadBW: 10692WriteBW: 104716CycleCount: 32561  ===========================**********Show Perf********===========================layer_id:4 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:1upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_TENSOR_TRANS (upstream_layer_id:0 upstream_layer_name:TensorTranspose)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:5 downstream_layer_name:ConvolutionReluPoolingLayer2)NumUsedNNCores: 6ConvOutFIFODepth: 168 OrigInImageX: 224OrigInImageY: 224OrigInImageZ: 3NNOutImageX: 222 (sub: 222)NNOutImageY: 50 (sub: 50)NNOutImageZ: 32 (sub: 32)FinalOutImageX: 111FinalOutImageY: 111FinalOutImageZ: 32KernelX: 3KernelY: 3KernelZ: 3PoolingSize: 2PoolingStride: 2InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 1352kernelSize: 1408SrcBuf: VIP_SRAMDstBuf: VIP_SRAMKernelBuf: VIP_SRAMKernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 1.000000000000000coefCompression: 1.354838709677419imageCompression: 1.000000000000000imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408coefCompression_llu: 4608780470280697261imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 56OutImageTileYSize: 2KernelsPerCore: 6 kernelDDRReadBW: 7571InImageDDrReadBW: 0ReadBW: 7699WriteBW: 0CycleCount: 24949  ===========================**********Show Perf********===========================layer_id:5 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:2upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:4 upstream_layer_name:ConvolutionReluPoolingLayer2)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_POOLING (downstream_layer_id:1 downstream_layer_name:PoolingLayer2)NumUsedNNCores: 6ConvOutFIFODepth: 168 OrigInImageX: 111OrigInImageY: 111OrigInImageZ: 32NNOutImageX: 109 (sub: 109)NNOutImageY: 25 (sub: 25)NNOutImageZ: 32 (sub: 32)FinalOutImageX: 109FinalOutImageY: 109FinalOutImageZ: 32KernelX: 3KernelY: 3KernelZ: 32PoolingSize: 1PoolingStride: 1InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 11113kernelSize: 11648SrcBuf: VIP_SRAMDstBuf: DDRKernelBuf: VIP_SRAMKernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 1.000000000000000coefCompression: 0.965753424657534imageCompression: 1.000000000000000imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408coefCompression_llu: 4606873953072115319imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 55OutImageTileYSize: 2KernelsPerCore: 6 kernelDDRReadBW: 10564InImageDDrReadBW: 0ReadBW: 10692WriteBW: 104716CycleCount: 32561  ===========================**********Show Perf********===========================layer_id:4 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:1upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_TENSOR_TRANS (upstream_layer_id:0 upstream_layer_name:TensorTranspose)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:5 downstream_layer_name:ConvolutionReluPoolingLayer2)NumUsedNNCores: 6ConvOutFIFODepth: 168 OrigInImageX: 224OrigInImageY: 224OrigInImageZ: 3NNOutImageX: 222 (sub: 222)NNOutImageY: 50 (sub: 50)NNOutImageZ: 32 (sub: 32)FinalOutImageX: 111FinalOutImageY: 111FinalOutImageZ: 32KernelX: 3KernelY: 3KernelZ: 3PoolingSize: 2PoolingStride: 2InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 1352kernelSize: 1408SrcBuf: VIP_SRAMDstBuf: VIP_SRAMKernelBuf: VIP_SRAMKernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 1.000000000000000coefCompression: 1.354838709677419imageCompression: 1.000000000000000imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408coefCompression_llu: 4608780470280697261imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 56OutImageTileYSize: 2KernelsPerCore: 6 kernelDDRReadBW: 7571InImageDDrReadBW: 0ReadBW: 7699WriteBW: 0CycleCount: 24949  ===========================**********Show Perf********===========================layer_id:5 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:2upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:4 upstream_layer_name:ConvolutionReluPoolingLayer2)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_POOLING (downstream_layer_id:1 downstream_layer_name:PoolingLayer2)NumUsedNNCores: 6ConvOutFIFODepth: 168 OrigInImageX: 111OrigInImageY: 111OrigInImageZ: 32NNOutImageX: 109 (sub: 109)NNOutImageY: 25 (sub: 25)NNOutImageZ: 32 (sub: 32)FinalOutImageX: 109FinalOutImageY: 109FinalOutImageZ: 32KernelX: 3KernelY: 3KernelZ: 32PoolingSize: 1PoolingStride: 1InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 11113kernelSize: 11648SrcBuf: VIP_SRAMDstBuf: DDRKernelBuf: VIP_SRAMKernelCacheMode=VXNNE_SRAM_CACHE_MODE_STREAM_CACHEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 1.000000000000000coefCompression: 0.965753424657534imageCompression: 1.000000000000000imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408coefCompression_llu: 4606873953072115319imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 55OutImageTileYSize: 2KernelsPerCore: 6 kernelDDRReadBW: 10564InImageDDrReadBW: 0ReadBW: 10692WriteBW: 104716CycleCount: 32561  ===========================**********Show Perf********===========================layer_id:1 layer_name:PoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_POOLING operation_target:VXNNE_OPERATION_TARGET_TPabs_op_id:3upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:5 upstream_layer_name:ConvolutionReluPoolingLayer2)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:6 downstream_layer_name:ConvolutionReluPoolingLayer2)InImageX: 109 (sub: 109)InImageY: 109 (sub: 109)InImageZ: 32 (sub: 32)OutImageX: 54OutImageY: 54OutImageZ: 32KernelX: 1KernelY: 1KernelZ: 32PoolingSize: 2PoolingStride: 2InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 0kernelSize: 0SrcBuf: DDRDstBuf: VIP_SRAMKernelBuf: DDRKernelCacheMode=VXNNE_SRAM_CACHE_MODE_NONEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0 kernelDDRReadBW: 0InImageDDrReadBW: 380192ReadBW: 380320WriteBW: 0CycleCount: 129138  ===========================**********Show Perf********===========================layer_id:6 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:4upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_POOLING (upstream_layer_id:1 upstream_layer_name:PoolingLayer2)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_CONVOLUTION (downstream_layer_id:7 downstream_layer_name:ConvolutionReluPoolingLayer2)NumUsedNNCores: 6ConvOutFIFODepth: 168 OrigInImageX: 54OrigInImageY: 54OrigInImageZ: 32NNOutImageX: 52 (sub: 52)NNOutImageY: 52 (sub: 52)NNOutImageZ: 64 (sub: 64)FinalOutImageX: 26FinalOutImageY: 26FinalOutImageZ: 64KernelX: 3KernelY: 3KernelZ: 32PoolingSize: 2PoolingStride: 2InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 19841kernelSize: 20608SrcBuf: VIP_SRAMDstBuf: VIP_SRAMKernelBuf: DDRKernelCacheMode=VXNNE_SRAM_CACHE_MODE_FULL_CACHEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 1.000000000000000coefCompression: 0.934931506849315imageCompression: 1.000000000000000imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182418800017408coefCompression_llu: 4606596333917003439imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 52OutImageTileYSize: 6KernelsPerCore: 4 kernelDDRReadBW: 17809InImageDDrReadBW: 0ReadBW: 17937WriteBW: 0CycleCount: 47726  ===========================**********Show Perf********===========================layer_id:7 layer_name:ConvolutionReluPoolingLayer2operation_id:0 operation_name:VXNNE_OPERATOR_CONVOLUTION operation_target:VXNNE_OPERATION_TARGET_NNabs_op_id:5upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:6 upstream_layer_name:ConvolutionReluPoolingLayer2)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_TENSOR_TRANS (downstream_layer_id:2 downstream_layer_name:TensorTranspose)NumUsedNNCores: 6ConvOutFIFODepth: 168 OrigInImageX: 26OrigInImageY: 26OrigInImageZ: 64NNOutImageX: 24 (sub: 24)NNOutImageY: 24 (sub: 24)NNOutImageZ: 128 (sub: 128)FinalOutImageX: 12FinalOutImageY: 12FinalOutImageZ: 128KernelX: 3KernelY: 3KernelZ: 64PoolingSize: 2PoolingStride: 2InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 76726kernelSize: 79744SrcBuf: VIP_SRAMDstBuf: VIP_SRAMKernelBuf: DDRKernelCacheMode=VXNNE_SRAM_CACHE_MODE_FULL_CACHEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 0.999959309895833coefCompression: 0.897413793103448imageCompression: 1.000000000000000imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607182052296141483coefCompression_llu: 4606258404393712082imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 OutImageTileXSize: 24OutImageTileYSize: 16KernelsPerCore: 5 kernelDDRReadBW: 66293InImageDDrReadBW: 0ReadBW: 66421WriteBW: 0CycleCount: 40241  ===========================**********Show Perf********===========================layer_id:2 layer_name:TensorTransposeoperation_id:0 operation_name:VXNNE_OPERATOR_TENSOR_TRANS operation_target:VXNNE_OPERATION_TARGET_TPabs_op_id:6upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_CONVOLUTION (upstream_layer_id:7 upstream_layer_name:ConvolutionReluPoolingLayer2)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_FULLYCONNECTED (downstream_layer_id:8 downstream_layer_name:FullyConnectedReluLayer)InImageX: 12InImageY: 12InImageZ: 128OutImageX: 128 (sub: 128)OutImageY: 12 (sub: 12)OutImageZ: 12 (sub: 12)KernelX: 1KernelY: 1KernelZ: 128PoolingSize: 1PoolingStride: 1InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 0kernelSize: 0SrcBuf: VIP_SRAMDstBuf: VIP_SRAMKernelBuf: DDRKernelCacheMode=VXNNE_SRAM_CACHE_MODE_NONEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0 kernelDDRReadBW: 0InImageDDrReadBW: 0ReadBW: 128WriteBW: 0CycleCount: 11879  ===========================**********Show Perf********===========================layer_id:8 layer_name:FullyConnectedReluLayeroperation_id:0 operation_name:VXNNE_OPERATOR_FULLYCONNECTED operation_target:VXNNE_OPERATION_TARGET_TPabs_op_id:7upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_TENSOR_TRANS (upstream_layer_id:2 upstream_layer_name:TensorTranspose)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_FULLYCONNECTED (downstream_layer_id:9 downstream_layer_name:FullyConnectedReluLayer)InImageX: 1InImageY: 1InImageZ: 18432OutImageX: 1 (sub: 1)OutImageY: 1 (sub: 1)OutImageZ: 256 (sub: 256)KernelX: 1KernelY: 1KernelZ: 18432PoolingSize: 1PoolingStride: 1InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 7078638kernelSize: 0SrcBuf: VIP_SRAMDstBuf: VIP_SRAMKernelBuf: DDRKernelCacheMode=VXNNE_SRAM_CACHE_MODE_NONEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 0.972156100802951coefCompression: 1.493328270774571imageCompression: 1.000000000000000imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4606931623251920668coefCompression_llu: 4609404171816449099imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 kernelDDRReadBW: 2113922InImageDDrReadBW: 0ReadBW: 2114050WriteBW: 0CycleCount: 558736  ===========================**********Show Perf********===========================layer_id:9 layer_name:FullyConnectedReluLayeroperation_id:0 operation_name:VXNNE_OPERATOR_FULLYCONNECTED operation_target:VXNNE_OPERATION_TARGET_TPabs_op_id:8upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_FULLYCONNECTED (upstream_layer_id:8 upstream_layer_name:FullyConnectedReluLayer)downstream_layer_num:1 downstream_opertaion_num:10) downstream_operation_id:0 downstream_operation_name:VXNNE_OPERATOR_SOFTMAX (downstream_layer_id:3 downstream_layer_name:Softmax2Layer)InImageX: 1InImageY: 1InImageZ: 256OutImageX: 1 (sub: 1)OutImageY: 1 (sub: 1)OutImageZ: 6 (sub: 6)KernelX: 1KernelY: 1KernelZ: 256PoolingSize: 1PoolingStride: 1InputDataSize: 8OutputDataSize: 8FP16: 0archModel_kernelSize: 0kernelSize: 0SrcBuf: VIP_SRAMDstBuf: DDRKernelBuf: DDRKernelCacheMode=VXNNE_SRAM_CACHE_MODE_NONEImageCacheMode=VXNNE_SRAM_CACHE_MODE_NONExOffset: 0, yOffset: 0coefNonZeroRatio: 0.994791666666667coefCompression: 32.615384615384613imageCompression: 1.000000000000000imageNonZeroRatio: 0.300000000000000 coefNonZeroRatio__llu: 4607135506303898965coefCompression_llu: 4629787024622011628imageCompression_llu: 4607182418800017408imageNonZeroRatio_llu: 4599075939470750515 kernelDDRReadBW: 15029InImageDDrReadBW: 0ReadBW: 15157WriteBW: 6CycleCount: 6397  ===========================**********Show Perf********===========================layer_id:3 layer_name:Softmax2Layeroperation_id:0 operation_name:VXNNE_OPERATOR_SOFTMAX operation_target:VXNNE_OPERATION_TARGET_SHabs_op_id:9upstream_layer_num:1 upstream_opertaion_num:10) upstream_operation_id:0 uptream_operation_name:VXNNE_OPERATOR_FULLYCONNECTED (upstream_layer_id:9 upstream_layer_name:FullyConnectedReluLayer)downstream_layer_num:0 downstream_opertaion_num:0prev_ptrs = 0xffffa369c040 Warning: swapHandel, CMD changed  NN/TP: pre_physical:0x1FE2C040, new_physical:0x1FE2C040 layer id: 0 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP.execution time: 290 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 77 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 63 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 80 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 80 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 80 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 74 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 76 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 84 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 76 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 73 uslayer id: 1 layer name:PoolingLayer2 operation[0]:VXNNE_OPERATOR_POOLING target:VXNNE_OPERATION_TARGET_TP.execution time: 209 uslayer id: 6 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 140 uslayer id: 7 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 102 uslayer id: 2 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP.execution time: 101 uslayer id: 8 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP.execution time: 469 uslayer id: 9 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP.execution time: 54 uslayer id: 3 layer name:Softmax2Layer operation[0]:VXNNE_OPERATOR_SOFTMAX target:VXNNE_OPERATION_TARGET_SH.execution time: 187 usWarmup time: 3602.98 msOriginal image size: 600x600x3Cropped image size: 600x600x3Resized image size: 224x224x3Input tensor index: 14Input tensor name: conv2d_inputSelected order of channels: RGBSelected pixel values range: NAFilling time: 0.195005 msprev_ptrs = 0xffffa369c040 Warning: swapHandel, CMD changed  NN/TP: pre_physical:0x1FE2C040, new_physical:0x1FE2C040 layer id: 0 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP.execution time: 286 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 77 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 59 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 78 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 74 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 81 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 72 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 74 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 73 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 74 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 88 uslayer id: 1 layer name:PoolingLayer2 operation[0]:VXNNE_OPERATOR_POOLING target:VXNNE_OPERATION_TARGET_TP.execution time: 200 uslayer id: 6 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 105 uslayer id: 7 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 88 uslayer id: 2 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP.execution time: 82 uslayer id: 8 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP.execution time: 154 uslayer id: 9 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP.execution time: 48 uslayer id: 3 layer name:Softmax2Layer operation[0]:VXNNE_OPERATOR_SOFTMAX target:VXNNE_OPERATION_TARGET_SH.execution time: 131 usInference time 1: 2.49207 msprev_ptrs = 0xffffa369c040 Warning: swapHandel, CMD changed  TBDNN/TP: pre_physical:0x1FE2C040, new_physical:0x1FE2C040 layer id: 0 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP.execution time: 240 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 74 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 57 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 87 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 81 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 80 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 78 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 81 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 86 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 77 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 73 uslayer id: 1 layer name:PoolingLayer2 operation[0]:VXNNE_OPERATOR_POOLING target:VXNNE_OPERATION_TARGET_TP.execution time: 209 uslayer id: 6 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 108 uslayer id: 7 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 90 uslayer id: 2 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP.execution time: 84 uslayer id: 8 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP.execution time: 157 uslayer id: 9 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP.execution time: 48 uslayer id: 3 layer name:Softmax2Layer operation[0]:VXNNE_OPERATOR_SOFTMAX target:VXNNE_OPERATION_TARGET_SH.execution time: 136 usInference time 2: 2.47457 msprev_ptrs = 0xffffa369c040 Warning: swapHandel, CMD changed  NN/TP: pre_physical:0x1FE2C040, new_physical:0x1FE2C040 layer id: 0 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP.execution time: 254 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 69 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 60 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 82 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 77 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 77 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 73 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 76 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 73 uslayer id: 4 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 76 uslayer id: 5 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 73 uslayer id: 1 layer name:PoolingLayer2 operation[0]:VXNNE_OPERATOR_POOLING target:VXNNE_OPERATION_TARGET_TP.execution time: 210 uslayer id: 6 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 107 uslayer id: 7 layer name:ConvolutionReluPoolingLayer2 operation[0]:VXNNE_OPERATOR_CONVOLUTION target:VXNNE_OPERATION_TARGET_NN.execution time: 89 uslayer id: 2 layer name:TensorTranspose operation[0]:VXNNE_OPERATOR_TENSOR_TRANS target:VXNNE_OPERATION_TARGET_TP.execution time: 83 uslayer id: 8 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP.execution time: 155 uslayer id: 9 layer name:FullyConnectedReluLayer operation[0]:VXNNE_OPERATOR_FULLYCONNECTED target:VXNNE_OPERATION_TARGET_TP.execution time: 185 uslayer id: 3 layer name:Softmax2Layer operation[0]:VXNNE_OPERATOR_SOFTMAX target:VXNNE_OPERATION_TARGET_SH.execution time: 151 usInference time 3: 2.61483 msAverage inference time: 2.52716 msTotal prediction time: 2.72216 msOutput tensor index: 5Output tensor name: activation_5/SoftmaxTop results: 1 Red Appleprev_ptrs = 0xffffa369c040Exit VX Thread: 0xa3ee5fb0</pre>{
$ export CNN_PERF=1 NN_EXT_SHOW_PERF=1 VIV_VX_DEBUG_LEVEL=1 VIV_VX_PROFILE=1
89
edits