Open main menu

DAVE Developer's Wiki β

Changes

no edit summary
==== <big>Profiling model execution on NPU</big> ====
The following block shows the profiler log. "The log captures detailed information of the execution clock cycles and DDR data transmission in each layer". Note that the time needed for inference is longer than usual while the profiler overhead is added.<pre class="board-terminal">root@imx8mpevk:/mnt/ramdisk/image_classifier_eIQ_plus# build/image_classifier_cv 3 my_fruits_model_qatlegacy.tflite labels.txt testdata/red-apple1.jpg INFO: Created TensorFlow Lite delegate for NNAPI.#productname=VIPNano-D+I, pid=0x9fCreated VX Thread: 0xa3ee5fb0Applied NNAPI delegateprev_ptrs = 0xffffa369c040Can't support one shaderCoreCount!---------------------------Begin VerifyTiling -------------------------AXI-SRAM = 0 Bytes VIP-SRAM = 260096 Bytes SWTILING_PHASE_FEATURES[1, 1, 1] 0 TP [( 3 224 224 1, 150528, 0x0xaaaab1874580(0x0xaaaab1874580, 0x(nil)) -> 224 224 3 1, 150528, 0x0xaaaab187db10(0x0xaaaab187db10, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] C[ 1] 1 NN [( 224 224 3 1, 150528, 0x0xaaaab187db10(0x0xaaaab187db10, 0x(nil)) -> 111 111 32 1, 394272, 0x0xaaaab1881a90(0x0xaaaab1881a90, 0x(nil))) k(3 3 3, 1152) pad(0 0) pool(2 2, 2 2)] P[ 0] C[ 2] 2 NN [( 111 111 32 1, 394272, 0x0xaaaab1881a90(0x0xaaaab1881a90, 0x(nil)) -> 109 109 32 1, 380192, 0x0xaaaab1884270(0x0xaaaab1884270, 0x(nil))) k(3 3 32, 9984) pad(0 0) pool(0 0, 1 1)] P[ 1] C[ 3] 3 TP [( 109 109 32 1, 380192, 0x0xaaaab1884270(0x0xaaaab1884270, 0x(nil)) -> 54 54 32 1, 93312, 0x0xaaaab1887410(0x0xaaaab1887410, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(2 2, 2 2)] P[ 2] C[ 4] 4 NN [( 54 54 32 1, 93312, 0x0xaaaab1887410(0x0xaaaab1887410, 0x(nil)) -> 26 26 64 1, 43264, 0x0xaaaab188cd90(0x0xaaaab188cd90, 0x(nil))) k(3 3 32, 19968) pad(0 0) pool(2 2, 2 2)] P[ 3] C[ 5] 5 NN [( 26 26 64 1, 43264, 0x0xaaaab188cd90(0x0xaaaab188cd90, 0x(nil)) -> 12 12 128 1, 18432, 0x0xaaaab1892710(0x0xaaaab1892710, 0x(nil))) k(3 3 64, 79616) pad(0 0) pool(2 2, 2 2)] P[ 4] C[ 6] 6 TP [( 12 12 128 1, 18432, 0x0xaaaab1892710(0x0xaaaab1892710, 0x(nil)) -> 128 12 12 1, 18432, 0x0xaaaab1894ef0(0x0xaaaab1894ef0, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] P[ 5] C[ 7] 7 TP [(18432 1 1 1, 18432, 0x0xaaaab1894ef0(0x0xaaaab1894ef0, 0x(nil)) -> 256 1 1 1, 256, 0x0xaaaab18965b0(0x0xaaaab18965b0, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] P[ 6] C[ 8] 8 TP [( 256 1 1 1, 256, 0x0xaaaab18965b0(0x0xaaaab18965b0, 0x(nil)) -> 6 1 1 1, 6, 0x0xaaaab1897c10(0x0xaaaab1897c10, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] P[ 7] C[ 9] 9 SH [( 6 1 1 1, 6, 0x0xaaaab1897c10(0x0xaaaab1897c10, 0x(nil)) -> 6 1 1 1, 6, 0x0xaaaab187a200(0x0xaaaab187a200, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] P[ 8] Detected SegmentsAB_VS (0 - 1)TL_VS (1 - 2)AB_VS (3 - 8)======================== Block [0 - 2] ============================== 0 TP DD -> VS [( 150528, 150528), IC( 0), KC( 0)] 1 NN VS -> VS [( 150528, 96000), IC( 0), KC( 1408)] 2 NN VS -> DD [( 96000, 0), IC( 0), KC( 11648)]------------------------------------------------------------------Segment AB (0 - 0)------------------------The following block shows the profiler log. "The log captures detailed information of the execution clock cycles and DDR data transmission in each layer". Note that the time needed for inference is longer than usual while the profiler overhead is added. '''root@imx8mpevk:/mnt/ramdisk/image_classifier_eIQ_plus# build/image_classifier_cv 3 my_fruits_model_qatlegacy.tflite labels.txt testdata/red-apple1.jpg INFO: Created TensorFlow Lite delegate for NNAPI.''' #productname=VIPNano-D+I, pid=0x9f Created VX Thread: 0xa3ee5fb0 '''Applied NNAPI delegate''' prev_ptrs = 0xffffa369c040 Can't support one shaderCoreCount! ---------------------------Begin VerifyTiling ------------------------- AXI-SRAM = 0 Bytes VIP-SRAM = 260096 Bytes SWTILING_PHASE_FEATURES[1, 1, 1] 0 TP [( 3 224 224 1, 150528, 0x0xaaaab1874580(0x0xaaaab1874580, 0x(nil)) -> 224 224 3 1, 150528, 0x0xaaaab187db10(0x0xaaaab187db10, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] C[ 1] 1 NN [( 224 224 3 1, 150528, 0x0xaaaab187db10(0x0xaaaab187db10, 0x(nil)) -> 111 111 32 1, 394272, 0x0xaaaab1881a90(0x0xaaaab1881a90, 0x(nil))) k(3 3 3, 1152) pad(0 0) pool(2 2, 2 2)] P[ 0] C[ 2] 2 NN [( 111 111 32 1, 394272, 0x0xaaaab1881a90(0x0xaaaab1881a90, 0x(nil)) -> 109 109 32 1, 380192, 0x0xaaaab1884270(0x0xaaaab1884270, 0x(nil))) k(3 3 32, 9984) pad(0 0) pool(0 0, 1 1)] P[ 1] C[ 3] 3 TP [( 109 109 32 1, 380192, 0x0xaaaab1884270(0x0xaaaab1884270, 0x(nil)) -> 54 54 32 1, 93312, 0x0xaaaab1887410(0x0xaaaab1887410, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(2 2, 2 2)] P[ 2] C[ 4] 4 NN [( 54 54 32 1, 93312, 0x0xaaaab1887410(0x0xaaaab1887410, 0x(nil)) -> 26 26 64 1, 43264, 0x0xaaaab188cd90(0x0xaaaab188cd90, 0x(nil))) k(3 3 32, 19968) pad(0 0) pool(2 2, 2 2)] P[ 3] C[ 5] 5 NN [( 26 26 64 1, 43264, 0x0xaaaab188cd90(0x0xaaaab188cd90, 0x(nil)) -> 12 12 128 1, 18432, 0x0xaaaab1892710(0x0xaaaab1892710, 0x(nil))) k(3 3 64, 79616) pad(0 0) pool(2 2, 2 2)] P[ 4] C[ 6] 6 TP [( 12 12 128 1, 18432, 0x0xaaaab1892710(0x0xaaaab1892710, 0x(nil)) -> 128 12 12 1, 18432, 0x0xaaaab1894ef0(0x0xaaaab1894ef0, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] P[ 5] C[ 7] 7 TP [(18432 1 1 1, 18432, 0x0xaaaab1894ef0(0x0xaaaab1894ef0, 0x(nil)) -> 256 1 1 1, 256, 0x0xaaaab18965b0(0x0xaaaab18965b0, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] P[ 6] C[ 8] 8 TP [( 256 1 1 1, 256, 0x0xaaaab18965b0(0x0xaaaab18965b0, 0x(nil)) -> 6 1 1 1, 6, 0x0xaaaab1897c10(0x0xaaaab1897c10, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] P[ 7] C[ 9] 9 SH [( 6 1 1 1, 6, 0x0xaaaab1897c10(0x0xaaaab1897c10, 0x(nil)) -> 6 1 1 1, 6, 0x0xaaaab187a200(0x0xaaaab187a200, 0x(nil))) k(0 0 0, 0) pad(0 0) pool(0 0, 1 1)] P[ 8] Detected Segments AB_VS (0 - 1) TL_VS (1 - 2) AB_VS (3 - 8) ======================== Block [0 - 2] ============================== 0 TP DD -> VS [( 150528, 150528), IC( 0), KC( 0)] 1 NN VS -> VS [( 150528, 96000), IC( 0), KC( 1408)] 2 NN VS -> DD [( 96000, 0), IC( 0), KC( 11648)] ------------------------------------------------------------------ Segment AB (0 - 0) ----
$ export CNN_PERF=1 NN_EXT_SHOW_PERF=1 VIV_VX_DEBUG_LEVEL=1 VIV_VX_PROFILE=1
89
edits