Open main menu

DAVE Developer's Wiki β

Changes

no edit summary
</pre>
The developed application is profiled several times each one with a different number of threads. For all the traces, the DPU throughput is provided, also with some additional information, concerning the latency of the DPUs and the usage of both CPU and DPU cores. The inference is repeated for 10 times on the same image. It is possible to notice that the latency of the DPU_0 is higher than the latency of the DPU_1.
Profiling CPU and DPU using In the figure below, the VART APIs, with two single -based application uses 1 thread tasks; the inference trace shows that the throughput is firstly performed over 1 test image and then over 1 custom image:stable, almost at <code>245.50 fps</code>.
[[File:Vaiprofiler1 thread 10 runs.png|thumb|center|1000px800px|Vitis Ai ProfilerProfiling VART based application, 1 thread only]]
Profiling {| class="wikitable" style="margin: auto;"|+Trace information|-! Item! Value|- style="font-weight:bold;"| DPU_1 Latency| style="font-weight:normal;" | |-| custom_cnn_0| 1514.05 us|- style="font-weight:bold;"| Utilization| style="font-weight:normal;" | |-| CPU-00| 15.90 %|-| CPU and -01| 23.74 %|-| CPU-02| 1.12 %|-| CPU-03| 1.15 %|-| DPU using VART APIs, with two single thread tasks; the inference is firstly performed over 180 test images and then over 24 custom images:-01| 18.72 %|}
TBD ADD TRACE HERE WITH THROUGHPUT
In the figure below, the VART-based application uses 2 threads; the trace shows that the throughput is stable, almost at <code>442.40 fps</code>.
[[File:Vaiprofiler 2 threads 10 runs.png|thumb|center|800px|Profiling CPU and DPU using VART APIsbased application, two tasks each one with 4 2 threads; the inference is firstly performed over 180 test images and then 24 custom images:]]
TBD ADD TRACE HERE WITH THROUGHPUT{| class="wikitable" style="margin: auto;"|+Trace information|-! Item! Value|- style="font-weight:bold;"| DPU_0 Latency| style="font-weight:normal;" | |-| custom_cnn_0| 2085.12 us|- style="font-weight:bold;"| DPU_1 Latency| style="font-weight:normal;" | |-| custom_cnn_0| 1648.66 us|- style="font-weight:bold;"| Utilization| style="font-weight:normal;" | |-| CPU-00| 2.84 %|-| CPU-01| 10.56 %|-| CPU-02| 30.00 %|-| CPU-03| 19.14 %|-| DPU-00| 19.02 %|-| DPU-01| 13.24 %|}  In the figure below, the VART-based application uses 4 threads; the trace shows that the throughput is stable, almost at <code>818.20 fps</code>. [[File:Vaiprofiler 4 threads 10 runs.png|thumb|center|800px|Profiling VART based application, 4 threads]]  {| class="wikitable" style="margin: auto;"|+Trace information|-! Item! Value|- style="font-weight:bold;"| DPU_0 Latency| style="font-weight:normal;" | |-| custom_cnn_0| 2111.89 us|- style="font-weight:bold;"| DPU_1 Latency| style="font-weight:normal;" | |-| custom_cnn_0| 1679.56 us|- style="font-weight:bold;"| Utilization| style="font-weight:normal;" | |-| CPU-00| 20.05 %|-| CPU-01| 18.56 %|-| CPU-02| 19.26 %|-| CPU-03| 22.21 %|-| DPU-00| 23.95 %|-| DPU-01| 16.96 %|}  In the figure below, the VART-based application uses 6 threads; the trace shows that the throughput is stable, almost at <code>830.30 fps</code>. [[File:Vaiprofiler 6 threads 10 runs.png|thumb|center|800px|Profiling VART based application, 6 threads]]  {| class="wikitable" style="margin: auto;"|+Trace information|-! Item! Value|- style="font-weight:bold;"| DPU_0 Latency| style="font-weight:normal;" | |-| custom_cnn_0| 2305.08 us|- style="font-weight:bold;"| DPU_1 Latency| style="font-weight:normal;" | |-| custom_cnn_0| 1856.95 us|- style="font-weight:bold;"| Utilization| style="font-weight:normal;" | |-| CPU-00| 20.36 %|-| CPU-01| 19.88 %|-| CPU-02| 22.71 %|-| CPU-03| 19.21 %|-| DPU-00| 22.87 %|-| DPU-01| 20.84 %|}
dave_user
207
edits