Changes

Jump to: navigation, search
VART-based application
[[File:Xilinx DSight.png|thumb|center|1000px|DSight visual performance analysis]]
===VART-based application=== As stated previously, this version of the application is functionally equivalent to the DNNDK-based one, but it makes use of the newer [https://github.com/Xilinx/Vitis-AI/blob/master/VART/README.md Vitis AI Runtime (VART) API]. 
The following dump shows the output of the application when processing the image file <code>red_apple_1.jpg</code>.
<pre>
====Profiling with Vitis AI Profiler====
Vitis-AI Profiler is an a powerful, application-level tool that could help to optimize the whole AI application. The main purpose of this tool is to help to detect bottlenecks of the whole AI application by profiling the pre-processing functions and the post-processing functions together with DPU kernels' running status.
There are two components of this tool named <code>vaitrace</code>, which runs on the target device and takes the responsibility for data collection, and <code>vaiprofiler</code>, which runs on a PC or local server and takes the responsibility for analysis and visualization of collected data.
</pre>
The developed application is profiled several times each one with a different number of threads. For all the profiling traces, the DPU throughput is provided, also along with some additional information, concerning the latency of the DPUs and the usage of both CPU and DPU cores. The inference is repeated for 10 times on the same image. It is possible to notice that the latency of the DPU_0 is higher than the latency of the DPU_1.
=====One thread=====In the figure below, the VART-based application uses 1 thread; the . The trace shows that the throughput is stable, almost at <code>around '''245''' fps.50 fps</code>The throughput is similar to the one achieved by the DNNDK-based application, but a little bit smaller. This is probably due to the fact that the VART API's are affected by a little bigger overhead.
[[File:Vaiprofiler 1 thread 10 runs.png|thumb|center|800px|Profiling VART based application, 1 thread only]]
|}
As expected, only one of the two DPU cores is actually leveraged.=====Two threads=====In the figure below, the VART-based application uses 2 threads; the . The trace shows that the throughput is stable, almost at <code>around '''442.40 ''' fps</code>.
[[File:Vaiprofiler 2 threads 10 runs.png|thumb|center|800px|Profiling VART based application, 2 threads]]
|}
As expected, profiling information indicates that both DPU's are used. At first approximation, the throughput is doubled with respect to the single thread application in accordance with the fact that the DPU cores work in parallel and the CPU cores are not saturated.
=====Four threads=====In the figure below, the VART-based application uses 4 threads; the . The trace shows that the throughput is stable, almost at <code>around ''' 818.20 ''' fps</code>.
[[File:Vaiprofiler 4 threads 10 runs.png|thumb|center|800px|Profiling VART based application, 4 threads]]
|}
Interestingly, having four threads—i.e. the same number of CPU cores—allows to furtherly increment the throughput by a factor of almost 2 while keeping the DPU cores occupation low.
=====Six threads=====In the figure below, the VART-based application uses 6 threads; the . The trace shows that the throughput is stable, almost at <code>around '''830.30 ''' fps</code>.
[[File:Vaiprofiler 6 threads 10 runs.png|thumb|center|800px|Profiling VART based application, 6 threads]]
| 20.84 %
|}
 
 
==Results==
 
It is possible to notice that the latency of the DPU_0 is higher than the latency of the DPU_1.
4,650
edits

Navigation menu