Changes

Jump to: navigation, search
Results
|For more details, please refer to the following sections.
|}
 
The target was configured in order to leverage the hardware acceleration provided by the [https://www.xilinx.com/products/intellectual-property/dpu.html Xilinx Deep Learning Processor Unit (DPU)], which is an IP instantiated in the Programmable Logic (PL) as depicted in the following block diagram.
[[File:Vaiprofiler 1 thread 10 runs.png|thumb|center|800px|Profiling VART based application, 1 thread only]]
 
{| class="wikitable" style="margin: auto;"
As expected, only one of the two DPU cores is actually leveraged.
=====Two threads=====
In the figure below, the VART-based application uses 2 threads. The trace shows that the throughput is stable, around '''442''' fps</code>.
[[File:Vaiprofiler 2 threads 10 runs.png|thumb|center|800px|Profiling VART based application, 2 threads]]
 
{| class="wikitable" style="margin: auto;"
[[File:Vaiprofiler 4 threads 10 runs.png|thumb|center|800px|Profiling VART based application, 4 threads]]
 
{| class="wikitable" style="margin: auto;"
[[File:Vaiprofiler 6 threads 10 runs.png|thumb|center|800px|Profiling VART based application, 6 threads]]
 
{| class="wikitable" style="margin: auto;"
|}
==Results==
 
In the following table are summirized the achieved throughput for all the tes
{| class="wikitable"|+!API!Number of threads!Throughput[fps]|-|DNNDK|1||-| rowspan=Results=="4" |VART|1||-|2||-|4||-|6||}
It is possible to notice that the latency of the DPU_0 is higher than the latency of the DPU_1.
4,650
edits

Navigation menu