Changes

ML-TN-001 - AI at the edge: comparison of different embedded platforms - Part 3

203 bytes added, 15:39, 14 October 2020

→‎Results

|For more details, please refer to the following sections.

|}

The target was configured in order to leverage the hardware acceleration provided by the [https://www.xilinx.com/products/intellectual-property/dpu.html Xilinx Deep Learning Processor Unit (DPU)], which is an IP instantiated in the Programmable Logic (PL) as depicted in the following block diagram.

[[File:Vaiprofiler 1 thread 10 runs.png|thumb|center|800px|Profiling VART based application, 1 thread only]]

{| class="wikitable" style="margin: auto;"

As expected, only one of the two DPU cores is actually leveraged.

=====Two threads=====

In the figure below, the VART-based application uses 2 threads. The trace shows that the throughput is stable, around '''442''' fps~~</code>~~.

[[File:Vaiprofiler 2 threads 10 runs.png|thumb|center|800px|Profiling VART based application, 2 threads]]

{| class="wikitable" style="margin: auto;"

[[File:Vaiprofiler 4 threads 10 runs.png|thumb|center|800px|Profiling VART based application, 4 threads]]

{| class="wikitable" style="margin: auto;"

[[File:Vaiprofiler 6 threads 10 runs.png|thumb|center|800px|Profiling VART based application, 6 threads]]

{| class="wikitable" style="margin: auto;"

|}

==Results==

In the following table are summirized the achieved throughput for all the tes

{| class="wikitable"|+!API!Number of threads!Throughput[fps]|-|DNNDK|1||-| rowspan=~~Results==~~"4" |VART|1||-|2||-|4||-|6||}

It is possible to notice that the latency of the DPU_0 is higher than the latency of the DPU_1.

U0001

Bureaucrats, dave_user, Administrators

4,650

edits

Changes

ML-TN-001 - AI at the edge: comparison of different embedded platforms - Part 3

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Quick Links

Contact us

How to use wiki

Advanced Search

Tools