Changes

← Older edit

ML-TN-001 - AI at the edge: comparison of different embedded platforms - Part 3

864 bytes added, 13:26, 5 January 2021

no edit summary

|For more details, please refer to the following sections.

|}

The target was configured in order to leverage the hardware acceleration provided by the [https://www.xilinx.com/products/intellectual-property/dpu.html Xilinx Deep Learning Processor Unit (DPU)], which is an IP instantiated in the Programmable Logic (PL) as depicted in the following block diagram.

==Building the application==

The starting point for the application is the ~~model—in the form of a TensorFlow protobuf file (.pb)—described~~ model described [[ML-TN-001_-_AI_at_the_edge:_comparison_of_different_embedded_platforms_-_Part_1#Reference_application_.231:_fruit_classifier|here]]. Incidentally, ~~this is~~ the '''same''' ~~protobuf file~~ model structure was used as starting point for [[ML-TN-001_-_AI_at_the_edge:_comparison_of_different_embedded_platforms_-_Part_2|this other test]] as well(*). This makes the comparison of the two tests straightforward, even though they were run on SoC's that differ significantly from the architectural standpoint.

(*) The two models share the same structure but, as they are trained independently, their weights differ.

===Training the model===

Model training is performed with the help of the Docker container provided by Vitis AI.

In order to have reproducible and reliable results, some measures were taken:

* The inference was repeated several times and the average execution time was computed

* All the files required to run the test—the executable, the image files, etc.—are stored on a [https://www.jamescoyle.net/how-to/943-create-a-ram-disk-in-linux tmpfs RAM disk ] in order to make file system/storage medium overhead neglectable.

Two new C++ applications were developed for the trained, optimized, and compiled neural network model as illustrated in the steps above:

</pre>

Within the scope of this TN, the most relevant time is ''[DPU tot time]'', which indicates the time spent to execute the inference (~3.7ms). This leads to a throughput of about 271 fps.

====Fine grained profiling using DNNDK low level API====

[[File:Vaiprofiler 1 thread 10 runs.png|thumb|center|800px|Profiling VART based application, 1 thread only]]