Changes

ML-TN-001 - AI at the edge: comparison of different embedded platforms - Part 3

1,237 bytes added, 16:17, 12 October 2020

no edit summary

* All the files required to run the test—the executable, the image files, etc.—are stored on a tmpfs RAM disk in order to make file system/storage medium overhead neglectable.

Two new C++ applications were developed for the trained, optimized and compiled neural network model as illustrated in the steps above. The first application uses the old DNNDK low-level APIs for loading the DPU kernel, creating the DPU task and preparing the input-output tensors for the inference. Two possible profiling strategies are available depending on the chosen DPU mode when compiling the kernel (normal or profile): a coarse grained profiling, that shows the execution time for all the main tasks executed on the CPU and on the DPU and a fine grained profiling, that shows detailed information about all the nodes of the model, such as the workload, the memory occupation and the runtime. Instead, the second application is a multi-thread application that uses the VART high level APIs for retrieving the computational subgraph from the DPU kernel and for performing the inference. In this case, it is possible to split the entire workload on multiple concurrent threads, assigning each one a batch of images. Both applications use the opencv library for cropping and resize the input images, in order to match the model's input tensor shape, and display the results of the inference (i.e. the probability for each class) for each image.

Before illustrating the results ~~of inference~~ by running the C++ applications it can be interesting checking some information about the DPU and the DPU kernel elf file. This can be done, with DExplorer and DDump tools.

===DExplorer===

U0019

dave_user

207

edits

DAVE Developer's Wiki β

Changes

ML-TN-001 - AI at the edge: comparison of different embedded platforms - Part 3

DAVE Developer's Wiki ^β