Open main menu

DAVE Developer's Wiki β

Changes

no edit summary
=== Version 2 ===
The version 1 application was then modified to accelerate the inference using the NPU (ML module) of the i.MX8M Plus Soc. This is possible because "the TensorFlow Lite library uses the Android NN API driver implementation from the GPU/ML module driver for running inference using the GPU/ML module".
Neither the floating-point nor the half-quantized models work in NPU (ML module). Moreover, "the GPU/ML module driver does not support per-channel quantization yet. Therefore post-training quantization of models with TensorFlow v2 cannot be used if the model is supposed to run on the GPU/ML module (inference on CPU does not have this limitation). TensorFlow v1 quantization-aware training and model conversion is recommended in this case".
So, only the fully-quantized model was tested with the version 2 application.
=== Version 3 ===
A new C++ application was written to apply the inference to the frames captured from an image sensor ([https://cdn.sparkfun.com/datasheets/Sensors/LightImaging/OV5640_datasheet.pdf OV5640]) instead of images retrieved from files. Like version 2version2, inference run on NPU, so only the fully-quantized model was tested with the version 3 application. Note that in this case, the framerate is capped
== Running the applications ==
* All the files required to run the test—the executable, the image files, etc.—are stored on a [https://www.jamescoyle.net/how-to/943-create-a-ram-disk-in-linux tmpfs RAM disk] in order to make file system/storage medium overhead neglectable.
=== <big>Version 1 </big> ===
The following sections detail the execution of the first version of the classifier on the embedded platform. The number of threads was also tweaked in order to test different configurations. During the execution, the well-know <code>[https://en.wikipedia.org/wiki/Htop htop]</code> utility was used to monitor the system. This tool is very convenient to get some useful information such as cores allocation, processor load, and number of running threads.
=== Floating-point model ===<pre class="board-terminal">TBD</pre> ==== Tweaking the number of threads ====The following screenshots show the system status while executing the application with different values of the thread parameter. [IMAGE "Thread parameter unspecified"] [IMAGE "Thread parameter set to 1"] [IMAGE "Thread parameter set to 2"] == Floating= Half-point quantized model ===<pre class="board-terminal">TBD</pre>The following screenshot shows the system status while executing the application. In this case, the thread parameter was unspecified. [IMAGE "Thread parameter unspecified"] === Fully-quantized model ===
<pre class="board-terminal">
TBD
</pre>
==== Tweaking the number of threads ====
The following screenshots show the system status while executing the application with different values of the thread parameter.
 
[IMAGE "Thread parameter unspecified"]
 
[IMAGE "Thread parameter set to 4"]
=== <big>Version 2 </big> ===
"The first execution of model inference using the NN API always takes many times longer, because of model graph initialization needed by the GPU/ML module"
=== <big>Version 3 </big> ===
== Results ==
89
edits