Changes

Jump to: navigation, search
no edit summary
Specifically, the following versions of the application were tested:
* Version 1: This version is the same described in [[ML-TN-001 - AI at the edge: comparison of different embedded platforms - Part 2|this article]]. As such, inference is implemented in software and is applied to images retrieved from files.
* Version 2: This version is functionally equivalent to the version 1, but it leverages the Neural Processing Unit (NPU) to hardware accelerate the inference.
* Version 3: This is like version 3, but the inference is applied to the frames captured live from an image sensor.
=== Test Bed Testbed ===
The kernel and the root file system of the tested platform were built with the L5.4.24_2.1.0 release of the Yocto Board Support Package (BSP) for i.MX 8 family of devices. They were built with support for [https://www.nxp.com/design/software/development-software/eiq-ml-development-environment:EIQ eIQ]: "a collection of software and development tools for NXP microprocessors and microcontrollers to do inference of neural network models on embedded systems".
The following table details the relevant specs of the test bedtestbed.
{| class="wikitable" style="margin: auto;"
== Model deployment and inference applications ==
=== Version 1 ===
The C++ application previously used and described [https://wiki.dave.eu/index.php/ML-TN-001_-_AI_at_the_edge:_comparison_of_different_embedded_platforms_-_Part_2#Model_deployment_and_inference_application here] was adapted to work with the new NXP Linux BSP release. Now it uses OpenCV 4.2.0 to pre-process the input image and TensorFlow Lite (TFL) 2.1 as inference engine. It still supports all the 3 TFL models previously tested on the [https[://wiki.dave.eu/index.php?title=Category:Mito8M&action=edit&redlink=1 |Mito8M SoM]]:* 32-bit floating-point model;* half-quantized model (post-training 8-bit quantization of the weights only);
* fully-quantized model (TensorFlow v1 quantization-aware training and 8-bit quantization of the weights and activations).
=== Version 2 ===
The version 1 application was then modified to accelerate the inference using the NPU (ML module) of the [https://www.nxp.com/products/processors-and-microcontrollers/arm-processors/i-mx-applications-processors/i-mx-8-processors/i-mx-8m-plus-arm-cortex-a53-machine-learning-vision-multimedia-and-industrial-iot:IMX8MPLUS i.MX8M Plus Soc] SoC. This is possible because "''the TensorFlow Lite library uses the Android NN API driver implementation from the GPU/ML module driver for running inference using the GPU/ML module"''.
Neither the floating-point nor the half-quantized models work in with the NPU, however. Moreover, "''the GPU/ML module driver does not support per-channel quantization yet. Therefore post-training quantization of models with TensorFlow v2 cannot be used if the model is supposed to run on the GPU/ML module (inference on CPU does not have this limitation). TensorFlow v1 quantization-aware training and model conversion is recommended in this case"''.
Therefore, only the fully-quantized model was tested with this version of the version 2 application.
=== Version 3 ===
A new C++ application was written to apply the inference to the frames captured from the image sensor ([https://cdn.sparkfun.com/datasheets/Sensors/LightImaging/OV5640_datasheet.pdf OV5640]) of a [https://www.nxp.com/part/MINISASTOCSI#/ camera module], instead of images retrieved from files. It This version uses OpenCV 4.2.0 to control the camera and to pre-process the frames. Like version 2, inference run runs on NPU, so only the fully-quantized model was tested with the version 3 application.
== Running the applications ==
* All the files required to run the test—the executable, the image files, etc.—are stored on a [https://www.jamescoyle.net/how-to/943-create-a-ram-disk-in-linux tmpfs RAM disk] in order to make file system/storage medium overhead neglectable.
=== <big>Version 1</big> ===The following sections detail the execution of the first version of the classifier on the embedded platform. The number of threads was also tweaked in order to test different configurations. During the execution, the well-know known <code>[https://en.wikipedia.org/wiki/Htop htop]</code> utility was used to monitor the system. This tool is very convenient to get some useful information such as cores allocation, processor load, and number of running threads.
==== <big>Floating-point model</big> ====
<pre class="board-terminal">
root@imx8mpevk:/mnt/ramdisk/image_classifier_eIQ_plus# build/image_classifier_cv 2 my_converted_model.tflite labels.txt testdata/red-apple1.jpg
[[File:ML-TN-001 4 float 2threads.png|thumb|center|600px|Thread parameter set to 2]]
==== <big>Half-quantized model</big> ====
<pre class="board-terminal">
root@imx8mpevk:/mnt/ramdisk/image_classifier_eIQ_plus# build/image_classifier_cv 2 my_fruits_model_1.12_quant.tflite labels.txt testdata/red-apple1.jpg
2.47029e-18 Hand
</pre>
 
 
The following screenshot shows the system status while executing the application. In this case, the thread parameter was unspecified.
 
[[File:ML-TN-001 4 weightsquant default.png|thumb|center|600px|Thread parameter unspecified]]
4,650
edits

Navigation menu