Changes

ML-TN-001 - AI at the edge: comparison of different embedded platforms - Part 4

124 bytes added, 13:41, 21 October 2020

no edit summary

Specifically, the following versions of the application were tested:

* Version 1: This version is the same described in [[ML-TN-001 - AI at the edge: comparison of different embedded platforms - Part 2|this article]]. As such, inference is implemented in software and is applied to images retrieved from files.

* Version 2: This version is functionally equivalent to ~~the~~ version 1, but it leverages the Neural Processing Unit (NPU) to hardware accelerate the inference.

* Version 3: This is like version 3, but the inference is applied to the frames captured live from an image sensor.

=== ~~Test Bed~~ Testbed ===

The kernel and the root file system of the tested platform were built with the L5.4.24_2.1.0 release of the Yocto Board Support Package (BSP) for i.MX 8 family of devices. They were built with support for [https://www.nxp.com/design/software/development-software/eiq-ml-development-environment:EIQ eIQ]: "a collection of software and development tools for NXP microprocessors and microcontrollers to do inference of neural network models on embedded systems".

The following table details the relevant specs of the ~~test bed~~testbed.

{| class="wikitable" style="margin: auto;"

== Model deployment and inference applications ==

=== Version 1 ===

The C++ application previously used and described [https://wiki.dave.eu/index.php/ML-TN-001_-_AI_at_the_edge:_comparison_of_different_embedded_platforms_-_Part_2#Model_deployment_and_inference_application here] was adapted to work with the new NXP Linux BSP release. Now it uses OpenCV 4.2.0 to pre-process the input image and TensorFlow Lite (TFL) 2.1 as inference engine. It still supports all the 3 TFL models previously tested on ~~the~~ [~~https~~[:~~//wiki.dave.eu/index.php?title=~~Category:Mito8M~~&action=edit&redlink=1~~ |Mito8M SoM]]:* 32-bit floating-point model;* half-quantized model (post-training 8-bit quantization of the weights only);

* fully-quantized model (TensorFlow v1 quantization-aware training and 8-bit quantization of the weights and activations).

=== Version 2 ===

The version 1 application was then modified to accelerate the inference using the NPU (ML module) of the [https://www.nxp.com/products/processors-and-microcontrollers/arm-processors/i-mx-applications-processors/i-mx-8-processors/i-mx-8m-plus-arm-cortex-a53-machine-learning-vision-multimedia-and-industrial-iot:IMX8MPLUS i.MX8M Plus ~~Soc~~] SoC. This is possible because "''the TensorFlow Lite library uses the Android NN API driver implementation from the GPU/ML module driver for running inference using the GPU/ML module"''.

Neither the floating-point nor the half-quantized models work in with the NPU, however. Moreover, "''the GPU/ML module driver does not support per-channel quantization yet. Therefore post-training quantization of models with TensorFlow v2 cannot be used if the model is supposed to run on the GPU/ML module (inference on CPU does not have this limitation). TensorFlow v1 quantization-aware training and model conversion is recommended in this case"''.

Therefore, only the fully-quantized model was tested with this version of the ~~version 2~~ application.

=== Version 3 ===

A new C++ application was written to apply the inference to the frames captured from the image sensor ([https://cdn.sparkfun.com/datasheets/Sensors/LightImaging/OV5640_datasheet.pdf OV5640]) of a [https://www.nxp.com/part/MINISASTOCSI#/ camera module], instead of images retrieved from files. It This version uses OpenCV 4.2.0 to control the camera and to pre-process the frames. Like version 2, inference ~~run~~ runs on NPU, so only the fully-quantized model was tested ~~with the version 3 application~~.

== Running the applications ==

* All the files required to run the test—the executable, the image files, etc.—are stored on a [https://www.jamescoyle.net/how-to/943-create-a-ram-disk-in-linux tmpfs RAM disk] in order to make file system/storage medium overhead neglectable.

=== ~~<big>~~Version 1~~</big>~~ ===The following sections detail the execution of the first version of the classifier on the embedded platform. The number of threads was also tweaked in order to test different configurations. During the execution, the well-~~know~~ known <code>[https://en.wikipedia.org/wiki/Htop htop]</code> utility was used to monitor the system. This tool is very convenient to get some useful information such as cores allocation, processor load, and number of running threads.

==== ~~<big>~~Floating-point model~~</big>~~ ====

root@imx8mpevk:/mnt/ramdisk/image_classifier_eIQ_plus# build/image_classifier_cv 2 my_converted_model.tflite labels.txt testdata/red-apple1.jpg

[[File:ML-TN-001 4 float 2threads.png|thumb|center|600px|Thread parameter set to 2]]

==== ~~<big>~~Half-quantized model~~</big>~~ ====

root@imx8mpevk:/mnt/ramdisk/image_classifier_eIQ_plus# build/image_classifier_cv 2 my_fruits_model_1.12_quant.tflite labels.txt testdata/red-apple1.jpg

2.47029e-18 Hand

</pre>

The following screenshot shows the system status while executing the application. In this case, the thread parameter was unspecified.

[[File:ML-TN-001 4 weightsquant default.png|thumb|center|600px|Thread parameter unspecified]]

U0001

Bureaucrats, dave_user, Administrators

4,650

edits

Changes

ML-TN-001 - AI at the edge: comparison of different embedded platforms - Part 4

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Quick Links

Contact us

How to use wiki

Advanced Search

Tools