Changes

← Older edit

ML-TN-001 - AI at the edge: comparison of different embedded platforms - Part 2

7,980 bytes added, 13:25, 5 January 2021

no edit summary

~~[[File:TBD.png|thumb|center|200px|Work in progress]]~~

__FORCETOC__

|-

|1.0.0

|~~September~~ October 2020

|First public release

|}

==Introduction==

This Technical Note (TN for short) belongs to the series introduced [[ML-TN-001_-_AI_at_the_edge:_comparison_of_different_embedded_platforms_-_Part_1|here]].

Specifically, it illustrates the execution of ~~an inference application (fruit classifier) that makes use of the model described in~~ [[ML-TN-001_-_AI_at_the_edge:_comparison_of_different_embedded_platforms_-_Part_1#Reference_application_.231:_fruit_classifier|this ~~section~~inference application (fruit classifier)]] ~~when executed~~ on the [[:Category:Mito8M|Mito8M SoM]], a system-on-module based on the NXP [https://www.nxp.com/products/processors-and-microcontrollers/arm-processors/i-mx-applications-processors/i-mx-8-processors/i-mx-8m-family-armcortex-a53-cortex-m4-audio-voice-video:i.MX8M i.MX8M SoC].

==~~Model deployment~~= Test bed ===~~TBD~~The kernel and the root file system of the tested platform were built with the L4.14.98_2.0.0 release of the Yocto Board Support Package for i.MX 8 family of devices. They were built with support for [https://www.nxp.com/design/software/development-software/eiq-ml-development-environment:EIQ eIQ]: "a collection of software and development tools for NXP microprocessors and microcontrollers to do inference of neural network models on embedded systems".

The following table details the relevant specs of the test bed. {| class="wikitable" style=~~Bulding~~ "margin: auto;"|-|'''NXP Linux BSP release'''|L4.14.98_2.0.0|-|'''Inference engine'''|TensorFlow Lite 1.12|-|'''Maximum ARM cores frequency''' '''[MHz]'''|1300|-|'''SDRAM memory frequency (LPDDR4)''''''[MHz]'''|1600|-|'''[https://www.kernel.org/doc/Documentation/cpu-freq/governors.txt Governor]'''|ondemand|} ==Model deployment and inference application==To run the model on the target, a new C++ application was written. After debugging this application on a host PC, it was migrated to the edge device where it was built natively. The root file system for eIQ, in fact, provides the native C++ compiler as well. The application uses OpenCV 4.0.1 to pre-process the input image and TensorFlow Lite (TFL) 1.12 as inference engine. The model, originally created and trained with Keras of TensorFlow (TF) 1.15, was therefore converted into the TFL format. Then, the same model was recreated and ~~running~~ retrained with Keras of TF 1.12. This allowed to convert it into TFL with post-training quantization of the weights without compatibility issues with the target inference engine version. After that, it was also recreated and retrained with quantization-aware training of TF 1.15. In this way, a fully quantized model was obtained after conversion. So, in the end, three converted models were obtained: a regular 32-bit floating-point one, an 8-bit half-quantized (only the weights, not the activations) one, and a fully-quantized one. The following images show the graphs of the models before conversion (click to enlarge): {| class="wikitable" style="margin: auto;"|+!Originally created model(Keras of TF 1.15)!Recreated model(Keras of TF 1.12)!Quantization-aware trained model(TF 1.15)|-|[[File:ML - Keras1.15 fruitsmodel.png|none|thumb|1000x1000px]]|[[File:ML - Keras1.12 fruitsmodel.png|none|thumb|1000x1000px]]|[[File:ML - TF1.15QAT fruitsmodel.png|none|thumb|1000x1000px]]|} The following images show the graphs of the models after conversion (click to enlarge): {| class="wikitable" style="margin: auto;"|+!Floating point model(TFL)!Half quantized model(TFL)!Fully quantized model(TFL)|-|[[File:ML - TFL float fruitsmodel.png|none|thumb|1000x1000px]]|[[File:ML - TFL halfquant fruitsmodel.png|none|thumb|1000x1000px]]|[[File:ML - TFL QAT fruitsmodel.png|none|thumb|1000x1000px]]|} ==Running the application==

In order to have reproducible and reliable results, some measures were taken:

* The inference was repeated several times and the average execution time was computed

* All the files required to run the test—the executable, the image files, etc.—are stored on a tmpfs RAM disk in order to make file system/storage medium overhead neglectable.

The following sections detail the execution of the classifier on the embedded platform. The [https://www.tensorflow.org/lite/performance/best_practices#tweak_the_number_of_threads number of threads] was also tweaked in order to test different configurations. During the execution, the well-know [https://en.wikipedia.org/wiki/Htop <code>htop</code>] utility was used to monitor the system. This tool is very convenient to get some useful information such as cores allocation, processor load, and number of running threads.

=== Floating-point model ===

root@imx8qmmek:~/devel/image_classifier_eIQ# ./image_classifier_cv 2 my_converted_model.tflite labels.txt testdata/red-apple1.jpg

Number of threads: undefined

Warmup time: 233.403 ms

Original image size: 600x600x3

Cropped image size: 600x600x3

Resized image size: 224x224x3

Input tensor index: 1

Input tensor name: conv2d_8_input

Selected order of channels: RGB

Selected pixel values range: 0-1

Filling time: 1.06354 ms

Inference time 1: 219.723 ms

Inference time 2: 220.512 ms

Inference time 3: 221.897 ms

Average inference time: 220.711 ms

Total prediction time: 221.774 ms

Output tensor index: 0

Output tensor name: Identity

DAVE Developer's Wiki β

Changes

ML-TN-001 - AI at the edge: comparison of different embedded platforms - Part 2

DAVE Developer's Wiki ^β