Changes

Jump to: navigation, search
no edit summary
|1.1.0
|November 2020
|Added application written in Python(version 2B)
|}
== Model deployment and inference applications ==
=== Version 1 (C++) ===
The C++ application previously used and described [https://wiki.dave.eu/index.php/ML-TN-001_-_AI_at_the_edge:_comparison_of_different_embedded_platforms_-_Part_2#Model_deployment_and_inference_application here] was adapted to work with the new NXP Linux BSP release. Now it uses OpenCV 4.2.0 to pre-process the input image and TensorFlow Lite (TFL) 2.1 as inference engine. It still supports all the 3 TFL models previously tested on [[:Category:Mito8M|Mito8M SoM]]:
* 32-bit floating-point model
* fully-quantized model (TensorFlow v1 quantization-aware training and 8-bit quantization of the weights and activations).
=== Version 2A (C++) ===
The version 1 application was then modified to accelerate the inference using the NPU (ML module) of the [https://www.nxp.com/products/processors-and-microcontrollers/arm-processors/i-mx-applications-processors/i-mx-8-processors/i-mx-8m-plus-arm-cortex-a53-machine-learning-vision-multimedia-and-industrial-iot:IMX8MPLUS i.MX8M Plus] SoC. This is possible because ''the TensorFlow Lite library uses the Android NN API driver implementation from the GPU/ML module driver for running inference using the GPU/ML module''.
Neither the floating-point nor the half-quantized models work with the NPU, however. Moreover, ''the GPU/ML module driver does not support per-channel quantization yet. Therefore post-training quantization of models with TensorFlow v2 cannot be used if the model is supposed to run on the GPU/ML module (inference on CPU does not have this limitation). TensorFlow v1 quantization-aware training and model conversion is recommended in this case''. Therefore, only the fully-quantized model was tested with this version of the application.
=== Version 2B (Python) ===The version 2A application was then ported to Python. This Python version is functionally equivalent to the 2A version, which is written in C++. The goal of version 2B is to make a comparison in terms of performance with respect to version 2A. Generally, Python has the advantage of being easier to work with, but at the cost of being slower to execute. However, in this case, '''regarding the inference computation''', the performance is '''pretty much the same between the two versions'''. This is because the Python API's act only as a wrapper to the core TensorFlow library written in C++ one(and other "fast" languages). As detailed in section TBD, the overall time is significantly different because it takes into account the pre/post-processing computations as well. These computations don't leverage the NPU accelerator and thus are more affected by the slower Python code. Nevertheless, in case the model used is much more complex as it usually occurs in real-world cases, this overhead could be still tolerable because it might be neglectable. In conclusion, the use of Python has not to be discarded a priori because of performance concerns. Depending on the specific use case, it can be a valid option to consider.
Generally, Python has the advantage of being easier to work with, but at the cost of being slower to execute. However, in this case, the performance is pretty much the same between the two versions. This because the Python API acts only as a wrapper to the core TensorFlow library written in C++ (and other fast languages). === Version 3 (C++) ===
A new C++ application was written to apply the inference to the frames captured from the image sensor ([https://cdn.sparkfun.com/datasheets/Sensors/LightImaging/OV5640_datasheet.pdf OV5640]) of a [https://www.nxp.com/part/MINISASTOCSI#/ camera module], instead of images retrieved from files. This version uses OpenCV 4.2.0 to control the camera and to pre-process the frames. Like version 2, inference runs on NPU, so only the fully-quantized model was tested.
4,650
edits

Navigation menu