Changes

Jump to: navigation, search
no edit summary
* fully-quantized model (TensorFlow v1 quantization-aware training and 8-bit quantization of the weights and activations).
=== Version 2 2A ===
The version 1 application was then modified to accelerate the inference using the NPU (ML module) of the [https://www.nxp.com/products/processors-and-microcontrollers/arm-processors/i-mx-applications-processors/i-mx-8-processors/i-mx-8m-plus-arm-cortex-a53-machine-learning-vision-multimedia-and-industrial-iot:IMX8MPLUS i.MX8M Plus] SoC. This is possible because ''the TensorFlow Lite library uses the Android NN API driver implementation from the GPU/ML module driver for running inference using the GPU/ML module''.
Neither the floating-point nor the half-quantized models work with the NPU, however. Moreover, ''the GPU/ML module driver does not support per-channel quantization yet. Therefore post-training quantization of models with TensorFlow v2 cannot be used if the model is supposed to run on the GPU/ML module (inference on CPU does not have this limitation). TensorFlow v1 quantization-aware training and model conversion is recommended in this case''. Therefore, only the fully-quantized model was tested with this version of the application.
 
=== Version 2B ===
The version 2A application was then ported to Python. This Python version is functionally equivalent to the C++ one.
 
Generally, Python has the advantage of being easier to work with, but at the cost of being slower to execute. However, in this case, the performance is pretty much the same between the two versions. This because the Python API acts only as a wrapper to the core TensorFlow library written in C++ (and other fast languages).
=== Version 3 ===
89
edits

Navigation menu