ML-TN-001 - AI at the edge: comparison of different embedded platforms - Part 1
History[edit | edit source]
|1.0.0||September 2020||First public release|
|1.1.0||November 2020||Added new articles in the series|
Introduction[edit | edit source]
Thanks to the unstoppable technology progress, nowadays Artificial Intelligence (AI) and specifically Machine Learning (ML) are spreading on low-power, resource-constrained devices as well. In a typical Industrial IoT scenario, this means that edge devices can implement complex inference algorithms that were used to run on the cloud platforms only.
This Technical Note (TN for short) is the first one of a series illustrating how machine learning-based test applications are deployed and perform across different embedded platforms, which are eligible for building such intelligent edge devices.
The idea is to develop one or more reference applications with the help of well-known frameworks/libraries and to test them on these platforms for comparing performances, resource utilization, development flow, etc.
In the following sections, these applications are described in more detail. Each article of this series explores in detail one specific platform or use case.
Reference application #1: fruit classifier[edit | edit source]
This application implements a classifier like the one described here. There is one notable difference, however, with respect to the linked article. In this case, the model was created from scratch.
Model creation[edit | edit source]
The model was created and trained using Keras, a high-level API of TensorFlow.
The following block shows its architecture:
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 222, 222, 32) 896 _________________________________________________________________ activation (Activation) (None, 222, 222, 32) 0 _________________________________________________________________ max_pooling2d (MaxPooling2D) (None, 111, 111, 32) 0 _________________________________________________________________ conv2d_1 (Conv2D) (None, 109, 109, 32) 9248 _________________________________________________________________ activation_1 (Activation) (None, 109, 109, 32) 0 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 54, 54, 32) 0 _________________________________________________________________ conv2d_2 (Conv2D) (None, 52, 52, 64) 18496 _________________________________________________________________ activation_2 (Activation) (None, 52, 52, 64) 0 _________________________________________________________________ max_pooling2d_2 (MaxPooling2 (None, 26, 26, 64) 0 _________________________________________________________________ conv2d_3 (Conv2D) (None, 24, 24, 128) 73856 _________________________________________________________________ activation_3 (Activation) (None, 24, 24, 128) 0 _________________________________________________________________ max_pooling2d_3 (MaxPooling2 (None, 12, 12, 128) 0 _________________________________________________________________ flatten (Flatten) (None, 18432) 0 _________________________________________________________________ dense (Dense) (None, 256) 4718848 _________________________________________________________________ activation_4 (Activation) (None, 256) 0 _________________________________________________________________ dropout (Dropout) (None, 256) 0 _________________________________________________________________ dense_1 (Dense) (None, 6) 1542 _________________________________________________________________ activation_5 (Activation) (None, 6) 0 ================================================================= Total params: 4,822,886 Trainable params: 4,822,886 Non-trainable params: 0
The training was done in the cloud using an AWS EC2 server set up ad hoc.
The dataset was created collecting 240 images of 6 different fruits. 75% of the images were used for the training (training dataset) and the rest was used for test/validation purposes (test dataset, validation dataset). Of course, training the model with a greater number of images would have led to better accuracy, but it wouldn't have changed the inference time. As the primary goal of the applications built upon this model is to benchmark different platforms, this is acceptable. Obviously, this would not be if this were a real-world application.
Several measures were taken to counter the high overfitting tendency due to the small number of images. For instance, new images were synthesized from the existing ones to simulate a larger dataset (data augmentation), as shown below:
The following plots show the training history:
Articles in this series[edit | edit source]
The other articles in this series are:
- Part 2: testing application #1 on Mito8M SoM (NXP i.MX8M)
- Part 3: testing application #1 on Xilinx Zynq UltraScale+ MPSoC ZCU104 Evaluation Kit
- Part 4: testing application #1 on NXP i.MX8M Plus EVK
- Part 5: comparing NXP i.MX8M Plus NPU and Google Coral TPU
- Part 6: testing application #1 on Xilinx Zynq UltraScale+ MPSoC ZCU104 Evaluation Kit with PyTorch