Changes

← Older edit

ML-TN-001 - AI at the edge: comparison of different embedded platforms - Part 2

501 bytes added, 13:25, 5 January 2021

no edit summary

==Introduction==

This Technical Note (TN for short) belongs to the series introduced [[ML-TN-001_-_AI_at_the_edge:_comparison_of_different_embedded_platforms_-_Part_1|here]].

Specifically, it illustrates the execution of ~~an inference application (fruit classifier) that makes use of the model described in~~ [[ML-TN-001_-_AI_at_the_edge:_comparison_of_different_embedded_platforms_-_Part_1#Reference_application_.231:_fruit_classifier|this ~~section~~inference application (fruit classifier)]] ~~when executed~~ on the [[:Category:Mito8M|Mito8M SoM]], a system-on-module based on the NXP [https://www.nxp.com/products/processors-and-microcontrollers/arm-processors/i-mx-applications-processors/i-mx-8-processors/i-mx-8m-family-armcortex-a53-cortex-m4-audio-voice-video:i.MX8M i.MX8M SoC].

=== Test bed ===

The kernel and the root file system of the tested platform were built with the L4.14.98_2.0.0 release of the Yocto Board Support Package for i.MX 8 family of devices. They were built with support for [https://www.nxp.com/design/software/development-software/eiq-ml-development-environment:EIQ eIQ]: "a collection of software and development tools for NXP microprocessors and microcontrollers to do inference of neural network models on embedded systems".

The following table details the relevant specs of the test bed. {| class="wikitable" style="margin: auto;"

|-

|'''NXP Linux BSP release'''

|TensorFlow Lite 1.12

|-

|'''Maximum ARM cores frequency ~~(max)~~'''

'''[MHz]'''

|[[File:ML - TF1.15QAT fruitsmodel.png|none|thumb|1000x1000px]]

|}

The following images show the graphs of the models after conversion (click to enlarge):

2.47029e-18 Hand

</pre>

The following screenshot shows the system status while executing the application. In this case, the thread parameter was unspecified.

== Results ==

The following table lists the prediction times for a single image depending on the model and the thread parameter.

{| class="wikitable" style="margin: auto;"

|unspecified

|200

|Four threads are created beside the main process (supposedly, this quantity is set accordingly to the number of physical cores available). Nevertheless, they seem to be constantly in sleep state.

|-

|4

|8084|Interestingly, 7 actual processes are created ~~besides~~ beside the main one. Four of them, however, seem to be constantly in sleep state.

|}

The prediction time '''takes into account the time needed to fill the input tensor with the image'''. Furthermore, it is averaged over several predictions.

The same tests were repeated using a network file system (NFS) over an Ethernet connection, too. No significant variations in the prediction times were observed.

In conclusion, to maximize the performance in terms of execution time, the model has to be fully-quantized and the number of threads has to be specified explicitly.

U0009

dave_user, Administrators

5,184

edits

DAVE Developer's Wiki β

Changes

ML-TN-001 - AI at the edge: comparison of different embedded platforms - Part 2

DAVE Developer's Wiki ^β