Open main menu

DAVE Developer's Wiki β

Changes

History
= History =
{| class="wikitable" border="1"
!VersionID#
!Date
!Notes
|-
|1.0.0{{oldid|16244| 16244}}|December 202128/02/2022
|First public release
|-
! style="border-left:solid 2px #73B2C7; border-right:solid 2px #73B2C7;border-top:solid 2px #73B2C7; border-bottom:solid 2px #73B2C7; background-color:#ededed; padding:5px; color:#000000" |{{oldid|17342| 17342}}
! style="border-left:solid 2px #73B2C7; border-right:solid 2px #73B2C7;border-top:solid 2px #73B2C7; border-bottom:solid 2px #73B2C7; background-color:#ededed; padding:5px; color:#000000" |17/01/2023
! style="border-left:solid 2px #73B2C7; border-right:solid 2px #73B2C7;border-top:solid 2px #73B2C7; border-bottom:solid 2px #73B2C7; background-color:#ededed; padding:5px; color:#000000" |Update testbed information
|-
|}
=Introduction=
This Technical Note (TN) describes a demo application used to show the combination of an inference algorithm, namely [https://en.wikipedia.org/wiki/Keyword_spotting keyword spotting], and asymmetric multiprocessing scheme (AMP) on a heterogeneous architecture. This use case can serve as the basis for more complex applications that have to carry out the following tasks:* Acquiring Real-time data acquisition from sensors in real-time* Executing Execution of a computationally expensive inference algorithm on the collected data.
This scenario is quite common in the realm of "AI at the edge " but, generally, can not be addressed with a microcontroller-based solution because it would take too long to run the inference algorithm. On the other hand, a classic embedded processor running a complex operating system such as Linux might not be suited either because unable to handle tight real-time constrained tasks properly.
In such cases, the power and the flexibility of the NXP i.MX8M Plus can be of much help, as this SoC features a heterogeneous architecture — an ARM Cortex-A53 complex combined with an ARM Cortex-M7 core — and a Neural Processing Unit (NPU).
* The Cortex-A53 complex — running Yocto Linux — is devoted to the inference algorithm by leveraging the NPU hardware acceleration
* The Cortex-M7 core takes care of data acquisition.
Several documents dealing with AMP configurations were published in the past [1]. Most of them refer to homogeneous architectures, which pose some well-know limitations when it comes to implementing asymmetric multiprocessing schemes. At large, the structure of the i.MX8M Plus allows to overcome such limitations.
 
[1]
* [[BELK-AN-001: Asymmetric Multiprocessing (AMP) on Bora – Linux FreeRTOS]]
* [[BELK-TN-001: Real-timeness, system integrity and TrustZone® technology on AMP configuration]]
* [[BELK-AN-007: Asymmetric Multiprocessing (AMP) on Bora/BoraX with OpenAMP]]
* [[BELK-AN-002: Trace on the Bora AMP (Linux + FreeRTOS) system]]
* [[BXELK-TN-002: Non-intrusive continuous multi-gigabit transceivers link monitoring]]
* [[XELK-AN-001: Asymmetric Multiprocessing (AMP) on Axel – Linux + FreeRTOS]]
* [[MISC-TN-003: Asymmetric multiprocessing on NXP i.MX6SoloX]]
=Testbed=
The testbed is illustrated in the following picture. Basically, it consists of an [[ORCA_SBC|Orca Single Board Computer]] with an USB to Audio adapter for an easy audio playback: [[File:ORCA_SBC-USB-audio-adapter.png | 500px | thumb|center| ORCA SBC with the USB audio adapter]]
More information on how to connect and use the USB Audio adapter in the following [[DESK-MX8-AN-0002:_Adding_an_audio_interface_with_an_USB_bluetooth_audio_transmitter | Application Note]]
=Implementation=
As stated previously, the inference algorithm is keyword spotting. The data Data being processed are thus audio samples retrieved by the Cortex M7 and sent to the Cortex A53 complex.
From a software perspective, we identify two different domains (see also the following picture):
The reserved SDRAM buffers used to store the audio samples are protected at Linux device tree level to prevent D1 domain from accessing them directly. It is worth remembering that it is also possible to make use of a hardware-based, stronger protection mechanism by exploiting the i.MX8M Plus Resource Domain Controller (RDC).
The inference application (IAPP) running in D1 uses a simple sysfs-based interface to interact with the firmware running in D2. As such, the whole operation of the software works operates like this:
* IAPP triggers the "acquisition" of audio samples by writing to a specific sysfs pseudo file
* The Cortex M7 firmware (MFW)
** adds some noise to the samples
** stores the resulting buffer in the shared memory
** signals IAPP that the buffer is ready
* IAPP runs the inference to spot the pronounced word.
The firmware running in D2 is implemented as a [https://www.nxp.com/design/software/development-software/mcuxpresso-software-and-tools-/mcuxpresso-software-development-kit-sdk:MCUXpresso-SDK FreeRTOS] application. The use of a real-time operating system, combined with the intrinsic characteristics of the Cortex M7 in terms of [https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/beginner-guide-on-interrupt-latency-and-interrupt-latency-of-the-arm-cortex-m-processors interrupt latency], make this core extremely suitable for tight real-time constrained applications. Nevertheless, nothing prevents to choose a bare metal coding style instead.
== Additional notes regarding the inference application ==
TBDIAPP runs the inference on a convolutional neural network (CNN) trained to spot a predefined set of keywords. The CNN was trained with TensorFlow on several audio spectrograms on the basis of [https://www.tensorflow.org/tutorials/audio/simple_audio this] tutorial. It was then deployed in TensorFlow Lite format. Before running the inference, IAPP pre-processes the audio samples to match the format used to train the CNN. Most of the libraries needed by IAPP and other deep learning applications are provided by the [https://www.nxp.com/design/software/development-software/eiq-ml-development-environment:EIQ NXP eIQ] software ecosystem. 
==Boot sequence==
This demo was arranged in order to execute the following boot sequence:
* The RPMsg link between D1 and D2 is established
* IAPP starts.
Please note that the Cortex M7 firmware has not to be started before the Linux kernel necessarily. It is also possible to start MFW from the user-space Linux.
=Testing=
Watch our dedicated video were is possible to see this demo in action:
{{#ev:youtube|LiLrhqqUNDw|600|center|Keyword Spotting and Asymmetric Multiprocessing on Orca SBC|frame}}
8,204
edits