Revision as of 11:02, 9 December 2021

HOME	SOMs	SBCs	ToloMEO Embedded Assistant	GET A QUOTE	ONLINE HELPDESK
	Roadmap		IoT Services			ML/AI services	Embedded Design Services

Info Box

Applies to Machine Learning

Applies to Machine Learning Technical Notes

Applies to ORCA Technical Notes

Applies to ORCA SBC Technical Notes

History[edit | edit source]

Version	Date	Notes
1.0.0	December 2021	First public release

Introduction[edit | edit source]

This Technical Note (TN) describes a demo application used to show the combination of an inference algorithm, namely keyword spotting, and asymmetric multiprocessing scheme (AMP) on a heterogeneous architecture. This use case can serve as the basis for more complex applications that have to carry out the following tasks:

Acquiring data from sensors in real-time
Executing a computationally expensive inference algorithm on the collected data.

This scenario is quite common in the realm of "AI at the edge" but, generally, can not be addressed with a microcontroller-based solution because it would take too long to run the inference algorithm. On the other hand, a classic embedded processor running a complex operating system such as Linux might not be suited either because unable to handle tight real-time constrained tasks properly.

In such cases, the power and the flexibility of the NXP i.MX8M Plus can be of much help, as this SoC features a heterogeneous architecture — an ARM Cortex-A53 complex combined with an ARM Cortex-M7 core — and a Neural Processing Unit (NPU).

The idea is to exploit i.MX8M Plus' architecture to implement an AMP configuration where

The Cortex-A53 complex — running Yocto Linux — is devoted to the inference algorithm by leveraging the NPU hardware acceleration
The Cortex-M7 core takes care of data acquisition.

Several documents dealing with AMP configurations were published in the past [1]. Most of them refer to homogeneous architectures, which pose some well-know limitations when it comes to implementing asymmetric multiprocessing schemes. At large, the structure of the i.MX8M Plus allows to overcome such limitations.

[1]

Testbed[edit | edit source]

The testbed is illustrated in the following picture. Basically, it consists of an Orca Single Board Computer

Implementation[edit | edit source]

As stated previously, the inference algorithm is keyword spotting. The data being processed are thus audio samples retrieved by the Cortex M7 and sent to the Cortex A53 complex.

From a software perspective, we identify two different domains (see also the following picture):

D1, which refers to the Yocto Linux world running on the Cortex A53 complex
D2, which refers to the firmware running on the Cortex M7 core.

D1 and D2 communicates through the RPMsg protocol. On the Cortex M7 side, the RPMsg Lite implementation by NXP is used. The interface between D1 and D2 comprises a shared memory buffer as well. This area is used to exchange audio samples between the domains. Synchronization messages are exchanged over RPMsg channels instead.

For the sake of simplicity, the audio samples are not captured by the Cortex M7 with a real microphone. They are retrieved from prefilled memory buffers inaccessible to the Cortex A53 cores. For the purposes of discussion, this simplification is neglectable as the communication mechanisms between the domains are not affected at all. Likewise, the inference algorithm could probably be executed by the powerful Cortex M7 core itself. This is not a big deal as the aim of this TN is to show an architectural solution that can be tailored to address more challenging, real-world use cases.

The reserved SDRAM buffers used to store the audio samples are protected at Linux device tree level to prevent D1 domain from accessing them directly. It is worth remembering that it is also possible to make use of a hardware-based, stronger protection mechanism by exploiting the i.MX8M Plus Resource Domain Controller (RDC).

The inference application (IAPP) running in D1 uses a simple sysfs-based interface to interact with the firmware running in D2. As such, the whole operation of the software works like this:

IAPP triggers the "acquisition" of audio samples by writing to a specific sysfs pseudo file
The Cortex M7 firmware (MFW)
- retrieves randomly one of the prefilled audio buffers
- adds some noise to the samples
- stores the resulting buffer in the shared memory
- signals IAPP the buffer is ready
IAPP runs the inference to spot the pronounced word.

The firmware running in D2 is implemented as a FreeRTOS application. The use of a real-time operating system, combined with the intrinsic characteristics of the Cortex M7 in terms of interrupt latency, make this core extremely suitable for tight real-time constrained applications. Nevertheless, nothing prevents to choose a bare metal coding style instead.

Additional notes regarding the inference application[edit | edit source]

TBD

Boot sequence[edit | edit source]

This demo was arranged in order to execute the following boot sequence:

U-Boot starts and populates the audio samples buffers by retrieving WAV files via TFTP protocol
U-Boot initializes the Cortex M7 core and starts MFW
MFW waits for establishing the RPMsg link with the D1 domain
U-Boot starts the Linux kernel, which then takes control of the Cortex A53 complex
The RPMsg link between D1 and D2 is established
IAPP starts.

Please note that the Cortex M7 firmware has not to be started before the Linux kernel necessarily. It is also possible to start MFW from the user-space Linux.

Testing[edit | edit source]

HOME	SOMs	SBCs	ToloMEO Embedded Assistant	GET A QUOTE	ONLINE HELPDESK
	Roadmap		IoT Services			ML/AI services	Embedded Design Services

@@ Line 25: / Line 25: @@
 * Executing a computationally expensive inference algorithm on the collected data.
-This scenario is quite common in the realm of AI at the edge but, generally, can not be addressed with a microcontroller-based solution because it would take too long to run the inference algorithm. On the other hand, a classic embedded processor running a complex operating system such as Linux might not be suited either because unable to handle tight real-time constrained tasks properly.
+This scenario is quite common in the realm of "AI at the edge" but, generally, can not be addressed with a microcontroller-based solution because it would take too long to run the inference algorithm. On the other hand, a classic embedded processor running a complex operating system such as Linux might not be suited either because unable to handle tight real-time constrained tasks properly.
 In such cases, the power and the flexibility of the NXP i.MX8M Plus can be of much help, as this SoC features a heterogeneous architecture — an ARM Cortex-A53 complex combined with an ARM Cortex-M7 core — and a Neural Processing Unit (NPU).
@@ Line 32: / Line 32: @@
 * The Cortex-A53 complex — running Yocto Linux — is devoted to the inference algorithm by leveraging the NPU hardware acceleration
 * The Cortex-M7 core takes care of data acquisition.
+Several documents dealing with AMP configurations were published in the past [1]. Most of them refer to homogeneous architectures, which pose some well-know limitations when it comes to implementing asymmetric multiprocessing schemes. At large, the structure of the i.MX8M Plus allows to overcome such limitations.
+[1]
+* [[BELK-AN-001: Asymmetric Multiprocessing (AMP) on Bora – Linux FreeRTOS]]
+* [[BELK-TN-001: Real-timeness, system integrity and TrustZone® technology on AMP configuration]]
+* [[BELK-AN-007: Asymmetric Multiprocessing (AMP) on Bora/BoraX with OpenAMP]]
+* [[BELK-AN-002: Trace on the Bora AMP (Linux + FreeRTOS) system]]
+* [[BXELK-TN-002: Non-intrusive continuous multi-gigabit transceivers link monitoring]]
+* [[XELK-AN-001: Asymmetric Multiprocessing (AMP) on Axel – Linux + FreeRTOS]]
+* [[MISC-TN-003: Asymmetric multiprocessing on NXP i.MX6SoloX]]
 =Testbed=

Difference between revisions of "ML-TN-006 — Keyword Spotting and Asymmetric Multiprocessing on Orca SBC"

Revision as of 11:02, 9 December 2021

Contents

History[edit | edit source]

Introduction[edit | edit source]

Testbed[edit | edit source]

Implementation[edit | edit source]

Additional notes regarding the inference application[edit | edit source]

Boot sequence[edit | edit source]

Testing[edit | edit source]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Quick Links

Contact us

How to use wiki

Advanced Search

Tools