XELK-AN-001: Asymmetric Multiprocessing (AMP) on Axel – Linux + FreeRTOS

From DAVE Developer's Wiki
Jump to: navigation, search
Info Box
Axel-04.png Applies to Axel Ultra
Axel-lite 02.png Applies to Axel Lite

History[edit | edit source]

Version Date XELK version Notes
0.9.2 Feb 2015 2.0.0 Update for XELK 2.0.0 release
0.9.3 April 2015 2.0.0 Minor fixes
1.0.0 April 2018 2.1.0 wiki version

Introduction[edit | edit source]

This application note describe how to build the software components required to set up asymmetric multi-processing (AMP for short) configuration required to run Linux OS on first Cortex®-A9 core and FreeRTOS on second Cortex®-A9 core of the Freescale i.MX6 SOC. The latencystat demo is a RPMsg-based application that exploits sophisticated techniques to handle inter-processors communication and synchronization.

Asymmetric Multiprocessing[edit | edit source]

Thanks to latest technological improvements, multicore processors are becoming popular in embedded world too. These architectures allows the implementation of processing schemes that were not feasible with traditional single-core CPUs. Among these, one of the most interesting is asymmetric multiprocessing (AMP for short). This configuration permits to address several requirements that the embedded system developers struggle to handle in case of single-core systems.

This application note describes in detail the implementation of Linux/FreeRTOS asymmetric multiprocessing configuration on DAVE Embedded Systems AXEL LITE SoM. This configuration is a typical example about how to leverage AMP flexibility to combine, on one single piece of silicon, the versatility of Linux o.s. for general purpose computation, connectivity and HMI and the determinism of an RTOS to satisfy real-time constraints. Since AXEL family products are all based on Freescale i.MX6 processors, what here described applies to AXEL ULTRA too.

As known, AMP allows a multicore system to run simultaneously1 multiple Operating Systems (OS) that are independent of each other. In other words, each core has its own private memory space, which contains the OS and the applications that are to run on that core. In addition, there can be some shared memory space that is used for inter-core communication. This is in contrast to Symmetric Multiprocessing (SMP), in which one OS runs on multiple cores using a public shared memory space.

Thanks to AMP, developers can use open-source Linux and FreeRTOS operating systems and the RPMsg Inter Processor Communication (IPC) framework to quickly implement applications that need to deliver deterministic, real-time responsiveness for markets such as automotive, industrial and others with similar requirements, while preserving the openness of Linux.

The following picture depicts the structure of the system.

AMP configuration

Core #0:

  • takes care of boot process
  • once Linux gets control of processor, initializes core #1 in SMP mode
  • stops core #1 and switches it to AMP mode
  • loads binary image of FreeRTOS that is then executed by core #1

Inter-core communication[edit | edit source]

Inter-core communication is based on RPMsg framework2. The adoption of a standardized and mainlined protocol improves dramatically the portability and the maintainability of the application code. On Linux kernel, this framework is supported by the kernel 3.10.17_GA – released by Freescale itself along with L3.10.17_1.0.0_IMX6QDLS_BUNDLE BSP – upon which the XELK 2.0.0 is based3. The picture below shows how the system memory is fragmented. The portion of memory used to create a shared area between the two cores – ring buffers – is allocated inside the 256 MByte region used by FreeRTOS.

Memory layout

XELK platform[edit | edit source]

AXEL Embedded Linux Kit (XELK for short) provides all the necessary components required to set up the developing environment for:

  • building the second stage bootloader (U-Boot)
  • building and running Linux operating system on AXEL-based systems
  • building Linux applications that will run on the target

DAVE Embedded Systems provides all the customization required (in particular at bootloader and Linux kernel levels) to enable customers use the standard i.MX6 development tools for building all the firmware/software components that will run on the target system.

Please refer to the XELK Quick Start Guide for further details on XELK.

N.B.: this application note has been tested using XELK 2.0.0.

AMP on AXEL[edit | edit source]

The following sections describe how to build the software components required to set up asymmetric multi-processing (AMP for short) configuration required to run Linux OS on first Cortex®-A9 core and FreeRTOS on second Cortex®-A9 core. The latencystat demo is a RPMsg-based application that exploits sophisticated techniques to handle inter-processors communication and synchronization.

Prerequisities[edit | edit source]

  • AXEL Embedded Linux Kit (please refer to XELK Quick Start Guide for further details) version 2.0.0
  • access to AXEL git repositories (see below)
  • an arm-none-eabi- toolchain for building FreeRTOS

Software components git repositories[edit | edit source]

The software components for this application note are provided as git repositories, so the user can immediately get access to the source trees and keep these components in sync and up to date with DAVE Embedded Systems repositories.

Assuming that a local repository has not been created, clone the remote AMP git repository:

  • cloning the linux git repository:
dvdk@dvdk:~/xelk$ git clone git@git.dave.eu:dave/axel/linux-2.6-imx.git
  • cloning the FreeRTOS repository:
dvdk@dvdk:~/xelk$ git@git.dave.eu :dave/axel/freertos.git
  • cloning the latencystat repository:
dvdk@dvdk:~/xelk$ git@git.dave.eu :dave/axel/latencystat.git

Building the software components[edit | edit source]

The following paragraphs describe how to build the software components required for this application. Please note that:

  • the standard Linux infrastructure will be used to load the firmware for the second core Linux will start in SMP mode, running on both cores; then CPU1 will be shutdown and freeRTOS firmware

will be loaded and run

  • the GPIO7_13 pin (JP10.9 of the AXEL EVB-LITE) is toggled during the ISR
  • the EIM_D19 signal (available on the R5 resistor on the AXEL EVB-LITE) is the output of the EPIT1 (which is the interrupt source)

The FreeRTOS application programs the EPIT1, which uses 66MHz as time-base (prescaler=1), setting FFFF.FFFF - 0x000F.4240 (1k tick) as the compare value. The timer starts a countdown from 0xFFFF.FFFF and, as it reaches the compare value, it triggers the ISR and toggles the EIM_D19 pin.

The ISR counts the number of ticks after the compare value and saves that information to return it to the Linux application for the histogram calculation.

Build the Linux kernel[edit | edit source]

Set the correct environment sourcing the iMX6 toolchain:

dvdk@dvdk:~/xelk$ source env.sh 

Enter the linuxdirectory and checkout the amp-feat-2.1.0-amp branch:

dvdk@dvdk:~/xelk/linux-2.6-imx$ git checkout axel-feat-2.1.0-amp

Select the Axel board configuration:

dvdk@dvdk:~/xelk/linux-2.6-imx$ make imx_v7_axel_defconfig

To run this example, Linux kernel must be rebuilt. Thus configure the kernel using imx_v7_axel_defconfig as configuration file and enter the following command line, that changes the default load address of kernel and launches the building of the kernel and device tree images and the kernel modules:

dvdk@dvdk:~/xelk/linux-2.6-imx$ make UIMAGE_LOADADDR=0x18008000 imx6q-xelk-l.dtb uImage modules

The file arch/arm/boot/uImage is the binary image of the kernel that must be used to boot the system, together with the file arch/arm/boot/dts/imx6q-xelk-l.dtb, which is the binary image of the device tree with the XELK hardware configuration.

The following kernel modules, resulting from the kernel build procedure, must be copied from the building directory to the root file system (usually into /lib/modules/<kernel version>/kernel, but any other directory can be used):

 LD [M] drivers/remoteproc/remoteproc.ko
 LD [M] drivers/remoteproc/mx6_remoteproc.ko
 LD [M] drivers/rpmsg/rpmsg_freertos_statistic.ko
 LD [M] drivers/rpmsg/virtio_rpmsg_bus.ko
 LD [M] drivers/virtio/virtio.ko
 LD [M] drivers/virtio/virtio_ring.ko

Build FreeRTOS[edit | edit source]

For building the FreeRTOS, a proper toolchain has to be used:

dvdk@dvdk:~/xelk$ cd freertos
dvdk@dvdk:~/xelk/freertos$ export PATH=<path to arm-eabi toolchain>/bin:$PATH
dvdk@dvdk:~/xelk/freertos$ export ARCH=arm
dvdk@dvdk:~/xelk/freertos$ export CROSS_COMPILE=arm-none-eabi-

and then run <code>make</code> to build the RTOS binary image:
dvdk@dvdk:~/xelk/freertos$ make

Building latencystat[edit | edit source]

Enter the latecystat directory and build with make:

dvdk@dvdk:~/xelk$ cd latencystat
dvdk@dvdk:~/xelk/latencystat$ make


Running the AMP on the target[edit | edit source]

As stated before, this example shows a sophisticated approach that allows for:

  • using a standardized communication channel between the two cores
  • exploiting a standardized mechanism to load the firmware of second core

The example performs IRQ latency measurements on FreeRTOS side by using a hardware timer. These measures are collected by the counterpart application running on Linux side and shown on console.

Move the device tree binary load address (the default 0x18000000 conflicts the AMP kernel loadaddr, it must be moved to a higher address) by changing u-boot environment for example as:

=> setenv fdtaddr 0x1c000000

Once all the components are built, please boot the system and launch the following commands:

echo "now running with $(cat /proc/cpuinfo | grep processor | wc -l) processors"

echo "start remote proc AMP"

insmod virtio.ko
insmod virtio_ring.ko
insmod virtio_rpmsg_bus.ko
insmod remoteproc.ko
insmod mx6_remoteproc.ko
insmod rpmsg_freertos_statistic.ko

echo "everything done"
echo "now Linux is running with $(cat /proc/cpuinfo | grep processor | wc -l) processors"

Then run the latencystat application as shown below. The typical output will look like this:

root@axel:~# ./latencystat -b
Linux FreeRTOS AMP Demo.
 0: Command 0 ACKed
 1: Command 1 ACKed
Waiting for samples...
 2: Command 2 ACKed
 3: Command 3 ACKed
 4: Command 4 ACKed
-----------------------------------------------------------
Histogram Bucket Values:
 Bucket 323 ns (36 ticks) had 38 frequency
 Bucket 341 ns (38 ticks) had 299 frequency
 Bucket 512 ns (57 ticks) had 1 frequency
 Bucket 746 ns (83 ticks) had 1 frequency
-----------------------------------------------------------
Histogram Data:
 min: 323 ns (36 ticks)
 avg: 332 ns (37 ticks)
 max: 746 ns (83 ticks)
 out of range: 0
 total samples: 339
-----------------------------------------------------------

This application is extremely useful for evaluating how CPU load on first core affects IRQ latency. In case latency does not satisfy real-time requirements, it may be necessary to adjust arbitration priorities of processor's interconnect subsystem.