BELK-AN-001: Asymmetric Multiprocessing (AMP) on Bora – Linux FreeRTOS

From DAVE Developer's Wiki
Revision as of 13:48, 15 September 2015 by U0001 (talk | contribs) (Introduction)

Jump to: navigation, search
Info Box
Bora5-small.jpg Applies to Bora
BORA Xpress.png Applies to BORA Xpress

History[edit | edit source]

Version Date BELK version Notes
1.0.0 November 2013 1.1.0 First release
1.0.1 November 2013 1.1.0 Added UART0 pinout information

Minor fixes

1.1.0 November 2013 1.1.0 Added support for RPMsg example
1.5.0 December 2013 1.1.0 Added chapter related to Lauterbach debugger
1.5.1 January 2014 1.1.0 Minor fixes
1.6.0 April 2014 2.0.0 Minor fixes

Updated for BELK 2.0.0 release

1.7.0 September 2015 3.0.0 Added support for BoraX

Introduction[edit | edit source]

This application note describes how to build the software components required to set up asymmetric multi-processing (AMP for short) configuration required to run Linux OS on first Cortex-A9 core and FreeRTOS on second Cortex-A9 core of the Zynq SOC.

Asymmetric Multiprocessing (AMP) allows a multiprocessor/multicore system to run multiple Operating Systems (OS) that are independent of each other. In other words, each CPU has its own private memory space, which contains the OS and the applications that are to run on that CPU. In addition, there can be some shared memory space that is used for multiprocessor communication. This is contrasted with Symmetric Multiprocessing (SMP), in which one OS runs on multiple CPUs using a public shared memory space. Thanks to AMP, developers can use open-source Linux and FreeRTOS operating systems and the RPMsg Inter Processor Communication (IPC) framework between the Zynq's two high-performance ARM® Cortex™-A9 processors to quickly implement applications that need to deliver deterministic, real-time responsiveness for markets such as automotive, industrial and others with similar requirements. For further information, please refer to this link.

Two different examples are here provided. The first one – HelloWorld – shows basic functionalities while the second – RPMsg-based application – exploits more sophisticated techniques to handle inter-processors communication and synchronization. This latter configuration is based on RPMsg mechanism as described in Xilinx document UG978 (v2013.04, April 22, 2013).

PDF version of this Application Note can be downloaded here.

AMP on Bora[edit | edit source]

The following sections detail how to build the software components required to set up asymmetric multi-processing (AMP for short) configuration required to run Linux OS on first Cortex-A9 core and FreeRTOS on second Cortex-A9 core. The prerequisites are:

Building the software components[edit | edit source]

Vivado project[edit | edit source]

  • log into the development host
  • Assuming that a local repository has not been created, clone the remote Bora git repository (the -b option is used to automatically checkout the current branch):
git clone git@git.dave.eu:dave/bora/bora.git -b bora
  • Enter the git directory
  • Switch to bora branch (not required if this is already the current branch):
git checkout bora

Set project directory variable:

export PROJ_DIR=$(pwd)/../bora-build-YYYYMMDD-nobk

Configure Vivado settings (1):

. /opt/Xilinx/Vivado/2013.3/settings64.sh

Launch Vivado with build_project script (2):

vivado -mode tcl -source build_project.tcl -notrace -tclargs "-bitstream"

(1) In a 32 bit system, Vivado settings are configured with the following command /opt/Xilinx/Vivado/2013.3/settings32.sh

(2) Passing the -tclargs "-bitstream" parameters allows for automatic building of the FPGA bitstream.

FSBL[edit | edit source]

Once the Vivado project build is completed, the hardware configuration can be exported starting the SDK to build the FSBL. From the SDK GUI:

  • Create a new application project, as shown in the picture below:
AN-BELK-001 01.jpg
  • Configure the application settings as shown in the pictures below:
AN-BELK-001 02.jpg
AN-BELK-001 03.jpg
  • Click finish to launch FSBL build process
  • Create the binary from the FSBL ELF chosing one of the following options:
    • manually launch the command: arm-xilinx-eabi-objcopy -v -O binary $PROJ_DIR/bora.sdk/SDK/SDK_Export/bora_FSBL/Debug/bora_FSBL.elf $PROJ_DIR/bora.sdk/SDK/SDK_Export/bora_FSBL/Debug/bora_FSBL.bin
    • configure the automatic binary generation on project build. In Project Explorer, right-click on bora_FSBL project and select C/C++ Build Settings and add the command arm-xilinx-eabi-objcopy -v -O binary ${ProjName}.elf ${ProjName}.bin on Post-build steps

N.B. When the Vivado project is modified, the binary must be re-generated with the following command:

python fpga-bit-to-bin.py --flip $PROJ_DIR/bora.runs/bora_run_impl/bora_design_wrapper.bit $PROJ_DIR/bora.runs/bora_run_impl/bora_design_wrapper.bin

FreeRTOS applications[edit | edit source]

The following sections describe the steps required to configure and build both the Helloworld and the RPMsg-based examples.

Importing the FreeRTOS repository into the SDK[edit | edit source]
  • Assuming that a local repository has not been created, clone the remote freeRTOS git repository:
git clone git@git.dave.eu:dave/bora/freertos.git
  • Enter the git directory
  • Switch to freertos-AMP branch:
git checkout freertos-AMP
  • In SDK gui import new repository: Xilinx Tools->Repositories
AN-BELK-001 04.jpg
  • Click New... to add a new repository under Local or Global Repositories, and select the freeRTOS repository directory:
AN-BELK-001 05.jpg
  • Click Rescan Repositories , Apply and OK
  • At the end of the procedure, applications based on freeRTOS operating system can be built

Building Example #1: HelloWorld application[edit | edit source]

The first example shows basic AMP functionalities. On FreeRTOS side, UART0 is used to implement a simple console. This port is routed via EMIO signals to pin-strip connector of BoraEVB. Since these signals are driven by FPGA Bank #34, these pins are 3.3V. Thus a RS232 transceiver or an USB/UART bridge should be used in order to connect the console on a PC. The signals are routed to the JP17 connector of the BoraEVB as reported below:

  • JP17.4 – UART0_TX
  • JP17.6 – UART0_RX

Please follow the steps listed below to build a HelloWorld application that prints a message on UART0 (via EMIO) on FreeRTOS running on Bora core #2.

  • From the SDK GUI, create e new application project:
AN-BELK-001 01.jpg
  • Configure the application settings as shown in the pictures and table below:
AN-BELK-001 07.jpg
AN-BELK-001 08.jpg
    • Project name: helloworld_freeRTOS
    • Hardware Platform: hw_platform_0
    • Processor: ps7_cortexa9_1
    • OS Plaftorm: freertos_zynq
    • Language: C
    • Board Support Package: Create New
    • Type: FreeRTOS Hello World AMP template
  • Click finish to launch the application build process
  • Create the binary from the application ELF chosing one of the following options:
    • manually launch the command: arm-xilinx-eabi-objcopy -v -O binary $PROJ_DIR/bora.sdk/SDK/SDK_Export/hellowordl_freeRTOS/Debug/hellowordl_freeRTOS.elf $PROJ_DIR/bora.sdk/SDK/SDK_Export/hellowordl_freeRTOS/Debug/hellowordl_freeRTOS.bin
    • configure the automatic binary generation on project build. In Project Explorer, right-click on helloworld_freeRTOS project and select C/C++ Build Settings and add the command arm-xilinx-eabi-objcopy -v -O binary ${ProjName}.elf ${ProjName}.bin on Post-build steps.

Building Example #2: RPMsg-based application[edit | edit source]

The procedure needed to build this application is similar to the one used to build HelloWorld application. The only difference is that the FreeRTOS Latency AMP template must be selected. In this case please note that:

  • the standard Linux infrastructure will be used to load the firmware for the second core
  • Linux will start in SMP mode, running on both cores; then CPU1 will be shutdown and FreeRTOS firmware will be loaded and run.

This example application exploits TTC1 timer to measure IRQ latencies as described in Xilinx UG978. In addition to that, GPIO0 (pin JP21.16 on BoraEVB) will be toggled every time ISR is invoked.

Once the build process is completed, the executable file in .elf format will be generated (we suggest to name it freertos). Creating the .bin file is not required.

  • Project name: RPMsg_freeRTOS
  • Hardware Platform: hw_platform_0
  • Processor: ps7_cortexa9_1
  • OS Plaftorm: freertos_zynq
  • Language: C
  • Board Support Package: Create New
  • Type: FreeRTOS Latency AMP

To run this example, Linux kernel (1) must be rebuilt too (2). First of all copy the freertos executable file in .elf format (freertos) into the directory firmware of Linux kernel tree (3). Then configure the kernel using bora_amp_defconfig as configuration file and enter the following command line, that changes the default load address of kernel and launches the building of both the kernel image and the modules:

bash# make UIMAGE_LOADADDR=0x10008000 uImage modules
[...]
  OBJCOPY arch/arm/boot/zImage
  Kernel: arch/arm/boot/zImage is ready
  UIMAGE  arch/arm/boot/uImage
Image Name:   Linux-3.9.0-bora-1.1.0-xilinx-00
Created:      Thu Nov 21 15:55:07 2013
Image Type:   ARM Linux Kernel Image (uncompressed)
Data Size:    3217192 Bytes = 3141.79 kB = 3.07 MB
Load Address: 10008000
Entry Point:  10008000
  Image arch/arm/boot/uImage is ready

The file arch/arm/boot/uImage is the binary image of the kernel that must be used to boot the system. The following kernel modules, resulting from the kernel build procedure, must be copied from the building directory to the root file system (usually into /lib/modules/<kernel version>/kernel, but any other directory can be used):

  LD [M]  drivers/remoteproc/remoteproc.ko
  LD [M]  drivers/remoteproc/zynq_remoteproc.ko
  LD [M]  drivers/rpmsg/rpmsg_freertos_statistic.ko
  LD [M]  drivers/rpmsg/virtio_rpmsg_bus.ko
  LD [M]  drivers/virtio/virtio.ko
  LD [M]  drivers/virtio/virtio_ring.ko
  LD [M]  net/rpmsg/rpmsg_proto.ko

For further details on kernel modules, please refer to this link.

(1) The kernel branch must be bora.

(2) It is assumed that the development environment is already set up as described in BELK Quick Start Guide.

(3) The name of the binary file copied into the firmware directory must be freertos.

Linux Device Tree[edit | edit source]

The Flattened Device Tree (FDT) is a data structure for describing the hardware in a system (for further information, please refer to http://elinux.org/Device_Tree). Both Example #1 and Example #2 requires some modifications to the standard Bora device tree (to initialiaze UART0 port and to properly initialize the RPMsg infrastructure, respectively). Please use the kernel branch bora, that already includes the aforementioned patches (for further details, please refer to the arch/arm/boot/dts/bora.dts file and commit descriptions on the Linux git repository). For detailed instructions on how to build the Linux kernel and the Device Tree, please refer to the BELK Quick Start Guide TBD.

Running the demo applications[edit | edit source]

Example #1: HelloWorld application[edit | edit source]

This section describes how to run freeRTOS HelloWorld example application on BORA using AMP (Linux + FreeRTOS). Plese follow the steps listed below:

  • Place all the binary files into the host tftp directory:
    • Kernel (1): uImage
    • Device Tree: bora.dtb
    • First stage bootloader: bora_FSBL.bin
    • FPGA bitstream: bora_design_wrapper.bin
    • FreeRTOS application: helloworld_freeRTOS.bin
  • Start the Bora system
  • From the U-Boot shell, update the FSBL with the following commands:
run load_fsbl
run update_fsbl
  • Reset the board to reboot with the new FSBL
  • Add the following U-Boot environment variables (2):
setenv addcons 'setenv bootargs ${bootargs} console=${console},115200n8 cma=16M debug maxcpus=${nr_cpus}'
setenv addmem 'setenv bootargs ${bootargs} mem=$(kernel_mem)'
setenv kernel_mem 1008M
setenv nr_cpus 1
setenv net_nfs 'run program_fpga; run load_freertos; run loadk nfsargs addip addcons addmem; bootm ${loadaddr_kern} - ${loadaddr_ftd}'
setenv load_freertos 'tftp ${freertos_addr} ${freertos_file};mw.l 0xFFFFFFF0 ${freertos_addr}'
setenv freertos_addr 0x3F000000
setenv freertos_file bora/BELK/helloworld_freeRTOS.bin
setenv fpga_file BELK/bora_design_wrapper.bin

Boot the system running the following command:

run net_nfs

(1) The kernel must be built with the UIMAGE_LOADADDR 0x8000 option. Please refer to section 3.4.3 of the Belk Quick Start Guide.

(2)

program_fpga

: Loads FPGA binary from TFTP and programs the bitstream

load_freertos

: Loads freertos application binary from TFTP and writes application start address for core #2

mem=${kernel_memory}

: sets maximum kernel memory (1008M = 1024M - 16M)

maxcpus=${nr_cpus}

: sets maximum Linux cores to 1

Example #2: RPMsg-based application[edit | edit source]

As stated before, this example shows a more sophisticated approach that allows for:

  • using a standardized communication channel between the two cores
  • exploiting a standardized mechanism to load the firmware of second core.

The example performs IRQ latency measurements on FreeRTOS side by using a hardware timer. These measures are collected by the counterpart application running on Linux side and shown on console. Plese follow the steps listed below:

  • Place all the binary files into the host tftp directory:
    • Kernel: uImage
    • Device Tree: bora.dtb
    • First stage bootloader: bora_FSBL.bin
    • FPGA bitstream: bora_design_wrapper.bin
    • FreeRTOS application: freertos
  • Start the Bora system
  • From the U-Boot shell, update the FSBL with the following commands:
run load_fsbl
run update_fsbl
  • Reset the board to reboot with the new FSBL
  • Add the following U-Boot environment variables (1) (2):
setenv addcons 'setenv bootargs ${bootargs} console=${console},115200n8 cma=16M debug'
setenv addmem 'setenv bootargs ${bootargs} mem=$(kernel_mem)'
setenv kernel_mem 496M
setenv net_nfs 'run program_fpga; run loadk nfsargs addip addcons addmem; bootm ${loadaddr_kern} - ${loadaddr_ftd}'
setenv freertos_addr 0x3F000000
setenv fpga_file BELK/bora_design_wrapper.bin
  • Boot the system running the following command:

run net_nfs

When booting, the Linux kernel will print out the following message to indicate it has been relocated to address 0x10000000:

[    0.000000] Machine: Xilinx Zynq Platform, model: Bora
[    0.000000] Change memory bank to 10000000-2fffffff
[    0.000000] cma: CMA: reserved 16 MiB at 2f000000

To start the example, please enter the following commands on Linux side to load the required modules:

insmod  drivers/virtio/virtio.ko
insmod  drivers/virtio/virtio_ring.ko
insmod  drivers/rpmsg/virtio_rpmsg_bus.ko
insmod  net/rpmsg/rpmsg_proto.ko
insmod  drivers/remoteproc/remoteproc.ko
insmod  drivers/remoteproc/zynq_remoteproc.ko
insmod  drivers/rpmsg/rpmsg_freertos_statistic.ko

Linux kernel will print these messages, informing that the communication between the two cores has been established: [ 17.966158] NET: Registered protocol family 41 [ 18.036698] CPU1: shutdown [ 18.045287] remoteproc0: 0.remoteproc-test is available [ 18.050522] remoteproc0: Note: remoteproc is still under development and considered experimental. [ 18.059554] remoteproc0: THE BINARY FORMAT IS NOT YET FINALIZED, and backward compatibility isn't yet guaranteed. [ 18.077341] remoteproc0: powering up 0.remoteproc-test [ 18.082668] remoteproc0: Booting fw image freertos, size 2357682 [ 18.103607] remoteproc0: remote processor 0.remoteproc-test is now up [ 18.113339] virtio_rpmsg_bus virtio0: rpmsg host is online [ 18.118795] remoteproc0: registered virtio0 (type 7) [ 18.124417] virtio_rpmsg_bus virtio0: creating channel rpmsg-timer-statistic addr 0x50 [ 18.151586] rpmsg_freertos_statistic rpmsg0: new channel: 0x400 -> 0x50! Then run the latencystat application as shown below. The typical output will look like this:

root@bora:~# ./latencystat -b
Linux FreeRTOS AMP Demo.
   0: Command 0 ACKed
   1: Command 1 ACKed
Waiting for samples...
   2: Command 2 ACKed
   3: Command 3 ACKed
   4: Command 4 ACKed
-----------------------------------------------------------
Histogram Bucket Values:
        Bucket 323 ns (36 ticks) had 38 frequency
        Bucket 341 ns (38 ticks) had 299 frequency
        Bucket 512 ns (57 ticks) had 1 frequency
        Bucket 746 ns (83 ticks) had 1 frequency
-----------------------------------------------------------
Histogram Data:
        min: 323 ns (36 ticks)
        avg: 332 ns (37 ticks)
        max: 746 ns (83 ticks)
        out of range: 0
        total samples: 339
-----------------------------------------------------------

This application is extremely useful for evaluating how CPU load on first core affects IRQ latency. In case latency does not satisfy real-time requirements, it may be necessary to adjust arbitration priorities of processor's interconnect subsystem. For further details, please refer to chapter Interconnect of Zynq Technical Reference Manual.

N.B. prior to launching the latencystat application, make sure that the governor is set to performance with the following command:

echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

(1) program_fpga: Loads FPGA binary from TFTP and programs the bitstream

load_freertos: Loads freertos application binary from TFTP and writes application start address for core #2

(2) Please note that, using the RPMsg mechanism, it's not required to set the maxcpus=${nr_cpus} variable.

Advanced debugging techniques for AMP Linux+FreeRTOS configuration[edit | edit source]

Introduction[edit | edit source]

When working with complex real-time configurations such as AMP Linux+FreeRTOS, debugging requirements increase dramatically. This chapter – written in collaboration with Lauterbach SRL – shows how these issues can be tackled with Lauterbach TRACE32 ® debugger (1). The following picture shows the BoraEVB connected to Lauterbach PowerDebug Interface/USB3 via J18 connector. By default, the board is configured to chain Xilinx PL TAP and ARM DAP (please refer to chapter JTAG and DAP Subsystem of Zynq Technical Reference Manual for more details).

BoraEVB connected to Lauterbach PowerDebug Interface/USB3

The following sections describe in detail how to configure TRACE32® debugger to support debug of Linux running on the first Zynq core, and FreeRTOS, running on the second Zynq core.

(1) The techniques described in this chapter apply to the Example #1: HelloWorld FreeRTOS application (please refer to section TBD).

Prerequisites[edit | edit source]

LA-3500 Power Debug USB3 or LA-7705 Power Debug Ethernet or LA-7699 PowerDebug II LA-7843 JTAG Debugger for Cortex-A/-R LA-7960X License for Multicore Debugging TRACE32 PowerView for ARM (Release: Feb 2013, Software Version: R.2013.02.000045901) Optional: LA-7970X Trace License for ARM (Debug Cable) For a general introduction to debug features provided by TRACE32 tools, please refer to: “Debugger Basics – Training” manual (training_debugger.pdf) “Training HLL Debugging” manual (training_hll.pdf)

TRACE32 configuration[edit | edit source]

In AMP configuration, each core runs a unique code, already fixed at compile time. The CPU interoperates with other processing units, exchanging data through dedicated channels (for example, shared memory buffers or peripheral units). Lauterbach supports these architectures with different TRACE32 instances, each one connected to a single core, in “core view” configuration where debug focus is on single processor. However, as the cores do not work independently but perform the application task together and in parallel, it is possible to start and stop all the cores simultaneously. This is the only way to test the interaction between the cores and to monitor and control the entire application. Moreover, as each core run a separate part of the application, the majority of the symbol and debug information is assigned exclusively to the corresponding core. In the following paragraphs, the basic TRACE32 multicore configuration for a single device will be introduced. For more details, please refer to “ICD Debugger User's Guide” (debugger_user.pdf).

Multicore configuration[edit | edit source]

For the configuration of TRACE32 application, reference scripts are provided from Lauterbach. The first GUI must be started manually and must register itself to share a common JTAG handler with other TRACE32 applications. This is done setting the option CORE= in the configuration file (default file name: config.t32). PBI= ; within config file of first core USB CORE=1

or:

PBI= ; within config file of first core NET NODE=<IP_address> PACKLEN=1024 CORE=1 Nevertheless, the setting to define which core is addressed, actually is done later on.

Multicore synchronization[edit | edit source]

To use the start/stop synchronization between different core debuggers, the INTERCOM port settings are necessary. This is done assigning predefined port numbers in the configuration file to each TRACE32 application (option PORT=). IC=NETASSIST ; within config file of first core PORT=20001

IC=NETASSIST ; within config file of second core PORT=20002

Startup scripts[edit | edit source]

In order to use a generic configuration file for each TRACE32 instance, there is the possibility to use just one generic template file for all cores. The particular settings are passed as parameter. This is shown in the reference script: amp_start_core0.bat which refers to the configuration file: amp_config.t32 The batch file starts automatically this startup script: amp_demo.cmm After booting the first TRACE32 GUI, the second GUI will be started automatically by the startup script. See the reference script: amp_demo_start_core1.cmm For more details about PRACTICE batch language, please refer to: “Training PRACTICE” manual (training_practice.pdf) “PRACTICE Script Language User´s Guide” (practice_user.pdf) “PRACTICE Script Language Reference Guide” (practice_ref.pdf)

PRACTICE macros for multiple TRACE32[edit | edit source]

The startup script, started automatically at the first TRACE32 application, is fully able to configure the whole debug system, providing PRACTICE commands both to the current instance of TRACE32 application, and to the second instance. It’s also possible to deliver the same PRACTICE command to both instances with a single command line. The command redirection is possible using the INTERCOM feature. Typically some PRACTICE macros can be defined for this purpose. &core0=""  ;only to improve readability &core1="intercom localhost:&intercomport_core1" &both="GOSUB intercom_both " where: intercom_both:

 LOCAL &param
 ENTRY %Line &param
 &core0 &param
 &core1 &param

RETURN In this way, all CPU-specific configuration commands can be performed in the same way for each TRACE32 application, or distinguishing between different configurations. For example: &both SYStem.RESet or: &core0 SYStem.CPU ZYNQ-7000CORE0 &core1 SYStem.CPU ZYNQ-7000CORE1

The SYnch command[edit | edit source]

The synchronization between different TRACE32 applications is done by SYnch command group, which allows the following purposes: to establish a start/stop synchronization between the cores controlled by different TRACE32 instances; to allow concurrent assembler single steps between the cores controlled by different TRACE32 instances; to allow synchronous system mode changes between the cores controlled by different TRACE32 instances.

Related documents and additional resources[edit | edit source]

Document Location
DAVE Embedded Systems Developers Wiki http://wiki.dave.eu/index.php/Main_Page
Zynq-7000 Technical Reference Manual http://www.xilinx.com/support/documentation/user_guides/ug585-Zynq-7000-TRM.pdf
Bora main page on DAVE Embedded Systems Developers Wiki http://wiki.dave.eu/index.php/Category:Bora
Bora Hardware Manual http://www.dave.eu/sites/default/files/files/bora-hm.pdf
BoraEVB page on DAVE Embedded Systems Developers Wiki http://wiki.dave.eu/index.php/BoraEVB
Vivado Design Suite User Guide: Embedded Processor Hardware Design http://www.xilinx.com/support/documentation/sw_manuals/xilinx2013_2/ug898-vivado-embedded-design.pdf
Zynq-7000 All Programmable SoC: Concepts, Tools, and Techniques (CTT) http://www.xilinx.com/support/documentation/sw_manuals/xilinx14_6/ug873-zynq-ctt.pdf
Zynq-7000 All Programmable SoC Software Developers Guide http://www.xilinx.com/support/documentation/user_guides/ug821-zynq-7000-swdev.pdf
BELK Quick Start Guide Provided with BELK
Xilinx UG978 (v2013.04) April 22, 2013 http://www.xilinx.com/support/documentation/sw_manuals/petalinux2013_04/ug978-petalinux-zynq-amp.pdf
Multi-OS Support (AMP & Hypervisor) http://www.wiki.xilinx.com/Multi-OS+Support+%28AMP+%26+Hypervisor%29
Xilinx UG978 http://www.xilinx.com/support/documentation/sw_manuals/petalinux2013_04/ug978-petalinux-zynq-amp.pdf