Open main menu

DAVE Developer's Wiki β

Changes

no edit summary
{{Applies To Bora}}
{{Applies To BoraX}}
{{AppliesToBORA_TN}}
{{AppliesToBORA_Xpress_TN}}
{{InfoBoxBottom}}
|[[Bora_Embedded_Linux_Kit_(BELK)#BELK_software_components|3.0.0]]
|Internal draft
|-
|1.0.0
|November 2015
|[[Bora_Embedded_Linux_Kit_(BELK)#BELK_software_components|2.2.0, 3.0.0]]
|First public release
|-
|}
[[AN-BELK-001:_Asymmetric_Multiprocessing_(AMP)_on_Bora_–_Linux_FreeRTOS|Traditional AMP]]<ref name="AN-BELK-001"></ref> configuration satisfies [[#REQ1|REQ1]] through [[#REQ4|REQ4]]. [[#REQ5|REQ5]] through [[#REQ8|REQ8]] are not satisfied instead. About integrity, for example, an application with ''root'' privileges could access memory regions that are supposed to be exclusively accessed by code executed in W1. This may lead to unpredictable behaviors and to potentially catastrophic consequences. This is where TrustZone technology comes to help: it establishes a sort of barrier between the two worlds and prevents W2 code from unauthorized accesses to certain regions of the processor's addressing space.
 
 
 
TBD
aggiungere analisi delle soluzioni micro grosso+microcontrollore e di quelle big.little ? nella tesi di Daniel se ne parla
==TrustZone-based approach==
This section describes in detail the solution that implemented by DAVE Embedded Systems to overcome the limitations of basic AMP configuration a to satisfy all the requirements listed in section [[#Limitations of traditional configurations|''Limitations of traditional configurations'']].
===Overview===
 
The major difference with respect to the traditional AMP configuration is the use of a software monitor, specifically a customized version of TOPPERS SafeG
<ref name="TOPPERS SafeG home (en)">''TOPPERS SafeG home page (English)'', https://www.toppers.jp/en/safeg.html</ref>
<ref name="TOPPERS SafeG home (jp)">''TOPPERS SafeG home page (Japanese)'', https://www.toppers.jp/safeg.html</ref>
<ref name="TOPPERS SafeG">[http://www.wiki.xilinx.com/Multi-OS+Support+%28AMP+%26+Hypervisor%29#Asymmetric%20Multi%20Processing%20%28AMP%29%20Configurations-Open%20Source%20or%20Freely%20Available%20Solutions-TOPPERS%20SafeG%20%28Nagoya%20University%29 ''TOPPERS SafeG (Nagoya University)'']</ref>.
 
[[File:Safeg-arch-english.png|thumb|center|400px|Nagoya University TOPPERS SafeG architecture]]
 
As shown in the picture, the monitor can be viewed as a software layer that lies between Trust/Non-trust worlds and underlying hardware. The monitor is responsible for:
* enabling and initializing TrustZone in order to protect memory regions that must not be accessible by Non-secure world* TBDASsetup data structure and exception handlers needed for context switch and Secure Monitor Call (SMC)* start the trusted OS
About operating systemsLater, Linux has been chosen for Non-trust world, while [http://www.freertos.org FreeRTOS ] has been selected for the Trust world. At once the time of this designtrusted OS is ready, it will do a specific SMC that will do the Linux/FreeRTOS combination has proven to be context switch that will start the most appealing for the majority of applications that this solution addresses. Nevertheless different combinations are possible{{efn|For example TOPPERS project makes use of [http://www.toppers.jp/en/index.html different RTOSes].}}non-trusted OS.
About the multioperating systems, Linux has been chosen for Non-processing schemetrust world, AMP while [http://www.freertos.org FreeRTOS] has been usedselected for the Trust world.At the time of this design, the Linux/FreeRTOS combination has proven to be the most appealing for the majority of applications that this solution addresses. Nevertheless different combinations are possible{{efn|The monitor can support either AMP or SMP configurationsFor example TOPPERS project makes use of [http://www.toppers.jp/en/index.html different RTOSes].}}.
About the multi-processing scheme, the two Zynq core are assigned statically to the two world (core0 to Linux, core1 to FreeRTOS). This allows to:* simplify the whole system implementation* reduce RTOS latency (because there's never need of ''non-trusted to trusted'' context switch) From the memory point of view:* the main memory is statically partitioned (by the monitor) into tree sections:** a non-trusted private area (protected at MMU-only level from trusted access)** a trusted private area (protected at TrustZone level by non-trusted access)** a shared memory area, marked as non-trusted These choices lead to the configuration depicted in the following picture.[[File:Bora-wp001 01.png|thumb|center#L2 cache usage|400px|DAVE Embedded Systems' TrustZone-enabled AMP solutionhere]].
===System memory partitioning===
===Boot process===
The boot process is composed by consists of several stages that are detailed by in the following list.
# reset signal is deasserted and core #0's Program Counter is set to reset vector address
# The first piece of code executed by the processor is BootROM. Depending on bootstrap configuration pins, First Stage Boot Loader (FSBL in the rest of the document) image is retrieved from a specific non-volatile memory by BootROM and stored into on-chip memory (OCM).
# monitor code
#* initializes TrustZone subsystem
#* enables core #1 and AMP configurationboth cores, setting up all the data structure required by TrustZone
#* gives trusted code the control of the machine.
# FreeRTOS kernel is initialized and real-time tasks are started. <u>Under the control of the tasks running on top of the RTOS kernel</u>, the non-trusted (NT for short) code is started{{efn|This is done via a Secure Monitor Call (referred as SMC in the rest of the document) that is handled by the monitor.}}.
===Inter-world communication===
Even if it is technically possibly possible to implement a system where W1 and W2 are completely isolated, the majority of real applications need a communication mechanism between the two worlds{{efn|From a different point of view, this is a sort of break in the barrier between the two worlds. As such it poses non-trivial issues in term of integrity and timeliness in the Trust world. This matter will be covered in more detail in following sections.}}.
Several options are available to implement such a mechanism, each of which having different pros and cons. Exhaustive comparison of all of them is beyond the scope of this paper. Nevertheless some aspects will be briefly discussed and some useful links will be provided to study this notable subject in more depth.
# acceptance into the mainline linux kernel
# possibility to customize the implementation in order to control the degree of isolation between the two worlds.
The first criteria guarantees future maintainability on linux Linux side. Generally speaking, it is expected that the GPOS needs relatively frequent maintenance activities in order to add new functionalities or fix vulnerabilities and bugs. In contrast, RTOS side is typically more stable and easier to maintain. Therefore, selecting a communication mechanism that is included in mainline linux Linux kernel is preferable from the point of view of overall system maintainability.
The second criteria is very important because, as discussed in following sections, it is absolutely crucial that the communication channel - <u>that affects directly the degree of isolation between W1 and W2</u> - is very flexible in order to adjust it to application-specific requirements as needed.
Different solutions have been evaluated - , including but not limited to
''OP-TEE''<ref name="OP-TEE">''Open Portable Trusted Execution Environment'', https://www.linaro.org/blog/core-dump/op-tee-open-source-security-mass-market/</ref>,
''dualoscom''<ref name="Sangorrin's thesis">Daniel Sangorrin Lopez, ''Advanced integration techniques for highly reliable dual-os embedded systems'', Nagoya University (Japan), 27th July 2012, http://ir.nul.nagoya-u.ac.jp/jspui/bitstream/2237/16907/1/k9888.pdf</ref>,
''RPMsg''<ref name="RPMsg-TI">http://omappedia.org/wiki/Category:RPMsg</ref>, <ref name="RPMsg-linux">https://lwn.net/Articles/464391/</ref>
and ''OpenAMP''<ref name="OpenAMP">https://github.com/OpenAMP/open-amp</ref>- and the . The choice has fallen on ''RPMsg '' that has been considered the best compromise among the available options. It should be recalled that this choice is reversible, in the sense that if application-specific requirements can not be met by ''RPMsg'', <u>it can be replaced by a different communication scheme</u>.
===L2 cache management===
 
As specified by [[#REQ8|REQ8]], L2 cache must be able to be enabled (at least) on W2 side. As stated in
<ref name="XAPP1078"> John McDougall, ''XAPP1078 (v1.0) Simple AMP Running Linux and Bare-Metal System on Both Zynq SoC Processors'', 14th February 2013</ref> and
<ref name="XAPP1079">John McDougall, ''XAPP1079 (v1.0.1) Simple AMP: Bare-Metal System Running on Both Cortex-A9 Processors'', 24th January 2014</ref>,
L2 cache - in contrast to L1 - is a shared resource in Zynq implementation. As such, in case both cores have to use it, specific techniques have to be implemented to handle it properly in dual-OS AMP configuration.In principle the approach that has been adopted allows to implement different strategies related to L2 cache management. The actual configuration that has been used to conduct the tests described in [[#Characterization and performance tests|this section]] assigns the whole L2 cache to the W2 world, as depicted in the following picture. <span id="L2 cache usage"></span>[[File:Bora-wp001 01 v2-L2cache.png|thumb|center|400px|L2 cache usage]]
The This configuration has been chosen for several reasons:* Linux performance are not affected by this dual-OS solution here described allows when accessing it's private memory* FreeRTOS determinism is granted, because is using only non-shared resources (excluding the SOC L3 bus)* access to shared memory is uncached or only L1 cached (the latter forces SCU SMP mode usage).For sake of completeness, it is recalled that it is also possible to flexibly enable L2 cache in W1 too, without breaking [[#REQ5|REQ5]], because ARM PL310 L2 cache controller support the TrustZone technology and configure does not allow the non-trusted OS (W2) to access trusted OS (W1) cached data. However, enabling L2cache for W1 may improve its computational performance but, in general, reduces real time determinism as well. In case L2 cache unavailability on W1 side is unacceptable, on-chip memory (OCM) can come to help to mitigate this issue.
====Leveraging OCM====
The use of L1 and L2 is strictly related to performance of ARM cores. There's also another precious resource that can increase core performance without affecting determinism, which is the OCM.
In SOCs, OCM is usually tightly coupled to ARM core, providing access performances that are comparable to L2 cache. This allows to leverage it to compensate for the unavailability of L2 memory.
TBDThe most common scenario is to:ripartizione L2 tra * restrict OCM access to W1 e W2only (again, this can be done with TrustZone)utilizzo OCM* move the most ''latency sensitive'' code - usually vectors and ISRs - inside its memory ranges* prevent this memory range to be L1-cached (because this usually does not increase the performance significantly but may waste precious L1 memory lines).
==Characterization and performance tests==
TBDSome basics tests have been conducted to characterize the system configured as described above. The figure the tests focus on is the interrupt latency on W1 realm. This value has been measured under different system load conditions to verify if and how the non real-time world may influence the real-time world. About Linux side, two load conditions have been considered:* idle* Google stressapptest (SAT for short) <ref name="SAT">https://code.google.com/p/stressapptest/"</ref> running to stress SDRAM memory and SD I/O.About RTOS side:* idle* memory intensive task; two subcases, in turn, have been considered to evaluate the impact of the L2 cache unavailability. The RTOS memory task access an array in main memory of two different sizes:* the smaller is half of L1 size (16KiB)* the larger is 4 times the L1 size (128KiB) The main task on the RTOS side is the ''latencystat'' demo provided with [[AN-BELK-001:_Asymmetric_Multiprocessing_(AMP)_on_Bora_–_Linux_FreeRTOS|AN-BELK-001]]<ref name="AN-BELK-001"></ref>, which:* programs PS TTC timer as freerun, triggering an interrupt on overflow* inside the overflow ISR the TTC counter is read: this counter reports the number of ticks elapsed between the event (overflow) and the handler itself, in other words the interrupt latency* after a while the TTC is reprogrammed and interrupt is enabled again, to trigger another event* those ''latency counters'' are collected into an array* the Linux-side application, by default after 10 seconds, stops the RTOS task which sends the array data over RPMsg* the Linux-side application collects the data and display the mininum, maximum and average latency measured The following table summarize the test results (all timing are given in ''ns'') {| class="wikitable"|-! rowspan="2"| Lantecy !! Linux idle !! colspan="3" | Linux SAT|-! RTOS idle !! RTOS idle !! RTOS 16k !! RTOS 128k|-| min || align="right"| 287 || align="right"| 287 || align="right"| 287 || align="right"| 1268|-| avg || align="right"| 287 || align="right"| 296 || align="right"| 305 || align="right"| 2024|-| max || align="right"| 548 || align="right"| 539 || align="right"| 575 || align="right"| 3050|}
==Conclusions and future work==
TBDThe following conclusions can be drawn from the test results:===Isolation vs performances===This work confirmed the need to find a trade* Real-off between two requirements that often push in opposite directions: isolation and performances. On one hand isolation should be pushed to the maximum possible extent to preserve the integrity timeness of W1 worldrealm is preserved in any condition, since Linux activity on CPU/memory/SD virtually has no influence on RTOS latency. On the other hand, overall systems performances have not to be affected so much that the product gets unusable* Moderate RTOS activity has no impact on latency. Generally speaking* As expected, strong isolation negatively impacts performancesin case intensive memory activity is performed on RTOS side, so finding the optimal balancing is not trivial. A "one size fits all" solution does not exist and system designer is responsible to choose which direction this knob has to be moved. This analysis naturally has to take into account application-specific requirementsdata/instruction cache misses increase significantly resulting in higher latency.
===Future work===
TBDFuture work will first focus on an additional feature that has not been included in the requirement list but that is undoubtedly useful in several applications. We are referring to the possibility of performing a complete reboot of the GPOS under the control of the RTOS, while this keeps operating normally. For instance this can be exploited when the RTOS needs to work as software watchdog for W2 activity: in case no activity is detected for a certain period of time, GPOS can be shutdown and rebooted.
Another aspect that should be investigated in more depth refers to the effects of the communication between W1 and W2 on the IRQ latency and the integrity of the real-time world. This matter is strictly related to the degree of isolation between the two worlds. In this work a strong-isolation approach has been adopted, meaning that*no data is exchanged during the execution of the IRQ latency measurement*it has been implicitly assumed that data sent from W2 to W1 can not compromise the integrity of the trust domain.These assumption may be not verified in real applications, however specific techniques can be implemented to manage these situations (see for example <ref name="Sangorrin's thesis"></ref> and <ref name=References=="PreventingInterruptOverload">J. Regehr, U. Duongsaa, ''Preventing Interrupt Overload'', 2nd May 2005, http://www.cs.utah.edu/~regehr/papers/lctes05/regehr-lctes05.pdf<references /ref>).
-----
{{notelist}}
 
==References==
{{reflist}}
8,204
edits