Changes

ML-TN-007 — AI at the edge: exploring Federated Learning solutions

778 bytes removed, 12:53, 4 October 2023

→‎Embedded Environment

|-

| Architecture

| ~~x86~~ AMD64

| -

|-

These platforms are both based on system-on-chips (SoC's) that are expressly designed to address industrial applications. NXP i.MX8M Plus and Xilinx Zynq UltraScale+ are indeed components that fit very well the demanding and challenging requirements our customers must meet in order to build successful products. For instance, these SoC's not only provides computational resources to implement complex architectures but also have wide temperature operating ranges and long availability as they belong to specific longevity programs issued by the respective manufacturers.

At its core, ZCU104 integrates an array of processing elements including a quad-core ARM Cortex-A53 Application Processing Unit (APU)<ref>Microprocessor that combines both traditional CPU and GPU cores onto a single chip.</ref>, which is based on an ARM64 architecture, and a dual-core ARM Cortex-R5 Real-Time Processing Unit (RPU)<ref>Dedicated hardware component or processor designed to execute tasks or operations with strict timing constraints. RPUs are commonly employed in systems that require immediate and predictable responses, such as embedded systems, robotics, and real-time control applications.</ref>. This processing power allows for efficient and parallel execution of complex [[ML-TN-001_-_AI_at_the_edge:_comparison_of_different_embedded_platforms_-_Part_1|ML inference algorithms]], making it an ideal choice for applications that demand real-time processing capabilities. It also features a Mali-400 MP2 Graphics Processing Unit, 16nm FinFET+ Programmable Logic, and 2 GB of DDR4 RAM. The peculiarity of this SoC that distinguishes it from competitors’ products is the fact that it integrates a Field Programmable Gate Array (FPGA)<ref>Re-configurable hardware device that allows users to implement custom digital circuits and functions by programming its internal logic gates and interconnections.</ref>, which is strictly coupled to the ARM processors. The ZCU104 boasts an array of high-speed interfaces, such as Gigabit Ethernet, USB 3.0, and DisplayPort, enabling seamless connectivity with external devices and peripherals. From the other side, at the heart of the SBC ORCA lies the NXP i.MX8M Plus SoC featuring a quad-core Arm Cortex-A53 CPU, which is based on ARM64 architecture, and a powerful Neural Processing Unit (NPU). The inclusion of the NPU enhances the platform’s ability to accelerate ML workloads, providing significant speed-up and power efficiency for [[ML-TN-001_-_AI_at_the_edge:_comparison_of_different_embedded_platforms_-_Part_1|neural network-based inference algorithms]]. The SBC ORCA is equipped with ample memory resources, including 6 GB of LPDDR4 RAM, to accommodate large datasets and complex ML models. Even the SBC ORCA offers a variety of high-speed interfaces, such as Gigabit Ethernet, USB, and HDMI, which enable seamless connectivity with external devices and peripherals. The research environment deployed on the two embedded devices was meticulously constructed, incorporating Python’s virtual environment (<code>python-venv</code>). The environment utilizes ~~pythonvenv~~<code>python-venv</code>, version 3.10.6, for the Xilinx Zynq UltraScale+ MPSoC ZCU104 device and version 3.9.1 for the DAVE Embedded Systems SBC ORCA device, ensuring precise version control and compatibility tailored to each device’s capabilities. To ensure uniformity and maintainability across different environments, the same "<code>requirements.txt" </code> file employed in the VM environment was seamlessly integrated into the Python virtual environments of both embedded devices. By employing this unified approach, it was guaranteed that each virtual environment within the embedded devices closely mirrors the environment within the VM, thereby enhancing reproducibility and streamlining research tasks and experiments across diverse hardware platforms. ~~Also~~ Even in this case, the role of the server was performed by the notebook machine ~~presented in~~ described previously. The following table illustrates the ~~subsection 2.2.1. In Table 2.3 are briefly summarised~~ characteristics of the ~~test bed specifications~~ machines usedfor the "embedded environment".

{| class="wikitable"

! ~~extbf{System}~~ Machine ! ~~\textbf{~~Component} ! ~~\textbf{~~Name} / Type ! ~~\textbf{~~Version}/ Qty

|-

|~~<nowiki> \multirow{~~rowspan="7~~}{*}{\textbf{Host}} </nowiki>~~" |PC

| Operating system

| GNU/Linux Ubuntu

| 22.04

|-

|

| ML frameworks

| Pytorch

| 1.13.1

|-

| rowspan="2" | ~~\multirow{2}{*}{~~FL frameworks}

| Flower

| 1.4.0

|-

|

| NVFlare

| 2.3.0

|-

| CPU| ~~Hardware~~ ~~| intel~~ Intel i7 12700h

| 6+8 core

|-

| ~~| Platform~~ Middleware | python -venv

| 3.10.6

|-

|

| Architecture

| ~~x86~~ AMD64 | -

|-

|~~<nowiki> \multirow{~~rowspan="8~~}{*}{\textbf{~~" |ZCU104~~}} </nowiki>~~

| Operating system

| Xilinx Linux Ubuntu

| 22.04

|-

|

| ML frameworks

| Pytorch

| 1.13.1

|-

| rowspan="2" | ~~\multirow{2}{*}{~~FL frameworks}

| Flower

| 1.4.0

|-

|

| NVFlare

| 2.3.0

|-

| ~~| \multirow{2}{*}{Hardware}~~ CPU

| ARM Cortex-A53 1.5Ghz

| 4 ~~core~~cores

|-

| RAM| DDR4 | 2GB ~~Ram~~ ~~| DDR4~~

|-

| ~~| Platform~~ Middleware

| python venv

| 3.10.6

|-

|

| Architecture

| ~~aarch64~~ ARM64 | -

|-

|~~<nowiki> \multirow{~~rowspan="8~~}{*}{\parbox{1.5cm}{\textbf{~~" |SBC ORCA~~}}} </nowiki>~~

| Operating system

| Linux Armbian

| 23.02

|-

|

| ML frameworks

| Pytorch

| 1.13.1

|-

| rowspan="2" | ~~\multirow{2}{*}{~~FL frameworks}

| Flower

| 1.4.0

|-

|

| NVFlare

| 2.3.0

|-

| ~~| \multirow{2}{*}{Hardware}~~ CPU

| ARM Cortex-A53 1.6Ghz

| 4 core

|-

| RAM| ~~| 6GB Ram~~ LPDDR4 | ~~LPDDR4~~6 GB

|-

|

| Platform

| python venv

| 3.9.1

|-

|

| Architecture

| ~~aarch64~~ ARM64 | -|} ~~[2]~~ ~~Microprocessor that combines both traditional CPU and GPU cores onto a single chip.~~ ~~[3]~~ ~~Dedicated hardware component or processor designed to execute tasks or operations~~ ~~with strict timing constraints. RPUs are commonly employed in systems that require~~ ~~immediate and predictable responses, such as embedded systems, robotics, and real-time~~ ~~control applications.~~ ~~[4]~~ ~~Re-configurable hardware device that allows users to implement custom digital circuits~~ ~~and functions by programming its internal logic gates and interconnections.~~

==== ML framework ====

U0001

Bureaucrats, dave_user, Administrators

4,650

edits

DAVE Developer's Wiki β

Changes

ML-TN-007 — AI at the edge: exploring Federated Learning solutions

DAVE Developer's Wiki ^β