Changes

Jump to: navigation, search
Cloud environment
====== Cloud environment ======
The model architecture chosen for the cloud environment prioritizes simplicity and compatibility with the resource-constrained CPUs of embedded devices and virtual machines. This decision was motivated by the need for a lightweight and efficient model that can be easily deployed and executed on various devices participating in the FL system. For that reason, the architecture chosen is a model , specifically SimpleCNN, taken from the "Training a classifier" tutorial available on the official PyTorch website. This selection fits perfectly with the request for a lightweight model that could still deliver satisfactory results.
====== Local environment ======
{| class="wikitable"
! extbf{Framework} ! \textbf{client} # clients ! \textbf{round} # rounds ! \textbf{epoch} # epochs ! \textbf{step} Step ! \textbf{model} Model ! \textbf{device}Device
|-
| Flower
|}
In order to have two clients, also the Xilinx Zynq UltraScale+ MPSoC ZCU104 Evaluation Kit was used. Since the two boards have similar CPUs the execution time for the two embedded devices is the same, allowing only the DAVE Embedded Systems SBC ORCA device to be considered for calculating the execution time as the number of available cores varies. The model architecture used is the same SimpleCNN already presented in section 4mentioned previously.1.5.The running For a better understanding, the execution time analysed was divided into two parts: the training timeand the total time. [[File:Flower-NVFlare-execution-time-1.png|center|thumb|716x716px|Total execution time.]] [[File:Flower-NVFlare-execution-time-2.png|center|thumb|702x702px|Training execution time.]][[File:Flower-NVFlare-execution-time-3.png|center|thumb|707x707px|CPU utilization: single-core vs quad-core.]] Although the execution times between the two frameworks are very different from each other due to a different infrastructure, the 4-core and 1-core execution times are comparable. In order fact, for both frameworks, in terms of both training and overall execution times, the decrease in cores from 4 to get 1 resulted in a more detailed view loss in performance ranging from 18% to a maximum of 22%. It can be therefore concluded that the degree of parallelism for both frameworks is not very high. This is also due to the fact that during execution, the device did not utilize all 4 cores at 100% but only at about 50%, compared to 100% utilization when the variousdevice was set to 1 core. execution From the histograms, it can be seen that there are comparable training times for between the two frameworks with Flower being slightly faster and configurations usedthus making more efficient use of computing power. On the other hand, there is a clear difference between the two, as far as the total running time is concerned. This discrepancy in execution speed could be attributed to various factors, including differences in algorithmic optimizations, parallel processing efficiency, network communication strategies, and underlying architectural design. In fact, Flower has a much simpler architecture than NVFlare, bringing to have a total execution time of about three times lower. This is important since this agility is particularly valuable in scenarios where real-time decision-making or rapid response to changing data is sumcrucial. Moreover, in resource-marised limited environments, such as the one used in Figure 4this work that makes use of embedded devices, conservation of computing power may be essential.30 and Figure 4Flower’s efficiency in this regard makes it a more suitable choice for applications where hardware resources are limited.31:
= Applying NVFlare to a real-word case =
4,650
edits

Navigation menu