Changes

Jump to: navigation, search
Advanced system design
== Advanced system design ==
To the end of performing more advanced testing, the same problem described in chapter "Flower vs NVFlare: an in-depth comparison" was leveraged. However, the test bed was tuned in order to increase the complexity of the use case as detailed in the rest of this chapter. The design settings of this advanced FL system remain consistent with those utilized in the previous comparison between NVFlare and Flower, referred to as the "Local Environment", unless some changes. In this scenario, the same desktop machine was utilized, equipped with an NVidia RTX 3080 Ti GPU. The ML framework, Pytorch, remained consistent, as did the Data Preprocessing involving Dataset selection, Dataset splitting, and Data augmentation. However, a significant change was introduced regarding "Data heterogeneity". Model configuration and client-side settings also remained unchanged. Minor adjustments were made to the metrics taken into consideration, focusing exclusively on two: local training loss and server validation accuracy. On the server side, the configuration underwent modifications. While maintaining a count of four clients, the number of communication rounds was elevated to 20.
== FL algorithms and centralized simulation ==
== Data heterogeneity ==
In this advanced project, an additional feature was incorporated involvingthe integration of classes aimed at performing dataset splitting among the designated clients, which, in this instance, were four in number. In addition to dividing the dataset into four subsets, the possibility of choosing the level of heterogeneity of the data was added by applying the Dirichlet sampling strategy. Thus, it was possible to dynamically adjust the degree of data heterogeneity for each client bringing higher. This functionality made it possible to simultaneously customize the level of data heterogeneity across all clients. In the context of FL, this data heterogeneity can be defined as follows:* Low Data Heterogeneity: Low heterogeneity means that the data across different clients is quite similar or homogeneous. There is little variation among the data held by different clients. This leads to nearly balanced classes among clients, that is classes with a similar number of samples in each class.* High Data Heterogeneity: High heterogeneity means that there is significant diversity in the data across different clients or nodes. This means that every subset assigned to each client contains unbalanced classes, i.e. some classes may be over-represented in some customers, while others may be under-represented.In order to have a clear comparison within the experiments, the upper and lower extremes of the α factor affecting heterogeneity were considered, i.e. 0.1 and 1.0.
== Results analysis ==A series of seven experiments were conducted. The first experiment involved a centralised simulation, while the remaining six experiments focused on testing three different algorithms: FedAvg, FedProx and Scaffold. Specifically, each algorithm was tested twice: the first time with α = 0.1 and the second time with α = 1.0. The following figure represents the local <code>training_loss</code> obtained running the quoted experiments. As can be seen, the loss of the centralized simulation isn’t good enough to keep up with the other experiments that reach a bit lower loss values. This shows the effectiveness of the FL algorithms compared to a classical ML approach. It can also be noticed the same behavior obtained in section with the previous test bed, where at the integration beginning of each round the loss go instantly higher compared to the last epoch of the previous round to get lower with later epochs. Another important thing to note is how experiments with an alpha value of 0.1 perform worse than their counterpart evaluated from an alpha value of 1.0. This factor becomes even more evident when observing the following chart, which illustrates the server <code>validation_accuracy</code>. This is due to the fact that there are more unbalanced classes within each client’s dataset and this leads models trained on classes aimed with less data to have difficulty generalizing correctly. Models are more inclined to predict dominant classes, reducing accuracy on less represented classes. The poor representation of some classes makes it difficult for models to learn from them, leading to lower overall accuracy. Looking at performing dataset splitting among theindividual algorithms in more detail can be seen a very similar behaviour between the FedAvg and FedProx algorithms, which have very similar results in terms of both local training_loss and server validation_accuracy. This is mainly due to the fact that they are very similar to each other minus a proximity term mu, in the case of FedProx, which improves the convergence ratio. The Scaffold algorithm, on the other hand, has a totally different im-
designated clientsplementation from its predecessors, which, in this instance, were four in number. In additionallows to dynamic adjustment
to dividing of the dataset into four subsets, the possibility aggregation weight of choosing theeach client’s update based on their historical
level of heterogeneity of the data [48] was added by applying the Dirichletperformance, and thus achieves better performance, especially when using
sampling strategy [36]unbalanced classes (α = 0.1). This can easily be seen in the server
The Dirichlet distribution is a distribution over vectors x that fulfil thevalidation_accuracy graph.
== Results analysis ==The successful execution of this more complex use case on NVFlare, in-A series of seven experiments were conducted. The first experiment involvedvolving multiple tested algorithms and diverse data heterogeneity, further
underscores the framework’s robust capabilities and suitability for a centralised simulation, while the remaining six experiments focused on test-wide
ing three different algorithms: FedAvg, FedProx and Scaffoldrange of scenarios. Specifically,This result confirms the versatility of NVFlare as a frame-
each algorithm was tested twice: the first time with α = 0.1 and the secondwork for FL framework, making it a reliable choice for real-world scenarios
time with α = 1.0requiring the management of heterogeneous and complex data.
= Conclusions and future work =
4,650
edits

Navigation menu