Changes

ML-TN-007 — AI at the edge: exploring Federated Learning solutions

163 bytes added, 08:03, 28 September 2023

no edit summary

* Comparing the results for isolating the best framework.

* Deep investigation of the best framework.

A detailed dissertation of the work that led to this Technical Note is available here TBD[1].

== Choosing Federated learning frameworks ==

=== System design ===

This section describes the architectural aspects of the entire system used to test Flower and NVFlare.

==== Testing environments ====

Within the FL infrastructure, multiple parties ~~were~~ are involved to create a collaborative environment for training ML models. These parties, including data providers (clients) and a model aggregator (server), ~~played~~ play essential roles in the FL process. Data providers ~~contributed~~ contribute their locally-held data, while model aggregators ~~facilitated~~ facilitate the consolidation of the individual model updates from different parties. The interactions between these parties ~~enabled~~ enable the training of robust preserving models, making FL an effective approach for decentralized data scenarios.

Two testing environments were usedfor testing the frameworks: the first one is denoted as "''local"'', while the other is called "''cloud"''.

* Local environment: The local parties consist of a single desktop computer that acts both as the server and four separate clients. This configuration mimics a decentralized environment where the desktop computer takes on the roles of multiple participants, simulating the interactions and data contributions of distinct entities. As the server, it coordinates and manages the FL process, while functioning as an individual client allows it to provide diverse data contributions for training. This localized approach allowed for the use of Docker as the development environment, leveraging the power of a desktop computer assembled with an RTX 3080ti GPU to enhance performance. The power given by the NVidia GPU, allowed the use of a more complex model and the simulation of four clients on the same machine that acts also as the server. Being self-contained in a single host, the local environment is convenient for testing, especially when the focus is on the functional verification.

* Cloud environment: In this case, cloud parties consist of two embedded devices or virtual machines acting as clients, and a notebook serving as the server. This configuration facilitates a distributed learning approach, enabling the clients to process their data locally while contributing to the model’s training. The server coordinates the learning process and aggregates the updates from the clients to improve the global model. This setup ensures a decentralized and privacy-preserving approach to ML, as the data remains on the clients’ devices, and only the model updates are shared during the training process. Leveraging embedded devices as clients enables the inclusion of resource-constrained devices in the FL ecosystem, making the framework more versatile and applicable to a wide range of scenarios. The notebook acting as the server provides a centralized point of coordination and ensures smooth communication and collaboration between the clients, making the FL process efficient and effective in leveraging distributed resources for improved model performance. Of course, this environment is more complicated to set up, but it better simulates real configurations.

==== ML framework ====

~~The next step was to decide on~~ Another crucial factor in designing the testing set up is the ML framework to be used. To this end, PyTorch ~~[47]~~ was selected as the primary ML framework. The ~~flex-~~ ~~ibility~~ flexibility of PyTorch allowed for the implementation of complex models and easy ~~customisation~~ customization to meet specific project requirements. ~~The~~ Also, the availability of pre-trained models and a vast collection of built-in functions expedited the development process and enabled focus on the core aspects of the project ~~[21]~~. Another pivotal factor is PyTorch’s ability to leverage GPU for ~~hard-~~ ~~ware~~ hardware acceleration, which is crucial for training models on distributed data in FL environments. Its integration with CUDA and optimisation for GPU computing make it a pragmatic choice for applications requiring high ~~per-~~ ~~formance~~performance. Lastly, PyTorch was chosen for its adaptability within the existing ~~devel-~~ ~~opment~~ development environment, including its compatibility with Docker and '''embedded devices based on the ARM64 (AArch64) architecture'''. PyTorch’s ~~adaptabil-~~ ~~ity~~ adaptability and support for ARM64 architecture were key factors in this decision. This interoperability has facilitated the integration of the framework into the research and development environment.

==== Data Preprocessing ====

~~The step of~~ "Data Preprocessing" ~~[51] holds significant importance~~ in ~~en-~~ ~~suring~~ an important step for ensuring the success and effectiveness of the entire process. This crucial phase involves the choice of the dataset and the transformation and preparation of data before it is used for training the ML model on distributed devices. The "Data Preprocessing" stage plays a vital role too in ~~harmonising~~ harmonizing the data collected from different parties, which might have varying data distributionsand formats. By applying standardized preprocessing techniques across the data from multiple clients, the potential bias and inconsistencies arising from diverse data sources can be mitigated, leading to a more accurate and robust global model.

~~and formats. By applying standardised~~ Data preprocessing ~~techniques across the~~ ~~data from multiple clients~~step includes dataset selection, dataset splitting, ~~the potential bias~~ and ~~inconsistencies arising from~~ ~~diverse~~ data ~~sources can be mitigated~~augmentation. For more details about these operations, ~~leading to a more accurate and robust~~ ~~global model.~~ ~~The next subsections will take a closer look at the steps taken~~ please refer to ~~complete~~ ~~this important aspect of the project~~[1]. ~~TBD~~

==== Model configuration ====

* unsupervised learning?

**https://arxiv.org/pdf/1805.03911.pdf

== References ==

* [1]

U0001

Bureaucrats, dave_user, Administrators

4,650

edits

DAVE Developer's Wiki β

Changes

ML-TN-007 — AI at the edge: exploring Federated Learning solutions

DAVE Developer's Wiki ^β