ML-TN-007 — AI at the edge: exploring Federated Learning solutions

From DAVE Developer's Wiki
Revision as of 14:27, 27 September 2023 by U0001 (talk | contribs) (Choosing Federated learning frameworks)

Jump to: navigation, search
Info Box
NeuralNetwork.png Applies to Machine Learning


History[edit | edit source]

Version Date Notes
1.0.0 August 2023 First public release

Introduction[edit | edit source]

According to Wikipedia, Federated Learning (FL) is defined as a machine learning technique that trains an algorithm via multiple independent sessions, each using its own dataset. This approach stands in contrast to traditional centralized machine learning techniques where local datasets are merged into one training session, as well as to approaches that assume that local data samples are identically distributed.

Federated learning enables multiple actors to build a common, robust machine learning model without sharing data, thus addressing critical issues such as data privacy, data security, data access rights and access to heterogeneous data. Its applications engage industries including defense, telecommunications, Internet of Things, and pharmaceuticals. A major open question is when/whether federated learning is preferable to pooled data learning. Another open question concerns the trustworthiness of the devices and the impact of malicious actors on the learned model.

In principle, FL can be an extremely useful technique to address critical issues of industrial IoT (IIoT) applications. As such, it matches perfectly DAVE Embedded Systems' IIoT platform, ToloMEO. This Technical Note (TN) illustrates how DAVE Embedded Systems explored, tested, and characterized some of the most promising open-source FL frameworks available to date. One of these frameworks might equip ToloMEO-compliant products in the future allowing our customers to implement federated learning systems easily. From the point of view of machine learning, therefore, we investigated if typical embedded architectures used today for industrial applications are suited for acting not only as inference platforms — we already dealt with this issue here — but as training platforms as well.

In brief, the work consisted of the following steps:

  • Selecting the FL frameworks to test.
  • Testing the selected frameworks.
  • Comparing the results for isolating the best framework.
  • Deep investigation of the best framework.

A detailed dissertation of the work that led to this Technical Note is available here TBD.

Choosing Federated learning frameworks[edit | edit source]

Criteria and initial, long list[edit | edit source]

For selecting the frameworks, several factors were taken into account:

  • ML frameworks flexibility: The adaptability of the framework to manage different ML frameworks.
  • Licensing: It is mandatory that the framework has an open-source, permissive license to cope with the typical requirements of real-world use cases.
  • Repository rating and releases: Rating in a repository is important for a FL framework as it indicates a high level of community interest and support, potentially leading to more contributions and improvements. Meanwhile, the first and latest releases indicate respectively the maturity and the support of the framework and whether it is released or still in a beta version.
  • Documentation and tutorials: The provided documentation with related tutorials has to be complete and well-made.
  • Readiness for commercial usage: The readiness of the framework to be developed in a real-world scenario. In order to establish the readiness, it was checked the version of the framework and the license.

According to the previous criteria, an initial list including the most promising FL frameworks was completed. It comprised of the following products:

The limitation in selecting only eighth FL frameworks arises from the evolving nature of the field. As a relatively recent and rapidly evolving technique, FL continues to witness the emergence of various frameworks, each with its unique features and capabilities. In this context, the choice to focus on these frameworks reflects the attempt to capture the current state of the art and provide an analysis of the most prominent and well-established options available. This selection aims to offer valuable insights into the leading frameworks that are currently considered among the best choices in the evolving landscape of FL.

In the next sections, the aforementioned factors will be treated individually, justifying the reasons behind the discard of some frameworks rather than others.

ML frameworks flexibility[edit | edit source]

Flexibility in ML frameworks is crucial when choosing a FL framework as

it allows adapting the system to diverse use cases and data distributions.

A flexible framework can support various ML algorithms, models, and data

types, accommodating the specific requirements of different scenarios. This

adaptability enhances the framework’s applicability in real-world deploy-

ments, ensuring it can effectively handle the heterogeneity and dynamic na-

ture of distributed data sources across clients.

It is noteworthy to mention that most of the frameworks discussed ear-

lier, including NVFlare, FATE, Flower, PySyft, IBM, OpenFL, and FedML,

are designed to be agnostic to the underlying ML framework used. This ag-

nosticism provides users with the flexibility to employ various ML libraries,

such as TensorFlow, PyTorch, and scikit-learn, among others, based on their

preferences and project requirements.

However, it is important to highlight that one exception to this trend

is TFF. Unlike the other frameworks, TFF is specifically tailored for the

TensorFlow ecosystem. While it harnesses the powerful capabilities of Ten-

sorFlow, it inherently limits the utilisation of other ML libraries in the FL

context. As a result, users opting for TFF should be aware of this frame-

work’s dependency on TensorFlow for their FL endeavors.

For that reason TFF was discarded from the potential frameworks to be

considered for comparison.

Licensing[edit | edit source]

The choice of a suitable license is of paramount importance for any FL frame-

work [19]. A well-crafted license provides a legal foundation that governs the

usage, distribution, and modification of the framework’s source code and as-

sociated components.

A permissive license, like the MIT License or Apache License, allows users

to use, modify, and distribute the framework with relatively few restrictions.

This encourages widespread adoption, fosters innovation, and facilitates con-

tributions from a broader community of developers and researchers. The per-

missiveness of these licenses empowers users to incorporate the framework

into their projects, even if they have proprietary components.

On the other hand, copyleft licenses, like the GNU GPL, require derived

works to be distributed under the same terms, ensuring that any modifica-

tions or extensions to the framework remain open-source. While this may

be more restrictive, it encourages a collaborative ecosystem where improve-

ments are shared back with the community.

A clear and well-defined license also provides legal protection to both

developers and users, helping to mitigate potential legal risks and disputes.

It ensures that contributors have granted appropriate rights to their work

and helps maintain a healthy and sustainable development environment.

Most of the frameworks previously described are under the Apache-2.0

license except one: IBMFL. In fact, it is under an unspecified license that

makes the framework not suitable for commercial use. For that reason,

IBMFL was discarded from the comparison too.

Repository stars and releases[edit | edit source]

Stars in a GitHub repository are important because they serve as a measure

of popularity and community interest in the project. When a repository

receives more stars, it indicates that more developers and users find the

project valuable and relevant. This can lead to several benefits:

  • Visibility: Repositories with more stars are likely to appear higher in GitHub search results, making it easier for others to discover and use the project.
  • Credibility: High-starred repositories are often perceived as more trust- worthy and reliable, as they are vetted and endorsed by a larger user base.
  • Contributions: Popular repositories tend to attract more contribu- tions from developers, leading to a more active and vibrant community around the project.
  • Feedback: Projects with many stars are more likely to receive feedback, bug reports, and feature requests, helping the developers improve the software.
  • Maintenance: Higher stars can also stimulate the maintainers to keep the project updated and actively supported. Other important aspects, which are related to the stars obtained by the framework, are the first and latest releases. Thanks to the latter, it is possible respectively to see the maturity of the framework and also how often it is updated, and thus the support behind it. Obviously, a framework that was born earlier than others is much more likely to have more stars. Having this in mind, at the time of writing this thesis, the ranking in terms of star ratings received correlated with the first release for each framework is as follows:
  • PySyft 8.9k stars Jan 19, 2020 2. FATE 5.1k stars Feb 18, 2019 3. FedML 3.1k stars Apr 30, 2022 4. Flower 2.8k stars Nov 11, 2020 5. TFF 2.1k stars Feb 20, 2019 6. OpenFL 567 stars Feb 1, 2021 7. IBMFL 438 stars Aug 28, 2020 8. NVFlare 413 stars Nov 23, 2021 These characteristics, although they certainly have a bearing on the choice of frameworks, were not enough to go so far as to discard any of the selected frameworks.

Documentation and tutorials[edit | edit source]

  • High quality documentation and well-crafted tutorials are essential consid- erations when selecting a FL framework. In fact, there are several reasons that are presented here below:
  • Accessibility and Ease of Use: Comprehensive documentation allows users to understand the framework’s functionalities, APIs, and usage quickly. It enables developers, researchers, and practitioners to get started with the framework efficiently, reducing the learning curve. • Accelerated Development: Well-structured tutorials and examples demon- strate how to use the framework to build practical FL systems. They provide step-by-step guidance on setting up experiments, running code, and interpreting results. This expedites the development process and encourages experimentation with different configurations. • Error Prevention: Clear documentation and good examples help users avoid common mistakes and errors during implementation. It provides troubleshooting tips and addresses frequently asked questions, reducing frustration and increasing user satisfaction. • Reliability and Robustness: A well-documented framework indicates that developers have invested time in organising their code and ex- plaining its functionalities. This attention to detail suggests a more reliable and stable framework. • Maintenance: Higher stars can also stimulate the maintainers to keep the project updated and actively supported.

Regarding this aspects, there are a lot of frameworks that still don’t have

good documentation and tutorials. Among the latter, there are: PySyft,

OpenFL and FedML. PySyft is still under construction, as the official repos-

itory says, and for that reason often the documentation is not up to date

and is not complete. OpenFL, on its side, has very meager documentation

and only a few tutorials that don’t explore a lot of ML frameworks or a

lot of scenarios. The FedML framework also has, like PySyft, incomplete

documentation because the project is born very recently and is still under

development. Finally, the FATE framework has a complete and well-made

documentation but very few tutorials and, because of its complex architec-

ture, would have taken too much time. Because of these reasons, these four

frameworks were discarded from the comparison.

Readiness for commercial usage[edit | edit source]

In the context of FL, the significance of a framework being ready for commer-

cial use cannot be overstated. As businesses increasingly recognise the value

of decentralised ML solutions, the demand for robust and production-ready

frameworks has intensified.

A FL framework geared for commercial use offers several crucial ad-

vantages. Firstly, it provides a stable and scalable foundation to deploy

large-scale FL systems across diverse devices and platforms. This ensures

that businesses can seamlessly integrate the framework into their existing

infrastructure, minimising disruption and optimising efficiency.

Moreover, a commercially viable framework emphasises security and pri-

vacy measures, a non-negotiable aspect when dealing with sensitive data

across distributed environments. Advanced encryption techniques, secure

communication protocols, and differential privacy methods guarantee that

user data remains safeguarded, mitigating potential risks of data breaches

or unauthorised access.

Of the frameworks covered, only a few are ready to be used commer-

cially. Among them are: Flower, NVFlare, FATE, OpenFL and TFF. On

the other hand, there are some frameworks that are not yet ready. Among

the latter are: FedML, PySyft and IBMFL. The first two are in fact still

under development and not yet ready for commercial-level use, while the

third has a private license that does not allow the framework to be used

for commercial-level applications. As explained in the previous sub-sections,

these frameworks were already discarded from comparison.

Final choice[edit | edit source]

At the beginning of this section, a total of eight frameworks were considered.

Each framework was assessed based on various aspects and after an in-depth

analysis, six frameworks were deemed unsuitable due to some requisites not

being met.

The requirements that were met and not met by the framework were

summarised in Table 3.1:

TBD

These two remaining frameworks are: Flower and NVFlare. They demon-

strated the potential to address the research objectives effectively and were

well-aligned with the specific requirements of the FL project.

In chapter 5, these two selected frameworks will be rigorously compared,

examining their capabilities in handling diverse ML models, supporting vari-

ous communication protocols, and accommodating heterogeneous client con-

figurations. The comparison will delve into the frameworks’ performance,

ease of integration, and potential for real-world deployment.

By focusing on these two frameworks, this research aims to provide a

detailed evaluation that can serve as a valuable resource for practitioners

and researchers seeking to implement FL in a variety of scenarios. The se-

lected frameworks will undergo comprehensive testing and analysis, enabling

the subsequent sections to present an informed and insightful comparison,

shedding light on their respective strengths and limitations.

Frameworks Comparison[edit | edit source]

Testing the selected frameworks[edit | edit source]

Flower[edit | edit source]

Flower is

Flower running on SBC ORCA
# of cores
1
Flower 1-core htop MX8M+.png
Flower log 1-core MX8M+.png
4
Flower 4-cpu htop MX8M+.png
Flower log 4-core MX8M+.png

NVFlare[edit | edit source]

NVFlare is

TBD

Comparing test results[edit | edit source]

TBD

Deep investigation of NVFlare[edit | edit source]

TBD

Conclusions and future work[edit | edit source]

TBD

One important issue that was not addressed yet is the labeling of new samples. In other words, it was implicitly assumed that new samples collected by a device are somehow labelled prior to being used for training. This is a strong assumption because implies that

labeling of new samples issue