FLBench: A Comprehensive Experimental Evaluation of Federated Learning Frameworks

NAISS 2023/7-12



Principal Investigator:

Sadi Abed Alhaleem Mohammad Alawadi


Högskolan i Halmstad

Start Date:


End Date:


Primary Classification:

10201: Computer Sciences

Secondary Classification:

10202: Information Systems (Social aspects to be 50804)




The advent of distributed Machine Learning (ML) promoted sophisticated analytics at the network's edge. This decentralized and large-scale ML architecture is known as Federated Learning (FL). Fl aims to enable multiple actors to build a common and robust ML model over the local dataset. Furthermore, the advent of a new wave of FL frameworks promoted data privacy, security, access rights, and access to heterogeneous data. However, the variety of these frameworks required an experimental evaluation of performance analysis. Therefore, this project aims to evaluate and analyze the popular federated learning frameworks extensively (A similar comparison paper can be found here: ). The main contributions of this project are: 1. Evaluating the following open-source federated learning frameworks. • Paddle Federated Learning Framework ( • PySyft Framework /pygrid ( ) • Flower ( • TensorFlow FL ( ) • FEDn ( ) • Intel FL ( • FATE ( ) • FedML ( ) • TiFL () • OpenFL • NVFlare 2. Benchmarking suit (Experiment design), i.e., network architecture (the ML model, e.g., LSTM, CNN, etc.), the used datasets, benchmark tool/framework. (Mnist, Cifer10 & 100, IMDB, for IoT data CASA activity recognition). 3. Theoretically comparing the federated algorithm they support (FedAVG, FedProx, etc.), cross-device and cross-silo, horizontal and vertical federated learning. Also, open-source, Diversified Computing Paradigms (Standalone simulation, Distributed computing, on-device training), ML heterogeneity (Pytorch, TensorFlow, MXnet,...etc.), development coding language, Framework's timeline. 4. Comparison criteria, performance (task per time, aka, throughput), resources consumption (CPU, Memory, GPU), convergence, deployment effort, Flexibility, accuracy, scalability.