Privacy-preserving Federated Learning Model For Email Spam Detection

This project employs federated learning to classify emails as spam or not spam while ensuring user privacy. Initially, various machine learning and deep learning models were trained on a publicly available dataset, and the models with the best results (SVM & GRU) were applied in a federated environment. Later, the homomorphic encryption technique was added to offer a higher level of data confidentiality.

Code     Project Report


Preface

Over the past few years, machine learning has revolutionized fields such as computer vision, natural language processing, speech recognition, and email spam filtering. Much of this success is based on collecting vast amounts of data, often in privacy-invasive ways. Federated Learning is a new subfield of machine learning that allows training models without collecting the data itself. Instead of sharing data, users collaboratively train a model by only sending weight updates to a server. In this project, this technique has been applied to a very relevant domain - Email Spam Filtering.

Project Objectives

The primary objective of our work is to analyze the performance of existing machine learning algorithms in the domain of email spam filtering. We aim to evaluate several neural network techniques like simple Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU) along with traditional machine learning algorithms such as Logistic Regression, Decision Tree Classifier, Naive Bayes (NB), Support Vector Machine (SVM) classifier, and Random Forest Classifier. Among these, we choose the algorithms that yield the best result in email spam detection and then apply them in a federated environment. The main objectives of our project work are summarised as follows:

  1. Applying several machine learning algorithms for the classification of emails as ham or spam.
  2. Providing a comparative analysis of the performance of these approaches.
  3. Applying two such approaches (with the best results) in a federated environment.
  4. Performing homomorphic encryption in the federated environment to ensure an even higher level of data security.
  5. Comparing the performance of machine learning algorithms when used with and without Federated Learning.