Code underlying: Privacy-Preserving Membership Queries for Federated Anomaly Detection

DOI:10.4121/4e1739c5-f743-47cc-aa01-df52481e3fb3.v1
The DOI displayed above is for this specific version of this dataset, which is currently the latest. Newer versions may be published in the future. For a link that will always point to the latest version, please use
DOI: 10.4121/4e1739c5-f743-47cc-aa01-df52481e3fb3
Datacite citation style:
Vos, Jelle; Pentyala, Sikha; Golob, Steven; Maia, Ricardo José Menezes; Kelley, Dean et. al. (2025): Code underlying: Privacy-Preserving Membership Queries for Federated Anomaly Detection. Version 1. 4TU.ResearchData. software. https://doi.org/10.4121/4e1739c5-f743-47cc-aa01-df52481e3fb3.v1
Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite

Software

Privacy-Preserving Feature Extraction for Detection of

Anomalous Financial Transactions


------------------------------------------------------------------------


This repository holds the code written by the PPMLHuskies for the 2nd Place solution in the PETs Prize Challenge, Track A.


Description


The task is to predict probabilities for anomalous transactions, from a

synthetic database of international transactions, and several synthetic

databases of banking account information. We provide two solutions. One

solution, our centralized approach, found in `solution_centralized.py`,

uses the transactions database (PNS) and the banking database with no

privacy protections. The second solution, which provides robust privacy

gurantees outlined in our report, follows a federated architecture,

found in `solution_federated.py` and model.py. In this approach, PNS

data resides in one client, banking data is divided up accross other

clients, and an aggregator handles all the communication between any

clients. We have built in privacy protections so that clients and the

aggregator learn minimal information about each other, while engaging in

communication to detect anomalous transactions in PNS.


The way in which we conduct training and inference in both the

centralized and the federated architectures is fundamentally the same

(other than the privacy protections in the latter). Several new features

are engineered from the given PNS data. Then a model is trained on those

features from PNS. Next, during inference, a check is made to determine

if attributes from a PNS transaction match with the banking data, or if

the associated account in the banking data is flagged. If any of these

attributes are amiss, we give it a value of 1, and a 0 otherwise.

Lastly, we take the maximum of the inferred probabilities from the PNS

model, and the result from the Banking data validation, which is used as

our final prediction for the probability that the transaction is

anomalous.


The difference between the federated and centralized logic is that in

the federated set up, where there are one or multiple partitions of the

banking data across clients, is that the PNS client engages in a

cryptographic protocol based on homomorphic encryption with the banking

clients, routed through the aggregator, to perform feature extraction.

This protocol, to ensure privacy, and that PNS does not learn anything

from the banks beyond the set membership of a select few features, is

carried out over several rounds, r. r = 7 + n, where n is the number of

bank clients.

History

  • 2025-03-06 first online, published, posted

Publisher

4TU.ResearchData

Format

Python/.py, Rust/.rs

Organizations

TU Delft, Faculty of Electrical Engineering, Mathematics and Computer Science, Department of Intelligent Systems
University of Washington Tacoma, School of Engineering and Technology
University of Brasilia, Department of Computer Science

To access the source code, use the following command:

git clone https://data.4tu.nl/v3/datasets/2851fb4d-9c4f-498d-9ade-446399e45e08.git "PETsChallenge"

Or download the latest commit as a ZIP.