Code underlying: Privacy-Preserving Membership Queries for Federated Anomaly Detection
DOI: 10.4121/4e1739c5-f743-47cc-aa01-df52481e3fb3
Software
Categories
Licence Apache-2.0
Privacy-Preserving Feature Extraction for Detection of
Anomalous Financial Transactions
------------------------------------------------------------------------
This repository holds the code written by the PPMLHuskies for the 2nd Place solution in the PETs Prize Challenge, Track A.
Description
The task is to predict probabilities for anomalous transactions, from a
synthetic database of international transactions, and several synthetic
databases of banking account information. We provide two solutions. One
solution, our centralized approach, found in `solution_centralized.py`,
uses the transactions database (PNS) and the banking database with no
privacy protections. The second solution, which provides robust privacy
gurantees outlined in our report, follows a federated architecture,
found in `solution_federated.py` and model.py. In this approach, PNS
data resides in one client, banking data is divided up accross other
clients, and an aggregator handles all the communication between any
clients. We have built in privacy protections so that clients and the
aggregator learn minimal information about each other, while engaging in
communication to detect anomalous transactions in PNS.
The way in which we conduct training and inference in both the
centralized and the federated architectures is fundamentally the same
(other than the privacy protections in the latter). Several new features
are engineered from the given PNS data. Then a model is trained on those
features from PNS. Next, during inference, a check is made to determine
if attributes from a PNS transaction match with the banking data, or if
the associated account in the banking data is flagged. If any of these
attributes are amiss, we give it a value of 1, and a 0 otherwise.
Lastly, we take the maximum of the inferred probabilities from the PNS
model, and the result from the Banking data validation, which is used as
our final prediction for the probability that the transaction is
anomalous.
The difference between the federated and centralized logic is that in
the federated set up, where there are one or multiple partitions of the
banking data across clients, is that the PNS client engages in a
cryptographic protocol based on homomorphic encryption with the banking
clients, routed through the aggregator, to perform feature extraction.
This protocol, to ensure privacy, and that PNS does not learn anything
from the banks beyond the set membership of a select few features, is
carried out over several rounds, r. r = 7 + n, where n is the number of
bank clients.
History
- 2025-03-06 first online, published, posted
Publisher
4TU.ResearchDataFormat
Python/.py, Rust/.rsAssociated peer-reviewed publication
Privacy-Preserving Membership Queries for Federated Anomaly DetectionCode hosting project url
https://github.com/steveng9/PETsChallengeOrganizations
TU Delft, Faculty of Electrical Engineering, Mathematics and Computer Science, Department of Intelligent SystemsUniversity of Washington Tacoma, School of Engineering and Technology
University of Brasilia, Department of Computer Science
To access the source code, use the following command:
git clone https://data.4tu.nl/v3/datasets/2851fb4d-9c4f-498d-9ade-446399e45e08.git "PETsChallenge"