TY - DATA T1 - Code underlying: Privacy-Preserving Membership Queries for Federated Anomaly Detection PY - 2025/03/06 AU - Jelle Vos AU - Sikha Pentyala AU - Steven Golob AU - Ricardo José Menezes Maia AU - Dean Kelley AU - Zekeriya Erkin AU - Martine De Cock AU - Anderson Nascimento UR - DO - 10.4121/4e1739c5-f743-47cc-aa01-df52481e3fb3.v1 KW - privacy enhancing technologies KW - anomaly detection KW - federated learning KW - private membership queries KW - secure computation KW - cryptography KW - elliptic curves N2 -

Privacy-Preserving Feature Extraction for Detection of

Anomalous Financial Transactions

------------------------------------------------------------------------

This repository holds the code written by the PPMLHuskies for the 2nd Place solution in the PETs Prize Challenge, Track A.

Description

The task is to predict probabilities for anomalous transactions, from a

synthetic database of international transactions, and several synthetic

databases of banking account information. We provide two solutions. One

solution, our centralized approach, found in `solution_centralized.py`,

uses the transactions database (PNS) and the banking database with no

privacy protections. The second solution, which provides robust privacy

gurantees outlined in our report, follows a federated architecture,

found in `solution_federated.py` and model.py. In this approach, PNS

data resides in one client, banking data is divided up accross other

clients, and an aggregator handles all the communication between any

clients. We have built in privacy protections so that clients and the

aggregator learn minimal information about each other, while engaging in

communication to detect anomalous transactions in PNS.

The way in which we conduct training and inference in both the

centralized and the federated architectures is fundamentally the same

(other than the privacy protections in the latter). Several new features

are engineered from the given PNS data. Then a model is trained on those

features from PNS. Next, during inference, a check is made to determine

if attributes from a PNS transaction match with the banking data, or if

the associated account in the banking data is flagged. If any of these

attributes are amiss, we give it a value of 1, and a 0 otherwise.

Lastly, we take the maximum of the inferred probabilities from the PNS

model, and the result from the Banking data validation, which is used as

our final prediction for the probability that the transaction is

anomalous.

The difference between the federated and centralized logic is that in

the federated set up, where there are one or multiple partitions of the

banking data across clients, is that the PNS client engages in a

cryptographic protocol based on homomorphic encryption with the banking

clients, routed through the aggregator, to perform feature extraction.

This protocol, to ensure privacy, and that PNS does not learn anything

from the banks beyond the set membership of a select few features, is

carried out over several rounds, r. r = 7 + n, where n is the number of

bank clients.

ER -