Machine Learning / Membership Inference Attack
Membership Inference Attack
github
Description:
This was a term project for AI 539 and my first deep dive into machine learning security research. I replicated the black-box membership inference attack from Shokri et al. (2017), which determines whether a specific data point was used to train a model — using only the model's confidence scores, with no access to its weights or training data. The pipeline runs end-to-end: train a target neural network on CIFAR-10, train 50 shadow models in parallel on an HPC SLURM cluster, train per-class attack multilayer perceptrons on the shadow outputs, then evaluate how accurately the attack can expose the target model's training set. This project gave me hands-on experience with machine learning privacy vulnerabilities, distributed GPU training, and empirically validating published research results.
My Contributions:
Beyond contributing to the core replication, I extended the project by investigating a key limitation in our setup: the shared attack models were trained entirely on shadow models of size n=2,500, meaning they may not accurately reflect the confidence distributions of larger target models with smaller generalization gaps. My fix was to train a dedicated set of attack models per target size, with shadow models matched to the same training size as the target. This produced attack classifiers tuned to the specific overfitting behavior of each target, improving precision by up to 7% over the baseline approach.
Abstract:
A replication of the landmark 2017 membership inference attack against machine learning models. By training shadow models that mimic a target neural network, an attack classifier learns to distinguish members from non-members using only confidence vectors.
Image 1
Image 2
Image 3