Membership Inference Attack
A replication of the landmark 2017 membership inference attack against machine learning models. By training shadow models that mimic a target neural network, an attack classifier learns to distinguish members from non-members using only confidence vectors.
Image
Projected Gradient Descent Attack
An implementation and analysis of PGD adversarial attacks on image classifiers. By iteratively perturbing inputs within a bounded ε-ball, the attack reliably fools undefended models. Training models on these attack-generated examples produces models robust to PGD with minimal clean accuracy sacrifice.
Image