Machine Learning / Projected Gradient Descent Attack
Projected Gradient Descent
Description:
This was my first hands-on implementation of adversarial machine learning where I implemented the PGD (projected gradient descent) white-box attack from Madry et al. against LeNet on MNIST and ResNet18 on CIFAR-10, then systematically analyzed how attack hyperparameters, training augmentations, and regularization techniques affect a model's vulnerability. Finally, I defended both models using adversarial training, reducing the accuracy gap from ~45–65% down to just 2–13% with nearly no cost to clean accuracy. This project gave me hands-on experience with adversarial example crafting, robustness evaluation, and the practical tradeoffs between model utility and security.
Abstract:
An implementation and analysis of PGD adversarial attacks on image classifiers. By iteratively perturbing inputs within a bounded ε-ball, the attack reliably fools undefended models. Training models on these attack-generated examples produces models robust to PGD with minimal clean accuracy sacrifice.

LeNet - MNISTResnet18 - CIFAR10
Clean Accuracy99.41%88.31%
Adversarial Accuracy54.78%23.37%
Accuracy Difference44.63%64.94%


Adversarially TrainedLeNet - MNISTResnet18 - CIFAR10
Clean Accuracy99.50%86.75%
Adversarial Accuracy97.43%72.97%
Accuracy Difference2.07%13.78%
Image 1
Image 2
Image 3