Adam Bobich

Machine Learning / Projected Gradient Descent Attack

Projected Gradient Descent

Description:

This was my first hands-on implementation of adversarial machine learning where I implemented the PGD (projected gradient descent) white-box attack from Madry et al. against LeNet on MNIST and ResNet18 on CIFAR-10, then systematically analyzed how attack hyperparameters, training augmentations, and regularization techniques affect a model's vulnerability. Finally, I defended both models using adversarial training, reducing the accuracy gap from ~45–65% down to just 2–13% with nearly no cost to clean accuracy. This project gave me hands-on experience with adversarial example crafting, robustness evaluation, and the practical tradeoffs between model utility and security.

Abstract:

An implementation and analysis of PGD adversarial attacks on image classifiers. By iteratively perturbing inputs within a bounded ε-ball, the attack reliably fools undefended models. Training models on these attack-generated examples produces models robust to PGD with minimal clean accuracy sacrifice.

	LeNet - MNIST	Resnet18 - CIFAR10
Clean Accuracy	99.41%	88.31%
Adversarial Accuracy	54.78%	23.37%
Accuracy Difference	44.63%	64.94%

Adversarially Trained	LeNet - MNIST	Resnet18 - CIFAR10
Clean Accuracy	99.50%	86.75%
Adversarial Accuracy	97.43%	72.97%
Accuracy Difference	2.07%	13.78%