Robustifying Models Against Adversarial Attacks by Langevin Dynamics
Adversarial attacks on deep learning models have compromised their
As remedies, a lot of defense methods were proposed, which however, have
been broken down by newer attacking strategies.
In the midst of this ensuing arms race, the problem of robustness
against adversarial attacks remains unsolved even on the toy MNIST dataset.
This paper proposes a novel, simple yet effective defense strategy,
where adversarial samples are relaxed onto the underlying manifold
of the (unknown) target class distribution. Specifically, given an
off-manifold adversarial sample, our algorithm drives the adversarial
towards high density regions of the data generating distribution of the
target class by Metroplis-adjusted Langevin algorithm (MALA) with
perceptual boundary taken into account. Although the motivation is
similar to projection methods, e.g., Defenese-GAN, our method, called
MALA for defense (MALADE) is equipped with significant
obfuscation—projection is distributed broadly, and therefore any
whitebox attack cannot accurately align the input
so that the MALADE moves it to a targeted untrained spot where the model
predicts a wrong label. In our experiment, MALADE exhibited
performance against various elaborate attacking strategies.
|Date||April 3, 2019 (Wed) 15:00 - 16:00|