Design and Evaluation of GAN-based Models for Adversarial Training Robustness in Deep Learning

Thesis Title: Design and Evaluation of GAN-Based Models for Robustness of Adversarial Training in Deep Learning. I further certify that I am the sole source of the creative works and/or inventive know-how described in this thesis.

Introduction

Motivation

Research Scope

Contributions

Thesis Online

Summary

Background and Related Work

Background

Generative Adversarial Network
Adversarial Attacks on Image Classifiers
Gradient Descent
Fast Gradient Method Adversarial Attacks
Iterative Gradient Descent Methods

Therefore, all parameters of the model can be optimized using the gradient descent algorithm. Fast gradient method attacks are one of the gradient-based attack algorithms that can target convolutional classification models.

Figure 2.1: Relationship between machine learning, deep learning, and GAN.

Related Work

Gradient-Based Single-Step Algorithm
Gradient-Based Multi-Step Algorithm
Generative Models
Ensemble Models
Limitations

There is a more advanced loss and divergence can be used for GAN models (i.e., Jensen-Shannon divergence) [88]. Zhang, “Adversary Pattern Protection with Conditional Generative Adversarial Networks,” Security and Communication Networks, vol.

Table 2.1: Overview of adversarial training.

Summary

Proposed GAN-Based Architecture

Basic Formulations

Input Formulations of the Generator

Classifier Architecture

L ∞ GAN

L ∞ Constraint Function
L ∞ Generator Design

L 2 GAN

L 2 Constraint Function
L 2 Generator Design

Summary

Implementation

Libraries and Tools

TensorFlow 2
Adversarial Robustness Toolbox

Implementation Details

Input Implementation
Classifier Parameters
L ∞ GAN

L ∞ GAN Output Constraint Function
L ∞ GAN Generator Parameters

L 2 GAN

L 2 GAN Output Constraint Function
L 2 GAN Generator Parameters

Training Methodology Implementation
Attack Algorithms

Datasets

CIFAR 10
MNIST
Data Preprocessing

Summary

Evaluation Results

Evaluation Metrics

Assessment includes multi-class scenarios; therefore, the accuracy metric is used to evaluate the accuracy of the classifiers. The accuracy metric is applied to all performance evaluations of our trained classifiers. The trained classifiers are then attacked by evaluation attack algorithms and the accuracy metric is applied to calculate the accuracy of the model on adversarial samples as well as on clean data.

For ease of description, the accuracy representing the performance of the classifier under adversarial attacks is defined as robust accuracy. The accuracy representing the performance of the classifier with original data sample from the test data set is defined by pure accuracy. In subsequent sections, these two words are used to describe the performance of the classifier for different scenarios.

These attack algorithms are used as threat models to evaluate the robustness of the proposed model formulations.

L ∞ GAN

Input Comparison
Width Parameter Comparison
Training Epoch Comparison
Low Dimension Image Evaluation (MNIST)

This suggests that without any protection, the classifier cannot protect smaller sizes of adversarial noises. This indicates two scenarios, namely that the gradient landscape of the classifier becomes simpler or more complex. In this case, the classifier is more robust against a disturbance size below or equal to 8/255.

The dramatic increase in difference indicates that the worse gradient direction of the classifier becomes more different as the perturbation is larger. It suggests that a generator incorporating a prescribed gradient can provide improved estimation of adversarial patterns and that a classifier trained against them can have more robust accuracy. This improvement in robustness over FGSM indicates that increasing training epoch improves the generalization of the classifier to one-level gradient perturbation.

This also suggests that the gradient landscape of the classifier may become increasingly challenging to estimate.

Figure 5.1: Generator input formulations comparison results.

L 2 GAN

L 2 Robustness
Robustness Transferability

When the PGD adversarial attack is applied to the L2 norm constraint, the algorithm may face the issue of gradient masking, in which the gradient descent can only find the local optimal perturbation vector. The proposed GAN applies a fixed norm size perturbation vector in the L2 norm space, and this solution can exploit the gradient masking issue and can enable better data augmentation compared to a traditional gradient descent. The parallel comparison between the GAN-trained classifier and PGD L2 AT suggests that GAN may be more effective in this.

The results of this experiment mainly demonstrate the comparison between the GAN L2 model and the conventional adversarial training model and suggest that the GAN L2 model can be more beneficial to train under L2 constraint. This section presents the results of the L∞ robustness tests using the same classifier from the previous section. The results indicate that there is some transferability within the robustness between the different constraints.

Under some perturbation sizes, the L2 GAN classifier appears to exhibit better robustness against the L∞ PGD attack; however, the scalar used during training for L2 GAN is 64/255, which is significantly larger than L∞ GAN's 16/255.

Figure 5.5: Training results under L 2 constraint against L 2 PGD attack.

Summary of Results

Regarding the accuracies, the classifier along the baseline shows lower clean accuracy, indicating the robustness and accuracy trade-offs. The generalization of the contradictory distributional shifts cannot be completely resolved by the proposed model. However, the model helps to improve the overall adversarial robustness and provides an option in the application when the robustness is required and is more important than the clean accuracy.

The results show that the generator formulations, parameters and training epochs can affect the training performance of the GAN and affect the pure accuracy and. The augmentation constraint can be flexible in providing overall robustness against gradient attacks, but the size of the rate must be considered along with the constraint.

Figure 5.7: Comparison of the accuracies between all formulations.

Visualization

The build limit can be flexible in providing general resistance to gradient attacks, but the size of the norm should be considered along with the limit. Additional results are fed into the classifier to generate graphs e), f), g), and h), respectively. From the graphs in Figure 5.8, it is obvious that the classifier behaves more robustly when the image z) is disturbed by the vector a). The shape of the correct class activation of the layer before SoftMax has a wide enough margin until the incorrect class activation exceeds it.

However, if the image z) is perturbed by vectors b), c) and d), the shape of the activation of the correct class becomes more positive-sloping. Points where an incorrect class activation becomes larger than the correct class are labeled with different colored lines; the orange line labels the middle where no vector is added; the blue line represents the overshoot point caused by the FGSM vector; and the yellow line represents the overshoot point caused by the PGD vector. By observation, the overshoot point caused by the vector produced by the generator is between the points of FGSM and PGD.

This suggests that the generator captured the opposing perturbation direction successfully; however, the intercepted direction is not as efficient as the PGD algorithm.

Discussion

The L2 constraint experiments demonstrate the effectiveness of the GAN in estimating and improving the adversarial robustness of the L2 constraint. This suggests that the proposed GAN can alleviate the gradient masking problem of the T2 gradient-based adversarial training. The trade-off suggests that these classifiers overfit the generator's generated samples and it is challenging for the classifier to generalize to both generated and clean sample distributions.

By controlling the generator parameter numbers, this overfit can be reduced, but the trade-off cannot be fully mitigated. This generator overfitting effect leads to a reduction in augmentation performance and results in the robustness of the classifier decreasing with increasing training repetitions. It suggests that the adversarial noises captured by the generator are indeed different from those produced using gradient-based algorithms.

There is still a limitation of GAN in terms of finding the worst adversarial pattern of the classifier.

Threats to Validity

The improvement in model robustness depends on the random initialization of the models' parameters. To address this, multiple models are trained within a single experiment, and the best one is used for evaluation; therefore, the real results may reflect the upper limit of GAN training. The iteration number of the PGD used in most evaluations is 100 to ensure a balance between effective evaluation and computation time.

The result shows that the performance of the model did not vary much between 100 and 1000 iterations. The thesis mainly uses 32×32×3 image dimensionality in the RGB color channel for evaluation; however, several dimensions can affect the training results in terms of generalization and robustness of the classifier. A machine learning model vulnerability can also include a data poisoning attack and other types of attacks.

There are other cases where a classifier may also provide poor generalization, such as the natural skew of the data distribution.

Summary

Conclusion

Compared to the PGD algorithm, the adversarial samples generated by the generator are significantly different in visualization and are less effective than the PGD adversarial samples. This thesis proposed a GAN architecture to perform low-complexity adversarial training for protecting adversarial samples from gradient-based attacks. The significance of the works includes proposing various GAN formulations and implementations to address the adversarial sampling problems from L∞ and L2 constraint gradient attacks and suggesting the optimal GAN formulation for defending against these attacks.

Further evaluation also reveals the relationship between training epoch and model performance and demonstrates the transferability of robustness across constraints. Additional visualization is also provided to illustrate the differences between algorithm-generated adversarial samples and the GAN estimates. The thesis provides an extension solution for adversarial training and introduced GAN into this application domain.

Compared to other GAN solutions, the proposed architecture is implemented with more advanced formulations and under different constraints, which as a result provides more insight into the optimal GAN for adversarial training.

Future Work

The chapter also describes the realization of the model components and the implementation of the learning logic.

The generator learns to generate the synthesis data that maximizes the loss of the classifier D. These include the input I, the constraint function N, the architecture of the generator G and the classifier D. In addition, the input vector I and the architecture of the classifier are shared with both types.

The generator must learn based on the current gradient of the classifier and the given input I. The input format for the image sample and the gradient vector is the dimension of the image (e.g., the output will be an image with 32 × 32 pixels and three color channels. The training process of the proposed GAN architecture involved adversarial training generator and classifier.

However, this time the generator parameters are updated according to the gradient of the classifier loss. The proposed GAN does not change the implementation of the training period and the training step. This experiment examines the effect of generator formulation on the adversarial results of classifier training.

With the comparison, the L∞ GAN and L2 GAN trained classifiers show a similar performance against the L∞. The results indicate that there is some transferability within the robustness between the different constraints.