The Carlini & Wagner attack is currently one of the best known algorithms to generate adversarial examples.

- Brief introduction
- Original CW attack algorithm
- Constructing the function
- Formulating the loss function
- Solving the last constraint
- References
- Referred in

The CW attack algorithm is a very typical adversarial attack, which utilizes two separate losses:

- An adversarial loss to make the generated image actually adversarial, i.e., is capable of fooling image classifiers.
- An image distance loss to constraint the quality of the adversarial examples so as not to make the perturbation too obvious to the naked eye.

This paradigm makes CW attack and its variants capable of being integrated with many other image quality metrics like the PSNR or the SSIM - image-quality-assessment.

When adversarial examples were first discovered in 2013, the optimization problem to craft adversarial examples was formulated as:

Where:

is the input image, is the perturbation, is the dimension of the image and is the target class.- Function
serves as the distance metric between the adversarial and the real image, and function is the classifier function.

Traditionally well known ways to solve this optimization problem is to define an objective function and to perform gradient descent on it, which will eventually guide us to an optimal point in the function. However, the formula above is difficult to solve because

In CW, we express Constraint 1 in a different form as an objective function

Conceptually, the objective function tells us how close we are getting to being classified as

Where

In the original paper, seven different objective functions are assessed, and the best among them is given by:

Where:

is the logit (the unnormalized raw probability predictions of the model for each class / a vector of probabilities) when the input is an adversarial . is the probability of the target class (which represents how confident the model is on misclassifying the adversarial as the target).- So,
is the difference between what the model thinks the current image most probably is and what we want it to think.

The above term is essentially the difference of two probability values, so when we specify another term

We then reformulates the original optimization problem by moving the difficult of the given constraints into the minimization function.

Here we introduce a constant

The best constant

I personally found that the best constant is often found lying between 1 or 2 through my personal experiments.

After formulating our final loss function, we are presented with this final constraint:

This constraint is expressed in this particular form known as the "box constraint", which means that there is an upper bound and a lower bound set to this constraint. In order to solve this, we will need to apply a method called "change of variable", in which we optimize over

Where

Therefore, our final optimization problem is:

The CW attack is the solution to the optimization problem (optimized over

- readme
- cw-algorithm - The CW attack algorithm