Fooling an image classifier – a review of Adversarial images to one-pixel attacks.
I wrote this in early 2018 & for some reason it sat in the draft folder on an iPad I rarely use – rediscovered it in 2020. It’s a little dated but I still think it’s relevant and useful. So I’m going to publish it.
Generally, its helpful to understand how a system fails in order to gain insight into its workings and to build a better system. There is a growing body of published information that describes how image classifiers, can be coerced into misclassification. We’ll review some of them here.
Obviously, the easiest is to show your classifier an image so far out of left field that it has no chance of getting it right. This was how a VGG-16 trained on ImageNet classified a chest x-ray. You can say, well, maybe the lungs look like wings… but nope, not really. As there are no radiographs in the ImageNet database, you would never expect it to get it right on OOS (out of sample) data.
Zeiler and Fergus in 2013 proposed a method of image perturbation in order to understand the convolutional network through deconvolution. They used occlusion sensitivity where “the probability of the correct class drops significantly when the object is occluded.” 1
Szegedy et al in used L-BFGS in 2014 to obtain through estimation a minimum distortion function that when applied to an existing image caused it to be misclassified. It was notable that the function was consistent across early neural networks and visually hard to distinguish, suggesting that there is an universality to these adversarial classifiers.2 L-BFGS is not commonly used because reasons.
To those experimenting in deep learning and Artificial Intelligence, training time far exceeds inference time. Training can take hours to days, evaluation takes seconds. Inference is quick, almost intuitive.
I believe that OODA loops are Kahneman’s Type 2 slow thinking. OA loops are Kahneman’s Type 1 fast thinking. Narrow AI inference is a type 1 OA loop. An AI version of type 2 slow thinking doesn’t yet exist.*
And like humans, Narrow AI can be fooled.
If you haven’t seen the Chihuahua vs. blueberry muffin clickbait picture, consider yourself sheltered. Claims that narrow AI can’t tell the difference are largely, but not entirely, bogus. While Narrow AI is generally faster than people, and potentially more accurate, it can still make errors. But so can people. In general, classification errors can be reduced by creating a more powerful, or ‘deeper’ network. I think collectively we have yet to decide how much error to tolerate in our AI’s. If we are willing to tolerate an error of 5% in humans, are we willing to tolerate the same in our AI’s, or do we expect 97.5%? Or 99%? Or 99.9%?
It gets a little bit more complicated than that, because the earliest maybe inherently limited by factors in the training process: sample size, class imbalance, data augmentation, class prevalence and incidence, training algorithm.
The single pixel attack is a bit more interesting. While similar images such as the ones above probably won’t pass careful human scrutiny, frankly adversarial images unrecognizable to humans can be misinterpreted by a classifier.
Selecting and perturbing a single pixel is much more subtle, and probably could escape human scrutiny. Jaiwei Su et al address this in their “One Pixel Attack” paper, where the modification of one pixel in an image had between a 66% to 73% chance of changing the classification of that image. By changing more than one pixel, success rates respectively rose. The paper used older, less deep Narrow AI’s like VGG-16 and Network-in-network. Newer models such as DenseNets and ResNets might be harder to fool. This type of “attack” represents a real-world situation where the OA loop fails to account for unexpected new (or perturbed) information, and is incorrect.
I am not aware of defined solutions to these problems
Author’s Addendum: Since the writing of this blogpost, there has been considerable research into “hardening” data to guard against adversarial attacks. Its not perfect, but nothing of this nature will ever be. The following comments should be considered in the time they were written – early 2018.
– the obvious images that fool the classifier can probably possibly be dealt with by ensembling other, more traditional forms of computer vision image analysis such as HOG or SVM’s. For a one-pixel attack, perhaps widening the network and increasing the number of training samples by either data augmentation or adversarially generated features might make the network more robust (Turns out I was right on this one – GANs are useful for hardening) . This probably falls into the “too soon to tell” category.
There has been a great deal of interest and emphasis placed lately on understanding black-box models. I’ve written about some of these techniques in other posts. Some investigators feel this is less relevant. However, by understanding how the models fail, they can be strengthened. I’ve also written about this, but from a management standpoint. There is a trade off between accuracy at speed, robustness, and serendipity. I think the same principle applies to our AI’s as well. By understanding the frailty of speedy accuracy vs. redundancies that come at the expense of cost, speed, and sometimes accuracy, we can build systems and processes that not only work but are less likely to fail in unexpected & spectacular ways.
Let’s acknowledge the likelihood of failure of narrow AI where it is most likely to fail, and design our healthcare systems and processes around that, as we begin to incorporate AI into our practice and management. If we do that, we will truly get inside the OODA loop of our opponent – disease – and eradicate it before it even had a chance. What a world to live in where the only thing disease can say is, “I never saw it coming.”
*I believe OODA loops have mathematical analogues. The OODA loop is inherently Bayesian – next actions iteratively decided by prior probabilities. Iterative deep learning constructs include LSTM and RNN’s (Recurrent Neural Networks) and of course, General Adversarial Networks (GANs). There have been attempts to not only use Bayesian learning for hyperparameter optimization but also combining it with RL(Reinforcement Learning) & GANs. And since this post was written, there has been development of Bayesian Neural Networks (BNN) which are pretty controversial as to whether they work or not. Time will only tell if this brings us closer to the vaunted AGI (Artificial General Intelligence)**.
**While I don’t think we will soon in my lifetime solve the AGI question, I wouldn’t be surprised if complex combinations of these methods, along with ones not yet invented, bring us close to top human expert performance in a Narrow AI. But I also suspect that once we start coding creativity and resilience into these algorithms, we will take a hit in accuracy as we approach less narrow forms of AI. And we are seeing this with outside validation of algorithms. We will ultimately solve for the best performance of these systems, and while it may even eventually exceed human ability, there will likely always be an error present. And in that area of error is where future medicine will advance.
© 2018, © 2020
There’s also now some work showing that humans can anticipate these labels! Including for the images you described as “unrecognizable”:
https://www.nature.com/articles/s41467-019-08931-6