Fun with GRAD-CAM maps
After the news that meltdown and spectre were going to affect both the speed of my CPU, and now my GPU, I was feeling a bit down, and changing my mind on overclocking, because, if we’re going to take such a hit in performance with the fixes, I’m not concerned about burning out these chips by the time they release a working updated processor. Before I bought more hardware I needed to take my mind off things so I decided to cheer myself up with this.
The CheXNet paper prompted me to learn a bit more about CAM maps. I’ve written more about this paper, and the underlying Wang ChestX-ray14 dataset released by the NIH, which you can read about here, on our sister site, n2value.com/blog. The implementation I’m following is that of Selvaraju, RR, Cosgwell M , Das A et al. Here is their GitHub site.Derived from the final ConvNet layer, they are useful for understanding what pixels are activating the class that will be selected by the subsequent FC layers.
As a quick and dirty approach I used a VGG-16 pretrained on Imagenet, and pulled a few creative common license images off the web (thanks Freefoto.com!) These are for common items which have classes defined already in Imagenet – a truck, boat, lions, a pekingese. The base images and superimposed CAM activation maps are displayed below:
Here, the CAM map picks up mostly on the bow/prow of the boat, and to a lesser extent the stern, body, and rigging. Seems reasonable.
And here, the CAM map activates over the elongated trailer of the tractor trailer. Fairly specific, so reasonable again.
I found this interesting – the face of the lion is of course selected, but less so its prominent snout and much more so its distinctive eyes. Very telling.
This image was borrowed from Wikipedia (so cute) ! It is a Pekingese, for those of you who don’t recognize the breed – I’ve had two. My first resembled this one. Here, the eyes activate less than the distinct pushed-in snout of the Pekingese, which makes complete sense.
So, in general the CAM maps reasonably lend themselves to pre-trained classes on a well-trained convolutional neworks. I surmise that there may be some operations to make these activation maps more exact. For medical imaging, without a medical imagenet, applying CAM’s might not be as useful as here. But since we’re more imaging subject agnostic on this site, I’ll leave it at that.