Testing our build!
So, its fun and all to spend the next 2 months talking about the specifics of the build, but I’ve gotten impatient, and you might like to actually have some proof the build works in case you are following along, so here is a jupyter notebook with some runnable code.
I’m using a smaller modified VGGnet on the CIFAR-10 dataset, a 32×32 pixel image dataset with 60000 32×32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
For expediency, we’ll run this in Keras.
For Reproducibility: CUDA 8.0 cuDNN 5.1 Tensorflow 1.2.1 Keras 2.0.8 jupyter 1.0
This modified VGG follows the following structure:
Input Layer (32x32x3)
Conv-32 3×3 –> Conv 64 3×3 –> Maxpool-64
Conv-128 3×3 –> Maxpool-128
Conv-128 3×3 –> Maxpool-128
FC 1024
FC 10 (Output)
With dropout added after the first two Maxpools and the first FC layer.
For our purposes of testing we will only modify the dropout values applied after the first two MaxPool layers and the minibatch size.
Pixel_L | Pixel_W | Layers | Size in Kb | Params | Batch-128 Mb | Params-128 | Batch-1024 Mb | Params-1024 | |
INPUT | 32 | 32 | 3 | 3 | 1 | 0.375 | 128 | 3 | 1024 |
Conv-32, 3×3 | 32 | 32 | 32 | 32 | 864 | 4 | 110592 | 32 | 884736 |
Conv-64, 3×3 | 32 | 32 | 64 | 64 | 18432 | 8 | 2359296 | 64 | 18874368 |
Maxpool 64 | 16 | 16 | 64 | 16 | 36864 | 2 | 4718592 | 16 | 37748736 |
Conv-128, 3×3 | 16 | 16 | 128 | 32 | 73728 | 4 | 9437184 | 32 | 75497472 |
Maxpool 128 | 8 | 8 | 128 | 8 | 147456 | 1 | 18874368 | 8 | 150994944 |
Conv-128, 3×3 | 8 | 8 | 128 | 8 | 147456 | 1 | 18874368 | 8 | 150994944 |
Maxpool 128 | 4 | 4 | 128 | 2 | 147456 | 0.25 | 18874368 | 2 | 150994944 |
FC Dense 1024 | 1 | 1 | 1024 | 1 | 1179648 | 0.125 | 150994944 | 1 | 1207959552 |
FC Dense 10 | 1 | 1 | 10 | 0 | 92160 | 0 | 11796480 | 0 | 94371840 |
Totals | 166 | 1844065 | 20.75 | 236040320 | 166 | 1888322560 |
The preceding is a spreadsheet to estimate the memory requirements and parameters involved in the FORWARD PASS of the convolutional network. Length and Width and depth should be self-explanatory (Color images = RGB = 3 layers). Size in Kb for one image and subsequent layers are given (minibatch of 1). Memory size in Mb for Batches of 128 and 1024 are given for comparison.
While the memory requirements are easily handled with our 11GB GPU, given the small starting size of the images and the limited number of layers in the convolutional network. However, it should be obvious that with larger images, memory requirements will increase quickly, and batch size needs to be considered to not exceed available GPU memory.
Note that the largest actual memory use occurs early on in the first convolutional block. However, the greatest parameter use is at the last fully-connected layer.
For this example, using a 1024 batch size, the sum of memory used in the network is 166Mb for the forward pass. We need also to account for the backward pass – estimating at least 1x and possibly up to 2x. So 166×3=498Mb or 0.5GB.
So, it works.
Best results achieved on about 60 runs were an accuracy of 0.806, loss of 0.567. taking about 400 seconds (6 ½ minutes)