Testing our build!

So, its fun and all to spend the next 2 months talking about the specifics of the build, but I’ve gotten impatient, and you might like to actually have some proof the build works in case you are following along, so here is a jupyter notebook with some runnable code.

I’m using a smaller modified VGGnet on the CIFAR-10 dataset, a 32×32 pixel image dataset with 60000 32×32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

For expediency, we’ll run this in Keras.
For Reproducibility: CUDA 8.0 cuDNN 5.1 Tensorflow 1.2.1 Keras 2.0.8 jupyter 1.0

This modified VGG follows the following structure:

Input Layer (32x32x3)

Conv-32 3×3 –> Conv 64 3×3 –> Maxpool-64

Conv-128 3×3 –> Maxpool-128

Conv-128 3×3 –> Maxpool-128

FC 1024

FC 10 (Output)

With dropout added after the first two Maxpools and the first FC layer.

For our purposes of testing we will only modify the dropout values applied after the first two MaxPool layers and the minibatch size.

 

 

 

 

Pixel_L Pixel_W Layers Size in Kb Params Batch-128 Mb Params-128 Batch-1024 Mb Params-1024
INPUT 32 32 3 3 1 0.375 128 3 1024
Conv-32, 3×3 32 32 32 32 864 4 110592 32 884736
Conv-64, 3×3 32 32 64 64 18432 8 2359296 64 18874368
Maxpool 64 16 16 64 16 36864 2 4718592 16 37748736
Conv-128, 3×3 16 16 128 32 73728 4 9437184 32 75497472
Maxpool 128 8 8 128 8 147456 1 18874368 8 150994944
Conv-128, 3×3 8 8 128 8 147456 1 18874368 8 150994944
Maxpool 128 4 4 128 2 147456 0.25 18874368 2 150994944
FC Dense 1024 1 1 1024 1 1179648 0.125 150994944 1 1207959552
FC Dense 10 1 1 10 0 92160 0 11796480 0 94371840
Totals 166 1844065 20.75 236040320 166 1888322560

 

The preceding is a spreadsheet to estimate the memory requirements and parameters involved in the FORWARD PASS of the convolutional network. Length and Width and depth should be self-explanatory (Color images = RGB = 3 layers). Size in Kb for one image and subsequent layers are given (minibatch of 1). Memory size in Mb for Batches of 128 and 1024 are given for comparison.

While the memory requirements are easily handled with our 11GB GPU, given the small starting size of the images and the limited number of layers in the convolutional network. However, it should be obvious that with larger images, memory requirements will increase quickly, and batch size needs to be considered to not exceed available GPU memory.

Note that the largest actual memory use occurs early on in the first convolutional block. However, the greatest parameter use is at the last fully-connected layer.

For this example, using a 1024 batch size, the sum of memory used in the network is 166Mb for the forward pass. We need also to account for the backward pass – estimating at least 1x and possibly up to 2x. So 166×3=498Mb or 0.5GB.

So, it works.

Best results achieved on about 60 runs were an accuracy of 0.806, loss of 0.567. taking about 400 seconds (6 ½ minutes)