Calculating Output dimensions in a CNN for Convolution and Pooling Layers with KERAS
This article outlines how an input image changes as it passes through the Convolutional-Layers and Pooling layers in a Convolutional Neural Network (CNN) and as a bonus also has the calculation of the number of parameters. The article assumes that you are familiar with the fundamentals of KERAS and CNN’s.
If you are new or just starting with CNN’s I recommend these following sources:-
- A great intro video on working of CNN http://brohrer.github.io/how_convolutional_neural_networks_work.html
- Official Keras book from its creator François Chollet https://www.manning.com/books/deep-learning-with-python
Calculating the output when an image passes through a Convolutional layer:-
NOTE:- All matrices are square i.e the image width = height.
Parameters that influence the output shape : -
- The input dimensions of the image — > I (ixi)
- The size of filter/kernel — > F (fxf)
- Strides — > S (integer)
- Padding — > P (integer)
- Depth/Number of feature maps/activation maps — > D (integer)
Convolution Output dimension = [(I - F +2 *P) / S] +1 x D > Formula1
NOTE:- The “x D” above doesn’t stand for multiplication operation but it depicts the depth or the number of activation maps.
Let us take a look at an example with python snippet: -
- An input image, I with dimensions (32x32x3) -An input image 32 pixel wide and 32 pixel in height with 3 channels i.e, (I =32),
- A filter size 3x3 (F=3)
- Stride is1 (S =1),
- Zero padding (P=3), and
- Depth /feature maps are 5 (D =5)
The output dimensions are = [(32 - 3 + 2 * 0) / 1] +1 x 5 = (30x30x5)
Keras Code snippet for the above example
import numpy as np
from tensorflow import keras
model = keras.models.Sequential()
#here in the snippet below
#D = 5 (first parameter)
#Stride= (1,1) by defaultmodel.add(keras.layers.Conv2D(5, kernel_size=3, activation='relu', input_shape=(32, 32, 3)))model.summary()

No of Parameter calculation, the kernel Size is (3x3) with 3 channels (RGB in the input), one bias term, and 5 filters.
Parameters = (FxF * number of channels + bias-term) * D
In our example Parameters = (3 * 3 * 3 + 1) * 5 = 140
Calculating the output when an image passes through a Pooling (Max) layer:-
For a pooling layer, one can specify only the filter/kernel size (F) and the strides (S).
Pooling Output dimension = [(I - F) / S] + 1 x D
Note Depth, D will be same as the previous layer (i.e the depth dimension remains unchanged, in our case D=5 ) — -> Formula2
- Let F = 2 (2x2 window)
- Stride, S = 2
- Depth, D = 5 (depth from the previous layer)
In our example we have
Output = [(30–2) / 2] + 1 x D = (15x15x5)
#default strides is 2 in pooling layer)
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))model.summary()

Note:- Since pooling operation is a fixed function it introduces no additional parameters.

The same formula1 and formula2 are applicable as the depth grows.
Thank you for reading this article, please let me know what your thoughts are down below in comments.
REFERENCES:-
- Great Notes on CNN from Stanford https://cs231n.github.io/convolutional-networks/
- Detailed coverage on Convolution Arthematic https://arxiv.org/pdf/1603.07285.pdf
- A great theoretical book for Deep Learning https://www.deeplearningbook.org/