Padding Strategies and Calculations in Convolutional Neural Networks
1. Background
In Convolutional Neural Networks (CNNs), convolutional layers extract features by sliding filters (kernels). However, convolution operations gradually reduce the spatial dimensions of feature maps (e.g., for an input size of \(n \times n\) and a kernel size of \(k \times k\), the output size becomes \((n-k+1) \times (n-k+1)\)). Furthermore, edge pixels participate in computations far less frequently than central pixels, leading to information loss.
Padding is introduced precisely to address these two issues:
- To preserve feature map dimensions (avoiding excessive network shrinkage).
- To increase the utilization of edge pixels.
2. Common Padding Strategies
(1) Valid Padding
- Means no padding is applied (Padding=0); the kernel slides only within the bounds of the input feature map.
- Output size formula:
\[ \text{Output Size} = \lfloor \frac{n - k + 1}{\text{stride}} \rfloor \]
- Example: Input size \(5 \times 5\), kernel size \(3 \times 3\), stride=1, output size is \(3 \times 3\).
(2) Same Padding
- Objective: To make the output size equal to the input size (when stride=1).
- Padding amount calculation:
Let input size be \(n\), kernel size be \(k\), stride be \(s\), and padding amount be \(p\). The output size is:
\[ \text{Output Size} = \lfloor \frac{n + 2p - k}{s} \rfloor + 1 \]
Setting the output size equal to the input size \(n\) (when stride=1) and solving the equation:
\[ n = \frac{n + 2p - k}{1} + 1 \implies p = \frac{k-1}{2} \]
- Requires the kernel size to be odd (e.g., 3, 5, 7) to allow symmetric padding. For example, when \(k=3\), \(p=1\).
(3) Full Padding
- Objective: To ensure every part of the input pixels is "fully covered" by the kernel (i.e., computation still occurs when the kernel edges align with the input edges).
- Padding amount \(p = k-1\), resulting in an output size larger than the input.
- Less commonly used in practice, more frequent in signal processing.
3. Specific Padding Operations
Taking Same Padding as an example (stride=1, kernel size \(3 \times 3\)):
- Calculate padding amount: \(p = (3-1)/2 = 1\).
- Add a border of zeros around the input feature map (one row/column if \(p=1\), two if \(p=2\), etc.).
- The kernel slides over the padded map, producing an output of the same size as the original input.
Example:
- Input: \(5 \times 5\) matrix, kernel size \(3 \times 3\), stride=1.
- After padding: Add one row/column of zeros on the top, bottom, left, and right, resulting in a \(7 \times 7\) matrix.
- Convolution output: \((7-3+1) \times (7-3+1) = 5 \times 5\), matching the original input size.
4. Code Implementation (Python Example)
In deep learning frameworks, padding is typically specified via a parameter:
import torch.nn as nn
# Define a convolutional layer: Same Padding (when kernel_size=3, stride=1, padding=1)
conv_layer = nn.Conv2d(
in_channels=3,
out_channels=64,
kernel_size=3,
stride=1,
padding=1 # Directly set the padding amount
)
The framework automatically handles dimension calculations and zero-padding operations.
5. Summary and Extensions
- Choosing a Padding Strategy:
- Same Padding is commonly used when spatial resolution must be preserved (e.g., in image segmentation, generative models).
- Valid Padding can be used when feature extraction is prioritized over spatial size (e.g., in the final layers of classification networks).
- Padding Variants: Reflection Padding, Replication Padding, etc., often used for edge smoothing in image processing.
Through padding strategies, convolutional neural networks achieve greater flexibility in balancing feature map size control and information preservation.