CNN

Convolution

2D Convolutional Kernel 原点在正中间

  •   是输出图像的像素值;
  •   是输入图像的像素值,表示在位置    的邻域像素;
  •   是卷积核的权重;
  •   是卷积核的半径,卷积核大小为 

Convolution Result

  • Height

  • Width

Example

If the input data is an image of size , using filters of size for convolution, with a stride of and padding of , what will be the final size of the input?

So, the output size is . Through the convolution operation, the feature map can retain the same length and width as the original input.

Padding

Preserve Spatial Dimensions (Same Padding)

  • Without padding, the output size of a convolution layer decreases after every operation. Padding helps preserve the original input size by compensating for the loss of border pixels during convolution.

For example:

  • Input size =   
  • Kernel size =   
  • Without padding: Output size = 
  • With “same” padding: Output size =   

Prevent Information Loss at Borders

Without padding, the convolution operation does not consider the edges and corners of an image as much as the center regions. Padding ensures the borders are included in the computation, retaining more information.

Enable Deeper Networks

By maintaining consistent feature map sizes throughout the layers, padding allows for the design of deeper networks without rapidly shrinking the feature map size.

Control Output Size

Padding can be adjusted to produce desired output dimensions for specific applications (e.g., “same” padding or “valid” padding).

Improve Symmetry for Feature Extraction

Padding ensures that the kernel interacts symmetrically with all regions of the input image, which improves the extraction of features near edges.

Sampling in 2D

Downsampling

Assume we have a matrix A with size :

with size
with size

UpSampling

Assume we have a matrix with size :

with size
with size

Laplacian in 2D

Assume we have a 8×8 matrix (image) B:

The Laplacian filter in 2D (for edge detection) is commonly represented as:

If we use the matrix to represent the Laplacian, the matrix is:

And the result is:

Question

Part I: Convolution and image filtering

1. Comparing different filters?

2. Comparing different scales / size of filter?

3.Separability property of a filter / convolution?

4. Convolution and correlation? How

5. How to work on a kernel approximating a 1st, 2nd derivative?

  1. What is Fourier Transform? What is the usage? How to calculate in 1D? 2D? Padding if necessary.

  2. Convolution in image domain is equivalent to multiplication in frequency domain. Why? Verify?

8. How to obtain image pyramid? Gaussian, Laplacian, Steerable? Calculate? What are the usages / applications of them?