ARTIFICIAL INTELLIGENCE (6) – Deep learning (4) Pooling In Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are a core technology in modern computer vision, enabling tasks such as image classification, object detection, medical imaging, and autonomous driving. Beyond convolutional layers, pooling layers are also essential, helping networks learn and recognize patterns more efficiently.

Pooling acts as a form of data reduction. After convolutions generate detailed feature maps, pooling downsamples them into a more compact representation. This allows the model to retain the most relevant patterns—such as edges, textures, or shapes—while discarding less important details. As a result, training becomes more efficient, overfitting is reduced, and the network gains robustness to small shifts or scale changes in the input image.

The Pooling Process: Step-by-Step

Imagine you have a 4×4 Input Image (a grid of 16 numbers). We will use a 2×2 Filter with a Stride of 2 (meaning the window jumps 2 pixels at a time so it never overlaps). In the context of the pooling process, Stride is the «step» or the number of pixels the window (filter) moves as it slides across the image.

When we say a Stride of 2 means the window «never overlaps,» here is the visual breakdown of what is happening:

1. The Concept of Stride

Imagine you are reading a book.

Stride 1: You read every single word, one by one.
Stride 2: You skip every other word.

In Pooling, the window (the 2×2 square) «jumps» a specific distance before it stops to look at the next set of pixels.

2. Why it «Never Overlaps»

If your window is 2 pixels wide and your Stride is 2, the movement looks like this:

Step 1: The window covers Pixels 1 and 2.
The Jump: It moves 2 pixels to the right.
Step 2: The window now covers Pixels 3 and 4.

Because the «Jump» (2) is the same size as the «Window» (2), the second window starts exactly where the first one ended. They do not share any pixels.

3. Visual Comparison (Example)

Imagine a row of 4 pixels: [ A ] [ B ] [ C ] [ D ]

With Stride 1 (Overlapping):

First window sees: [ A ] [ B ]
Moves 1 pixel…
Second window sees: [ B ] [ C ]

Result: Pixel B was processed twice! They overlap.

With Stride 2 (Non-overlapping):

First window sees: [ A ] [ B ]
Moves 2 pixels…
Second window sees: [ C ] [ D ]

Result: No pixel is seen twice. They are perfectly separate.

Why is this important for your study?

Size Reduction: When the Stride is 2, you are essentially cutting the width and height of the image in half. A $4 \times 4$ image becomes a $2 \times 2$ image.
Efficiency: By not overlapping, the computer processes fewer windows, which makes the AI much faster.

2×2 Filter with a Stride of 2

	Col 1	Col 2	Col 3	Col 4
Row 1	10	20	1	5
Row 2	30	40	2	8
Row 3	5	10	50	60
Row 4	2	4	70	80

2. Max Pooling (Extracting the «Peak»)

The filter looks at each 2×2 block and picks the highest number.

Top-Left Block: {10, 20, 30, 40} $\to$ Max is 40.
Top-Right Block: {1, 5, 2, 8} $\to$ Max is 8.
Bottom-Left Block: {10, 5, 2, 4} $\to$ Max is 10.
Bottom-Right Block: {50, 60, 70, 80} $\to$ Max is 80.

The Result (Output):

| 40 | 8 |

| 10 | 80 |

Why? It keeps the most prominent features (like a bright pixel or a sharp edge).

3. Average Pooling (Calculating the «Smoothness»)

The filter looks at the same 2×2 blocks but calculates the mean (sum divided by 4).

Top-Left Block: $(10 + 20 + 30 + 40) /4 = 100/4 = 25$
Top-Right Block: $(1 + 5 + 2 + 8) /4 = 16/4 = 4$
Bottom-Left Block: $(5 + 10 + 2 + 4) /4 = 21/4 = 5.25$
Bottom-Right Block: $(50 + 60 + 70 + 80) /4 = 260/4 = 65$

The Result (Output):

| 25 | 4 |

| 5.25 | 65 |

Why? It provides a smoother summary of the area.

If stride = 1, the filter moves 1 pixel at a time.
If stride = 2, it moves 2 pixels at a time.
If stride = 3, it moves 3 pixels at a time.

Strides in a sliding window operation using a kernel/filter of size (2, 2)

Why Does Stride Change the Output Size?

When the filter moves with bigger steps, it looks at fewer positions in the image.

That means:

Stride = 1 → Output image is almost the same size
Stride = 2 → Output image is about half the size
Stride = 3 → Output image is about one third the size

So when stride > 1, the output becomes smaller.

What About Pooling?

In pooling layers:

The stride is usually the same as the filter size by default.

For example:

If the filter size is (2 × 2)
The stride is usually 2

This means the filter moves without overlapping.

Types of Pooling

Max Pooling → Keeps the biggest value in each small region
Average Pooling → Takes the average value in each small region

The max pooling operation

The average pooling operation

Why Do We Use Pooling?

To Reduce Size

Pooling reduces the width and height of the image.
This is called down-sampling.

Example:

A 400×400 image
After pooling several times
It could become 25×25
But it still keeps the main shape and structure.

To Make the Model Faster

Smaller images = less data
Less data = faster training and less computation

This helps the neural network work efficiently.

To Make the Model More Robust

Pooling helps the network recognize objects even if they move slightly in the image.

For example:

A car on the left
A car on the right

The network can still recognize it as a car.

This is called translation invariance.

Max Pooling Function

In this section, we will use manually written pooling functions to visualize the pooling process and better understand what actually goes on. Two functions are provided, one for max pooling and the other for average pooling. Using the functions, we will attempt to pool the image.

The function above replicates the max pooling process. Using the function, let’s attempt to max pool the reference image using a (2, 2) kernel.

max_pool(‘image.jpg’, 2, visualize=True)

The effects of using a larger kernel (3, 3) are seen below. As expected, the reference image reduces to 1/3 its preceding size for every iteration. By the third iteration, a pixelated (16, 16) down-sampled representation is produced (a 0.1% summary). Although pixelated, the overall idea of the image is somewhat still maintained.

visualize_pooling('image.jpg', 3, kernel=3)

Average pooling function

Applying Max Pooling and Average Pooling in Architecture and Construction

Pooling techniques such as max pooling and average pooling can be very useful in the architecture and construction sector, especially when using computer vision systems for inspection, monitoring, and analysis.

Applications with Large-Scale Image Data

When working with massive image datasets, pooling becomes especially valuable because it reduces dimensionality while preserving meaningful patterns.

Digital Twins and BIM Integration

In large infrastructure projects, thousands of images are generated from:

Site scans
Drones
3D reconstruction workflows

Pooling helps compress visual information before integrating it into digital twins or BIM-based monitoring systems, enabling scalable updates without overwhelming computational resources.

Material and Surface Classification

In large construction databases, pooling enables efficient classification of:

Concrete types
Façade materials
Roofing systems
Pavement conditions…

Reducing resolution while keeping dominant features improves scalability when processing millions of images.

Key Insight

Whenever image datasets become large — spatially (high resolution), temporally (continuous monitoring), or geographically (urban scale) — pooling is essential.

It enables models to scale from single-building analysis to city-level intelligence without prohibitive computational cost.

	AI CHINESE – AI Chin… en AI CHINESE – AI Chinese Speech…
	Mane Oliva en REVIT ARCHITECTURE (199)…
	REVIT ARCHITECTURE (… en REVIT ARCHITECTURE (957) – PYT…
	REVIT ARCHITECTURE (… en REVIT ARCHITECTURE (946) – PYT…
	REVIT ARCHITECTURE (… en REVIT ARCHITECTURE (927) – PYT…

ARTIFICIAL INTELLIGENCE (6) – Deep learning (4) Pooling In Convolutional Neural Networks

The Pooling Process: Step-by-Step

1. The Concept of Stride

2. Why it «Never Overlaps»

3. Visual Comparison (Example)

Why is this important for your study?

2. Max Pooling (Extracting the «Peak»)

3. Average Pooling (Calculating the «Smoothness»)

Why Does Stride Change the Output Size?

What About Pooling?

Types of Pooling

Why Do We Use Pooling?

To Reduce Size

To Make the Model More Robust

Applying Max Pooling and Average Pooling in Architecture and Construction

Applications with Large-Scale Image Data

Digital Twins and BIM Integration

Material and Surface Classification

Key Insight

@Yolanda Muriel Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0)

Publicado por Yolanda MURIEL

Deja un comentario Cancelar la respuesta

The Pooling Process: Step-by-Step

1. The Concept of Stride

2. Why it «Never Overlaps»

3. Visual Comparison (Example)

Why is this important for your study?

2. Max Pooling (Extracting the «Peak»)

3. Average Pooling (Calculating the «Smoothness»)

Why Does Stride Change the Output Size?

What About Pooling?

Types of Pooling

Why Do We Use Pooling?

To Reduce Size

To Make the Model More Robust

Applying Max Pooling and Average Pooling in Architecture and Construction

Applications with Large-Scale Image Data

Digital Twins and BIM Integration

Material and Surface Classification

Key Insight

@Yolanda Muriel Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0)

Comparte esto:

Relacionado

Publicado por Yolanda MURIEL

Deja un comentario Cancelar la respuesta