ARTIFICIAL INTELLIGENCE (12) – Deep learning (10) Why Only the First Layer of a CNN Can Be Visualised as an RGB Image

Convolutional Neural Networks (CNNs) are made of many layers stacked together, each transforming the input image into more abstract representations. Although we often try to «visualise» what a CNN has learned, a well‑known fact is that only the filters in the very first convolutional layer can be displayed directly as RGB images.

The First Layer Has Filters With Depth = 3 (Matching RGB Channels)

A standard colour image has shape:

Height × Width × 3

The final dimension «3» corresponds to the three colour channels:

R = red
G = green
B = blue

A convolutional filter in the first layer must have the same depth as the input it processes.
So if the input has 3 channels, the filter also has depth 3:

Filter shape (example): 7 × 7 × 3

This 3‑channel structure is directly compatible with how a computer displays colour images.
Each filter can therefore be mapped to an image where:

One slice = red intensity
One slice = green intensity
One slice = blue intensity

The first layer is like looking at the world with three coloured glasses.

Deeper Layers Do Not Operate on RGB Anymore

Once the first layer processes the image, the network no longer deals with R/G/B.
Instead, the output becomes a set of feature maps, each representing a specific learned pattern:

vertical edges
horizontal edges
corners
textures
colour‑invariant shapes
patterns that correspond to parts of objects
highly abstract concepts

If the first layer has, for example, 64 filters, then it produces an output of:

Height × Width × 64

This means the next layer receives 64 channels, not 3.

Therefore, a deeper filter might look like:

3 × 3 × 64

5 × 5 × 128

There is no natural way to map 64 or 128 channels to the 3 RGB channels.

Imagine you mix together 64 crayons at once.
You can’t draw a simple picture from that — it becomes something abstract.
Deeper layers «see» the world using many more channels than our eyes can understand.

Channel Explosion Makes RGB Visualisation Impossible

As the network goes deeper:

The number of channels grows (64 → 128 → 256 → 512…).
Each filter corresponds to a 3D block with depth = number of input channels.
The depth becomes far larger than 3.

If you wanted to “convert” 128 channels into RGB:

Which channels map to Red?
Which map to Green?
Which to Blue?
How to combine them?
How to keep the interpretation consistent?

There is no principled answer.
Any mapping would be arbitrary, misleading, and meaningless.

You can only make a coloured drawing if you have three crayons.
If someone gives you 128 crayons at once, you don’t know how to use them to draw one picture.

Deep Filters Represent Concepts, Not Colours

Deeper layers represent:

edges and textures
object parts
shapes
semantic concepts
class‑specific structures
patterns created by combinations of earlier features

These concepts do not exist in RGB space.
They are abstract mathematical features extracted from the original image.

A deep filter may look like:

«respond strongly when it sees an eye‑shaped pattern»
«activate for a circular texture»
«detect a high‑level part of a specific object class»

These are not images — they are instructions for detecting patterns.

The early layers look at colours.
Later layers look at ideas, like “eye shape”, “fur texture”, or “circle pattern”.
Ideas cannot be drawn with red/green/blue colours.

Instead of Showing Filters, We Show Their Effects

Because the raw filter weights in deep layers cannot be converted to RGB, researchers use alternative visualisation methods:

Feature maps

Show which regions of the image activate a given filter.

Activation maximisation

Generate an artificial image that maximally triggers a neuron.

Guided backpropagation

Highlight pixels that contribute most to an activation.

Class activation maps (CAM / Grad‑CAM)

Visualise what part of the image the network uses to make a prediction.

These methods reveal what a filter is “looking for,” even if we cannot display the filter itself as a picture.

Meaningful visualisation of deeper layers requires activation‑based techniques, not raw kernel inspection.

Additional Resources

“How to Visualize Filters and Feature Maps in Convolutional Neural Networks”

Autor: Jason Brownlee
Sitio: Machine Learning Mastery

Visualizing representations of Outputs/Activations of each CNN layer

CS231n – Convolutional Neural Networks (Stanford)

	AI CHINESE – AI Chin… en AI CHINESE – AI Chinese Speech…
	Mane Oliva en REVIT ARCHITECTURE (199)…
	REVIT ARCHITECTURE (… en REVIT ARCHITECTURE (957) – PYT…
	REVIT ARCHITECTURE (… en REVIT ARCHITECTURE (946) – PYT…
	REVIT ARCHITECTURE (… en REVIT ARCHITECTURE (927) – PYT…

ARTIFICIAL INTELLIGENCE (12) – Deep learning (10) Why Only the First Layer of a CNN Can Be Visualised as an RGB Image

Additional Resources

@Yolanda Muriel Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0)

Publicado por Yolanda MURIEL

Deja un comentario Cancelar la respuesta

Additional Resources

@Yolanda Muriel Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0)

Comparte esto:

Relacionado

Publicado por Yolanda MURIEL

Deja un comentario Cancelar la respuesta