ARTIFICIAL INTELLIGENCE (54) – Computer vision (8) – Understanding Noise Scheduling and Noise Shapes in Diffusion Models

Diffusion models, particularly Denoising Diffusion Probabilistic Models (DDPMs), have become a powerful framework for generative modeling. Two key components that strongly influence their behavior are the noise schedule and the shape of the noise applied during the forward diffusion process. This article summarizes and explains these concepts clearly.

The Role of the Noise Scheduler

In the forward diffusion process, noise is gradually added to an image over multiple timesteps. The noise schedule (or beta scheduler) determines how much noise is added at each step.

Two common schedulers are:

  • Linear scheduler
  • Cosine scheduler

Linear Scheduler

The linear scheduler increases noise at a constant rate over time. This means: Noise grows steadily,  the original image structure is lost relatively quickly and later timesteps contain mostly random noise with very little meaningful signal.

Cosine Scheduler

The cosine scheduler follows a nonlinear schedule based on a cosine function. Its key properties are: Noise is added more gradually in early and middle timesteps,  the original image information is preserved for longer, and more timesteps retain useful structure.

Why the Cosine Scheduler Is Preferred

The cosine scheduler is generally preferred because: It retains information from the original image over more timesteps, this provides the model with more meaningful training data, and  it avoids situations where the model is learning from pure noise (which is not informative).

If too many timesteps contain almost no signal (as in the linear schedule), the model is effectively learning to map random noise to random noise. This does not improve learning.

In contrast, the cosine scheduler: Ensures that even later timesteps still contain some recognizable structure, and leads to better training efficiency and improved results.

Therefore, the correct conclusion is: The cosine scheduler is preferred because it retains information of the original image during more timesteps.

Noise in the Forward Diffusion Process

Another important aspect is the noise that is added to images during diffusion.

In each timestep: Random Gaussian noise is generated, and this noise is added to the image to progressively corrupt it.

What is the shape of this noise? The noise has the same shape as the input image.

Why? Noise is applied element-wise to each pixel of the image. Therefore:

  • If the image has shape:
    (batch_size, channels, height, width)
    
  • Then the noise must also have:
    (batch_size, channels, height, width)
    

This is typically implemented using this Python code:

noise = torch.randn_like(x)

Where:

  • x is the input image tensor
  • randn_like ensures identical shape

Key Takeaways

Cosine scheduler > Linear scheduler

  • Better information retention.
  • More effective learning.

Noise shape = Image shape

  • Required for pixel-wise corruption.
  • Ensures correct diffusion behavior.

Final Insight

The success of diffusion models depends heavily on how noise is introduced. A well-designed scheduler like the cosine schedule and correctly structured noise tensors ensure that the model learns meaningful transformations rather than trivial mappings. These design choices directly impact the efficiency and quality of generated outputs.

Visualization of how we suppress noise

© Image. https://edge-preserving-diffusion.mpi-inf.mpg.de/ 

 

Bonus.

Write down your ideas.

 

 

Licencia Creative Commons@Yolanda Muriel Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0)

Deja un comentario