ARTIFICIAL INTELLIGENCE (52) – Computer vision (7) – Understanding Variational Autoencoders: Learning to Generate, Not Just Reconstruct

Variational Autoencoders (VAEs) represent a powerful class of generative models that go beyond traditional neural networks designed purely for reconstruction or classification tasks.

From Autoencoders to Variational Thinking

A standard autoencoder learns to map an input to a compressed representation (latent space) and reconstruct it back. While effective at minimizing reconstruction error, this approach has a major limitation: the latent space is not structured in a way that allows meaningful generation of new samples.

Traditional autoencoders comprise two primary components: an encoder and a decoder

 

The encoder maps input data to a probabilistic distribution in the latent space while the decoder reconstructs data from this latent representation.

© Image. https://spotintelligence.com/2023/12/27/variational-autoencoders-vae/

The key idea introduced in this lab is to replace deterministic encoding with probabilistic encoding. Instead of mapping an input image to a single point, the encoder maps it to a distribution in the latent space. This subtle change transforms the model into a generative one.

Learning Distributions in the Latent Space

In a VAE, the encoder does not output a single vector, but rather the parameters of a probability distribution—specifically, the mean and the variance of a Gaussian distribution. This allows each input to be represented as a region in the latent space rather than a fixed point.

To make training possible, the lab introduces the reparameterization trick, which enables sampling from this distribution while still allowing gradients to flow through the network. This step is essential because it connects probabilistic modeling with standard backpropagation.

The Role of the Decoder

Once a latent sample is obtained, the decoder takes it as input and reconstructs the original data. However, the decoder is not just reproducing inputs; it is learning a mapping from the latent distribution to the data distribution. This is what enables the model to generate new samples by simply sampling from the latent space.

Balancing Reconstruction and Regularization

One of the most important lessons of the lab is that training a VAE involves optimizing a composite loss function with two competing objectives:

  1. Reconstruction loss: Ensures that the output resembles the input.
  2. Regularization term (KL divergence): Encourages the learned latent distributions to be close to a standard normal distribution.

This balance is crucial. Without the regularization term, the model behaves like a traditional autoencoder. Without the reconstruction term, it would fail to learn meaningful mappings between inputs and outputs.

Why Structure Matters in Latent Space

A major takeaway from the lab is that the structure of the latent space determines the generative capabilities of the model. By enforcing a smooth and continuous latent space, VAEs enable:

  • Interpolation between data points
  • Generation of new, realistic samples
  • Exploration of meaningful variations in the data

The interpolation experiment in the lab illustrates this idea beautifully: intermediate points between two latent representations produce gradual transformations in the decoded outputs.

Architecture and Design Choices

The lab also highlights practical aspects of building VAEs:

  • Convolutional layers are used to capture spatial structure in image data
  • Fully connected layers map high-level features to distribution parameters
  • The encoder and decoder must be carefully designed to ensure compatibility between representations

These design choices demonstrate how modern neural architectures can be adapted for probabilistic modeling.

Beyond Reconstruction: Generative Power

Ultimately, the message of the lab is that VAEs are not just about compressing and reconstructing data—they are about learning a representation of the data distribution itself. This allows the model to:

  • Generate entirely new samples
  • Understand variations within the data
  • Provide a continuous and interpretable latent space

Conclusion

By learning distributions instead of fixed representations, VAEs unlock powerful generative capabilities and provide a foundation for more advanced models in deep generative modeling.

The key insight is simple but profound:

To generate meaningful data, a model must first learn a meaningful representation of uncertainty.

 

Bonus

 

Licencia Creative Commons@Yolanda Muriel Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0)

 

Deja un comentario