Non Learning Based Methods

This is same as what we have seen in other algorithms just that now we will apply to neural networks

(UnSupervised) Learning Based Models

Type 1 : Convert Original data to required data

1.1. AutoEncoder (AE)

Below is one of the most famous AutoEncoder architecture called U-Net

U-Net

If you create a very deep network then you can improve its performance by using tricks discussed earlier here.

1.2. Variational AutoEncoder (VAE)

If you create a very deep network then you can improve its performance by using tricks discussed earlier here.

1.3. Vector-Quantized Variational AutoEncoder(VQ-VAE)

If you create a very deep network then you can improve its performance by using tricks discussed earlier here.

1.4 Restricted Boltzmann Machines (RBMs)

I will not go deep into this model because this is an outdated model and is no more used since auto-encoders and all the above discussed models perform way better than RBMs !! You can think of RBMs as undirected versions of auto-encoders, as shown below !!

Now just like we can add multiple hidden layers in autoencoders, here also we can add multiple hidden layers in RBMs as shown below

If you add directions to the above DBMs architecture, then you will arrive at the famous Deep Belief Networks architecture!!

To learn more about RBMs you can read this article or this one too.

Type 2 : Convert Noise to required data via Single Stage Decoding

2.1. Generative Adversarial Networks (GANs)

If you create a very deep network then you can improve its performance by using tricks discussed earlier here.

Type 3: Convert Original data to required data (Improved)

3.1. AE-GAN

Out aim here is to integrate teh AE model and the GAN model hence we will use an autoencoder as generator and the discriminator remains as it is….

Instead of using a vanilla autoencoder architecture as shown above, you can also use fancy autoencoder architectures like U-Net. If you create a very deep network then you can improve its performance by using tricks discussed earlier here.

3.2. VAE-GAN

Same as above, except that here generator will be a VAE model.

If you create a very deep network then you can improve its performance by using tricks discussed earlier here.

3.3. VQ-VAE-GAN / VQ-GAN

Same as above, except that here generator will be a VQ-VAE model.

If you create a very deep network then you can improve its performance by using tricks discussed earlier here.

Type 4: Convert Noise to required data via Multi Stage Decoding

4.1 Flow Based Models

This model is oudated and no more used, so I don’t have much knowledge on this !!

4.2 Denoising Diffusion Probabilistic Model (DDPM)

If you create a very deep network then you can improve its performance by using tricks discussed earlier here.

In addition to these tricks there is one more trick called Latent DDPM. Instead of doing diffusion on the original space, it is done on the latent space, since working in latent space gives the same result but the process becomes exponentially fast, as shown in below diagram.

In here we have used an AE to convert into latent space but instead we can also use a VAE or a VQ-VAE instead to convert into latent space.

Hence in Latent DDPM the forward diffusion process is done on the compressed latents. The noise is applied to those latents, not to the original data. And so the noise predictor is actually trained to predict noise in the latent space not the original space!!

4.3. Denoising Diffusion Implicit Model (DDIM)

If you create a very deep network then you can improve its performance by using tricks discussed earlier here.

In addition to these tricks there is one more trick called Latent DDIM. Instead of doing diffusion on the original space, it is done on the latent space, since working in latent space gives the same result but the process becomes exponentially fast.

--

--

Sarvesh Khetan
Sarvesh Khetan

Written by Sarvesh Khetan

A deep learning enthusiast and a Masters Student at University of Maryland, College Park.

No responses yet