Image Data Augmentation Techniques

Applied GenAI

Sarvesh Khetan
7 min readJun 27, 2024

1. Generating Image Data Irrespective of Class

1.a. Non Learning Based Methods :

1.b. (Unsupervised) Learning Based Method :

We have discussed all these methods earlier in regression with tabular cross section data, now we will just apply these for image data.

1.b.1. AutoEncoders (AE) :

  • AE with FFNN
  • AE with CNN (works better for images data)

1.b.2. Variational AutoEncoders (VAE) :

  • VAE with FFNN
  • VAE with CNN (works better for images)

1.b.3. Vector Quantized — Variational Autoencoders (VQ-VAE) :

  • VQ-VAE with FFNN
  • VQ-VAE with CNN (works better for images)

1.b.4. Generative Adversarial Networks (GANs) :

  • GAN with FNN
  • GAN with CNN (works better for images)

1.b.5. AE — GAN :

  • AE-GAN with FFNN
  • AE-GAN with CNN (works better for images)

1.b.6. VAE — GAN :

  • VAE-GAN with FFNN
  • VAE-GAN with CNN (works better for images)

1.b.7. VQ — GAN :

  • VQ-GAN with FFNN
  • VQ-GAN with CNN (works better for images)

1.b.8. Denoising Diffusion Probabilistic Models (DDPMs) :

  • DDPM with FFNN
  • DDPM with CNN (works better for images)
  • Latent DDPM with FFNN
  • Latent DDPM with CNN (works better for images)

1.b.9. Denoising Diffusion Implicit Models (DDIMs) :

  • DDIM with FFNN
  • DDIM with CNN (works better for images)
  • Latent DDIM with FFNN
  • Latent DDIM with CNN (works better for images)

Note : In all the above networks you can replace the encoder part directly with the image embedding from some good foundation models like ResNet / EfficientNet…. and then just finetune this encoder by adding the decoder part instead of training an entire new encoder from scratch

2. Generating Class Specific Image Data

We have discussed all these methods earlier in classification with tabular cross section data, now we will just apply these for image data.

Non Learning Based Method

1. Flips : Horizontal / Vertical

transforms.Compose([
transforms.Resize(size=(64, 64)), # Resize the images to 64x64
transforms.RandomHorizontalFlip(p=0.5), # Flip the images randomly on the horizontal with prob of flip = 0.5 i.e. 50% probability
transforms.ToTensor() # Turn the image into a torch.Tensor which also converts all pixel values from 0 to 255 to be between 0.0 and 1.0 i.
])

2. Random Crops

Randomly sample a section from the original image then resize this section to the original image size.

3. Change Color

a. Apply PCA to all [R, G, B] pixels in the training set
b. Then sample a ‘colour offset’ along the principal component direction
c. Add this offset to all the pixels of the training images

4. Translation

Translation just involves moving the image along the X or Y direction (or both). In the following example, we assume that the image has a black background beyond its boundary, and are translated appropriately. This method of augmentation is very useful as most objects can be located at almost anywhere in the image. This forces your convolutional neural network to look everywhere

5. Rotation

6. Gaussian Noise

7. …….. you can find many more types of transformations here in official pytorch documentation !!

AutoEncoders (AE)

AutoEncoders (AE) with FFNN

Implementation of the above architecture can be found below…

Instead, you can also use a U-Net style autoencoder architecture here and if you want to train a large network then you can use the techniques discussed earlier here.

AutoEncoders (AE) with CNN

Now we know that in autoencoder we first reduce the dimensions from original space to latent space and then increase the dimensions from latent space to original space. But convolution can only reduce dimensions by nature, so what to do to increase the dimensions of the images from latent space?? Researchers came up with the following techniques….

Method 1 : Non — Learning Based Upsampling Techniques

Method 2 : Learning Based Upsampling Techniques (works better) i.e. Transpose Convolution

Hence now the CNN based Autoencoder architecture looks something like this …..

Implementation of the above architecture can be found below…

Instead, you can also use a U-Net style autoencoder architecture here and if you want to train a large network then you can use the techniques discussed earlier here. Below is the implementation of a U-Net style autoencoder architecture with residual connections as shown below…

Variational AutoEncoders (VAE)

Variational AutoEncoders (VAE) with FFNN

Implementation of the above architecture can be found below…

If you want to train a large network then you can use the techniques discussed earlier here.

Variational AutoEncoders (VAE) with CNN

Implementation of the above architecture can be found below…

If you want to train a large network then you can use the techniques discussed earlier here.

Vector Quantized — Variational AutoEncoders (VQ-VAE)

VQ-VAE with CNN

Implementation of the above architecture can be found below…

If you want to train a large network then you can use the techniques discussed earlier here.

Generative Adversarial Networks (GANs)

GAN with FFNN

Implementation of the above architecture can be found below…

If you want to train a large network then you can use the techniques discussed earlier here.

GAN with CNN (also called Deep-Conv-GAN / DC-GAN)

Implementation of the above architecture can be found below…

If you want to train a large network then you can use the techniques discussed earlier here.

VQ-GAN

VQ-GAN with CNN

Implementation of the above architecture can be found below…

If you want to train a large network then you can use the techniques discussed earlier here.

Denoising Diffusion Probabilistic Models (DDPMs)

DDPM with CNN

— — — — — — — — — — — diagram here — — — — — — — — -

Implementation of the above architecture can be found below…

Similarly you can use a U-Net style architecture here also and if you want to train a large network then you can use the techniques discussed earlier here.

--

--

Sarvesh Khetan

A deep learning enthusiast and a Masters Student at University of Maryland, College Park.