Image Data Augmentation Techniques

Applied GenAI

7 min readJun 27, 2024

1. Generating Image Data Irrespective of Class

1.a. Non Learning Based Methods :

1.b. (Unsupervised) Learning Based Method :

We have discussed all these methods earlier in regression with tabular cross section data, now we will just apply these for image data.

1.b.1. AutoEncoders (AE) :

AE with FFNN
AE with CNN (works better for images data)

1.b.2. Variational AutoEncoders (VAE) :

VAE with FFNN
VAE with CNN (works better for images)

1.b.3. Vector Quantized — Variational Autoencoders (VQ-VAE) :

VQ-VAE with FFNN
VQ-VAE with CNN (works better for images)

1.b.4. Generative Adversarial Networks (GANs) :

GAN with FNN
GAN with CNN (works better for images)

1.b.5. AE — GAN :

AE-GAN with FFNN
AE-GAN with CNN (works better for images)

1.b.6. VAE — GAN :

VAE-GAN with FFNN
VAE-GAN with CNN (works better for images)

1.b.7. VQ — GAN :

VQ-GAN with FFNN
VQ-GAN with CNN (works better for images)

1.b.8. Denoising Diffusion Probabilistic Models (DDPMs) :

DDPM with FFNN
DDPM with CNN (works better for images)
Latent DDPM with FFNN
Latent DDPM with CNN (works better for images)

1.b.9. Denoising Diffusion Implicit Models (DDIMs) :

DDIM with FFNN
DDIM with CNN (works better for images)
Latent DDIM with FFNN
Latent DDIM with CNN (works better for images)

Note : In all the above networks you can replace the encoder part directly with the image embedding from some good foundation models like ResNet / EfficientNet…. and then just finetune this encoder by adding the decoder part instead of training an entire new encoder from scratch

2. Generating Class Specific Image Data

We have discussed all these methods earlier in classification with tabular cross section data, now we will just apply these for image data.

Non Learning Based Method

1. Flips : Horizontal / Vertical

transforms.Compose([
   transforms.Resize(size=(64, 64)), # Resize the images to 64x64
   transforms.RandomHorizontalFlip(p=0.5), # Flip the images randomly on the horizontal with prob of flip = 0.5 i.e. 50% probability
   transforms.ToTensor() # Turn the image into a torch.Tensor which also converts all pixel values from 0 to 255 to be between 0.0 and 1.0 i.
])

2. Random Crops

Randomly sample a section from the original image then resize this section to the original image size.

3. Change Color

a. Apply PCA to all [R, G, B] pixels in the training set
b. Then sample a ‘colour offset’ along the principal component direction
c. Add this offset to all the pixels of the training images

4. Translation

Translation just involves moving the image along the X or Y direction (or both). In the following example, we assume that the image has a black background beyond its boundary, and are translated appropriately. This method of augmentation is very useful as most objects can be located at almost anywhere in the image. This forces your convolutional neural network to look everywhere

5. Rotation

6. Gaussian Noise

7. …….. you can find many more types of transformations here in official pytorch documentation !!

AutoEncoders (AE)

AutoEncoders (AE) with FFNN

Implementation of the above architecture can be found below…

CV/data_augmentation/ae/ae_ffnn.ipynb at main · khetansarvesh/CV

Implementation of algorithms like CNN, Vision Transformers, VAE, GAN, Diffusion .... for image data …

github.com

Instead, you can also use a U-Net style autoencoder architecture here and if you want to train a large network then you can use the techniques discussed earlier here.

AutoEncoders (AE) with CNN

Now we know that in autoencoder we first reduce the dimensions from original space to latent space and then increase the dimensions from latent space to original space. But convolution can only reduce dimensions by nature, so what to do to increase the dimensions of the images from latent space?? Researchers came up with the following techniques….

Method 1 : Non — Learning Based Upsampling Techniques

Method 2 : Learning Based Upsampling Techniques (works better) i.e. Transpose Convolution

Hence now the CNN based Autoencoder architecture looks something like this …..

Implementation of the above architecture can be found below…

CV/data_augmentation/ae/ae_cnn.ipynb at main · khetansarvesh/CV

Implementation of algorithms like CNN, Vision Transformers, VAE, GAN, Diffusion .... for image data …

github.com

Instead, you can also use a U-Net style autoencoder architecture here and if you want to train a large network then you can use the techniques discussed earlier here. Below is the implementation of a U-Net style autoencoder architecture with residual connections as shown below…

CV/data_augmentation/ae/ae_cnn_residual.ipynb at main · khetansarvesh/CV

Implementation of algorithms like CNN, Vision Transformers, VAE, GAN, Diffusion .... for image data …

github.com

Variational AutoEncoders (VAE)

Variational AutoEncoders (VAE) with FFNN

Implementation of the above architecture can be found below…

CV/data_augmentation/vae/vae_ffnn.ipynb at main · khetansarvesh/CV

Implementation of algorithms like CNN, Vision Transformers, VAE, GAN, Diffusion .... for image data …

github.com

If you want to train a large network then you can use the techniques discussed earlier here.

Variational AutoEncoders (VAE) with CNN

Implementation of the above architecture can be found below…

CV/data_augmentation/vae/vae_cnn.ipynb at main · khetansarvesh/CV

Implementation of algorithms like CNN, Vision Transformers, VAE, GAN, Diffusion .... for image data …

github.com

If you want to train a large network then you can use the techniques discussed earlier here.

Vector Quantized — Variational AutoEncoders (VQ-VAE)

VQ-VAE with CNN

Implementation of the above architecture can be found below…

CV/data_augmentation/vq_vae at main · khetansarvesh/CV

Implementation of algorithms like CNN, Vision Transformers, VAE, GAN, Diffusion .... for image data …

github.com

If you want to train a large network then you can use the techniques discussed earlier here.

Generative Adversarial Networks (GANs)

GAN with FFNN

Implementation of the above architecture can be found below…

CV/data_augmentation/gan/gan_ffnn.ipynb at main · khetansarvesh/CV

Implementation of algorithms like CNN, Vision Transformers, VAE, GAN, Diffusion .... for image data …

github.com

If you want to train a large network then you can use the techniques discussed earlier here.

GAN with CNN (also called Deep-Conv-GAN / DC-GAN)

Implementation of the above architecture can be found below…

CV/data_augmentation/gan/cnn_gan at main · khetansarvesh/CV

Implementation of algorithms like CNN, Vision Transformers, VAE, GAN, Diffusion .... for image data …

github.com

If you want to train a large network then you can use the techniques discussed earlier here.

VQ-GAN

VQ-GAN with CNN

Implementation of the above architecture can be found below…

CV/data_augmentation/vq_gan/vq_gan_cnn.ipynb at main · khetansarvesh/CV

Implementation of algorithms like CNN, Vision Transformers, VAE, GAN, Diffusion .... for image data …

github.com

If you want to train a large network then you can use the techniques discussed earlier here.

Denoising Diffusion Probabilistic Models (DDPMs)

DDPM with CNN

— — — — — — — — — — — diagram here — — — — — — — — -

Implementation of the above architecture can be found below…

CV/data_augmentation/ddpm/ddpm_cnn.ipynb at main · khetansarvesh/CV

Implementation of algorithms like CNN, Vision Transformers, VAE, GAN, Diffusion .... for image data …

github.com

Similarly you can use a U-Net style architecture here also and if you want to train a large network then you can use the techniques discussed earlier here.

Image Data Augmentation Techniques

Applied GenAI

1. Generating Image Data Irrespective of Class

1.a. Non Learning Based Methods :

1.b. (Unsupervised) Learning Based Method :

2. Generating Class Specific Image Data

Non Learning Based Method

AutoEncoders (AE)

AutoEncoders (AE) with FFNN

CV/data_augmentation/ae/ae_ffnn.ipynb at main · khetansarvesh/CV

Implementation of algorithms like CNN, Vision Transformers, VAE, GAN, Diffusion .... for image data …

AutoEncoders (AE) with CNN

CV/data_augmentation/ae/ae_cnn.ipynb at main · khetansarvesh/CV

Implementation of algorithms like CNN, Vision Transformers, VAE, GAN, Diffusion .... for image data …

CV/data_augmentation/ae/ae_cnn_residual.ipynb at main · khetansarvesh/CV

Implementation of algorithms like CNN, Vision Transformers, VAE, GAN, Diffusion .... for image data …

Variational AutoEncoders (VAE)

Variational AutoEncoders (VAE) with FFNN

CV/data_augmentation/vae/vae_ffnn.ipynb at main · khetansarvesh/CV

Implementation of algorithms like CNN, Vision Transformers, VAE, GAN, Diffusion .... for image data …

Variational AutoEncoders (VAE) with CNN

CV/data_augmentation/vae/vae_cnn.ipynb at main · khetansarvesh/CV

Implementation of algorithms like CNN, Vision Transformers, VAE, GAN, Diffusion .... for image data …

Vector Quantized — Variational AutoEncoders (VQ-VAE)

VQ-VAE with CNN

CV/data_augmentation/vq_vae at main · khetansarvesh/CV

Implementation of algorithms like CNN, Vision Transformers, VAE, GAN, Diffusion .... for image data …

Generative Adversarial Networks (GANs)

GAN with FFNN

CV/data_augmentation/gan/gan_ffnn.ipynb at main · khetansarvesh/CV

Implementation of algorithms like CNN, Vision Transformers, VAE, GAN, Diffusion .... for image data …

GAN with CNN (also called Deep-Conv-GAN / DC-GAN)

CV/data_augmentation/gan/cnn_gan at main · khetansarvesh/CV

Implementation of algorithms like CNN, Vision Transformers, VAE, GAN, Diffusion .... for image data …

VQ-GAN

VQ-GAN with CNN

CV/data_augmentation/vq_gan/vq_gan_cnn.ipynb at main · khetansarvesh/CV

Implementation of algorithms like CNN, Vision Transformers, VAE, GAN, Diffusion .... for image data …

Denoising Diffusion Probabilistic Models (DDPMs)

DDPM with CNN

CV/data_augmentation/ddpm/ddpm_cnn.ipynb at main · khetansarvesh/CV

Implementation of algorithms like CNN, Vision Transformers, VAE, GAN, Diffusion .... for image data …

Written by Sarvesh Khetan