Non Modelling Methods for Image Representation Learning
Table of Contents
- Method 1 : Image Flattening Representation
a. For GrayScale Images
b. For RGB Images - Method 2 : Kernel Based Representation
a. Motivation
b. For GrayScale Images
c. For RGB Images
d. Drawbacks
Method 1 : Image Flattened Representations
For GrayScale Images
For RGB Images
Above I have shown taking average of all channels but instead you can also take sum of all three channels
Method 2: Kernel / Filter Based Representations
Motivation
Drawback of above method is that flattening leads to loss of spatial information in the image and spatial information is crucial. Hence to solve this problem researchers came up with this idea!! Taking inspiration from NLP wherein we used surrounding context and current context to create embedding for current word similarly here in images to inculcate spatial information we will make use of surrounding / neighbouring pixels and the current pixel while creating embedding of the current pixel
For GrayScale Images
Hence Image Flattened Representation is nothing but a single kernel of size 1*1
For RGB Images
Drawbacks
This is a manual approach and hence needs domain expertise to specify which kernel needs to be used for specific use-case. For example : If you want to detect if a person is sleeping in the class or not, a vital feature is to detect the distance between the eyelids. Hence you need to design a kernel which can perform this.