Non Modelling Methods for Image Representation Learning

Sarvesh Khetan
3 min readNov 14, 2024

--

Table of Contents

Method 1 : Image Flattened Representations

For GrayScale Images

For RGB Images

Above I have shown taking average of all channels but instead you can also take sum of all three channels

Method 2: Kernel / Filter Based Representations

Motivation

Drawback of above method is that flattening leads to loss of spatial information in the image and spatial information is crucial. Hence to solve this problem researchers came up with this idea!! Taking inspiration from NLP wherein we used surrounding context and current context to create embedding for current word similarly here in images to inculcate spatial information we will make use of surrounding / neighbouring pixels and the current pixel while creating embedding of the current pixel

For GrayScale Images

Hence Image Flattened Representation is nothing but a single kernel of size 1*1

For RGB Images

Drawbacks

This is a manual approach and hence needs domain expertise to specify which kernel needs to be used for specific use-case. For example : If you want to detect if a person is sleeping in the class or not, a vital feature is to detect the distance between the eyelids. Hence you need to design a kernel which can perform this.

--

--

Sarvesh Khetan
Sarvesh Khetan

Written by Sarvesh Khetan

A deep learning enthusiast and a Masters Student at University of Maryland, College Park.

No responses yet