Level Up Coding

Coding tutorials and news. The developer homepage gitconnected.com && skilled.dev && levelup.dev

Follow publication

Recurrent Neural Networks(RNNs) for Sequence Classification

Sarvesh Khetan
Level Up Coding
Published in
4 min readJan 6, 2025

--

Table of Contents :

- Single Layer Architecture

  1. RNN Architecture
  2. Learning in RNN
  3. How RNN solves issues in FFNN
  4. Issues with RNN
  5. Solutions to RNN Issues

- Stacked Layer Architecture

Single Layer Architecture

RNN Architecture

Assume you have a dataset X with N features and M datapoints then the RNN architecture would look something like this

L (hyperparameter) => denotes Hidden Neurons which has to be same for all FFNN else they won’t connect to each other || K => denotes Output Neurons which can be anything depending on no of classes you need to classify

You should note that in above architecture weights W, U and V are same across all the timestamps because it’s obvious right in an FFNN same weights are applied to all the input data and hence when you unfold the FFNN to get RNN the weights should remain the same !! Researchers call this weight sharing

Hence now we understand that single layer RNNs are nothing but multiple single layer FFNNs connected in a sequence using hidden states

Since we are doing multiclass classification, we are adding softmax at the end

In many paper you will find researchers using following short hand notation of RNNs (generalizing it for timestamp t)

Following video shows animation of how RNNs performs computation internally !!

RNN Animation

Learning in RNN

Since we are using RNN to solve a classification task we can use cross entropy loss to train the network, as shown below

Now this optimization can be solved using any optimizer i.e. gradient descent / Adam / AdaGrad / … (stochastic or mini batch version). Below lets try solving it using gradient descent

Now to calculate these derivates we will take help of computation graph for this RNN architecture which is shown below

Computational Graph for RNN

Calculating dE / dB

Calculating dE / dV

Calculating dE / dW

Calculating dE / dU

Calculating dE / db

How RNN solves drawbacks of FFNNs?

We saw here few issues with FFNN because of which researchers invented RNNs, but did RNNs solve these problems? YES it solves both the FFNN problems as highlighted below

  1. In RNN we consider the entire sequential information to make the prediction because we are passing hidden states which contain the entire previous sequential information (i.e. about X1, X2 , …. Xi-1) to the next hidden unit in the hidden layer, thus solving the problem.
  2. Since we have shared weight matrices in RNN, this allows RNNs to also take variable size inputs during inference, thus solving the variable size input problem too!!

Issues with RNNs :

RNNs have two problems namely Vanishing Gradients and Exploding Gradients. To understand these refer to the dE / dW equation

Ideas to solve RNN Issues

These issues can be solved via following manner

  • Vanishing Gradients : This can be solved by introducing Skip Connections / Direct Connections (highway connection specifically) into the network
  • Exploding Gradients : This can be solved using gradient clipping technique

Stacked Architecture

Stacked RNNs
Mathematical Details in Stacked RNN
Short Hand Notation

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Written by Sarvesh Khetan

A deep learning enthusiast and a Masters Student at University of Maryland, College Park.

No responses yet