Gradient Based Optimization Algorithms - Visualized

I have created a set of animations for the DL and LLM lectures taught by Prof. Mitesh Khapra at IIT Madras in the Deep Learning course.
I have uploaded the notebook that I used to create these animations (or) you can directly go to colab here
That notebook has an implementation for the gradient descent algorithm. You need to modify the update rule for each optimization algorithm.
The objective is to get an intuitive idea of the differences in the optimization algorithms with contrived examples.
You can find all the animations used in the lecture in .mp4 format in the Animations directory
Here are a few samples

Gradient Descent

Momentum Gradient Descent

Nesterov Accelarated Gradient Descent

AdaGrad

Adam

Histogram of Activations

Looking at the distribution of activation values always gives us a lot of insights
This led to the development of normalization techniques, and block-wise quantization strategies (like in DeepSeek-V3)
Here is an example of visualizing the histogram of activation values in a simple three-layer neural network.

Histogram of Gradients (not traditional HoG feature)

Can we train an LLM model using FP8?
Look at the range of gradients as the training progresses.
Likely, the values go outside the dynamic range of FP8.
Come up with tricks, like block scaling as used in DeepSeek-V3.
Here is the animation of change in gradient vales of the embedding vector corresponding to the word "the"
Use backward hooks to capture these values

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
animations		animations
Gradient_Descent_Animation.ipynb		Gradient_Descent_Animation.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gradient Based Optimization Algorithms - Visualized

Gradient Descent

Momentum Gradient Descent

Nesterov Accelarated Gradient Descent

AdaGrad

Adam

Histogram of Activations

Histogram of Gradients (not traditional HoG feature)

About

Releases

Packages

Languages

Arunprakash-A/ML-DL-LLM-Animations

Folders and files

Latest commit

History

Repository files navigation

Gradient Based Optimization Algorithms - Visualized

Gradient Descent

Momentum Gradient Descent

Nesterov Accelarated Gradient Descent

AdaGrad

Adam

Histogram of Activations

Histogram of Gradients (not traditional HoG feature)

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages