Vachan V Y VachanVY

Hi, I'm Vachan! I'm like Deep Learning and Systems Programming

Projects:

NeuroForge:

Implemented Neural Network (Forward and Backward Propagation), Batchnorm and Layernorm, Dropout from scratch just using basic tensor methods
Neural Networks => nn.ipynb
- Logistic Regression
- MLP
Batch-Normalization and Layer-Normalization: Why When Where & How? => batchnorm.ipynb, layernorm.ipynb
Dropout: Why When Where & How? => dropout.ipynb, dropout_scale.ipynb
- Comparision before and after scaling the model => dropout_scale.ipynb, nn_scale.ipynb
Adam and AdamW
- Adam
- AdamW
Model Distillation => distillation.ipynb
Mixture-Of-Experts (MoE) Layers

Transformers

graph TD;
    Transformers -->|Text| GPT;
    Transformers -->|Images| Vision_Transformers["Vision Transformers"];
    Transformers -->|Audio| MAGNeT["MAGNeT"];
    Transformers --> |Video| Video_Vision_Transformers["Video Vision Transformers"];
    Transformers -->|Diffusion| Diffusion_Transformers["Diffusion Transformers"];

    GPT --> Multi_Modal_Transformers["Multi-Modal Transformer"];
    Vision_Transformers --> Multi_Modal_Transformers;
    MAGNeT --> Multi_Modal_Transformers;
    Video_Vision_Transformers --> Multi_Modal_Transformers;
    Diffusion_Transformers --> Multi_Modal_Transformers;

    Multi_Modal_Transformers --> LLMs["Large Language Models (LLMs)"];
    RLHF["Reinforcement Learning from Human Feedback (RLHF)"] --> LLMs;

    Reinforcement_Learning --> RLHF;

    LLMs --> Reasoning_LLMs["Reasoning LLMs"];
    Reinforcement_Learning --> Reasoning_LLMs;

gpt.jax:

GPT written in jax, trained on tiny shakespeare dataset (1.1 MB text data) and scaled it on the tiny stories dataset (~2 GB text data)

Model-Params	`d_model`	`n_heads`	`maximum_context_length`	`num_layers`	`vocab_size`	Estimated Validation Loss on tiny stories dataset
280K	64	8	512	5	512	1.33
15M	288	6	256	6	32000	1.19
45M	512	8	1024	8	32000	TODO
110M	768	12	2048	12	32000	TODO

Model: 15M | Prompt: Once upon a time, | Sampling Technique: Greedy sampling

Once upon a time, there was a little girl named Lily. She loved to play with her toys and eat yummy food. One day, she found a big, round thing in her room. It was a microscope. Lily was very curious about it.
Lily wanted to see what was inside the microscope. She tried to open it, but it was very hard. She tried and tried, but she could not open it. Lily felt sad and wanted to find a way to open the microscope.
Then, Lily had an idea. She asked her mom for help. Her mom showed her how to open the microscope. Lily was so happy! She looked through the microscope and saw many tiny things. She was so excited to see the tiny things. Lily and her mom had a fun day together.

Prompt: Once upon a time, in a big forest, there was a fearful little dog named Spot | Sampling Technique: Greedy sampling

Once upon a time, in a big forest, there was a fearful little dog named Spot. Spot was scared of many things. One day, Spot saw a big tree with a hole in it. He thought, "I want to see what is inside the hole."
Spot went to the tree and looked inside the hole. He saw a little bird with a hurt wing. Spot said, "I will help you, little bird." He used his paw to gently lift the bird out of the hole. The bird was very happy and said, "Thank you, Spot!"
Spot and the bird became good friends. They played together in the forest every day. Spot learned that it is good to help others, even if they are scared of something. And they lived happily ever after.

Diffusion Transformers

CelebA
- More Generated-images <====== See more Model Generated Images here
- Training-insights
MNIST-experiment
- Training on MNIST
Diffusion-Transformers Paper Summary
Some generated images:

Reinforcement-Learning

Algorithms	Environment (Name & Goal)	Environment GIF	Plots
Policy Iteration	Frozen Lake: The player makes moves until they reach the goal or fall in a hole. The lake is slippery (unless disabled) so the player may move perpendicular to the intended direction sometimes.		-
Value Iteration	Taxi-v3: The taxi starts at a random location within the grid. The passenger starts at one of the designated pick-up locations. The passenger also has a randomly assigned destination (one of the four designated locations).		-
Monte Carlo Exploring Starts	Blackjack-v1: a card game where the goal is to beat the dealer by obtaining cards that sum to closer to 21 (without going over 21) than the dealer's cards
Sarsa	CliffWalking-v0: Reach goal without falling		Sarsa: Orange
Q-learning	CliffWalking-v0: Reach goal without falling		Q-learning: Blue
Expected Sarsa	CliffWalking-v0: Reach goal without falling		Expected Sarsa: Green
Double Q-learning	CliffWalking-v0: Reach goal without falling		Double Q-learning: Red
n-step Bootstrapping (TODO)	-	-	-
Dyna-Q	ShortcutMazeEnv (custom made env): Reach the goal dodging obstacles
Prioritized Sweeping	ShortcutMazeEnv (custom made env): Reach the goal dodging obstacles
Monte-Carlo Policy-Gradient	CartPole-v1: goal is to balance the pole by applying forces in the left and right direction on the cart.
REINFORCE with Baseline	CartPole-v1: goal is to balance the pole by applying forces in the left and right direction on the cart.		-
One-Step Actor-Critic	CartPole-v1: goal is to balance the pole by applying forces in the left and right direction on the cart.
Policy Gradient on Continuous Actions (TODO)	-	-	-
On-policy Control with Approximation (TODO)	-	-	-
Off-policy Methods with Approximation (TODO)	-	-	-
Eligibility Traces (TODO)	-	-	-

Deep Reinforcement Learning: Paper Implementations

Year	Paper	Environment (Name & Goal)	Environment GIF	Plots
2013	Playing Atari with Deep Reinforcement Learning	ALE/Pong-v5 - You control the right paddle, you compete against the left paddle controlled by the computer. You each try to keep deflecting the ball away from your goal and into your opponent’s goal.
2014	Deep Deterministic Policy Gradient (DDPG)	Pendulum-v1 - The pendulum starts in a random position and the goal is to apply torque on the free end to swing it into an upright position, with its center of gravity right above the fixed point.
2015, 2016	Deep Reinforcement Learning with Double Q-Learning + Prioritized Experience Replay	-	-	-
2017	Proximal Policy Optimization (PPO)	LunarLander-v3: This environment is a classic rocket trajectory optimization problem. According to Pontryagin’s maximum principle, it is optimal to fire the engine at full throttle or turn it off
2018	Soft Actor-Critic (SAC)	InvertedDoublePendulum-v5: The cart can be pushed left or right, and the goal is to balance the second pole on top of the first pole, which is in turn on top of the cart, by applying continuous forces to the cart.	Constant Alpha: Learnable Alpha (TODO: add an explanation for adaptive alpha loss):	Constant Alpha: Learnable Alpha:
2017	Mastering the Game of Go without Human Knowledge	Go - Win against self-played adversary	-	-
2017	AlphaZero	Chess - Beat traditional engines	-	-
2020	Mastering Atari, Go, Chess and Shogi with a Learned Model	Multiple Environments (Planning with Models)	-	! -
20xx	AlphaFold	Protein Folding - Predict protein structures	-	-