GitHub - nayakanuj/Size_Scaling_Emergence_Plateau: An information theoretic explanation for several empirical phenomena in language models

Code for "An Information Theory of Compute-Optimal Size Scaling, Emergence, and Plateaus in Language Models" at Neural Compression Workshop, NeurIPS 2024.

TLDR: We present a simplified unified graph framework to explain compute-optimal size scaling, emergent capabilities, and performance plateauing using tools from iterative decoding in information theory and random network theory.

Abstract: Recent empirical studies show three phenomena with increasing size of language models: compute-optimal size scaling, emergent capabilities, and performance plateauing. We present a simple unified mathematical framework to explain all of these language model scaling phenomena, building on recent skill-text bipartite graph frameworks for semantic learning. Modeling the learning of concepts from texts as an iterative process yields an analogy to iterative decoding of low-density parity check (LDPC) codes in information theory. Thence, drawing on finite-size scaling characterizations of LDPC decoding, we derive the compute-optimal size scaling (Chinchilla rule) for language models. Further, using tools from random network theory, we provide a simple explanation for both emergence of complex skills and plateauing of performance as the size of language models scale. We see multiple plateaus.

Notebook: info_theory_size_scaling_plateaus.ipynb

Citation:

@inproceedings{nayak2024information,
  title={An Information Theory of Compute-Optimal Size Scaling, Emergence, and Plateaus in Language Models},
  author={Nayak, Anuj K and Varshney, Lav R},
  booktitle={Workshop on Machine Learning and Compression, NeurIPS 2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
README.md		README.md
info_theory_size_scaling_plateaus.ipynb		info_theory_size_scaling_plateaus.ipynb
modules_core.py		modules_core.py
modules_dbg.py		modules_dbg.py
plot_chinchilla_loss_scaling_emergence_plateau_prereq.py		plot_chinchilla_loss_scaling_emergence_plateau_prereq.py
skill_text_finite_size_scaling_biterr.py		skill_text_finite_size_scaling_biterr.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

nayakanuj/Size_Scaling_Emergence_Plateau

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages