An information theoretic explanation for several empirical phenomena in language models
language-models ldpc-codes emergence chinchilla plateaus size-scaling skill-text training-compute-scaling
-
Updated
Feb 18, 2025 - Jupyter Notebook