Grokking

Videos

Deep learning

Empirical patterns

Grokking

Author

Chris Endemann

Published

July 26, 2024

About this resource

The verb, “to grok”, was originally coined by Robert A. Heinlein in his 1961 science fiction novel “Stranger in a Strange Land,” where it meant to understand something so thoroughly that it becomes a part of oneself. In the context of machine learning, “grokking” refers to the phenomenon whereby a model, after extensive training, suddenly shifts from merely memorizing data to achieving a deep and intuitive understanding, allowing it to generalize effectively to new, unseen data. Observing this phenomenon requires a significantly large increase in training iterations, often far beyond the usual training duration expected for a model to reach acceptable performance. This prolonged training period initially shows no improvement in generalization, making the eventual transition to grokking both surprising and significant.

The grokking phenomenon was recently investigated in a notable paper from OpenAI (Power et al., 2021). The video you’ll be watching will explain OpenAI’s findings on this transition to generalization, highlighting the dramatic increase in iterations needed and providing a deeper understanding of the process and its implications for developing more robust and reliable machine learning models.

Grokking: Generalization beyond overfitting on small algorithmic datasets (paper explained)

Questions?

If you any lingering questions about this resource, please feel free to post to the Nexus Q&A on GitHub. We will improve materials on this website as additional questions come in.