Part 6/15:
A rich part of the discussion revolves around in-context learning—the models’ ability to adapt to new patterns within a session. Karpathy sees it as pattern completion within a token window, arising spontaneously from vast pattern data. He debates whether in-context learning relies on internal gradient descent loops, citing papers demonstrating that models can mimic linear regression or perform other algorithms internally.
He suggests that in-context learning might run a tiny gradient descent process within the layers, an internal algorithmic adaptation, blurring the line between pattern recognition and learning. The larger question: why does in-context learning seem to produce more generalizable, intelligent behavior compared to static pretraining?