Part 7/8:
The principle of superposition indicates that while one might want to represent various facts as discrete neurons, in high-dimensional spaces, many features can overlap or interlace. Allowing for "nearly perpendicular" directions in this high-dimensional realm means LLMs can accommodate significantly more information than superficially expected.
This dimensional flexibility may partly explain the models' scalability, where increasing dimensions exponentially enhances the capacity to encode diverse concepts. For instance, if the model grows tenfold in dimensionality, it has the potential to house exponentially more distinct ideas.