Part 8/15:
He envisions architectures with huge, sparse, long-term memory modules (e.g., sparse attention mechanisms) that can extend context and reflect human-like memory processes, akin to sleep or reflection phases in humans.
The Path Toward Smaller, More Efficient Models
While current AI models often boast hundreds of billions to trillions of parameters, Karpathy believes that the core of intelligence could be captured with surprisingly small models—possibly in the range of a billion parameters or less. He draws parallels to the dramatic gains achieved in the last decade by scaling down models, fine-tuning them, and improving data quality rather than merely increasing size.