What is Spatial Artificial Intelligence? Exploring Fay Lee's Vision
Fay Lee, renowned computer scientist and often referred to as the "Godmother of AI," has recently raised hundreds of millions to build a new AI company focused on spatial intelligence. Her work centers on the idea that for AI to truly understand the real world, it must not only process language but also perceive, reason, and interact within a 3D environment.
In a recent interview with the venture capital firm a16z, Fay explained her vision for the future of AI, emphasizing that language alone is insufficient for building models of the world. This conversation reflects a shift in the artificial intelligence community, one that moves beyond text and into visual and spatial understanding—a field she has been passionate about for years.
Spatial intelligence refers to AI’s ability to understand the physical world in terms of 3D space and time. It's about how objects and events exist and interact in these dimensions. Fay Lee believes that to make AI systems more useful and intelligent, we must train them not just to interpret language but also to perceive and interact with the real world. This is the foundation of her new company, World Labs, which aims to lead advancements in AI’s ability to navigate and act within the physical world.
Fay Lee's journey in AI is illustrious. Her most famous contribution is the creation of the ImageNet dataset in 2009, a massive collection of labeled images that transformed computer vision and deep learning. It enabled machines to "see" the world by training on millions of images. This breakthrough helped revolutionize fields like autonomous vehicles, medical imaging, and AI-driven object recognition.
Beyond ImageNet, Lee has held academic positions at prestigious institutions like Princeton and Stanford. She recently left academia to launch World Labs, which focuses on advancing AI through spatial intelligence.
Why Now? The Perfect Time for Spatial AI
Lee notes that we are in a "Cambrian explosion" of AI possibilities, where text, images, and videos are all converging into new AI applications. Much of the early promise of AI has been realized, but we are just now beginning to tap into its full potential by integrating these capabilities into real-world contexts. The rise of more powerful computing, such as advancements in GPUs, and the increasing sophistication of AI algorithms, have set the stage for spatial intelligence to take off.
Her co-founders, Ben Mildenhall and Kristof Laur, have been at the forefront of related advancements such as NeRF (Neural Radiance Fields), which allows AI to infer 3D shapes from 2D images, further advancing the field of computer vision.
Spatial Intelligence vs. Language Models
Unlike large language models (LLMs), which operate by predicting sequences of tokens in one-dimensional space (text), spatial AI is fundamentally three-dimensional. According to Lee, language is a generated signal—it doesn’t naturally exist in the world—and is a lossy way to convey information about the world. On the other hand, the physical world follows the laws of physics and has inherent structures that AI needs to learn and replicate.
Spatial intelligence will allow AI to not only "see" objects in 3D but also understand their relationships and interactions in real time. This enables richer, more complex applications in augmented reality (AR), virtual reality (VR), robotics, and more.
The Future of AI: From Gaming to Augmented Reality
One exciting potential application for spatial intelligence is world generation, which could lead to the creation of fully interactive 3D worlds. This might revolutionize industries beyond gaming, where the creation of virtual environments today requires immense effort and financial investment. In the future, Lee envisions AI being able to dynamically generate richly detailed 3D spaces for various uses, from education to entertainment.
The technology also has the potential to reshape how we interact with the world. For example, AR and VR devices, such as Apple's Vision Pro, could blend the physical and digital worlds, allowing users to manipulate virtual objects within real-world environments. This blending could replace the need for physical screens and devices, transforming the way we work and interact with technology.
AI and Robotics: Bringing Spatial Intelligence to Life
Perhaps most significantly, spatial AI will be crucial in the development of intelligent robots capable of functioning autonomously in real-world environments. Tesla's autonomous driving data, collected from millions of miles driven, is an example of how spatial intelligence is already being applied to teach machines to navigate physical spaces. This technology could eventually extend beyond cars to robots like Tesla's Optimus, enabling them to perform complex tasks in a variety of settings.
As Fay Lee and her team continue to push the boundaries of spatial intelligence, the implications for the future of AI are vast. Whether in robotics, AR/VR, or entirely new forms of digital experiences, this next frontier could bring us closer to a world where AI doesn't just understand language but fully comprehends the spatial and physical dimensions of reality.
This shift from flat, 2D AI systems to richly contextualized, spatially aware models represents the next major leap forward in artificial intelligence—and it's only just beginning.
What is Spatial Artificial Intelligence? Exploring Fay Lee's Vision
Fay Lee, renowned computer scientist and often referred to as the "Godmother of AI," has recently raised hundreds of millions to build a new AI company focused on spatial intelligence. Her work centers on the idea that for AI to truly understand the real world, it must not only process language but also perceive, reason, and interact within a 3D environment.
In a recent interview with the venture capital firm a16z, Fay explained her vision for the future of AI, emphasizing that language alone is insufficient for building models of the world. This conversation reflects a shift in the artificial intelligence community, one that moves beyond text and into visual and spatial understanding—a field she has been passionate about for years.
The Importance of Spatial Intelligence
Spatial intelligence refers to AI’s ability to understand the physical world in terms of 3D space and time. It's about how objects and events exist and interact in these dimensions. Fay Lee believes that to make AI systems more useful and intelligent, we must train them not just to interpret language but also to perceive and interact with the real world. This is the foundation of her new company, World Labs, which aims to lead advancements in AI’s ability to navigate and act within the physical world.
Fay Lee's Contributions and Impact
Fay Lee's journey in AI is illustrious. Her most famous contribution is the creation of the ImageNet dataset in 2009, a massive collection of labeled images that transformed computer vision and deep learning. It enabled machines to "see" the world by training on millions of images. This breakthrough helped revolutionize fields like autonomous vehicles, medical imaging, and AI-driven object recognition.
Beyond ImageNet, Lee has held academic positions at prestigious institutions like Princeton and Stanford. She recently left academia to launch World Labs, which focuses on advancing AI through spatial intelligence.
Why Now? The Perfect Time for Spatial AI
Lee notes that we are in a "Cambrian explosion" of AI possibilities, where text, images, and videos are all converging into new AI applications. Much of the early promise of AI has been realized, but we are just now beginning to tap into its full potential by integrating these capabilities into real-world contexts. The rise of more powerful computing, such as advancements in GPUs, and the increasing sophistication of AI algorithms, have set the stage for spatial intelligence to take off.
Her co-founders, Ben Mildenhall and Kristof Laur, have been at the forefront of related advancements such as NeRF (Neural Radiance Fields), which allows AI to infer 3D shapes from 2D images, further advancing the field of computer vision.
Spatial Intelligence vs. Language Models
Unlike large language models (LLMs), which operate by predicting sequences of tokens in one-dimensional space (text), spatial AI is fundamentally three-dimensional. According to Lee, language is a generated signal—it doesn’t naturally exist in the world—and is a lossy way to convey information about the world. On the other hand, the physical world follows the laws of physics and has inherent structures that AI needs to learn and replicate.
Spatial intelligence will allow AI to not only "see" objects in 3D but also understand their relationships and interactions in real time. This enables richer, more complex applications in augmented reality (AR), virtual reality (VR), robotics, and more.
The Future of AI: From Gaming to Augmented Reality
One exciting potential application for spatial intelligence is world generation, which could lead to the creation of fully interactive 3D worlds. This might revolutionize industries beyond gaming, where the creation of virtual environments today requires immense effort and financial investment. In the future, Lee envisions AI being able to dynamically generate richly detailed 3D spaces for various uses, from education to entertainment.
The technology also has the potential to reshape how we interact with the world. For example, AR and VR devices, such as Apple's Vision Pro, could blend the physical and digital worlds, allowing users to manipulate virtual objects within real-world environments. This blending could replace the need for physical screens and devices, transforming the way we work and interact with technology.
AI and Robotics: Bringing Spatial Intelligence to Life
Perhaps most significantly, spatial AI will be crucial in the development of intelligent robots capable of functioning autonomously in real-world environments. Tesla's autonomous driving data, collected from millions of miles driven, is an example of how spatial intelligence is already being applied to teach machines to navigate physical spaces. This technology could eventually extend beyond cars to robots like Tesla's Optimus, enabling them to perform complex tasks in a variety of settings.
As Fay Lee and her team continue to push the boundaries of spatial intelligence, the implications for the future of AI are vast. Whether in robotics, AR/VR, or entirely new forms of digital experiences, this next frontier could bring us closer to a world where AI doesn't just understand language but fully comprehends the spatial and physical dimensions of reality.
This shift from flat, 2D AI systems to richly contextualized, spatially aware models represents the next major leap forward in artificial intelligence—and it's only just beginning.