a16z Podcast cover

a16z Podcast · December 5, 2025

What Comes After ChatGPT? The Mother of ImageNet Predicts The Future

Highlights from the Episode

Justin JohnsonAI pioneer and World Labs co-founder
00:00:26 - 00:01:03
Marble: A generative model for 3D worlds
Marbl is a generative model of 3D worlds. It takes inputs like text or images and generates a 3D world that matches those inputs. While Marbl is a world model building towards spatial intelligence, it was also designed to be immediately useful. We're seeing emerging use cases in gaming, VFX, and film. Marbl offers interesting capabilities today as a product, while also laying the foundation for future grand world models.
Justin JohnsonAI pioneer and World Labs co-founder
00:04:42 - 00:05:31
Scaling compute for visual and spatial data
A key factor is the increased availability of data and computing power. The history of deep learning, in essence, is the history of scaling up compute. For example, AlexNet necessitated a shift from CPUs to GPUs. Since AlexNet, we've seen a thousandfold increase in performance per card. Today, it's common to train models not just on one GPU, but on hundreds, thousands, or even tens of thousands. The computing power we can leverage for a single model is now a million times greater than at the start of my PhD. Language models have shown significant progress recently. However, as we move towards visual, spatial, and world data, the processing demands increase substantially. This new, abundant computing power will be crucial for handling these demands.
Justin JohnsonAI pioneer and World Labs co-founder
00:10:17 - 00:11:24
Academia's evolving role in AI research
The role of academia, especially in AI, has shifted significantly in the last decade. This isn't negative; it's due to technological growth. Five or ten years ago, state-of-the-art models could be trained in a lab with just a couple of GPUs. However, because this technology was so successful and scaled up, you can no longer train such models with limited resources. This is a positive development, indicating the technology worked. Consequently, expectations for academics must shift. Our focus shouldn't be on training the biggest models, but rather on exploring novel and unconventional ideas, most of which may not succeed. There's much to be done in this area. I'm concerned that too many academics are overly focused on trying to mimic large-scale model training or treating academia as vocational training for industry labs. There's immense potential for innovation in new algorithms, architectures, and systems, even for individuals.
Justin JohnsonAI pioneer and World Labs co-founder
00:13:38 - 00:14:14
Hardware scaling limits and future breakthroughs
Yes and no. Even going from Hopper to Blackwell, the performance per watt remains similar. They primarily increase transistor count, chip size, and power usage. However, we're already observing a scaling limit in performance per watt from Hopper to Blackwell. I believe there's an opportunity for innovation. I don't know precisely what that entails, and it's not something achievable in a three-month startup cycle. However, if you dedicate a couple of years to it, you might achieve significant breakthroughs. This kind of long-range research is perfectly suited for academia.
Justin JohnsonAI pioneer and World Labs co-founder
00:27:07 - 00:27:51
AI vs. Human Intelligence: A different understanding
Understanding these models presents a unique challenge. Their intelligence differs significantly from human intelligence. Humans believe they understand things because they can introspect their own thought processes. We then assume others' thought processes are similar, inferring their internal mental states from their behavior. This leads us to believe we understand things, and that others do too. However, these models represent an alien form of intelligence. They exhibit fascinating behaviors, but their internal cognition or self-reflection, if it exists, is entirely unlike our own.
Justin JohnsonAI pioneer and World Labs co-founder
00:42:03 - 00:42:30
Emergent use cases for Marble's horizontal technology
I was joking online, posting a video on Slack about using Marbl to plan your next kitchen remodel. It actually works great for this already. You just take two images of your kitchen, reconstruct it in Marbl, and then use the editing features to visualize changes to countertops, floors, or cabinets. We didn't build anything specific for this use case, but because it's a powerful horizontal technology, these emergent use cases naturally arise from the model.
Fei-Fei LiAI pioneer Stanford professor creator of ImageNet
00:43:26 - 00:45:47
Defining spatial intelligence and its complementarity to language
AI, as a field, is largely inspired by human intelligence. We are the most intelligent animals known in the universe, and human intelligence is multifaceted. Psychologist Howard Gardner, in the 1960s, coined the term "multiple intelligences" to describe this, including linguistic, spatial, logical, and emotional intelligence. I view spatial intelligence as complementary to linguistic intelligence, not in opposition to a vague "traditional" concept. Spatial intelligence is the ability to reason, understand, move, and interact in space. Consider the deduction of DNA structure; much of it involved spatially reasoning about molecules and chemical bonds in 3D to conjecture a double helix. This ability, demonstrated by Francis Crick and Watson, is difficult to reduce to pure language. Even daily tasks, like grasping a mug, are deeply spatial. Seeing the mug, its context, my hand, and geometrically matching my hand to the mug, then touching the correct points, is all profoundly spatial. While I use language to describe it, language alone cannot enable you to pick up a mug.
Justin JohnsonAI pioneer and World Labs co-founder
00:57:16 - 00:57:46
Transformers as models of sets, not just sequences
There's a bit of technological confusion here, but Transformers have already resolved it. Transformers are not models of sequences; they are natively models of sets, which is very powerful. Many Transformers evolved from earlier architectures based on recurrent neural networks (RNNs). RNNs do have a built-in architectural bias towards modeling one-dimensional sequences. However, Transformers are simply models of sets. These sets can be one-dimensional sequences or other structures.

Keep up with new drops

Follow us on X for fresh highlight threads, release notes, and live listening sessions.

Highlight drops, launch threads, and behind-the-scenes from the podmark team.

Follow Podmark on X
00:00:0000:00:00