GEOMETRIC DEEP LEARNING BLUEPRINT

Machine Learning Street Talk • Sep 19, 2021

Audio Brief

Show transcript

This episode explores Geometric Deep Learning, a framework unifying architectures like CNNs, GNNs, and Transformers through the mathematics of symmetry and geometry. There are four key takeaways from this discussion. Geometric Deep Learning offers a unified framework for major deep learning architectures based on symmetry. Incorporating geometric priors improves model generalization and data efficiency, crucial for high-dimensional problems. The "Hardware Lottery" significantly impacts an algorithm's practical success, favoring compatibility with existing GPU architecture. A fundamental trade-off exists between explicitly modeling a problem's structure and learning it from large datasets. Geometric Deep Learning provides a principled view, showing how CNNs, GNNs, and Transformers are instances of a general design rooted in symmetry, local pooling, and geometric stability. This first-principles mindset helps understand and design more robust models, moving beyond ad-hoc approaches. Embedding assumptions about data symmetries directly into model architecture, known as geometric priors, is essential. This approach counters the "curse of dimensionality" and enables models to generalize and extrapolate effectively beyond their training data distributions. An algorithm's success is often heavily influenced by its fit with available hardware. Models designed for dense matrix operations, like Transformers, have gained widespread adoption partly due to their compatibility with GPU architectures, effectively winning the "Hardware Lottery." Machine learning involves a constant balance between explicit modeling and data-driven learning. While large datasets can help models learn symmetries, building symmetries directly into the architecture is often more efficient and necessary, especially for domains with vast symmetry groups. Ultimately, building machine learning on solid mathematical foundations like geometric priors will unlock its full potential for scientific breakthroughs and robust generalization.

Episode Overview

This episode introduces Geometric Deep Learning (GDL), a principled framework that unifies popular architectures like CNNs, GNNs, and Transformers by grounding them in the mathematics of symmetry and geometry.
The speakers argue that incorporating geometric priors (inductive biases) is essential for overcoming the "curse of dimensionality" and enabling machine learning models to generalize and extrapolate more effectively.
The discussion covers the theoretical underpinnings of deep learning, including the power of depth through composition, the limitations of current models in extrapolation, and the practical trade-offs between architectural design and learning from data.
Key concepts like the "Hardware Lottery" are explored, explaining how a model's compatibility with GPU architecture, not just its theoretical merit, can drive its success and adoption.
The conversation concludes by examining practical applications in science and reinforcement learning, and delves into the philosophical nature of intelligence, reframing it as the ability to abstract and generalize information.

Key Concepts

Geometric Priors & Inductive Bias: The core idea is to embed assumptions about the data's structure—specifically its symmetries (like translation, rotation, permutation)—into the model architecture. This helps combat the curse of dimensionality by constraining the function space.
Geometric Deep Learning Blueprint: A unified perspective viewing popular deep learning models (CNNs, GNNs, Transformers) as instances of a general design based on symmetry (invariance/equivariance), local pooling (scale separation), and geometric stability.
Erlangen Program: The historical inspiration for GDL, Felix Klein's program unified geometries by studying their invariant properties under groups of transformations. GDL applies this same mindset to machine learning.
Symmetry (Invariance & Equivariance): Invariance means the output does not change when the input is transformed (e.g., image classification under rotation). Equivariance means the output transforms in a predictable way with the input (e.g., image segmentation).
Depth Separation: The theoretical advantage of deep networks over shallow ones. Deep networks use composition of functions, which can represent certain complex functions exponentially more efficiently than shallow networks that rely on addition.
Extrapolation Challenge: A fundamental problem in ML where models, as powerful interpolators, struggle to generalize outside their training data distribution. Building in the right inductive biases is a key approach to enabling robust extrapolation.
Vector Spaces & Non-Euclidean Geometries: While continuous vector spaces are dominant due to computational convenience and their fit for gradient-based optimization, some data (like hierarchies or trees) are better represented in non-Euclidean spaces, such as hyperbolic spaces.
Transformers as Graph Neural Networks: Transformers can be understood as a specific type of GNN operating on a fully connected graph of its inputs (e.g., words in a sentence), using the attention mechanism to learn the graph's edge weights.
The Hardware Lottery: The concept that an algorithm's success is often heavily influenced by its compatibility with available hardware (e.g., GPUs). Models with dense matrix operations, like Transformers, have "won" the current lottery.
Geometric Stability: A relaxation of strict equivariance. It ensures that if an input is slightly perturbed (close to a symmetric transformation), the output will also only change slightly. This is crucial for handling the approximate symmetries found in real-world data.

Quotes

At 0:42 - "Symmetry, as wide or narrow as you may define its meaning, is one idea by which man through the ages has tried to comprehend and create order, beauty, and perfection." - Dr. Scarfe quotes German mathematician Hermann Weyl to emphasize the foundational importance of symmetry.
At 4:05 - "High-dimensional learning is impossible due to the curse of dimensionality. It only works if we make some very strong assumptions about the regularities of the space of functions that we need to search through." - Dr. Scarfe explains the fundamental challenge that necessitates the use of priors in machine learning.
At 24:57 - "I like to think of geometric deep learning as not a single method or architecture, but as a mindset. It's a way of looking at machine learning problems from the first principles of symmetry and invariance." - Michael Bronstein defines the core philosophy of geometric deep learning.
At 25:57 - "What I often see in deep learning when deep learning is taught is that it appears as a bunch of hacks with weak or no justification." - Michael Bronstein critiques the common ad-hoc approach to teaching deep learning, which their principled framework aims to correct.
At 26:30 - "The knowledge of principles easily compensates the lack of knowledge of facts." - Michael Bronstein quotes Claude-Adrien Helvétius to underscore the value of a first-principles approach over memorizing disparate facts or methods.
At 57:23 - "I do, however, believe that in order to make progress to the next level and make machine learning achieve its potential... it must be built on solid mathematical foundations." - Michael Bronstein asserting the critical importance of mathematical rigor for the future advancement of machine learning.
At 57:38 - "I also think that machine learning will drive future scientific breakthroughs... a good litmus test would be a Nobel prize awarded for a discovery made by or with the help of an ML system. It might already happen in the next decade." - Michael Bronstein on his prediction for the transformative impact of machine learning on fundamental science.
At 86:41 - "This is really understanding which class of functions benefit fundamentally from composition rather than from addition." - This succinctly frames the core difference between deep and shallow learning architectures.
At 89:23 - "The conservative answer of a statistical learning person would be no, because we don't have good theorems right now that tell us that this is the case." - The speaker provides the standard theoretical answer to the question of whether neural networks can extrapolate.
At 117:17 - "Vectors are probably the most convenient representation for both humans and computers. We can do algebraic arithmetic operations with them... they are also continuous objects, so it is very easy to use continuous optimization techniques in vector spaces." - Michael Bronstein outlines the practical and mathematical advantages of using vector spaces.
At 119:46 - "In graph learning, spaces with other more exotic geometries such as hyperbolic spaces have recently become popular... you can see that in certain types of graphs, the number of neighbors grows exponentially with the radius." - Michael Bronstein on why non-Euclidean geometries are necessary for certain data types.
At 126:26 - "It's always this question of the tradeoff between how much you model and how much you learn... machine learning is always the second best solution." - Michael Bronstein on the fundamental tradeoff between incorporating strong priors (modeling) and learning from data.
At 146:58 - "To me it makes sense to model as much as possible and learn what is hard or impossible to model." - Michael Bronstein sharing his guiding principle on the balance between explicit modeling and data-driven learning.
At 147:58 - "You can compensate for explicitly not accounting for rotational symmetry with data augmentation and more complex architectures and larger training sets." - Michael Bronstein explaining how hardware efficiency allows models to learn symmetries from data instead of having them hard-coded.
At 149:35 - "They can be seen as the graph neural network that has won the current hardware lottery." - Petar Veličković describing Transformers as a GNN variant that is perfectly suited for current hardware, explaining its widespread adoption.
At 157:28 - "In some cases you don't have a choice. So graphs are a great example... you're never ever going to be able to exhaustively sample that that group, and so it's better to just build it in." - Taco Cohen explaining that for domains with massive symmetry groups, building the symmetry into the architecture is the only practical approach.
At 178:41 - "Essentially, locality is a feature, not a bug, in many situations." - Michael Bronstein arguing that local operations, a core part of many successful deep learning models, are a key strength rather than a limitation.
At 181:25 - "These architectures that do message passing, but in a way that is equivariant to these rigid transformations, actually are more successful than generic graph neural networks." - Michael Bronstein on the success of building in physical symmetries, as seen in applications like virtual drug screening and AlphaFold.
At 198:16 - "The question of whether a computer can think is about as relevant as whether a submarine can swim." - Petar Veličković quoting Edsger Dijkstra to argue that the goal of AI is not to perfectly mimic human intelligence but to solve problems effectively.
At 201:22 - "I would define it as the faculty to abstract information... and the ability to use it in other contexts." - Michael Bronstein offering his perspective on defining intelligence as the ability to abstract and generalize knowledge.

Takeaways

Adopt a "mindset of first principles" by identifying the underlying symmetries and geometric structures of a problem before selecting or designing a model.
Use geometric priors to build more data-efficient and generalizable models, especially when data is scarce or high-dimensional.
View popular architectures like CNNs and Transformers not as disparate inventions, but as specific applications of a general geometric blueprint.
Recognize the fundamental trade-off between explicitly modeling a problem's structure and relying on large datasets and computation to learn it.
When choosing an architecture, consider the "Hardware Lottery"—a model's practical success may stem from its compatibility with hardware like GPUs as much as its theoretical elegance.
For domains with vast symmetry groups (like permutation in graphs), building the symmetry directly into the architecture is often more effective than trying to learn it via data augmentation.
Don't assume Euclidean space is always the best representation; consider hyperbolic or other non-Euclidean geometries for data with hierarchical or exponential structures.
Understand that the power of deep learning comes from the compositional nature of deep networks, which can represent certain functions far more efficiently than shallow ones.
For real-world applications with imperfect data, aim for "geometric stability" rather than strict equivariance, ensuring models are robust to small perturbations.
Leverage locality as a feature, not a bug. Deep, compositional models with local operations (like CNNs) have proven to be extremely effective.
In scientific applications like chemistry or physics, building in known physical symmetries (e.g., equivariance to 3D rotation) can lead to breakthroughs and superior model performance.
Focus on building AI that can effectively abstract, generalize, and solve problems, rather than trying to perfectly replicate the mechanisms of human thought.

Audio Brief

Episode Overview

Key Concepts

Quotes

Takeaways

More from Machine Learning Street Talk

The Mathematical Foundations of Intelligence [Professor Yi Ma]

AI That Can Change Its Mind? (New Architecture) [w/ Sakana CTO]

Intelligence as "Less is More" - Prof. David Krakauer [SFI]

Why Humans Are Still Powering AI [Sponsored]