If You Can't See Inside, How Do You Know It's THINKING? [Dr. Jeff Beck]

M
Machine Learning Street Talk Jan 25, 2026

Audio Brief

Show transcript
This episode explores the intersection of physics, mathematics, and artificial intelligence, focusing on how physical symmetries and energy-based models can create more robust AI systems. There are four key takeaways from this conversation. First, the industry is moving toward latent prediction rather than pixel-perfect generation. Current generative models often waste vast computational power attempting to predict high-frequency noise, such as individual pixels. A superior approach, exemplified by Joint-Embedding Predictive Architectures or JEPA, compresses inputs into an abstract latent space. This mimics human cognition by predicting the semantic state of the world rather than the precise arrangement of photons. Second, geometric deep learning offers a massive efficiency gain by baking in the laws of physics. Instead of forcing a neural network to learn fundamental rules from scratch via massive datasets, geometric deep learning mathematically incorporates natural symmetries and invariances. For instance, a model should know a priori that an object remains the same even if rotated. This inductive bias allows models to represent the physical world with significantly greater accuracy and less data. Third, data scientists must be wary of blind dimensionality reduction. Standard techniques like Principal Component Analysis or PCA often discard low-variance dimensions under the assumption that they contain little information. However, in biological and neural data, the critical manifold structure often resides precisely in these low-variance dimensions. Pre-compressing data blindly runs the risk of throwing away the very structural signal the model is trying to learn. Fourth, the conversation reframes intelligence as a spectrum of agency defined by complexity. From a Free Energy perspective, the difference between a rock and a human is a matter of degree, not kind. Because we cannot see inside a system to prove it is planning, we adopt an Intentional Stance, modeling systems as if they are agents because it is the most efficient way to predict their behavior. This suggests that future AGI will likely be a modular collective of specialized tools working together, rather than a single monolithic brain. Ultimately, this discussion highlights that the next leap in AI capability lies in aligning mathematical architectures with the fundamental structures of the physical world.

Episode Overview

  • Explores the intersection of physics, mathematics, and artificial intelligence, specifically how Geometric Deep Learning incorporates physical symmetries into AI models.
  • Examines the philosophical and mathematical definitions of "agency," proposing that the difference between a rock and a human is a matter of degree (complexity) rather than kind.
  • detailed technical discussion on Energy-Based Models (EBMs) and JEPA (Joint-Embedding Predictive Architecture) as superior alternatives to standard generative models.
  • Discusses the future of AGI and safety, arguing for modular, specialized intelligence over monolithic systems and using Inverse Reinforcement Learning for alignment.

Key Concepts

  • Geometric Deep Learning & Inductive Biases Instead of forcing a neural network to learn the laws of physics from scratch via massive datasets, Geometric Deep Learning mathematically "bakes in" natural symmetries (invariances). For example, a model should know a priori that an object remains the same object even if rotated. This inductive bias makes models significantly more efficient and accurate when representing the physical world.

  • The Spectrum of Agency & The "Black Box" From a Free Energy perspective, there is no hard line between an object and an agent. A "rock" has a simple, reactive policy, while a "human" has a complex policy involving long time scales and internal states. Because we cannot see inside a system to prove it is "planning," we adopt an "Intentional Stance"—we model systems as if they are agents because it is the most efficient way to predict their behavior.

  • Energy-Based Models (EBMs) Standard neural networks optimize weights to map Inputs to Outputs. EBMs are fundamentally different because their cost function depends on Inputs, Outputs, and Internal Latent States. This allows the model to capture complex dependencies and multiple correct answers. Minimizing "Free Energy" in this context is essentially minimizing energy while maintaining an entropy penalty (uncertainty), which acts as regularization to prevent the model from collapsing into a single, narrow prediction.

  • Prediction in Latent Space (JEPA) Current generative models often waste computational power trying to predict high-frequency noise (individual pixels). The JEPA architecture (Joint-Embedding Predictive Architecture) compresses inputs into an abstract "latent space" (concepts) and performs predictions there. This mimics human cognition: we predict the semantic state of the world, not the precise arrangement of photons.

  • The Manifold Trap (PCA Risks) A critical concept for data science is understanding that "low variance" does not mean "low information." Standard techniques like PCA discard low-variance dimensions. However, in biological and neural data, the most critical signal—the manifold structure—often resides in these low-variance dimensions. Pre-compressing data can accidentally throw away the structure you are trying to learn.

  • Modular vs. Monolithic Intelligence Intelligence is framed not as a single "general" algorithm, but as an emergent property of specialized modules communicating effectively. Just as the brain has distinct circuits for vision and motor control, future AGI will likely be a collective system of specialized tools working together, rather than one giant "brain."

Quotes

  • At 0:40 - "If you want to have a good model of the world as it actually is, it should incorporate those features [symmetries]. You can discover it in a brute-forcey way, but the mathematician in me really wants to build the symmetries in." - Explaining the core philosophy of Geometric Deep Learning.
  • At 2:02 - "If your definition of an agent is something that executes a policy, then anything is an agent. A rock is an agent." - Highlighting the difficulty of defining agency without referencing internal processing or counterfactuals.
  • At 9:52 - "Science is about prediction and data compression, and nothing else." - Providing the pragmatic justification for why we treat complex systems as 'agents'—it compresses the data required to predict them.
  • At 14:04 - "In a traditional neural network... the cost function is just a function of the inputs and the outputs... In an energy-based model, there's another thing that the cost function operates on, and that's... one of the internal states of your model." - Clarifying the technical distinction that allows EBMs to handle more complex reasoning.
  • At 25:24 - "It turns out that in neural data, the dimensions in which there is very little variability are some of the most important dimensions. And so pre-processing with PCA runs a risk of throwing out the most valuable information in your dataset." - A warning against standard dimensionality reduction techniques in complex systems.
  • At 44:35 - "It was really about... interconnected highly specialized intelligences... and their ability to learn how to work together... that gave rise to the technological revolution." - Framing intelligence as a collaborative network of modules rather than a singular entity.
  • At 45:58 - "Your reward function is the one that results in the same outcome that we currently have... You perturb that distribution over outcomes a little bit, and then you evaluate the consequences." - Proposing Inverse Reinforcement Learning as a safer path to AI alignment than explicit instruction.

Takeaways

  • Shift to Latent Prediction: When designing predictive models, move away from pixel-perfect reconstruction (generative) and focus on predicting transitions in the latent space (embedding). This focuses the model on semantic understanding rather than noise.
  • Avoid Blind Dimensionality Reduction: Be extremely cautious when using PCA or similar techniques to pre-process data for neural networks. You are likely discarding the manifold structure (the "signal") because it often looks like low-variance data compared to the noise.
  • Evaluate Agency via Counterfactuals: To determine if a system is truly "intelligent" or just a complex reflex machine, look for evidence of counterfactual processing—is the system simulating futures that don't happen to inform the one that does?
  • Align via Observation, Not Instruction: For AI safety, do not rely on hard-coded objective functions (which are prone to loopholes). Instead, utilize Maximum Entropy Inverse Reinforcement Learning, where the system infers values by observing the stable states of human society.