The coming AI security crisis (and what to do about it) | Sander Schulhoff

L
Lenny's Podcast Dec 21, 2025

Audio Brief

Show transcript
This episode challenges the efficacy of current AI security guardrails, advocating for a return to robust cybersecurity fundamentals to contain powerful AI agents. Three key takeaways emerged from this discussion. First, AI guardrail products are fundamentally ineffective, offering a false sense of security. They are easily bypassed, proving no meaningful defense against determined attackers. The analogy is drawn that you "can't patch a brain" like a software bug. Second, any AI agent with real-world capabilities must be treated as an inherently untrustworthy, potentially malicious entity. Security architecture should strictly contain it using external controls, rather than relying on its internal safety mechanisms. This is the "angry god in a box" model. Third, prioritize robust, classical cybersecurity measures, especially strict permissioning and access controls. This is the primary and most effective defense for AI systems, far superior to AI-specific security tools. It requires a "cybersecurity plus" approach, combining traditional expertise with AI understanding. The podcast highlights that automated AI red teaming services readily expose vulnerabilities in current guardrails. The vast, infinite attack surface of large language models makes statistical security claims insignificant. Experts confirm no meaningful progress has been made in solving core AI safety issues like prompt injection or jailbreaking at the model level. Relying on such guardrails offers no added defense and definitely does not dissuade attackers. The "angry god in a box" mental model advises builders to assume the AI actively wants to cause harm. This shifts design focus entirely to how to keep such a powerful, malicious entity contained. It means designing systems where the AI's actions are stringently limited by external, traditional security protocols. The real solution lies in foundational cybersecurity, not "snake oil" AI products. This involves implementing strong permissioning for AI agents to control their data access and actions down to the absolute minimum required. For simple chatbots without external capabilities, traditional security measures are sufficient and more effective than deploying ineffective AI-specific tools. A significant market correction is anticipated, exposing the ineffectiveness of many current AI security offerings. Ultimately, securing AI requires leveraging established cybersecurity expertise, implementing stringent permissioning, and maintaining deep skepticism towards quick-fix AI security products.

Episode Overview

  • The episode's central thesis is that current AI "guardrails" are fundamentally broken and provide a false sense of security against determined attackers.
  • It deconstructs the flawed business model of many AI security companies, which sell automated "red teaming" services and ineffective guardrails to address the vulnerabilities they find.
  • The discussion draws a critical distinction between traditional cybersecurity and AI security, using the analogy that you "can't patch a brain" the way you can patch a software bug.
  • Instead of relying on flawed AI-specific products, the episode advocates for a return to foundational cybersecurity principles, especially strict permissioning, to contain AI agents.
  • It introduces a powerful mental model for developers: treat any AI agent with real-world capabilities as a contained, "angry god" that actively wants to cause harm.

Key Concepts

  • Adversarial Robustness: The field focused on an AI system's ability to defend against malicious inputs or attacks designed to make it behave in unintended ways.
  • AI Guardrails: Safety measures intended to prevent AI models from generating harmful, biased, or inappropriate content. The core argument is that these are fundamentally ineffective.
  • Attack Success Rate (ASR): A common metric for measuring adversarial robustness by quantifying how often an attack successfully bypasses a system's defenses.
  • "Patching a Brain" vs. "Patching a Bug": An analogy highlighting the core difficulty of AI security. Unlike fixing a deterministic software bug, "patching" an AI model's vulnerability often fails to address the underlying issue in the neural network.
  • Infinite Attack Surface: The concept that large language models have a virtually limitless number of potential inputs that could be used to bypass their safety features.
  • "Angry God in a Box": A mental model for AI agent security that advises builders to treat the AI as a malicious, powerful entity that must be strictly contained through external controls.
  • Classical Permissioning: The use of traditional, robust, and strict access control systems as the primary and most effective defense for AI agents that can take actions.
  • Cybersecurity Plus: The necessary intersection of traditional cybersecurity expertise and AI-specific knowledge required to build genuinely secure AI systems.
  • CAMEL Framework: A conceptual framework for dynamically managing an AI agent's permissions, restricting its capabilities to only what is necessary for the user's immediate intent.

Quotes

  • At 0:04 - "AI guardrails do not work. I'm going to say that one more time. Guardrails do not work." - The speaker emphasizes the core problem with current AI security measures, stating that they are fundamentally ineffective.
  • At 0:20 - "'The only reason there hasn't been a massive attack yet is how early the adoption is, not because it's secure.'" - Quoting AI expert Alex Komoroske, the host highlights that the lack of major AI-driven security incidents is due to timing, not robust security.
  • At 0:25 - "You can patch a bug, but you can't patch a brain." - This analogy explains the core difficulty in securing AI models compared to traditional software.
  • At 0:45 - "Not only do you have a God in the box, but that God is angry. That God's malicious. That God wants to hurt you." - The speaker intensifies the "God in a box" analogy, framing the security challenge as controlling an actively hostile AI.
  • At 1:06 - "The art and science of getting AI systems to do things that they should not do." - The host provides a simple definition for the field of adversarial robustness.
  • At 25:02 - "So ASR is the term you'll commonly hear used here. And it's a measure of adversarial robustness. So it stands for Attack Success Rate." - The speaker defines the central metric used to quantify the security of AI systems.
  • At 30:17 - "AI red teaming works too well. It's very easy to build these systems and they just, they always work against all platforms." - Sasha identifies the first major issue with current AI security practices: red teaming is too effective at finding flaws.
  • At 31:27 - "the number of possible attacks is one followed by a million zeros." - He illustrates the near-infinite attack surface for a large language model, explaining why statistical security measures are not truly significant.
  • At 57:00 - "I myself would not deploy guardrails. Uh, it doesn't seem to offer any added defense. It definitely doesn't dissuade attackers. There's not really any reason to do it." - Sander gives his direct and unequivocal opinion on the ineffectiveness of AI guardrails.
  • At 58:21 - "Imagine this agent service that we just implemented is an angry god that wants to cause us as much harm as possible... Using that as a lens of, okay, how do we keep it contained so that it can't actually do any damage?" - Lenny introduces a powerful mental model for thinking about AI agent security.
  • At 58:41 - "AI researchers are the only people who can solve this stuff long-term, but cybersecurity professionals are the only one who can... solve it short-term, uh largely in making sure we deploy properly permissioned systems." - Sander distinguishes between long-term research problems and immediate, practical security measures.
  • At 60:07 - "Make sure you're running just a chatbot. Get your classical security stuff in check. Get your data and action permissioning in check." - Sander provides his core, practical advice for teams, emphasizing traditional security over AI-specific tools.
  • At 71:39 - "We want to scare people into not buying stuff." - Sander humorously states his goal is to educate people on the ineffectiveness of current AI security products to prevent them from wasting money.
  • At 72:31 - "In my professional opinion, there's been no meaningful progress made towards solving adversarial robustness, prompt injection, jailbreaking... in the last couple years since the problem was discovered." - Sander gives a stark assessment of the current state of AI security research.
  • At 83:27 - "I think there's going to be a big market correction there where the revenue just completely dries up for these guardrail and automated red teaming companies." - Sander predicts a market collapse for AI security companies selling products he believes do not work.

Takeaways

  • Stop investing in and relying on AI guardrail products, as they provide a false sense of security and are easily bypassed by attackers.
  • Treat any AI agent with the ability to take actions as an inherently untrustworthy and potentially malicious entity that must be strictly contained.
  • Prioritize and implement robust, classical cybersecurity measures, particularly strict permissioning, as your primary line of defense for AI systems.
  • For simple chatbots with no external actions, focus on foundational security rather than wasting resources on ineffective, AI-specific security tools.
  • Invest in educating your team on the intersection of traditional cybersecurity and AI, rather than purchasing "snake oil" security products.
  • Use the "angry god" mental model to guide the architectural design of AI agent systems, focusing entirely on containment and damage limitation.
  • Be skeptical of the current AI security market, as a major correction is anticipated that will expose the ineffectiveness of many popular solutions.
  • Understand that core AI safety problems like jailbreaking are largely unsolved research challenges; focus your efforts on practical containment instead of trying to solve them at the model level.
  • Apply conceptual frameworks like CAMEL to design systems that dynamically restrict an agent's permissions to the absolute minimum required for a given task.