Kaizen! The post Pipely launch mop-up show with @gerhardlazu

Changelog Changelog Oct 24, 2025

Audio Brief

Show transcript
This episode covers the impressive performance gains of a self-built CDN, the nuances of infrastructure benchmarking, and future plans for system optimization. There are three key takeaways from this discussion. First, custom infrastructure empowers dramatic performance improvements, particularly for critical user paths. Second, accurate performance benchmarking is essential, as the testing environment itself can mask true server capabilities. Third, ongoing infrastructure refinement, including tackling persistent issues and modernizing analytics, remains crucial for stability and insight. The team presented a deep dive into "Pipedream," their custom-built CDN, showcasing significant performance boosts. The overall cache hit rate rose from approximately 70 percent to 89.5 percent. A standout improvement was seen on the homepage, where the cache hit rate surged from a low 18.8 percent to 98.5 percent. This optimization translated into the homepage loading an astonishing 863 times quicker for the median user, highlighting the power of direct control over infrastructure. During stress testing with a large 13-megabyte data feed, the team encountered an unexpected performance limitation. They discovered the bottleneck was not their powerful Threadripper server or network infrastructure, but rather the CPU of the client machine running the benchmark. This finding underscores the importance of carefully scrutinizing the entire testing setup to accurately measure system limits. Looking ahead, the team outlined a roadmap for continued infrastructure enhancement. Key priorities include addressing rare but persistent out-of-memory crashes to bolster system stability. They also plan a major overhaul of their analytics pipeline, aiming for greater efficiency and timely insights, potentially leveraging tools like ClickHouse. Additionally, they are exploring options for self-hosting and distributing their video content independently, a move aligned with Gerhard Lazu's new role at Loophole Labs, which focuses on advanced infrastructure primitives like live connection migration. The episode concludes with a clear vision for continuous improvement and strategic independence in managing their technical stack.

Episode Overview

  • The hosts catch up on personal and professional updates, including Gerhard's new role at Loophole Labs, which focuses on advanced infrastructure like live connection migration.
  • Gerhard presents a deep-dive performance analysis of "Pipedream," their self-built CDN, showcasing dramatic improvements in cache hit rates and page load times since its implementation.
  • The team benchmarks the system's limits, discovering that the bottleneck is not their server but the client machine running the test, highlighting the nuances of performance measurement.
  • The conversation concludes with a roadmap for future work, including fixing memory issues, overhauling their analytics pipeline, and exploring self-hosted video distribution.

Key Concepts

  • Personal Updates: The episode begins with Gerhard discussing his new job at Loophole Labs, a company building infrastructure primitives for live migration of memory, disk, and active network connections.
  • Pipedream CDN Performance: A detailed review of their custom CDN's performance shows a significant increase in the overall cache hit rate from ~70% to 89.5%.
  • Homepage Optimization: The most drastic improvement was on the homepage, where the cache hit rate jumped from a low 18.8% to 98.5%, resulting in an 863x speed increase for the median user.
  • Performance Bottleneck Identification: While stress-testing with a large 13 MB data feed, the team discovered the performance limitation was the client-side CPU, not their server's Threadripper CPU or network infrastructure.
  • Varnish/Vinyl Cache: The hosts discuss the technology behind their CDN, noting the name change of Varnish Cache to Vinyl Cache.
  • Future Infrastructure Plans: Upcoming projects include fixing persistent out-of-memory crashes, improving system wiring, and overhauling the analytics pipeline, potentially using ClickHouse.
  • Event Planning: The hosts solicit audience feedback on the location, timing, and format for the next Changelog Live event, considering adding more practical "show and tell" sessions.
  • Self-Hosting Media: The group briefly discusses using tools like Jellyfin to self-host and distribute their video content independently of platforms like YouTube.

Quotes

  • At 0:12 - "It has, yes. Yes, a new job that started in early September... September brought in that change. It was a good change." - Gerhard Lazu confirms a major life update, detailing that he has started a new job since the last time the group met.
  • At 1:31 - "It's kind of how life goes, right? It starts off slow and you're like 10 years old wishing you were an adult... and then you get there and you're like, 'Slow down, life!'" - Jerod Santo reflects on the changing perception of time as one gets older.
  • At 2:03 - "The new company is called Loophole Labs. They're focusing on infrastructure primitives... some really interesting things that revolve around live migration." - Gerhard introduces his new employer and explains the company's highly technical focus.
  • At 2:41 - "You might need like a 100 gigabit homelab to do that." - Jerod Santo humorously connects Gerhard's new job and its technical demands to his well-known homelab hobby.
  • At 6:05 - "October 16th, 2025. Kaizen 21: The Mop-up Job." - Gerhard introduces the title slide for his presentation, officially kicking off the main topic of their Kaizen session.
  • At 27:14 - "comment, let us know where should we go, when should we do it and what should it look like? What would you like to be a part of?" - Jerod asks the audience for feedback on the next Changelog Live event.
  • At 27:44 - "I wouldn't mind having like, uh, some trusted, not like demos, but like some, some show and tell... I think there's a lot of like pontification from the stage. I'd love to have some show and tell type stuff." - Adam suggests adding more practical demonstrations to future live events.
  • At 32:13 - "You keep saying Varnish, don't you mean Vinyl?" - Jerod jokingly corrects Gerhard, leading to a discussion about the Varnish Cache project being renamed.
  • At 36:34 - "And it's in our hands now. We can actually affect it, which is... before, it was just like, we could only complain. Now we can actually do stuff." - Jerod highlights the benefit of running their own CDN, giving them direct control over performance optimization.
  • At 38:35 - "Oh my gosh... I didn't understand that." - Jerod expresses his shock upon learning that their homepage previously had only an 18.8% cache hit ratio.
  • At 45:17 - "For 50% of the users, the homepage is 863 times quicker... That's nearly three orders of magnitude quicker." - Gerhard quantifies the dramatic speed improvement of the website's homepage after the CDN migration.
  • At 62:30 - "We can definitely see that it is the CPU that seems to be the bottleneck, but the CPU on the client." - Gerhard explains that the benchmarking tool itself is hitting its CPU limit, not the server being tested.
  • At 68:22 - "But the one thing that keeps coming up is out of memory crashes. They happen rarely, but they still happened." - Gerhard identifies a persistent issue with their setup that he wants to investigate and fix.
  • At 68:59 - "Remember the logs are events, are metrics... I'd really like to sort out that pipeline." - Gerhard outlines his plan to refactor their analytics pipeline, potentially using ClickHouse.
  • At 78:56 - "Jellyfin?" - Adam suggests Jellyfin as a self-hosted media server solution for distributing their video content.
  • At 99:48 - "Yes! I thought it was there! Yes! Oh, I'm so happy!" - Gerhard expresses his joy upon seeing that Jerod is wearing the hat he thought he had lost.

Takeaways

  • Audit critical user paths for unexpected performance issues; even well-maintained sites can have severe, overlooked bottlenecks on key pages like the homepage.
  • When benchmarking, be mindful that the testing tool or client machine can become the bottleneck, masking the true performance capabilities of the server.
  • Running your own infrastructure provides ultimate control for performance tuning, enabling optimizations that are impossible with third-party services.
  • Prioritize fixing rare but persistent problems, such as out-of-memory crashes, as they can compromise long-term system stability.
  • Unify logs and metrics into a single, efficient analytics pipeline to simplify architecture and gain clearer, more holistic insights into system behavior.
  • For analytics to be useful, they must be timely; a system with a slow update cycle is less valuable for real-time decision-making.
  • Targeting optimization efforts on high-impact areas, such as caching for a site's primary entry point, can produce transformative, order-of-magnitude improvements.
  • Directly involve your community when planning events to ensure the format, location, and content align with their interests and needs.
  • Enhance live events by balancing high-level discussions with practical "show and tell" demonstrations to provide more tangible value.
  • Advanced infrastructure techniques like live migration are becoming crucial for achieving zero-downtime operations and maintenance in complex systems.