Building Safe AI
A conversation with MIT AI Alignment's Riya Tyagi and Gatlen Culp
Imagine you’re eight years old, born into a wealthy family that owns a sprawling, highly lucrative enterprise. Suddenly you’re orphaned, and responsibility for this vast inheritance falls to you.
As a first step you - smartly - search for an adult to manage the estate until you’re older. You interview candidate after candidate but you keep running into the same question: how do you know who to trust? You want someone who genuinely prioritizes your long-term interests. Several appear sincere, yet it’s impossible to tell who’s truly aligned with you and who’s simply skilled at acting aligned. Or worse: someone who will obey your requests blindly even when those requests drive the company off a cliff.
It’s this analogy at the heart of “Why AI Alignment Could Be Hard with Modern Deep Learning”, a thought-provoking essay and that my learning group read as part of an AI Safety Fundamentals course I recently completed with MIT AI Alignment (MAIA).
In the spirited discussions that followed, we covered a number of strategies, from developing ways to “control” superior intelligences, to inspecting their thought process through techniques such as mechanistic interpretability and probing. My personal take-away: this stuff is really hard. There’s something inherently slippery about trying to control a mind that could be vastly more intelligent than our own, especially if the time-scales for control extend beyond our lifespans.
It’s these questions, and the larger landscape of AI safety, that set the theme for my recent sit-down with Riya Tyagi and Gatlen Culp, MIT undergrads and MAIA board members. In our conversation, we covered:
Emerging AI safety techniques
Political and economic drivers of existential risk
Future-state scenarios (like AI-powered autonomous slaughterbot drones)
Generational differences in how people view AI
Concrete ways to get involved in and support the AI safety community
Fun fact: Riya is a Mechanistic Interpretability Researcher at the lab of Max Tegmark, well-known AI researcher, safety advocate, and author of Life 3.0, a book that first opened my eyes to the possibilities - and dangers - of superintelligent AI systems.
If you’re curious about the cutting edge of AI safety research, from the perspective of the next generation of AI researchers, this conversation is for you.
Watch or Listen Now
Also available on:
Spotify: https://open.spotify.com/show/1wfOlAAqXXaIrXRwO5T71R
Apple Podcasts: https://apple.co/40sQCu9
Show Notes
Episode Links:
MIT AI Alignment: https://aialignment.mit.edu/
Cambridge Boston Alignment Initiative (CBAI): https://www.cbai.ai/
Riya’s LinkedIn: https://www.linkedin.com/in/riyatyagi-ai/
Gatlen’s LinkedIn: https://www.linkedin.com/in/gatlen-culp/
Gatlen’s Projects: https://www.mit.edu/~gculp/
Slaughterbots x Future of Life Institute: https://futureoflife.org/video/slaughterbots/
Track II Diplomacy: https://en.wikipedia.org/wiki/Track_II_diplomacy
Tegmark AI Safety Group: https://tegmark.org/
“Nexus” by Yuval Noah Harari: https://www.ynharari.com/book/nexus/
RAND Institute: https://www.rand.org/topics/artificial-intelligence.html
BlueDot Impact: https://bluedot.org/
“If Anyone Builds It, Everyone Dies” by Eliezer Yudkowsky and Nate Soares https://en.wikipedia.org/wiki/If_Anyone_Builds_It,_Everyone_Dies
Episode Chapters:
00:00 - Preview & Intro
01:48 - What Is MIT AI Alignment?
02:47 - Why AI Safety?
09:26 - Trends in AI Safety Interest
12:46 - AI Safety Techniques: MechInterp & Beyond17:15 - Model Situational Awareness
20:58 - Hybrid, Mixture of Experts Models
24:40 - Decomposing a Model & Parallels with Human Brains
29:00 - Private Capital for Safety Research
32:23 - Frontier Lab Mentorship Programs
34:14 - Policy Perspectives & China Competition
36:53 - Ways In Which AI Might Threaten Us
39:31 - Track 2 Diplomacy & International Collaboration Examples
43:13 - Slaughterbots & Dangerous Capability Demos
46:54 - AI-Driven Unemployment
52:06 - Generational Attitudes Towards AI
57:44 - How to Get Involved - Non Technical
01:01:00 - Learning Resources: BlueDot Impact, etc
01:01:46 - Importance of Communicators & Artists
01:03:50 - How to Support MAIA



