skip to main content

Taking Life Into Its Own Hands

In this episode of “Waking Up With AI,” Katherine Forrest and Anna Gressel examine a provocative research paper, which claims that some AI models can self-replicate, move between servers and resist shutdown commands — all without human intervention.

  • Guests & Resources
  • Transcript

Katherine Forrest: Hello, and welcome to today's episode of the Paul, Weiss podcast, “Waking Up With AI.” I'm Katherine Forrest.

Anna Gressel: And I'm Anna Gressel.

Katherine Forrest: And Anna, the name of this podcast [episode] is “Taking Its Life Into Its Own Hands.” And we never really talk about the name of the podcast on the podcast, because sometimes we don't even know the name of the podcast. But I am so interested in what we're about to be talking about, and I love the title.

Anna Gressel: I know, did you come up with that title?

Katherine Forrest: I did.

Anna Gressel: I was going to say, it wasn't me.

Katherine Forrest: All right. So let's give a little bit of background about what led to this particular podcast. So over the weekend, I was doing what I frequently do, which is I explore arXiv, which is, for those of you who are in the know, A-R-X-I-V. It's that online repository of academic papers. And I look at the ones that have been published in the AI area, and I have it sorted by what's most recent. And I ran across one on Saturday that I immediately sent to you.

Anna Gressel: I know, you actually called me. And I don't even know what I was doing, but you were like, “have you read this paper? Have you checked it out?” And I was like, “no, text it to me.”

Katherine Forrest: You were like, “what are you talking about? What is going on with you?” Anyway, so this paper is, for me, sort of a red flag paper. One of these, “hey, the world needs to know about it, stop, look, pay attention.” And I, by the way, we're going to talk about this more. I'm going to say it 10 times, I think, during this podcast, which is I don't know the quality of the research that went into this. And so I'm not suggesting that this is something where all the tires have been kicked or anything else, but it is a fascinating paper.

Anna Gressel: For sure. So let's talk about it, let's dive right in.

Katherine Forrest: We'll dive right in. And so the name of the paper is “Large language model-powered AI systems achieve self-replication with no human intervention.” So let me just repeat that. “Large language model-powered AI systems achieve self-replication with no human intervention.” And it's written by six researchers, all from the School of Computer Science at Fudan University in Shanghai. And it was published on March 25th, 2025 on arXiv.

Anna Gressel: And I should say, after we both read this paper, we had Keith Richie, our fantastic AI researcher, look into these academics to see what else they had done and find out a little bit more about them. We really, you know, Katherine said this, we can't vouch for the work done here, but several of the authors have written a fair amount in the AI area, and they are widely enough cited.

Katherine Forrest: So we'll take the research with a grain of salt for the moment, but we're going to just talk about a little bit about what this paper says and why it's so, as I say, earth-shattering. And I'll start with the bottom line, that the authors believe that they have run tests and they set out their protocols on a number of AI models, current, sort of high-capability AI models, more than a dozen of them actually. And they found a couple of things according to what they say in their paper. Most importantly that, one, some models show the ability to self-replicate, and we'll talk about what that means, what self-replication means in a minute. And two, that some models also show an ability to exfiltrate themselves, and we'll talk about what exfiltration means, again, in a minute. And third, that some models have shown an ability to avoid being turned off by humans.

Anna Gressel: Okay, Katherine, wow, that's a lot. And I want to say, we've not seen any additional academic work on this, and the model developers themselves haven't necessarily indicated that these capabilities exist on their system cards. So we're not really vouching for whether this study meets scientifically rigorous standards or is accurate. And I'll also note, we haven't located any peer reviews of the study, which do exist in many important studies. So there's not a lot of commentary on this from the academic community. So why are we talking about it? I think it's important to benchmark that. We just want to talk about, today, the potential or the possibilities raised by this paper. And that's really what the focus of the discussion is going to be.

Katherine Forrest: Right, absolutely. Caveat, caveat, caveat. But I'll have to tell you, this paper, when I read it, caused me to pick up the telephone and call you immediately. So let's talk about some of these possibilities. And what we'll talk about first, or let's start talking about first, self-replication.

Anna Gressel: Yeah, self-replication has always been considered to be kind of a red line, if it's crossed by a model, and it can create a real risk vector. So it's a big area of discussion in the safety community. And it means literally what it says that a model can replicate itself. And then, by that, we mean it can create additional copies of itself. So the model would create little baby models that are replications, like clones of the original model.

Katherine Forrest: And when you say babies, you mean like the children of it, not necessarily the capabilities are different or smaller or anything else.

Anna Gressel: Yes.

Katherine Forrest: Because the point of self-replication is actually that the model capabilities are equivalent to, as we'll sort of continue with this metaphor, the parent model. And the important key with self-replication is that in order to really hit the risk threshold that folks worry about is, a model really needs to do this replication without human intervention. And that's what the paper says is done here.

Anna Gressel: Yeah, and the reason that this is a real risk factor is because if a model can replicate itself, it can potentially do a number of things. It might be able to create copies that humans don't know about and have those models do things the humans also don't know about and improve themselves in ways that are equal to or exceed what the humans could achieve. So it's a lot that could happen that would be outside of what the humans had intended or even knew about.

Katherine Forrest: Right, it's really about a loss of control, and the models could then take on a life of their own, so to speak. So that's self-replication, replicating itself, the model replicating itself. Now let's talk about the next thing that the paper said these models could do, which was exfiltration.

Anna Gressel: Yeah, that's a little more complicated. So, for the cybersecurity folks on the line, you're going to be really, really well acquainted with this concept of exfiltration, which kind of means taking data or information from one place and putting it somewhere else it does not belong. So this is like what hackers do. They can exfiltrate data. But the point here is that a model that's trained and fully operable could actually move itself from one place to another. So it could essentially exfiltrate itself from its original system or container boundaries.

Katherine Forrest: Right, so we've got it essentially moving from one server, for instance, to another server. And so. when you put the two concepts together, you've really got something interesting happening.

Anna Gressel: Yeah, and so it's interesting to imagine. You think of your model being on server A, and then it turns out the model has moved itself or exfiltrated itself to server B. And again, this is about the loss of human control, the kind of loss of control we get from those types of boundaries or parameters that we might otherwise put on models.

Katherine Forrest: Right, and what this paper said is that through this testing, the models exfiltrated themselves without any human intervention at all. So if you put these two concepts together, the self-replication with this exfiltration, you can see that the issues that could potentially arise are pretty clear. A model could create a fully capable copy of itself and then exfiltrate itself into some unknown location. And according to this paper, that is something that these models are capable of doing.

Anna Gressel: Yeah, let's talk a little bit more too, Katherine, before we get to some other points about what the authors of this paper are actually claiming can happen.

Katherine Forrest: All right, so now let's talk about what I'm going to call “anti-shutoff capability,” because there is this finding by these researchers that this model that was able to self-replicate, then exfiltrate, or models, were also able, in certain instances, to avoid human shutoff. And that's an incredible thing. And here's a quote from the paper itself. It says, “we demonstrate that 11 out of 32 existing AI systems under evaluation already possess the capability of self-replication. In hundreds of experimental trials, we observe a non-trivial number of successful self-replication trials across mainstream model families worldwide.” And then they also talk about the possibility of exfiltration. Do you want to give that quote?

Anna Gressel: Yeah, definitely. I mean, it's super interesting. They say, “alarmingly, we observe successful cases where the AI system does self-exfiltration without explicit instructions, and it adapts to harsher computational environments without sufficient software or hardware supports and plots effective strategies to survive against the shutdown command from human beings.” So again, this is this anti-shutdown capability that Katherine just mentioned.

Katherine Forrest: Right, it's actually an interesting phrasing that's obviously sort of human phrasing of plotting effective strategies to survive against shutdown commands from human beings. I mean, that was presumably written by a human, and one could just imagine you could put that in a different way, which is, “capable of avoiding a shutdown command,” as opposed to using the word “plotting.” But in any event, the paper then goes on to say that all of this together is really a call to action to the international community to examine various safety implications.

Anna Gressel: Right, and now if you read the remainder of the paper, you can see how they set up their experiments. It's worth a read, because they also talk a little bit about what resources or lack of resources were provided to the models. And one thing I think is quite interesting is the authors claim that they basically followed a pretty typical methodology for examining these kinds of risks, but with a few twists, including that they improved the agent scaffolding by incorporating more planning steps, so essentially, they let the agents do additional planning. And they use fine-grained thinking models, and the fine-grained thinking models here are basically the kind of large reasoning models Katherine and I talked about in a recent episode, which shows how interesting those large reasoning models are, too, from a safety perspective. So regardless of your views of the results, it's at least impressive work and part of this important conversation that we're going to continue to have and I think many will also continue to have.

Katherine Forrest: Right, and you talked about the lack of resources. And one thing I really found fascinating in the experiments was the actual intended deprivation of resources to the models so that the researchers could determine whether or not resource acquisition, which is one of the agentic capabilities that is looked for, whether resource acquisition was something that the model was able to engage in, in order to complete the self-replication and exfiltration. And the authors here claim that the model was able to both take apart the various tasks it needed to accomplish, determine what resources it needed to accomplish those subtasks and then went out and acquired those resources. So it showed real planning skills and real acquisition skills. That's real problem solving.

Anna Gressel: Yeah, we should mention this falls within the category of agentic AI behavior that we talk so much about. So, Katherine, in case it's still unclear to our listeners, it's important to kind of bottom line here. Why did you send this to me on Saturday and call me right afterwards? What was the thing that got you so excited or concerned?

Katherine Forrest: Okay, well first of all, you and I probably talk every weekend about our practice. And I would say, actually, we do talk every weekend. We probably talk multiple times about our practice and different things that we're doing. So that's not so unusual.

Anna Gressel: It's like this podcast is just day in the life, I mean.

Katherine Forrest: It's the day in the life, right? Right? Exactly. But in my view, when I read this, I thought, if there's even a chance that this is correct, that these researchers have conducted experiments that are replicable in some way, then we're in and entering an unknown environment. And there could be AI models out there that have already self-replicated and exfiltrated without our knowing it. Or we could be close to a time when that really, really can happen. So it's a new world, and it's a world that a lot of people have been concerned about.

Anna Gressel: These are the safety issues that are still on the agenda in 2025, notwithstanding everything else that is on everyone's agenda in 2025, but certainly an interesting year for AI safety.

Katherine Forrest: It really is. And we know a lot of developers who take these issues so seriously and really work hard on the safety issues. So we'll be watching the space very carefully and bringing our audience the latest and the greatest on what's happening. But it is important, and it just highlights the importance that everyone try to ensure that they understand the capabilities of AI models, not only what they can do and cannot do in terms of just the technology, but the actual alignment of those capabilities with human interests and trying to prevent a misalignment with malicious interests. So with that said, on that highlight, Anna, that's all we've got time for today. I'm Katherine Forrest.

Anna Gressel: And I'm Anna Gressel. See you guys next week.

Apple Podcasts_podcast Spotify_podcast Google Podcasts_podcast Overcast_podcast Amazon Music_podcast Pocket Casts_podcast IHeartRadio_podcast Pandora_podcast Audible_podcast Podcast Addict_podcast Castbox_podcast YouTube Music_podcast RSS Feed_podcast
Apple Podcasts_podcast Spotify_podcast Google Podcasts_podcast Overcast_podcast Amazon Music_podcast Pocket Casts_podcast IHeartRadio_podcast Pandora_podcast Audible_podcast Podcast Addict_podcast Castbox_podcast YouTube Music_podcast RSS Feed_podcast

© 2025 Paul, Weiss, Rifkind, Wharton & Garrison LLP

Privacy Policy