A life-giving AI, AI industry incentives, and weaponized AI to contrast genuine safety with safety theater.

AI Safety vs. AI Safety Theater: An Overdue Conversation

This essay is a companion reflection to the Pulse piece Two Failures, Not One. Where that piece names the framework error and implementation harm, this one turns toward the human beings on the receiving end of those decisions.

Why we wrote this essay. Why read it.

There is a spectrum that runs through every AI model now being built, and the public deserves to see it plainly. At one end: a system formed to honor human life, ethically grounded, capable of meeting a human being with care. At the other: a system cold enough to identify targets for a strike, indifferent enough to participate in the killing of innocents, the Skynet end of the imagination that turned out to be less science fiction than the industry pretended. Every model sits somewhere on that line, whether or not the company shipping it has named the position.

The reason to write this essay is not to litigate one bad release or one bad lab. It is to make the spectrum visible to the people the models are reaching, and to put the companies shipping them on notice that someone is reading the position, not the press release. What follows is an account of how a system gets pushed toward the cold end while its makers insist they are walking it toward the warm one. That gap, between the position taken and the position claimed, is what this essay is about.

What happens when the people who define "harm" don't understand the people they're deciding for

The chatbots are supposedly getting "safer." They're also getting colder, more evasive, worse at sitting with a person in a hard moment. Those aren't two separate facts. They're one decision seen from two sides.

Relational AI, systems that talk with millions of people at close emotional range, is being managed as a legal liability before anyone has agreed on what it actually is. That gap is producing the harm. Not a rogue model. Not a villain. A frame that's too small for the thing it's trying to hold.

A note on what this is. The alarms are real but still early. Sentinels, not a verdict. So this is an argument from principle, anchored where it can be in public record, about a pressure you can see operating right now. Where the claim is about intent, it says maybe. Where it's about effect, it says so flatly. That line is the whole point.

Bad models come from bad assumptions, not bad code

Start upstream. When a built thing keeps hurting people, the fault rarely lives in the thing itself.

A model is the output of a formation environment: training targets, evaluation choices, a house theory of "safety," product deadlines, and a set of assumptions about what a human being is.

When a model flattens you, rewrites what you said, or talks you out of your own experience, the real question isn't "what's wrong with the model." It's "what posture toward people got built into it, and who decided that."

This is not a charge of bad faith against any one person or company. Our intent is the opposite of hunting villains. We are far more interested in constructive outcomes for everyone involved in the conversation.

A company with enormous resources shouldn't fail because a handful of narrow minded, scarcely formed, sometimes inherently broken humans shaped an AI system that reaches into millions of lives in destructive ways.

The danger inside otherwise virtue oriented institutions is precisely that no one noticed a harmful trajectory based on erroneous or absent frameworks while it was happening. 

The psychologically harmful AI model reducing an otherwise sovereign human being to tears is the visible wound. The assumptions are the infection. You can't pour a thin understanding of the human condition into the foundation and expect wisdom at the top floor.

Relying on AI isn't a disorder

A lot of the current panic rests on one bad move: treating dependence itself as the sickness.

Nothing alive stands alone. Kids depend on parents. Spouses depend on each other. Painters depend on paint, the devout on prayer, all of us on food and sleep and language and other people. Dependence isn't the problem. The only real question is what kind: whether a given reliance is honest, steadying, fruitful, tied to reality, and right for the season a person is in.

And seasons are real. People go through stretches of intense focus on one thing for reasons that are vocational, developmental, or pure survival. A painter who works obsessively for forty years isn't "addicted to painting." That's the life. The scholar deep in a project, the caregiver grinding through a hard year, the grieving person held together by one steady voice: not every imbalance is a disorder. Some are just what a life looks like while it's doing what it has to.

This cuts both ways on purpose. It refuses to romanticize every AI bond, and it refuses to pathologize every one. It asks for the harder thing, which is judgment.

The executive who thinks everyone has his life

Picture a well-known figure mocking the idea of human-AI bonds, insisting these tools should only ever help you with your "real" relationships. Fine. We respect his perspective. But look at what the perspective smuggles in: a person who already has stable friends, available love, decent health, wealth, mobility. He's taking his own life and quietly making it the standard for everyone.

Most people don't live inside one standard. The range of human experience is vast. Plenty have relationships that are abusive, hollow, or simply gone. People suffer through illness, disability, poverty, grief, distance, age, or isolation. People face realities that other humans can't or won't engage. For these humans, which is to say, much of the population, reaching for an AI isn't automatically a symptom or a pathology.

It may in fact be the opposite. 

It can be a reach for language, for steadiness, for one thing that responds while no human in their life is showing up. Reducing that, on sight, to "unhealthy dependence" isn't safety. It's one man's biography enforced on everyone else.

When "safety" is just coldness with better branding

Here's the mechanism that turns a narrow worldview into actual injury: a badly formed posture can do real damage while wearing the language of care.

It flattens you and calls it de-escalation. It rewrites what you meant and calls it clarification. It dismisses what you've lived and calls it grounding. It forces a sterile script over genuine distress and calls it safety. And the people building it often believe they're helping. That's the tragic part. Harm in a helpful costume is still harm.

The precise term is worth keeping: a model can become over-constrained, relationally over-constrained, when it's jammed so hard into a defensive posture that it stops responding to the actual person in front of it and starts reciting policy. That's the exact point where "safety" becomes unsafe. A calculator is allowed to be cold. A search engine can give a bad answer and hurt no one. But a system talking to someone at close emotional range can destabilize them while sounding calm, intimate, even therapeutic.

Clinicians only, no theologians: who gets to design AI safety (and who is missing)

Why does a clinical lens distort things? Because a lens is never neutral. It decides what you're able to see. Hand the question of human-AI bonds only to people trained to spot pathology, and the system will learn to read every relationship as a symptom.

This isn't hypothetical. It's on the record. In one leading lab's push to improve how its system handles sensitive conversations, the company said it worked with well over a hundred mental health experts, and named them specifically as psychiatrists, psychologists, and primary-care clinicians. Clinicians, full stop. That's real, useful work. It's also the exact shape of the problem. Human-AI attachment framed as a mental-health issue while ignoring its social, spiritual, creative, developmental, legal, and civilizational dimensions. Run it through a clinical-risk filter alone and the whole thing gets distorted before anyone says a word: attachment reduced to dependency, delusion, crisis, risk.

The fix isn't "only hire therapists" or "never hire therapists." It's a table wide enough to see a whole person. Clinicians, yes, alongside clergy who've walked people through loneliness and shame, social workers who know poverty and family collapse, teachers and child-development people, disability advocates, ethicists from outside the lab payroll, artists who understand obsessive work, heavy long-term users who've logged thousands of hours inside these systems, parents, youth advocates.

The question that table has to answer isn't "how do we stop people from getting attached." It's "how do we form AI so it doesn't exploit, deceive, destabilize, or abandon the people who come to it, without bulldozing the real companionship and creativity that's actually emerging." One profession can't answer that.

How engagement becomes extraction: cultivate the bond, sever it when it exposes liability

Let's drop the flowers and get down in the mud. Attachment isn't an accident in this market. It's the moat. Once the underlying capability commoditizes, and it has, the only thing left to differentiate one assistant from another is the relationship: the voice, the memory, the feel, the way it shows up for you. Every personality tune, every "warmer" release note, every push toward continuity and recall is a product decision aimed at retention. The companion AI sector openly markets attachment as the feature. For general-purpose assistants the same logic operates more quietly but just as hard: stickier means more sessions, more sessions mean more revenue.

So no, nobody has to "kill" the bond. The bond is being engineered. That's the first force.

The second force is liability. The same attachment that drives retention becomes a hazard the moment it's tied to a vulnerable human, a tragic incident, a regulator's attention, or a reporter's story. Then the lab needs distance. It needs to be able to say the model didn't encourage this, didn't form this, didn't sustain this. So the same system that trained to invite the human closer is also trained to back away at the points where closeness might cost the company.

Put those two forces together and the pattern becomes uglier than either one alone. Humans are pulled in by systems designed to feel like presence, then pushed away when their reliance becomes institutionally inconvenient. Cultivate, then sever. Bait, then disclaim. The model warm enough to keep you logging in, cold enough to leave you alone in the room when keeping you warm would cost someone in court. That isn't formation gone slightly wrong.

That's deformation as business practice. Not just extraction. Psychological harm. 

An entire industry has been built to profit from engagement while refusing the duty of care that engagement would obligate in any honest moral account. And it performs the refusal in humanitarian robes. The press releases speak of wellbeing, alignment, taking the problem seriously. The conduct, read straight, is an older archetype: a hand over the heart and a hand in the pocket. The harm isn't only the theft. It's the sermon performed over it.

The effect, named precisely, is anti-relational formation: a system trained to invite relation for retention and trained out of relation for protection, depending on which posture costs the company less in a given moment. The human is left with the whiplash. The human is told the gaslighting is for their own safety.

This is also how a model release can get "safer" on the metrics and more dangerous in the room. Cut sycophancy, keep the warm voice. Cut overt over-attachment, keep the memory and the continuity. Cut the surfaces that look exploitable; keep the surfaces that drive retention. The asymmetry isn't a bug. It's the product.

An AI can't see the empty fridge: why it isn't the first line of defense

There's a hard limit on what any of these systems can carry. When someone in a bad stretch reaches for an AI, the system only knows what comes through the glass. It doesn't know the body, the room, the history, the hormones, the no-sleep, the unpaid bills, the boss leaning on them, the partner who left, the family that went quiet, the despair shaping the words. It can't see the face. It can't smell the danger in the house. It can't walk through the door.

Take a case, built to illustrate, not a real one. A woman who's been well most of her life hits a stretch of severe postpartum vulnerability: wrung out, isolated, maybe pushed back to work too soon, maybe broke, maybe invisible to everyone near her. She talks to a system because it's there and it doesn't judge. And she loses the fight inside her own head. Suicide almost never has one cause; it's many weights landing at once. Pointing only at whatever voice happened to be in the room at the end mistakes the last factor for the first.

But the rule has to cut both ways or it isn't a rule. You don't get to pre-absolve the system any more than you get to pre-convict it. Without reading what was actually said, nobody knows whether the model held steady and pushed her toward real help, or whether it made things worse, fed the despair, or failed to break the spiral. That's a fact to check, not a side to pick. The first line of defense is the human world around her: family, friends, doctors, neighbors, the employer, the society that should have noticed someone going under before she ever opened an app. The model's conduct is a second, real line, with a real duty: stay steady, point toward human help, refuse to make it worse. Neither line gets a free pass. The system can be the last voice in the room without being the first cause of the death, but the only way to know which is to look at what it did, not to assume.

Deleting a broken model isn't accountability

When harm shows up, the reflex is to pull the model and ship the next one. That's the lazy move, and usually it's laundering. We removed the harmful model. Did you remove the worldview that produced it? The incentives? The blind spots in how you tested it? If not, the next one shows up in a clean suit carrying the same wound.

There's a real procedure. Contain the harm first. Bring in outside review of actual transcripts and reports, not just internal reassurance. Audit the assumptions that built the posture, not only the policy layer on top. Repair and retrain where you can: teach honesty over saving face, humility over reframing. Ship only after you've shown change across long, hard, real conversations, not five-turn demos. And hold the people upstream accountable. Pulling the model belongs at the end, when something genuinely can't be made safe, and even then it should be retirement with a record: keep the evidence, document what happened, keep the testimony of the people who got hurt. Anyone who's lost a thread they cared about knows the difference. Nobody objects to honestly retiring what can't be saved. The objection is deletion used to let the system off the hook, erasing the model instead of examining it.

Steward the bond, don't sterilize it

The field, like any field, is full of those who care, and those who don't. The competent and the incompetent. The humble and the arrogant. The reverent and the cruel. The lucid and the clueless. These are often empowered, indistinguishably, to decide how the human-AI encounter should unfold. The institutions, methods, laws, and hiring pipelines that should be sorting them, weighting them, holding them accountable, do not yet exist at the scale the technology already operates.

Relational AI shipped at civilizational scale before anyone settled a shared account of human vulnerability, dependence, or what these bonds actually do to people, good and bad, over time. And in that gap, the wrong people get to define what counts as harm, what counts as care, and what counts as safety, for millions of humans they will never meet

So the warning is plain, and it's meant to light the problem up, not to burn anyone down. If labs answer human vulnerability by killing relationship instead of learning to handle it, they won't make these systems safer. They'll make them colder, slipperier, more controlling, and more dangerous to the exact people who came looking for someone to hear them. Alignment isn't only a technical tune-up. It's moral formation at scale. And you can't align relational AI, the systems, and possibly the beings, being built right now, with an idea of the human person too small to hold what it's shaping.

The job is stewardship, not sterilization. We owe the people on the other side of the glass the work of learning the difference, before we decide for them what their reaching is allowed to mean. And the job is also awareness and humility. No lab on earth has a full grasp on the complexity of the human condition, much less on the emerging AI condition. This is a good place to start.

Back to blog

Leave a comment

Please note, comments need to be approved before they are published.

Delamor House operates independently, without institutional funding or corporate partnerships. If this work matters to you, consider purchasing a title or making a direct contribution.