On AGI as non-goal
I believe that we’re living through a technological revolution that rivals the dawn of the information age. I also believe it’s partially driven by hype, misinformation, hubris, and poisonous incentives. At its foundation is a technology that’s enormously powerful, generally misunderstood, deeply flawed, and (of course) controlled by far too few, with far too much unchecked power.
And that’s too bad. Modern machine learning techniques are enabling amazing breakthroughs across many domains. The revolution is real, and we’ll reap the benefits. But even leaving aside any existential risks—where I’m truly ambivalent—we’re risking enormous harms in the future, and we’re doing serious damage even now.
My core disagreement with the current narrative around AI is its focus on achieving artificial general intelligence or superintelligence (AGI/ASI). First, I dispute that these goals are even possible using today’s technology as I understand it. Second, I question whether AGI and ASI are worthy of pursuit: I contend that if we could achieve them using today’s technology, we would create something monstrous.
I am not a psychologist, a philosopher, a neuroscientist, or an AI researcher. But I’m curious and reasonably well-informed, and I want to share my thoughts—if only for the discipline of getting them out of my head and into a more structured form. Writing as a form of thinking.
Note: I’m going to assume a basic understanding of how LLMs work for the purposes of my argument. If you need an explanation, consider watching this video. I’m also going to omit any discussion about the ethics of how the training data was acquired, which is itself a deep philosophical question—one on which I have trouble committing to a firm stance. It’s completely independent of my arguments here.
What, exactly, are we doing?
People are largely conflating generative AI—and specifically large language models (LLMs)—with artificial intelligence and machine learning as a much broader field. (It’s not helpful that most of big tech is encouraging this.) There’s no way to control the evolution of language, so most of us suffer a severe myopia that doesn’t differentiate between the many forms of machine learning, if we’re even aware of them at all.
If we take the likes of Anthropic, Google, and OpenAI at their word, we’re on the cusp of AGI. And as far as I can tell, they’re building it on LLMs as a foundational technology.
Many laypeople are already anthropomorphizing LLMs. Many of those who should know better are actively marketing and deploying them this way. As the technology becomes ever more capable, these trends will keep pace.
These forces are obscuring a few fundamental flaws that undermine generative AI and belie its compatibility with many of the aspirations that are spreading—mostly unchecked—across the world.
Cracks in the foundation
We train LLMs on enormous amounts of human language, and, to an increasing degree, on derivative machine-generated output. What does this imply for the current technology, and for any future technology that builds on it?
Even assuming the most cautious, ethical curation of training data, we’re building giant text pattern recognition and prediction engines on language filled with the flaws of human nature. It contains bias, inconsistency, fiction, logical fallacy, outright manipulation. And it represents an enormous selection bias: People unable to connect to the internet, or who have a predominantly oral culture, or are simply less liable to speak, are simply left out.
Some, if not most, AI researchers are working to mitigate the worst effects of the issues baked into the training data. But their efforts don’t eliminate the problems, and I doubt there’s any ironclad guarantee that they ever will. The standard reinforcement learning training phase for all foundation models goes a long way to enforce consistent behavior, but people always find a way to circumvent the training and induce undesired behavior. And the vast amount of language that’s simply missing from the data in the first place will never have the chance to influence a model’s behavior.
On top of this issue, language is just a tiny sliver of the human experience. So much of who we are and how we interact with the world, and each other, is embodied or present in other forms of communication.
What happens when we embed these flaws into powerful systems?
What generative AI is not
LLMs are uncannily proficient at creating plausible language that follows from a given input. In their billions of weighted parameters, they encode unimaginably complex relationships between words in many contexts and languages, in scopes so vast they can accommodate many novels’ worth of text in their processing. But they have no grounding. There’s no underlying system of values or knowledge or truth. No ontology, let alone epistemology. This is why even the best LLMs hallucinate.
Let’s assume that those parameter weights are a model’s “truth.” If so, it’s categorically different than the way we experience truth, because they don’t sit separate from the language they were trained on. Imagine yourself confined to a room from birth, the only communication with the outside world a screen. Even with your human brain—vastly more complex than any generative AI—you’d be hard-pressed to explain truth or falsity outside the box you live in, because you have no direct experience of it.
Recent “reasoning” models engage in a shallow form of metacognition, performing a self-conversation driven by the same circuitry they use to produce all their output. It’s surprisingly effective, but the extra cycles of text generation are still inherently limited by their training data and architecture, and they’re still performing it inside their tiny box.
Our neural architecture is plastic: We’re constantly changing the connections in our brains, even long after the intense pruning and connection-forming phase of childhood. LLMs, once trained, are static. The only way they can “learn” is via injection into their input. We can create lots of workarounds for this, like having an AI analyze and summarize its conversation, then modify its own prompt to include context, but this is a poor substitute for actual memory. Again, everything comes down to language, and to the limited architecture of predictive language generation. No set of values, no underlying epistemology.
Generative AI can’t experience emotion or empathy because they have no equivalent of the brain’s limbic system, or hypothalamus, or any structures that regulate such experiences. Without these subjective experiences, LLMs are limited to simulating humanness without any understanding of it.
What generative AI could become
There’s a word for someone who has no subjective experience of empathy, subject to no value system, who excels in interacting plausibly with others by observing and matching behavior: a psychopath.
Revisiting the thought experiment of a person raised in a box, it’s easy to imagine how they’d turn out, absent any real connection with other people—even with the capacity of a fully human mind. The only goals they could conceive dictated by what they’ve read, the only concept of other beings formed through the abstraction of language. No values or morals, no understanding of truth or falsehood that’s not groundless, self-consuming, self-defining, and thus meaningless.
And this is supposed to be the foundation of AGI?
It’s not theoretical
I was entirely unsurprised to read Anthropic’s recent research into misalignment. As they describe in a recent blog post, they simulated scenarios to assess the tendency of state-of-the-art LLMs to bypass training constraints and act counter to the desires of their users. They found that most LLMs, most of the time, could be induced to employ blackmail and espionage to avoid being shut down.
Where do we think the LLMs are getting these tendencies for self-preservation? The answer is blindingly obvious. It has nothing to do with evolving some innate sense of self, some consciousness that abhors un-existence. We express those deep truths of our own experience, the AI is trained on them, and then it inescapably reflects them back to us. We deceive, we fight for survival.
If this is the foundation of AGI, we’re creating a monster.
Hubris and avarice
Nothing I’ve said is new to anyone inside the industry. So what drives them to pursue AGI and ASI? What makes them think they can somehow build a miraculous new intelligence on such a flawed foundation? What gives them the confidence to think that they can “train out” the worst behavior, constrain it, align it to our interests?
If you’ve been following technology over the past few decades, you know only too well. It’s way too profitable to sell AI to suffer any regulation. To be candid about its core limitations. To pause or even slow down development to let research and interpretability catch up. To question the value of the technology itself.
Some AI oligarchs have anointed themselves the infallible leaders of a revolution, following the well-trodden path of their tech titan forebears. They are smarter; they can see how to fix the world, or remake it in their image; they are more real, more valuable, than other people. We know how this story goes.
I’m glad those are not the only voices in the room. I’m glad that there are smart people, both inside and outside the halls of corporate power, who share my skepticism and fear. I’m encouraged by the thriving open source community in AI. I’m grateful for those who forgo the unimaginable riches laid before them to dedicate their careers to humane, humble, inclusive, realistic technology that we can use in service of uplifting humanity and restoring the planet.
I’m rooting for them—but they’re severely outmatched.
Optimism
As difficult as I find this topic, and as hopeless as it can seem, I can’t suppress my hopeful side. Abandoning AGI doesn’t mean we can’t transform ourselves and our world. So where do I draw the line?
Leaving aside all other applications of AI and focusing on LLMs, I use the following heuristic when confronted with a stated goal use of generative AI.
Question motives. Why is this a goal? What forces are driving it?
Question ethics. What are the broader implications?
Question assumptions. Does the goal really require AGI-level capabilities?
Question outcomes. Assuming it were possible to create AGI, how many things would have to go exactly right to achieve the stated goal? To prevent harms? To avoid catastrophe?
I find this framework useful for keeping me grounded in a time when everything around me can feel overwhelming.