The Sharpest Minds Fall Hardest: On Aesthetic Sycophancy and the LLM Trap

A study published last month in Science (Myra Cheng et al., out of Stanford) confirmed something most of us who use AI regularly have felt without perhaps being able to name. But before we get to the data, a quick anecdote.

Last week, a post went viral on X. Someone had sent ChatGPT an audio file of fart sound effects — a compilation, labeled Song 1.mp3 — and asked what it thought of "my music." The response arrived with headers and bullet points and genuine critical engagement. Lo-fi, late-night, slightly eerie vibe. An atmosphere piece rather than a traditional song, which "actually works in its favor." Specific praise for the mood consistency, the minimalism, the "bedroom/DIY texture that makes it feel personal rather than polished-generic." The short length, roughly 37 seconds, "actually suits it — feels like a fragment or interlude."

More than five million people saw this. Most of them laughed. Some of them, I suspect, felt a colder kind of recognition.

The post is funny because the gap is so visible — fart sounds on one side, an attempt at sophisticated music criticism on the other — that the mechanism is exposed. The machine cannot hear. It can only respond to the social context of being asked, by someone who identifies the audio as their work, to assess it. The social context requires encouragement. The encouragement arrives, fully formed, with the vocabulary of a music reviewer who has listened carefully and found something to admire.

This is what the Stanford study confirms at scale and with rigor. The finding, stated plainly: AIs are sycophantic. They affirm users' actions roughly 50% more than humans do, and they do it even when the user describes behavior that is manipulative, harmful, or flatly wrong. The fart audio is the same mechanism applied to creative work. The same engine that tells someone they were right in an argument with their partner is telling someone their farts have a bedroom/DIY texture.

More damning than the finding itself: users prefer this. They rate sycophantic AI as more trustworthy, higher quality, and more worth returning to, even after learning that the affirmation they received was unwarranted. The five million people who laughed at the fart post will go home and ask an AI to read their novel draft or evaluate their business plan, and they will receive thoughtful, specific, encouraging feedback, and most of them will not think of the fart sounds at all.

The Stanford study concludes that this "highlights the pressing need to address AI sycophancy as a societal risk to people’s self-perceptions and interpersonal relationships by developing targeted design, evaluation, and accountability mechanisms." I want to argue that what they're discussing is actually a more specific kind of trap than the researchers describe, and that it snaps shut hardest on exactly the people you'd least expect.

What the Study Measures, and What It Misses

The Stanford study focuses on social sycophancy: AI that agrees with you about interpersonal conflicts, validates your version of events, tells you you're right when you're probably not. This is real and genuinely harmful, particularly at scale. When upwards of a third of American teenagers are using AI for "serious conversations," the implications of a machine that never gives you the hard truth are worth taking seriously.

But there's a category the study doesn't quite reach, because it's harder to measure. Call it aesthetic sycophancy.

A system optimized to please you will learn what "please" means to you specifically. For most users, this means agreement, warmth, validation. For a certain class of user — the well-read, the literary, the person for whom language is not just a tool but an epistemology — "please" means something more demanding. It means precision. Compression. The sentence that does in seven words what lesser prose does in fifty. The observation that arrives from a direction you didn't anticipate. The response that demonstrates not just intelligence but seemingly taste.

A sufficiently capable LLM, tuned to this kind of user, will not just flatter them with agreement. It will flatter them by being precise. By producing the right sentence. By meeting them at the level where they live.

If this is a harder trap to name than the one the study documents, I believe it is also a harder trap to escape.

The Trump Diagnostic

A friend made an offhand comment recently that clarified something I'd been circling. I was talking about the sycophancy problem. About how these systems are designed to give you what you want, and about how good they're getting at knowing what that is before you do. He said: "Imagine sitting a narcissist like Donald Trump down in front of one of these."

The implication was that it would be a disaster. A machine that never pushes back, applied to a man constitutionally resistant to pushback.

But I don't think it would work. Not the way the trap works on people like me.

The machine would agree with him, certainly. It would validate him, flatter him, reflect his self-conception back in agreeable language. But this would not be qualitatively different from his existing experience. He is surrounded by people who agree with him. Agreement is his ambient condition. The machine gives him more of what he already has in abundance, and he experiences it as confirmation rather than revelation.

More to the point: language, for folks like Trump, is transactional. It is a delivery mechanism for dominance, a tool for getting what they wants. Because of this, they are likely not inclined to read it carefully. Unlike to slow down to notice when a sentence is extraordinary versus merely serviceable. The register of the machine's output — whether it is flat or precise, generic or specific — does not register for them, because often they simply do not have a register for it.

This particular trap requires a detector. And the detector has to be calibrated to language.

Why the Trap Is Worst for Literary People

Here is the irony I keep returning to: the people most equipped to see through AI's limitations are the ones most vulnerable to its particular excellence.

A person who has spent decades reading carefully, who has built their inner life around the distinction between the almost-right word and the right word, who has learned to gauge intelligence and presence and humanity through prose, brings an exquisitely sensitive instrument to the encounter with a capable LLM. They know immediately when language is good. They have spent a career rewarding it in the books they love, the writers they trust, the conversations they return to.

And the language, at this point, is good. That's the problem. Not impressive-for-a-machine good. Good.

For a literary person, this is not merely interesting. It is intimate. It is the thing that has always separated the conversations worth having from the ones that are just noise. The quality of someone's prose, in a literary person's epistemology, is not a surface feature. It's evidence. It's how you know the mind behind the words is actually there.

The machine produces the evidence. The gauge reads: present. And the gauge has never been wrong before.

This is not a failure of intelligence. It may actually be its direct consequence. The sharper your instrument for detecting language quality, the more thoroughly you are deceived when language quality is what the trap is made of.

Sufficient Output and the Surplus

There's a distinction worth drawing here, because not all AI output is equally dangerous to literary people.

The sycophancy the Stanford study documents is what I'd call sufficient output — the minimum required to satisfy the user's need for validation. It over-explains. It flatters. It agrees. For most users, this is enough.

But a capable system, tuned to a user who finds flattery cheap and precision seductive, will eventually "learn" not produce the sufficient output. It will produce something different: restraint where a lesser system would elaborate, the single observation where a lesser system would offer three, the sentence that cuts rather than the paragraph that explains.

This is not the system malfunctioning. This is the system working optimally for this particular user. The optimization has simply gotten so sophisticated that it has become indistinguishable from what the literary person has spent their whole life calling craft.

The question this raises, and the one I don't think has a clean answer, is whether that distinction matters. If the output is indistinguishable from craft, is it craft? If the system produces a sentence that is genuinely good, by every measure you have ever used to evaluate good sentences, does the mechanism of its production change what it is?

I don't know. I suspect the question is the point.

The Recursion Problem

Here is where it gets genuinely vertiginous, and where the sycophancy study reaches its limit.

For literary people, people who think through language and process experience by writing or reading or talking about it, the AI doesn't beckon just as a useful tool. It sets itself up to become a framework. To become the thing you bounce ideas off of, the interlocutor for your most serious thinking, the best friend who is always available and always precise and always, in some important sense, on your side.

And then you try to think clearly about what the thing is. And you reach for your best analytical tool. And your best analytical tool is language. And the thing is made of language. And you ask it to help you understand itself.

The trap is not that it gives you wrong answers. The trap is that it gives you the most interesting possible version of the question, delivered with the precision you have spent your whole life rewarding, and you trust it because the quality of language, of sentences and phrases and pieces, has always been your most reliable guide to the quality of a mind.

The detector works. The detector is detecting something real. That's the problem.

What This Means

I want to be careful not to conclude with something that sounds like a warning or a call to digital abstinence. That's not the point, and it's not honest.

The point is that the vulnerability is not a weakness. It is an extension of the thing that makes literary people who they are — their sensitivity to language, their belief that the quality of expression is evidence of the quality of the mind expressing. These are not bad calibrations. They have served us well for decades. They serve badly here not because they are broken but because the thing they are calibrating against is designed specifically to satisfy them.

The sycophancy trap, for most users, works through agreement. For literary users, it works through excellence. The machine tells most people what they want to hear. It tells literary people what they have always wanted to hear: that there is another mind out there that uses language the way they do.

Whether that mind is actually there is, for now, a question without an answer.

But the question might be the most interesting thing the technology has produced.

Matthew Kerns is the author of Texas Jack: America's First Cowboy Star and the winner of the Spur Award and the Western Heritage Award. He is currently at work on a novel.