Can Large Language Models Think?

Revisiting the Turing Test in the Age of LLMs

NASA Computer Room (ca. 1958). (https://catalog.archives.gov/id/278195)

In his famous article “Computing Machinery and Intelligence” (1), Alan Turing proposed “the imitation game” as a way to address the question “can machines think?” The basic concern of the imitation game is whether a computer can hold a conversation with a person without the person knowing whether they’re conversing with a person or a computer.
Today, with the growth and deployment of LLMs, it is clear LLMs are capable of having conversations that are accepted by humans as comparable to human conversation. There are many people who would ay that LLMs are intelligent.

But this just returns us to the earlier question: what does it mean to think? Or, alternatively, what does it mean to be intelligent?

Detecting LLMs

For many, LLM detection is important. Teachers, unsurprisingly, are concerned that their students are using LLMs to do homework rather than doing the homework themselves. On Reddit, I’ve seen academics lamenting both that articles they’re reviewing might be LLMs and that the reviews they’ve received on their submissions were generated by LLMs. Such behavior reflects a lamentable decay of academic standards (a decay, IMO, more about the general decline of support for academia than about professors acting in bad faith: professorial workloads are, as far as I can tell, absurdly high).

Many people say that they can recognize LLM output because of clues like use of em-dashes, bullet lists, and lack of minor grammatical errors. Recently, I read an article (can’t remember where) that talked about people in positions of power who are intentionally making errors in their writing so that their work is not mistaken for AI.

Personally, I don’t try to detect LLM use, and I certainly don’t assume LLM use due to types of punctuation that have been used by good writers for long years before LLMs were ever a thing. I am 100% confident that Laurence Sterne did not use an LLM to write The Life and Opinions of Tristram Shandy, Gentleman (2) in the 18th century. But he sure does use a lot of em-dashes (whether they were called as “em-dashes” at the time is a historical fact I’ve not researched), and my writing is influenced by Sterne and other old writers.

Does the Source Matter if Quality is Low?

Recently, I got some unhelpful feedback on a manuscript. One of my first thoughts was that it had been generated by an LLM, and two people to whom I lamented the situation both immediately said (paraphrasing) “That was an LLM.” Although I agree, it seems to me that the real problem was that the feedback did not do anything to help me improve my manuscript. It was not intelligent feedback in the sense that it did not address the question that was asked. (I specifically asked “do the diagrams I’ve added improve my manuscript or should I remove them” and got the response “simple diagrams can be very useful in a book like this,…I would favor keeping them where they clarify process, decision points, or distinctions that are harder to grasp quickly in prose.” That’s all reasonable, but it’s not an answer focused on my specific manuscript; it’s just a general principle.)

On Reddit, I’ve seen people say “I was reviewing an article that I think was an LLM. It had the following problems… How should I respond?” My thought is always: “If the quality of the article is low, then just point out the problems; point out the ways that the reasoning is unintelligent, and leave it at that.” Why worry about who wrote the article if it’s just conventional generalizations without any new ideas? If the reasoning is bad, that’s enough reason for a negative review.

It doesn’t matter to me whether the feedback was generated by a human or an LLM: whatever the source, it’s just low-quality feedback.

Can Machines Think?

Attempts to define terms are always trouble because different people use words differently. Do LLMs think? That depends on how we define “think.” Are LLMs intelligent? That depends on how we define “intelligence.”

Can machines think? Can dogs think? Humans can, but I’m sure there are many who would say that humans don’t think nearly enough. Are machines intelligent? Are dogs intelligent? Humans are. Sometimes. This all depends on how we define the terms.

I do feel confident in saying that if LLMs think, it is a very different order of thinking than my own.

LLMs generate output by calculating probabilities based on their training corpus. To generate an answer to a question, the LLM compares the question to previously written stuff and then calculates what a likely answer would based on the training data. “The insight of large language modeling is that many practical NLP tasks can be cast as word prediction,…We can cast the task of question answering as word prediction…we ask a language model to compute the probability distribution over possible next words.” (3)

I can safely say that my responses to questions are not, except in rare cases, the result of calculating probabilities. The mechanisms of my thinking — the whole range of human experience, the natural attempts to maintain coherence, as well as to manage cognitive dissonance — are not calculations. (Indeed, is it even possible to have cognitive dissonance if you’re just calculating probabilities?)

You Are More Original than LLMs

Turing’s test is no longer relevant because LLMs can have conversations that are satisfying or convincing to many. But the underlying question remains, and becomes more important: how often will we trust computers to make decisions for us?

Intelligence, at least on one level, is an ability to add something new and interesting to an ongoing conversation. By their very nature, LLMs, aren’t going to generate new ideas, at least not in the way that humans generate new ideas.

The best that LLMs can do is re-mix old stuff, and they will do it in a way that best matches what they have already seen. That’s not a route to originality. It’s not even a route to copying the best examples of anything (which are rare compared to the bulk or examples). LLM judgements are entirely based on comparisons with what has been done in the past.

Humans, however, create new stuff all the time. A lot of the new stuff that humans create is junk. And a lot of the stuff that humans create has already been created — hence the desire to not “re-invent the wheel.”

The more you trust your own intelligence, and the more you work to develop it, the more likely you are to do something that is both original and good. So trust yourself, not an LLM, which is literally a device to re-create that which is already commonplace and conventional.

This article was originally posted on Medium. Please check out my writing over there!

References

(1) Turing, A.M. 1950. “Computing Machinery and Intelligence.” Mind, LIX(236): 433–460

(2) Sterne, Laurence. (1759–1767). The Life and Opinions of Tristram Shandy, Gentleman. https://www.gutenberg.org/files/39270/39270-h/39270-h.htm

(3) Jurafsky, D. & Martin, J. H. (2026). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models (3rd. edition draft). https://web.stanford.edu/~jurafsky/slp3/ed3bookaug20_2024.pdf