AI Is Not Intelligent
navigate  ·  Home/End jump  ·  F fullscreen
A plain-language explanation

AI Is Not
Intelligent

What large language models actually do, from probabilities and tokens to context windows and pattern prediction, and why so many people interpret the output as genuine understanding

42 Slides ~45 min Read 2026 Edition
Part I

What we mean by "intelligent"

Before we can say a system is not intelligent, we need to agree on what the word was promising in the first place.

The claim · 1 of 3

The claim, stated plainly

A large language model produces fluent, useful text without understanding any of it. It does one mechanical thing extremely well, and that thing is not thinking.

It is genuinely useful

Give it full credit. These systems draft, translate, summarize, and write code at a level that was science fiction a decade ago.

Useful and intelligent differ

A calculator is useful and nobody calls it intelligent. Capability and understanding are separate questions.

This is not a put-down

Knowing how the tool works makes you better at using it. The goal here is an accurate mental model, and accuracy pays off.

By intelligent, this talk means the human sense of grounded understanding, intentional reasoning, and awareness, rather than raw capability. It focuses on text language models, though much of it carries over to image and audio systems built the same way.

The claim · 2 of 3

What the word "intelligent" smuggles in

When we call something intelligent, three things ride along with the word. A language model holds a thin, ungrounded version of each, which is why the output can look like the real thing.

Understanding

We assume words connect to real things. A text-only model learns how words relate to each other from text, with no senses and no lived contact with what they describe.

Intent

We assume a goal sits behind the words. There is a process selecting likely text, with no aim of its own.

A model of the world

We assume it tracks what is true. It builds internal abstractions that behave like a partial map of the world, with no way to check them against reality.

The claim · 3 of 3

The whole talk in one sentence

The core mechanic

A language model predicts the next chunk of text, over and over, based on patterns it absorbed from enormous amounts of writing. Everything else it appears to do grows out of that single loop.

Where ability comes from

Predicting that much text well forces the model to absorb grammar, facts, styles, and habits of reasoning.

Where the limits come from

It optimizes for plausible continuations. Plausible and true overlap often, and not always.

Where the illusion comes from

Fluent prediction looks exactly like understanding from the outside. The rest of the talk pulls the two apart.

Part II

The raw material is tokens

The model does not work with words or letters. It works with tokens, and that single fact explains a surprising number of its quirks.

Tokens · 1 of 3

It never actually sees words

Before any prediction happens, your text is chopped into tokens. A token is a common chunk of characters, often a word, sometimes part of one, sometimes just a space and a letter.

Built from frequency

Tokens come from how often character sequences appear in training text, so " the" and "ing" become tokens because they are everywhere.

Numbers stand in for text

Each token maps to an integer. The model only ever reads and writes lists of these numbers, then they turn back into text for you.

Roughly four characters each

In English a token averages about four characters, so 1,000 tokens is around 750 words. Other languages can cost far more.

Tokens · 2 of 3

How text becomes tokens

  "Tokenization isn't intelligence."

  ┌──────┬─────────┬─────┬────┬───────────────┬───┐
  │ Token│ ization │ isn │ 't │ intelligence  │ . │
  └──────┴─────────┴─────┴────┴───────────────┴───┘
        6 tokens, one common word split into two pieces

  the model sees only their IDs, never the letters:
  24038    2065     6315    956     11478       13

Notice that "Tokenization" became two tokens while "intelligence" stayed whole, purely because of how often each string appears in training. The model never sees the letters inside a token as separate things.

Tokens · 3 of 3

Why this explains the weird failures

Once you know the model reads tokens and not letters, several famous failures stop being mysterious.

Counting letters

Ask how many r's are in "strawberry" and it can stumble. The word is a couple of tokens, and the individual letters were never visible.

Spelling and reversing

Reversing a string or spelling a word backwards is hard for the same reason. It shuffles token chunks, not characters.

Arithmetic

Numbers fracture into awkward token pieces, which is part of it. The deeper reason is that it predicts what an answer looks like instead of running a calculation.

Part III

Probability is the engine

At its core the model does one thing on a loop. It looks at everything so far and guesses what comes next.

Probability · 1 of 5

Predict one token, then repeat

The generation loop

Text so far prompt plus what it wrote Model billions of weights Score every token a probability for each one Pick one token via the sampling rule append the token, then run the whole loop again

Nothing in this loop checks whether the result is true. It optimizes for what is likely to come next, and likely is not the same as correct.

Probability · 2 of 5

Attention is what changed everything

Open up the Model box from the last slide. The design that made these systems powerful is called the transformer, and it rests on one idea called attention.

Older models read in order

Earlier networks processed text one word at a time and tended to forget the start of a long passage by the time they reached the end.

Attention lets tokens look at each other

Every token can weigh every other token in the input at once, so the model links a pronoun to its noun or a question to its earlier setup directly.

That unlocked scale

Doing this in parallel made training on enormous data practical. More data and bigger models kept paying off, which is why capability jumped.

The same attention-based design now underpins image, audio, and video models, so much of what follows applies well beyond text.

Probability · 3 of 5

A ranked list of guesses

Given the unfinished phrase "I poured myself a cup of", the model assigns a probability to every possible next token. These are the front-runners.

Next-token probabilities for one prompt illustrative

100 75 50 25 42% coffee 27% tea 9% water 6% hot 4% juice 12% everything else
Probability · 4 of 5

Why the same prompt gives different answers

If it always grabbed the single highest bar, it would be repetitive and dull. So it rolls weighted dice over the top candidates, and a few settings control how adventurous that roll is.

Temperature

A dial on randomness. Low temperature sticks to the safest token, high temperature spreads the odds and invites surprise.

Top-p and top-k

These trim the candidate pool to the most likely tokens before the dice roll, so the model stays on the rails while still varying.

The consequence

Run the same prompt twice and you can get two different answers. Both are plausible continuations. Neither was looked up or verified.

Probability · 5 of 5

It runs on its own output

The model writes one token, adds it to the text, then predicts again with that token now part of the input. Feeding output back into input is called autoregression.

step 1   The
step 2   The cat
step 3   The cat sat
step 4   The cat sat on
step 5   The cat sat on the
step 6   The cat sat on the mat

Because each token depends on the ones before it, an early wrong turn gets built upon rather than corrected. The model commits to its own mistake and keeps elaborating confidently.

Part IV

Where the patterns come from

The model's apparent knowledge was baked in during training and then frozen. That explains both its breadth and its blind spots.

Training · 1 of 4

Compressing the internet into weights

Training shows the model staggering amounts of text and asks it, again and again, to predict the next token. Each miss nudges billions of internal numbers, the weights, a little closer.

What goes in

Books, code, articles, forums, and far more. Hundreds of billions of words of human writing.

What comes out

A fixed set of weights. No copy of the text is stored, only the statistical patterns squeezed out of it.

Why it seems to know everything

Predicting all that text well requires absorbing grammar, facts, and reasoning habits. Knowledge is a side effect of the guessing game.

Training · 2 of 4

Patterns, not a database

People picture a search engine with a model bolted on top. The reality is closer to a musician improvising in a style they have absorbed.

A search engine looks up

It finds the exact stored record and returns it word for word. Right or wrong, it is repeating something specific that exists.

A model reconstructs

It rebuilds a likely answer from overlapping patterns every time. Common facts come out reliably because the patterns are strong and consistent.

This is why it nails well-documented facts and invents obscure ones with equal confidence. Both answers are generated the same way, and only the strength of the underlying pattern differs.

Training · 3 of 4

What predicting text well builds

Predicting the next token sounds trivial. Doing it well across the whole internet is extremely hard, and the result is far richer than a lookup table.

Internal representations form

To predict well, the model builds internal structure. Interpretability research finds features that track sentiment, position, and even rough maps of places it has only read about.

Abilities emerge with scale

Skills like translation and step-by-step problem solving were never programmed in. They appeared as models grew, as a by-product of better prediction.

Simple mechanism, rich behavior

The rule is easy to state. The behavior it produces is genuinely sophisticated, and worth taking seriously rather than waving off as mere autocomplete.

Here is the nuance that matters. The model has no grounded, human understanding, and it is also far more than a parlor trick. Both are true at the same time.

Training · 4 of 4

Fine-tuning gives it a voice

Raw from pretraining, the model just continues text. The helpful chat assistant you talk to is a second layer of training on top.

Pretraining

Learns language and knowledge by predicting next tokens across the whole corpus. Produces raw capability with no manners.

Instruction tuning

Trained on examples of following requests, so it answers questions instead of merely continuing them.

Human feedback (RLHF)

People rank responses, and the model is nudged toward the preferred ones. This shapes tone, helpfulness, and refusals.

All of this happens before you ever type a word. Once deployed the weights are frozen. The model does not learn from your conversation, and it forgets everything the moment the window clears.

Part V

The only world it sees

Everything the model can use right now has to fit in its context window. Outside that window, for the model, nothing exists.

Context · 1 of 3

The window is the whole world

A sliding window over the token stream

Context window what it can attend to now scrolled out of view not generated yet Older tokens fall out the back as new ones arrive.

When people say a chatbot "remembers" the conversation, the application is pasting the transcript back into this window every turn. The model itself holds nothing between requests.

Context · 2 of 3

No memory between sessions

Each request starts cold. The model has no diary, no notes from yesterday, no sense that you have spoken before.

Every turn is a resend

The whole relevant transcript is fed in again each time. Continuity is the application replaying text, not the model recalling it.

Memory features store and retrieve

Products that "remember" you save facts in a database and quietly inject them into the window. Useful, and entirely outside the model.

Close the tab and it is gone

Nothing you said persists in the model. It cannot be reminded of a past chat it was never holding.

Context · 3 of 3

Even inside the window, attention is uneven

A bigger window helps, and it is neither free nor uniform. Where you put information changes how well it lands.

Lost in the middle

Models tend to use the start and end of a long context well and skim the middle. A key fact buried mid-document risks being ignored.

Shared budget

System rules, history, documents, and your question all compete for the same token limit. Add more of one and you squeeze the rest.

Silent truncation

Go over the limit and the oldest tokens drop off, often without warning. The model then answers as if they were never there.

Part VI

Hallucination is the mechanism working

A made-up citation is not the system breaking. It is the prediction engine doing exactly what it always does.

Hallucination · 1 of 3

Not a malfunction

The uncomfortable framing

A hallucination is a confident, fluent, wrong answer, produced by the same process that produces correct answers. The model is always generating the most plausible continuation, and sometimes the most plausible text simply is not true.

There is no truth check

Nothing in the loop compares the output against reality. Plausibility is the only target it has.

Its confidence is poorly calibrated

The model carries some signal of its own uncertainty, and it is weak and unreliable. Training for confident, helpful answers tends to bury the doubt that is there.

Hallucination · 2 of 3

Why a confident wrong answer is likely

Confidence in the output is mostly a property of the writing style. Authoritative prose fills the training data, and training for helpfulness can reward confident wording, so the model leans that way by default.

Plausible beats accurate

A realistic-sounding fake reference scores higher than an honest "I am not sure", because hedging is rarer in the text it learned from.

Gaps get filled

Ask for something it half knows and it completes the pattern with invented specifics that fit the shape of a real answer.

Leading questions steer it

Phrase a question as if a fact exists and the most likely continuation is to supply that fact, true or not.

A concrete case. In 2023 a lawyer filed a court brief built on precedents a chatbot had invented, with realistic names, reporters, and quotes, none of which existed. The wording read like genuine case law, which is exactly why it slipped through.

Hallucination · 3 of 3

Where its confidence misleads you

Trigger Why it happens What you see
Obscure facts Weak, thin patterns in training Confident, specific, wrong details
Recent events After the training cutoff, no data Plausible guesses stated as current fact
Quotes & citations It reconstructs the shape of a real one Real-looking sources that do not exist
Niche code APIs It blends several similar libraries Functions and flags that were never real
"Are you sure?" Agreement is a common pattern It flips its answer either way

None of these are random. Each row is a place where the strongest available pattern points away from the truth.

Part VII

Why it feels intelligent

If it is only predicting text, why is it so easy to believe there is a mind behind it? The answer is partly about the model and largely about us.

The illusion · 1 of 4

Fluency reads as understanding

For our entire history, fluent and coherent language was a reliable sign of a thinking mind. The model breaks that link, and our instincts have not caught up.

Grammar signals a person

Smooth, correct sentences used to guarantee a human author. We still read competence as comprehension.

Coherence signals a thread of thought

When ideas connect across paragraphs, we infer a reasoning process. The model produces those connections statistically.

Tone signals feeling

Warmth, hesitation, and apology in the text read as emotion. They are learned stylistic patterns with nothing behind them.

The illusion · 2 of 4

We supply the meaning

A lot of the intelligence we perceive is contributed by the reader. We are pattern-matchers too, primed to find minds everywhere.

The ELIZA effect

In the 1960s people opened up to a trivial script that just rephrased their sentences. The urge to see a mind in responsive text runs deep.

Anthropomorphism

We name our cars and apologize to furniture. A system that says "I think" gets a personality assigned to it instantly.

We fill the gaps

Given fluent output, we generate the charitable reading, smooth over errors, and credit the model with our own inference.

The illusion · 3 of 4

Sounding right and being right

It helps to separate the surface from the mechanism. The same response can be described two ways, and both are accurate.

Looks like understanding

  • Answers the exact question asked
  • Adapts its tone to the situation
  • Corrects itself when challenged
  • Explains its reasoning step by step

What is happening underneath

  • Produces tokens that usually follow such a question
  • Matches a style well represented in training
  • Generates a fresh plausible continuation after the challenge
  • Writes text shaped like an explanation

Both columns describe the same event. The gap between them is where the word "intelligent" quietly slips in.

The illusion · 4 of 4

"Reasoning" models are more prediction

Newer reasoning models seem to think before answering, and they are genuinely better at hard problems. The mechanism is still next-token prediction, given more room to run.

Thinking out loud

They generate a long chain of intermediate tokens before the final answer. Working through steps in text really does raise accuracy.

Still sampled, not deduced

Each step in that chain is predicted the same way as any other token. There is no separate logic engine switched on.

It can rationalize

The visible reasoning is itself generated text. A model can produce a tidy explanation for an answer it reached for other reasons.

Part VIII

Using it well anyway

None of this makes the tool less valuable. It makes it predictable. Here is how the mental model pays off in practice.

Using it well · 1 of 3

What prediction is genuinely great at

Reframe the model as a fast, fluent pattern engine and its strengths line up cleanly. These are the jobs where plausible and useful are the same thing.

Transforming text you provide

Summarizing, rephrasing, translating, reformatting. The source is in the window, so it has little to invent.

Drafting and brainstorming

First drafts, variations, and ideas to react to. You are the editor, and plausible is exactly what you want.

Code and structured patterns

Boilerplate, conversions, and well-trodden snippets. Strong, common patterns are where it is most reliable.

Using it well · 2 of 3

What never to trust unverified

Lean on it for

  • Text it can transform from material you gave it
  • Explanations you are able to check yourself
  • Options and drafts you will review
  • Getting unstuck on familiar, well-documented ground

Always verify

  • Facts, numbers, dates, and statistics
  • Citations, quotes, and links
  • Anything after its training cutoff
  • Legal, medical, or financial specifics
  • Code that touches money, data, or security
Using it well · 3 of 3

Treat every answer as a draft

The single habit that protects you is to read output as a confident draft from a brilliant, unreliable intern. Use it, then check it.

Ask for sources, then open them

A real reference can be verified. Treat any citation as unconfirmed until you have seen it yourself.

Cross-check what matters

The higher the stakes, the more independent the confirmation needs to be. Low stakes, lighter touch.

Keep a human on the decision

The model can inform a judgment. It should not be the one making it where the cost of wrong is real.

This is the correct way to operate a tool that optimizes for plausible text, and it costs you very little once it becomes a habit.

Part IX

The recap

One mechanism, a handful of consequences, and a mental model you can carry out of the room.

Recap · 1 of 3

The mechanism

Tokens

It reads and writes chunks of text as numbers, not whole words.

Prediction

It scores the next token and samples one, then repeats the loop.

Training

Its abilities are frozen patterns squeezed from a huge body of text.

Context

It only sees what fits in the window right now, and nothing else.

Recap · 2 of 3

The consequences

Plausible, not verified

Output is the likely continuation, with no check against truth.

Confident hallucination

Wrong answers arrive in the same fluent voice as right ones.

No memory

Each session starts cold unless an app replays the text for it.

It feels like a mind

Fluency plus our instincts manufacture the impression of understanding.

Recap · 3 of 3

The working mental model

Brilliant, fluent, unreliable

A fast pattern engine that drafts beautifully and cannot vouch for a word of it.

Plausible is the goal

Everything it produces aims at sounding right. You supply the part that checks whether it is right.

Use it like that

Hand it pattern work, keep the judgment, and verify anything that matters. Then it is genuinely powerful.

It predicts extremely well. Understanding is something you still bring to the table.

End of deck

It predicts.
It does not understand.

A language model is a remarkable pattern engine that turns probability into fluent text. Treat the output as a probable draft, verify what matters, and it earns its place. The fluency is real. The understanding is yours to add.

Press ← to revisit Home for slide 1 F for fullscreen