3 Comments
User's avatar
EMOTOBALL's avatar

This explanation of AI as “next-word prediction” is accurate — and the article does a strong job breaking down attention, tokens, and how the system actually works.

But there’s one layer missing.

Everything described here happens inside a single pass:

tokens → attention → probabilities → next word.

In real use, though, that process doesn’t happen once.

It happens across turns — with interaction.

So a more complete way to say it is:

“It predicts the next word under continuously updated constraints imposed by interaction.”

Because the user isn’t just providing input.

They are:

• reinforcing or rejecting outputs

• shifting tone and framing

• applying pressure for precision

• and shaping what the model prioritizes next

The model computes probabilities.

The interaction reshapes them over time.

That layer isn’t in most explanations — but it’s where the system actually becomes useful.

Dr Teodora Szasz's avatar

Totally agree. This is an intro. I am going in much more detail in many of my other articles.

User's avatar
Comment removed
Jan 12
Comment removed
Dr Teodora Szasz's avatar

Thank you — glad the cocktail party framing landed!

You're touching on something important. The "Are Sixteen Heads Really Better than One?" paper showed exactly this: many attention heads become redundant in deeper layers, and you can often prune 40-60% of them with minimal performance loss.

It's a fascinating tension — we design for diversity, but the model learns toward convergence. Great fodder for a future post on efficient transformers and why smaller models can punch above their weight.

Appreciate you reading closely!