Large Language Models

Last time, you made a small language model that could generate simple text.

But large language models, or "LLMs" for short, can generate poems, conversations, stories, and more.

What makes them so good at it?

Prediction

First, let's see what large language models actually do.

Models look at text and then predict—make a guess—about what should come next.

Alice

Notice that a model generates text one token at a time.

A token is a piece of text, like a word.

Best Guess

Choose the word you think best finishes each sentence.

You'll see the LLM's prediction after you make your choice.

Alice went timidly up to the door, and

Your Prediction

LLM's Prediction

Try this sentence, too.

Splash! She was up to her chin in

Your Prediction

LLM's Prediction

What about this one?

"Off with her head!" the Queen

Your Prediction

LLM's Prediction

How did you pick the best word to finish each sentence?

Odds are, you relied on the word's meaning.

Meaning

Let's look at this sentence again:

Alice went timidly up to the door, and

Many words could finish this sentence.

But, only a couple words fit this sentence's meaning.

LLMs are good at generating text because they seem to understand what words mean.

But do they? Let's explore.

Closer or Further?

First, let's see how you think about words' meanings.

Drag and drop the words below, putting the words closest in meaning to Door at the top.

Door
Reveal

When you're ready, click Reveal to see how an LLM thinks about these words' meanings.

Notice how the LLM gave each word a number.

Entrance 0.46

This number represents the distance between Door and Entrance.

But what does it mean for two words to be closer or further from one another?

Enter two words of your choice into each box below.

Door 0.46 Entrance

Can you find two words close to each other?

What about two words far from each other?

Did you notice words closer in meaning are also closer together?

To see why, let's look at how language models understand words.

Meanings from Math

Large language models don't understand words like humans do.

Instead, they use math!

Language models use a kind of math with dimensions.

A dimension is a scale or a measurement. For example, black to white, small to large, or light to heavy.

Let's start simple, with just one dimension.

As you move the slider along this dimension, how does the prediction change?

Exiting
Entering
?

Predicted Sentence

Alice went timidly up to the door, and entered.

Just one dimension might not be enough to find the right meaning.

Let's add another dimension: timid to bold.

Can you find the right combination of meaning to finish this sentence?

Entering
Exiting
Timid
Bold
?
?

Predicted Sentence

Alice went timidly up to the door, and entered.

Using more than one dimension makes it easier to find the right meaning.

Large language models use thousands of dimensions to understand words.

That's one reason they're good at finding the right word!

But, it's not the whole story.

The Right Words To Say

Language models use dimensions to represent the meanings of words.

But how do they pick the right word?

In short, language models are trained.

At first, language models pick the wrong words.

Alice went timidly up to the door, and broke in.

So, training is about fixing these predictions until they're correct.

Alice went timidly up to the door, and knocked.

Remember how words can be closer or further away from each other?

Knocked 1.39 Broke in

Here's another way to think about training:

It's about reducing the distance between what a model actually predicts and what it should predict.

Let's see this idea in action.

Training

During training, a model sees examples of text.

Correct Sentence

Alice went timidly up to the door and knocked.

After, it tries to complete similar text on its own.

?

Predicted Sentence

Alice went timidly up to the door and broke in.

The distance between the correct text and the predicted text tells the model how well it did.

Correct Sentence

Alice went timidly up to the door and knocked.

1.39
?

Predicted Sentence

Alice went timidly up to the door and broke in.

If the model's prediction isn't close, it will try to fix it.

But, a model can only fix its prediction a little at a time.

Correct Sentence

Alice went timidly up to the door and knocked.

0.64
?

Predicted Sentence

Alice went timidly up to the door and opened it.

As the model sees more and more examples, its predictions get better and better.

Correct Sentence

Alice went timidly up to the door and knocked.

0.09
?

Predicted Sentence

Alice went timidly up to the door and rang the bell.

Until...

Well, why not see for yourself?

Can you help this model adjust its predictions until they're correct?

?

Training Data

All of the examples a model trains with have a name: training data.

Training data is chosen by humans.

If training data is incorrect, unfair, or biased, then the model's predictions will be too.

See for yourself!

?

Meaning Machines

Large language models aren't magic.

Instead, they use clever math and lots of data to predict words based on their meanings.

Next time you use a large language model, think about what might be going on behind the scenes!