Large Language Models
Last time, you made a small language model that could generate simple text.
Last time, you made a small language model that could generate simple text.
But large language models, or "LLMs" for short, can generate poems, conversations, stories, and more.
What makes them so good at it?
First, let's see what large language models actually do.
Models look at text and then predict—make a guess—about what should come next.
Notice that a model generates text one token at a time.
A token is a piece of text, like a word.
Choose the word you think best finishes each sentence.
You'll see the LLM's prediction after you make your choice.
Try this sentence, too.
What about this one?
How did you pick the best word to finish each sentence?
Odds are, you relied on the word's meaning.
Let's look at this sentence again:
Many words could finish this sentence.
But, only a couple words fit this sentence's meaning.
LLMs are good at generating text because they seem to understand what words mean.
But do they? Let's explore.
First, let's see how you think about words' meanings.
Drag and drop the words below, putting the words closest in meaning to Door at the top.
When you're ready, click Reveal to see how an LLM thinks about these words' meanings.
Notice how the LLM gave each word a number.
This number represents the distance between Door and Entrance.
But what does it mean for two words to be closer or further from one another?
Enter two words of your choice into each box below.
Can you find two words close to each other?
What about two words far from each other?
Did you notice words closer in meaning are also closer together?
To see why, let's look at how language models understand words.
Large language models don't understand words like humans do.
Instead, they use math!
Language models use a kind of math with dimensions.
A dimension is a scale or a measurement. For example, black to white, small to large, or light to heavy.
Let's start simple, with just one dimension.
As you move the slider along this dimension, how does the prediction change?
Just one dimension might not be enough to find the right meaning.
Let's add another dimension: timid to bold.
Can you find the right combination of meaning to finish this sentence?
Using more than one dimension makes it easier to find the right meaning.
Large language models use thousands of dimensions to understand words.
That's one reason they're good at finding the right word!
But, it's not the whole story.
Language models use dimensions to represent the meanings of words.
But how do they pick the right word?
In short, language models are trained.
At first, language models pick the wrong words.
So, training is about fixing these predictions until they're correct.
Remember how words can be closer or further away from each other?
Here's another way to think about training:
It's about reducing the distance between what a model actually predicts and what it should predict.
Let's see this idea in action.
During training, a model sees examples of text.
After, it tries to complete similar text on its own.
The distance between the correct text and the predicted text tells the model how well it did.
If the model's prediction isn't close, it will try to fix it.
But, a model can only fix its prediction a little at a time.
As the model sees more and more examples, its predictions get better and better.
Until...
Well, why not see for yourself?
Can you help this model adjust its predictions until they're correct?
All of the examples a model trains with have a name: training data.
Training data is chosen by humans.
If training data is incorrect, unfair, or biased, then the model's predictions will be too.
See for yourself!
Large language models aren't magic.
Instead, they use clever math and lots of data to predict words based on their meanings.
Next time you use a large language model, think about what might be going on behind the scenes!