Tokenization is the process of breaking down a text into smaller units called tokens, which can be words, characters, or subwords. This is a fundamental step in how LLMs process and understand language.
Analogy: Imagine cutting a sentence into individual words and punctuation marks. Each piece is like a token.
Why It Matters: The length of the context window, measured in tokens, limits how much information the model can consider at once.