2 3 5 6 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Tokenization

Tokenization is the process of breaking down a text into smaller units called tokens, which can be words, characters, or subwords. This is a fundamental step in how LLMs process and understand language.

Analogy: Imagine cutting a sentence into individual words and punctuation marks. Each piece is like a token.

Why It Matters: The length of the context window, measured in tokens, limits how much information the model can consider at once.

Related Entries

Spread the word: