The context window size for language models (LLMs) can vary significantly depending on the specific model and version. Here is a very approximate breakdown of context window sizes in both tokens and approximate words for a range of popular LLMs:
GPT-3 Series:
- GPT-3 (Ada, Babbage, Curie, Davinci)
- Tokens: 2048 tokens
- Approximate Words: 1500 words
- GPT-3.5
- Tokens: 4096 tokens
- Approximate Words: 3000 words
GPT-4 Series:
- GPT-4
- Tokens: 8192 tokens (standard)
- Approximate Words: 6000 words
- GPT-4 (extended)
- Tokens: 32,768 tokens
- Approximate Words: 24,000 words
OpenAI Codex:
- Codex (code-davinci-002)
- Tokens: 4096 tokens
- Approximate Words: 3000 words
Google LaMDA:
- LaMDA
- Tokens: Approximately 4096 tokens
- Approximate Words: 3000 words
Anthropic Claude:
- Claude (Claude 1)
- Tokens: 9000 tokens
- Approximate Words: 6750 words
EleutherAI GPT-J:
- GPT-J
- Tokens: 2048 tokens
- Approximate Words: 1500 words
Cohere:
- Cohere Command
- Tokens: 2048 tokens
- Approximate Words: 1500 words
Summary:
- Tokens to Words Approximation: Generally, one token is roughly 0.75 words (depending on the complexity of the language and tokenization method).
- Context Windows: The context window size is a critical factor that determines how much text the model can process at once.
These values are approximations, and the exact word count can vary based on the specific text being tokenized. Additionally given the pace of development this information may be out of date a few minutes after I publish it 🙂