The Context Window Size for LLMs

The context window size for language models (LLMs) can vary significantly depending on the specific model and version. Here is a very approximate breakdown of context window sizes in both tokens and approximate words for a range of popular LLMs:

GPT-3 Series:

GPT-3 (Ada, Babbage, Curie, Davinci)

Tokens: 2048 tokens
Approximate Words: 1500 words

GPT-3.5

Tokens: 4096 tokens
Approximate Words: 3000 words

GPT-4 Series:

GPT-4

Tokens: 8192 tokens (standard)
Approximate Words: 6000 words

GPT-4 (extended)

Tokens: 32,768 tokens
Approximate Words: 24,000 words

OpenAI Codex:

Codex (code-davinci-002)

Tokens: 4096 tokens
Approximate Words: 3000 words

Google LaMDA:

LaMDA

Tokens: Approximately 4096 tokens
Approximate Words: 3000 words

Anthropic Claude:

Claude (Claude 1)

Tokens: 9000 tokens
Approximate Words: 6750 words

EleutherAI GPT-J:

GPT-J

Tokens: 2048 tokens
Approximate Words: 1500 words

Cohere:

Cohere Command

Tokens: 2048 tokens
Approximate Words: 1500 words

Summary:

Tokens to Words Approximation: Generally, one token is roughly 0.75 words (depending on the complexity of the language and tokenization method).
Context Windows: The context window size is a critical factor that determines how much text the model can process at once.

These values are approximations, and the exact word count can vary based on the specific text being tokenized. Additionally given the pace of development this information may be out of date a few minutes after I publish it 🙂