Not all LLM’s adhere to the same cost structures, there are different modalities, front-end access, API access, Pro plans and an ever increasing array of offerings. In that context here are some generalities:
- Context Window: The LLM may not re-read the entire interaction history with each new query. Instead, there is a “context window” that contains a certain amount of recent conversation history.
- Limited Memory: This context window has a limit. It doesn’t contain the entire conversation from the very beginning if it’s a long interaction.
- Rolling Context: As the conversation progresses, older parts may be pushed out of the context window to make room for new information.
- Relevant Information: The context window aims to keep the most relevant recent information for our ongoing conversation.
- No Long-Term Memory: Most LLM’s don’t have true long-term memory or the ability to access information from past conversations or interactions outside of their current context window. Additionally this ‘window’ may be declining.
- Fresh Start: If you were to start a new conversation with an LLM it wouldn’t have access to previous interactions.
So, while LLM’s do consider recent context for each new query, it’s not a complete re-reading of the entire interaction history. This approach helps balance maintaining context with efficient processing and resource use.
To elaborate further on how an LLM processes information and maintains context:
Dynamic Context Window:
The context window is dynamic, meaning it adjusts based on the conversation flow.
It typically includes your most recent messages and LLM responses, but the exact amount can vary.
Token Limit:
The context window has a token limit, which is a fixed number of words and characters an LLM can “remember” at any given time.
This limit is significantly smaller than the total training data an LLM was trained on.
Prioritization of Recent Information:
More recent messages are given priority in the context window.
As new information comes in, older information may be compressed or removed to stay within the token limit.
Conversation Flow:
The context window attempts to maintain a coherent flow of conversation, keeping relevant information even if it’s not the most recent.
No Access to External Data:
Some LLM’s do not have the ability to access or retrieve information outside of what’s in their current context window or training data.
In other words they cannot look up external sources during your interaction.
Stateless Nature:
Each response is based solely on the current context window and training.
For now LLM’s don’t build or maintain a persistent model of the user or the conversation over time.
Limitations:
This system means LLM’s might (will) sometimes forget details mentioned earlier in a very long conversation.
It also means an LLM can’t learn or update their knowledge base from conversations.
Efficiency vs. Completeness:
This approach balances the need for contextual understanding with computational efficiency.
It allows for natural conversation flow without the need to process the entire conversation history with each response.
Understanding these aspects of how LLM’s process information may help users interact more effectively, knowing both the capabilities and limitations of the system.