How Does It Work. 2024

Whilst a whole book in itself artificial intelligence (AI) and machine learning (ML) have been around since the late 1940’s / early 1950’s. Rapid improvement in computer processing power and architecture ultimately led to the emergence of neural networks (NN) with OpenAI’s first LLM (large language model) emergence in 2018. The articulation and experimentation around the concept of neural networks goes back to the 1800’s.


Large Language Models (LLMs) work by leveraging deep neural network architectures trained on vast amounts of text data. The training process involves predicting the next word in a sequence based on context, which helps the model learn patterns and relationships in language. LLMs use attention mechanisms to focus on relevant parts of the input and generate responses based on learned associations. During inference, given a prompt, the model generates output by sampling or decoding the most likely next tokens. However, it’s important to note that LLMs primarily rely on statistical patterns and lack true understanding or consciousness.

    A large language model (LLM) is a type of artificial intelligence (AI) model that can generate human-like text based on the input it receives. These models are trained on vast amounts of text data to learn patterns, relationships, and contextual information.


    In a language model, text is broken down into smaller units called tokens. A token can be as short as one character or as long as one word, depending on the language and the specific tokenizer used. For example, in English, a token can represent a single character like “a” or a whole word like “apple.” Tokens allow the model to process and understand text at a granular level.

    The tokenization process generally follows these steps:

    • Text Preprocessing: e.g. removing special characters or converting to lowercase.
    • Tokenization Strategy: this can be based on words, subwords, or characters.
    • Token Generation: text is split into tokens based on the model strategy.
    • Special Tokens: these are added to indicate the start and end of the text.
    • Tokenized Input: The tokenized text is now ready for further processing, such as encoding & generating responses.

    The number of records or data points that a language model holds depends on the training data it was trained on. The precise contents of the training infrastructure and model implementation are often considered proprietary, although OpenAI periodically releases information via

    During the training process, the language model learns to predict the next token in a sequence of text. By analyzing the context provided by the preceding tokens, the model generates the most likely token to follow. This predictive capability enables the model to generate coherent and contextually appropriate responses.

    When interacting with a language model, you provide a prompt or a starting sentence to guide the model’s output. The model then processes the tokens in the prompt and generates a continuation based on its learned knowledge and patterns from the training data. The length of the output can vary depending on the model’s settings or the specific instructions given.

    The Sequential Process

    Here is an overview of the sequential process that takes place from your prompt to the AI answer:

    • Receiving the Prompt: The AI receive the text prompt or question that you input.
    • Tokenization: The prompt is tokenized, breaking it down into smaller units (tokens) like words, subwords, or characters. This helps prepare the text for further processing.
    • Model Input Encoding: The tokenized prompt is encoded to create a numerical representation that the model can understand. This encoding typically involves mapping the tokens to their corresponding indices in the model’s vocabulary.
    • Model Processing: The encoded prompt is passed through the model’s architecture. The specific architecture can vary, but it generally consists of neural network layers that process and analyze the input.
    • Contextual Understanding: The model leverages its pre-trained knowledge and contextual understanding to interpret the prompt and generate an initial response. This involves considering the patterns and relationships learned during training on large amounts of text data.
    • Response Generation: Based on the initial understanding of the prompt, the model generates a response. The response generation can involve various techniques, such as autoregressive decoding, where the model predicts the next token based on the preceding context.
    • Decoding and Token Generation: The generated response is decoded from the numerical representation back into human-readable text. The decoded response is then tokenized, producing a series of tokens.
    • Post-processing: The generated tokens may undergo post-processing steps, such as removing special tokens, formatting, or adjusting the response to improve readability or coherence.
    • Output: The final processed response is provided as the answer to your prompt.

    This is a general overview of the process, the specific implementation details can vary depending on the model architecture and the system configuration.

    The Pace of Development

    The pace of development is astounding, looking back at the first GPT launched by OpenAI in 2018 – 117 million parameters.
    OpenAI’s GPT 2 release in 2019: 1.5B parameters – roughly 10 Billion tokens (with the average token size is 4 characters)
    By 2020 this parameter range had extended to 175 billion parameters.
    By 2023 the May release of GPT4 the parameter count is estimated to be in the order of 100 trillion. OpenAI have not confirmed the actual parameter count.

    2018BERT (Bidirectional Encoder Representations from Transformers): Google introduces BERT, significantly improving natural language understanding by leveraging bidirectional context in transformer models.
    2019GPT-2 (Generative Pre-trained Transformer 2): OpenAI releases GPT-2 with 1.5 billion parameters, capable of generating coherent and contextually relevant text, demonstrating the power of large-scale language models.
    2020T5 (Text-To-Text Transfer Transformer): Google introduces T5, framing NLP tasks as a text-to-text problem and achieving state-of-the-art results on multiple benchmarks.
    2020GPT-3: OpenAI releases GPT-3 with 175 billion parameters, setting new standards in natural language understanding and generation, and demonstrating few-shot learning capabilities.
    2021Codex: OpenAI introduces Codex, a descendant of GPT-3, specifically fine-tuned for programming and capable of generating code from natural language descriptions.
    2021DALL-E: OpenAI unveils DALL-E, a model capable of generating images from textual descriptions, showcasing the potential of multimodal AI.
    2021CLIP (Contrastive Language-Image Pre-Training): OpenAI releases CLIP, which learns visual concepts from natural language descriptions, enabling zero-shot transfer to various image classification tasks.
    2022Chinchilla: DeepMind introduces Chinchilla, a new model scaling law, suggesting that training on more data with fewer parameters can lead to better performance, shifting focus from larger models to more data-efficient training.
    2022LaMDA (Language Model for Dialogue Applications): Google releases LaMDA, designed specifically for dialogue applications, demonstrating advanced conversational abilities and context understanding.
    2023GPT-4: OpenAI releases GPT-4, further advancing language generation and understanding capabilities with more parameters and enhanced fine-tuning techniques, improving contextual coherence and response accuracy.
    2023PaLM (Pathways Language Model): Google introduces PaLM, a model designed to handle a wide variety of NLP tasks with high efficiency, leveraging the Pathways system to enable large-scale training across multiple TPUs.
    2024Gemini: DeepMind launches Gemini, a multimodal model integrating language, vision, and reinforcement learning, pushing the boundaries of AI interaction across different modalities.
    2024ChatGPT (v2): OpenAI introduces ChatGPT based on GPT-4 architecture, enhancing user interaction with improved contextual understanding and more natural responses.

    This is a somewhat simplistic explanation of how an LLM works, the technical details are often discussed / analysed in detail via Artificial Intelligence papers published in Cornell University’s