How Does It Work.

Whilst a whole book in itself artificial intelligence (AI) and machine learning (ML) have been around since the late 1940’s / early 1950’s. Rapid improvement in computer processing power and architecture ultimately led to the emergence of neural networks (NN) with OpenAI’s first LLM (large language model) emergence in 2018. The articulation and experimentation around the concept of neural networks goes back to the 1800’s.

TL;DR:

Large Language Models (LLMs) work by leveraging deep neural network architectures trained on vast amounts of text data. The training process involves predicting the next word in a sequence based on context, which helps the model learn patterns and relationships in language. LLMs use attention mechanisms to focus on relevant parts of the input and generate responses based on learned associations. During inference, given a prompt, the model generates output by sampling or decoding the most likely next tokens. However, it’s important to note that LLMs primarily rely on statistical patterns and lack true understanding or consciousness.

In some ways the process, as we now know it, is similar to search algorithms which allow one to search for a specific result, as in a Google or Bing search. The work on algorithms over 40 years has allowed for a significant leap in natural language processing (NLP) ultimately leading to GPT-3 (Generative Pre-trained Transformer 3). Generative artificial intelligence or generative AI or GenAI.

The important word here is Generative, the models are trained on large data sets and in response to Prompts can generate text, images, or other output. The pace of development is impressive. We are now seeing art, music and creative writing capabilities emerging from various models and various companies.

Arthur C. Clarke and his three laws:

In this context it may be useful to think of science fiction writer Arthur C. Clarke and his three laws:

When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
The only way of discovering the limits of the possible is to venture a little way past them into the impossible.
Any sufficiently advanced technology is indistinguishable from magic.

The individuals and organizations at the forefront of LLM development have created something that works and it is magical.

A large language model, such as GPT-3.5, is a type of artificial intelligence (AI) model that can generate human-like text based on the input it receives. These models are trained on vast amounts of text data to learn patterns, relationships, and contextual information.

Tokenization

In a language model, text is broken down into smaller units called tokens. A token can be as short as one character or as long as one word, depending on the language and the specific tokenizer used. For example, in English, a token can represent a single character like “a” or a whole word like “apple.” Tokens allow the model to process and understand text at a granular level.

The tokenization process generally follows these steps:

Text Preprocessing: e.g. removing special characters or converting to lowercase.
Tokenization Strategy: this can be based on words, subwords, or characters.
Token Generation: text is split into tokens based on the model strategy.
Special Tokens: these are added to indicate the start and end of the text.
Tokenized Input: The tokenized text is now ready for further processing, such as encoding & generating responses.

The number of records or data points that a language model holds depends on the training data it was trained on. GPT-3, for instance, was trained on a massive corpus of diverse texts, including books, articles, websites, and more. The precise contents of the training infrastructure and model implementation are often considered proprietary, although OpenAI periodically releases information via https://github.com/openai

During the training process, the language model learns to predict the next token in a sequence of text. By analyzing the context provided by the preceding tokens, the model generates the most likely token to follow. This predictive capability enables the model to generate coherent and contextually appropriate responses.

When interacting with a language model, you provide a prompt or a starting sentence to guide the model’s output. The model then processes the tokens in the prompt and generates a continuation based on its learned knowledge and patterns from the training data. The length of the output can vary depending on the model’s settings or the specific instructions given.

Statistical Pattern Matching versus Consciousness

It’s important to note that while LLMs can generate impressive and contextually relevant text, they don’t possess genuine understanding or consciousness. They operate based on statistical patterns learned from training data and may occasionally produce incorrect or nonsensical responses. This is known as AI hallucination. (this is something to watch for since the AI just ‘fabricates’ the information needed to answer your question.)

The training process involves identifying statistical regularities and patterns in the data, allowing the model to make predictions or generate responses based on probabilities.

Lack of Consciousness: While large language models can generate impressive responses, they do not possess consciousness. Consciousness refers to subjective awareness, self-reflection, and the ability to have subjective experiences. Language model do not have a subjective perspective and do not possess consciousness in the way humans do. They do not have beliefs, desires, or emotions.

In the context of machine learning Parameters refer to the variables incorporated within a trained model that enable the generation of new content through inference. Parameters are not words or tokens they are the variables that encapsulate the learned knowledge from the training data. They include weights and biases in neural networks, which are adjusted during training to optimize the model’s performance and enable accurate predictions or text generation.

The specific nature of these variables varies depending on the type of machine learning model. In the case of neural networks, which are the basis for many language models including GPT-3.5, the parameters consist of weights and biases.

Weights

Weights are the numerical values assigned to the connections between neurons in the neural network. They determine the strength of the connections and govern the impact of one neuron’s output on another. Adjusting the weights allows the model to learn and adapt to the patterns in the data during the training process.

Biases

Biases are additional parameters in a neural network that provide an offset or baseline activation for each neuron. They help the model account for variations and trends in the data that may not be captured by the weights alone.

The Sequential Process

Here is an overview of the sequential process that takes place from your prompt to the AI answer:

Receiving the Prompt: The AI receive the text prompt or question that you input.
Tokenization: The prompt is tokenized, breaking it down into smaller units (tokens) like words, subwords, or characters. This helps prepare the text for further processing.
Model Input Encoding: The tokenized prompt is encoded to create a numerical representation that the model can understand. This encoding typically involves mapping the tokens to their corresponding indices in the model’s vocabulary.
Model Processing: The encoded prompt is passed through the model’s architecture. The specific architecture can vary, but it generally consists of neural network layers that process and analyze the input.
Contextual Understanding: The model leverages its pre-trained knowledge and contextual understanding to interpret the prompt and generate an initial response. This involves considering the patterns and relationships learned during training on large amounts of text data.
Response Generation: Based on the initial understanding of the prompt, the model generates a response. The response generation can involve various techniques, such as autoregressive decoding, where the model predicts the next token based on the preceding context.
Decoding and Token Generation: The generated response is decoded from the numerical representation back into human-readable text. The decoded response is then tokenized, producing a series of tokens.
Post-processing: The generated tokens may undergo post-processing steps, such as removing special tokens, formatting, or adjusting the response to improve readability or coherence.
Output: The final processed response is provided as the answer to your prompt.

This is a general overview of the process, the specific implementation details can vary depending on the model architecture and the system configuration.

The Pace of Development

The pace of development is astounding, looking back at the first GPT launched by OpenAI in 2018 – 117 million parameters.
OpenAI’s GPT 2 release in 2019:
https://openai.com/research/gpt-2-1-5b-release 1.5B parameters – roughly 10 Billion tokens (with the average token size is 4 characters)
By 2020 this parameter range had extended to 175 billion parameters.
By 2023 the May release of GPT4 the parameter count is estimated to be in the order of 100 trillion. OpenAI have not confirmed the actual parameter count.

This is a somewhat simplistic explanation of how an LLM works, the technical details are often discussed / analysed in detail via Artificial Intelligence papers published in Cornell University’s https://arxiv.org/list/cs.AI/recent