What is a Foundation Model?
Trained on massive datasets, foundation models (FMs) are large deep learning neural networks that have changed the way data scientists approach machine learning (ML). Rather than develop artificial intelligence (AI) from scratch, data scientists use a foundation model as a starting point to develop ML models that power new applications more quickly and cost-effectively. The term foundation model was coined by researchers to describe ML models trained on a broad spectrum of generalized and unlabeled data and capable of performing a wide variety of general tasks such as understanding language, generating text and images, and conversing in natural language.
What is unique about foundation models?
A unique feature of foundation models is their adaptability. These models can perform a wide range of disparate tasks with a high degree of accuracy based on input prompts. Some tasks include natural language processing (NLP), question answering, and image classification. The size and general-purpose nature of FMs make them different from traditional ML models, which typically perform specific tasks, like analyzing text for sentiment, classifying images, and forecasting trends.
You can use foundation models as base models for developing more specialized downstream applications. These models are the culmination of more than a decade of work that saw them increase in size and complexity.
For example, BERT, one of the first bidirectional foundation models, was released in 2018. It was trained using 340 million parameters and a 16 GB training dataset. In 2023, only five years later, OpenAI trained GPT-4 using 170 trillion parameters and a 45 GB training dataset. According to OpenAI, the computational power required for foundation modeling has doubled every 3.4 months since 2012. Today’s FMs, such as the large language models (LLMs) Claude 2 and Llama 2, and the text-to-image model Stable Diffusion from Stability AI, can perform a range of tasks out of the box spanning multiple domains, like writing blog posts, generating images, solving math problems, engaging in dialog, and answering questions based on a document.
Why is foundation modeling important?
Foundation models are poised to significantly change the machine learning lifecycle. Although it currently costs millions of dollars to develop a foundation model from scratch, they’re useful in the long run. It’s faster and cheaper for data scientists to use pre-trained FMs to develop new ML applications rather than train unique ML models from the ground up.
One potential use is automating tasks and processes, especially those that require reasoning capabilities. Here are a few applications for foundation models:
- Customer support
- Language translation
- Content generation
- Copywriting
- Image classification
- High-resolution image creation and editing
- Document extraction
- Robotics
- Healthcare
- Autonomous vehicles
How do foundation models work?
Foundation models are a form of generative artificial intelligence (generative AI). They generate output from one or more inputs (prompts) in the form of human language instructions. Models are based on complex neural networks including generative adversarial networks (GANs), transformers, and variational encoders.
Although each type of network functions differently, the principles behind how they work are similar. In general, an FM uses learned patterns and relationships to predict the next item in a sequence. For example, with image generation, the model analyzes the image and creates a sharper, more clearly defined version of the image. Similarly, with text, the model predicts the next word in a string of text based on the previous words and its context. It then selects the next word using probability distribution techniques.
Foundation models use self-supervised learning to create labels from input data. This means no one has instructed or trained the model with labeled training data sets. This feature separates LLMs from previous ML architectures, which use supervised or unsupervised learning.
What can foundation models do?
Foundation models, even though are pre-trained, can continue to learn from data inputs or prompts during inference. This means that you can develop comprehensive outputs through carefully curated prompts. Tasks that FMs can perform include language processing, visual comprehension, code generation, and human-centered engagement.
Language processing
These models have remarkable capabilities to answer natural language questions and even the ability to write short scripts or articles in response to prompts. They can also translate languages using NLP technologies.
Visual comprehension
FMs excel in computer vision, especially with regard to identifying images and physical objects. These capabilities may find use in applications such as autonomous driving and robotics. Another capability is the generation of images from input text, as well as photo and video editing.
Code generation
Foundation models can generate computer code in various programming languages based on natural language inputs. It’s also feasible to use FMs to evaluate and debug code. Learn more about AI code generation.
Human-centered engagement
Generative AI models use human inputs to learn and improve predictions. An important and sometimes overlooked application is the ability of these models to support human decision-making. Potential uses include clinical diagnoses, decision support systems, and analytics.
Another capability is the development of new AI applications by fine-tuning existing foundation models.
Speech to text
Since FMs understand language, they can be used for speech to text tasks such a transcription and video captioning in a variety of languages.
What are examples of foundation models?
The number and size of foundation models on the market have grown at a rapid pace. There are now dozens of models available. Here is a list of prominent foundation models released since 2018.
BERT
Released in 2018, Bidirectional Encoder Representations from Transformers (BERT) was one of the first foundation models. BERT is a bidirectional model that analyzes the context of a complete sequence then makes a prediction. It was trained on a plain text corpus and Wikipedia using 3.3 billion tokens (words) and 340 million parameters. BERT can answer questions, predict sentences, and translate texts.
GPT
The Generative Pre-trained Transformer (GPT) model was developed by OpenAI in 2018. It uses a 12-layer transformer decoder with a self-attention mechanism. And it was trained on the BookCorpus dataset, which holds over 11,000 free novels. A notable feature of GPT-1 is the ability to do zero-shot learning.
GPT-2 released in 2019. OpenAI trained it using 1.5 billion parameters (compared to the 117 million parameters used on GPT-1). GPT-3 has a 96-layer neural network and 175 billion parameters and is trained using the 500-billion-word Common Crawl dataset. The popular ChatGPT chatbot is based on GPT-3.5. And GPT-4, the latest version, launched in late 2022 and successfully passed the Uniform Bar Examination with a score of 297 (76%).
Amazon Titan
Amazon Titan FMs are pretrained on large datasets, making them powerful, general-purpose models. They can be used as is or customized privately with company-specific data for a particular task without annotating large volumes of data. Initially, Titan will offer two models. The first is a generative LLM for tasks such as summarization, text generation, classification, open-ended Q&A, and information extraction. The second is an embeddings LLM that translates text inputs including words, phrases, and large units of text into numerical representations (known as embeddings) that contain the semantic meaning of the text. While this LLM will not generate text, it is useful for applications like personalization and search because by comparing embeddings the model will produce more relevant and contextual responses than word matching. To continue supporting best practices in the responsible use of AI, Titan FMs are built to detect and remove harmful content in the data, reject inappropriate content in the user input, and filter the models’ outputs that contain inappropriate content such as hate speech, profanity, and violence.
AI21 Jurassic
Released in 2021, Jurassic-1 is a 76-layer auto-regressive language model with 178 billion parameters. Jurassic-1 generates human-like text and solves complex tasks. Its performance is comparable to GPT-3.
In March 2023, AI21 Labs released Jurrassic-2, which has improved instruction following and language capabilities.
Claude
Claude 3.5 Sonnet
Anthropic’s most intelligent and advanced model, Claude 3.5 Sonnet, demonstrates exceptional capabilities across a diverse range of tasks and evaluations while also outperforming Claude 3 Opus.
Claude 3 Opus
Opus is a highly intelligent model with reliable performance on complex tasks. It can navigate open-ended prompts and sight-unseen scenarios with remarkable fluency and human-like understanding. Use Opus to automate tasks, and accelerate research and development across a diverse range of use cases and industries.
Claude 3 Haiku
Haiku is Anthropic’s fastest, most compact model for near-instant responsiveness. Haiku is the best choice for building seamless AI experiences that mimic human interactions. Enterprises can use Haiku to moderate content, optimize inventory management, produce quick and accurate translations, summarize unstructured data, and more.
Cohere
Cohere has two LLMs: one is a generation model with similar capabilities as GPT-3 and the other is a representation model intended for understanding languages. While Cohere has only 52 billion parameters, it outperforms GPT-3 in many respects.
Stable Diffusion
Stable Diffusion is a text-to-image model that can generate realistic-looking, high-definition images. It was released in 2022 and has a diffusion model that uses noising and denoising technologies to learn how to create images.
The model is smaller than competing diffusion technologies, like DALL-E 2, which means it does not need an extensive computing infrastructure. Stable Diffusion will run on a normal graphics card or even on a smartphone with a Snapdragon Gen2 platform.
Read more about Stable Diffusion »
BLOOM
BLOOM is a multilingual model with similar architecture to GPT-3. It was developed in 2022 as a collaborative effort involving over a thousand scientists and the Hugging Space team. The model has 176 billion parameters and training took three and a half months using 384 Nvidia A100 GPUs. Although the BLOOM checkpoint requires 330 GB of storage, it will run on a standalone PC with 16 GB of RAM. BLOOM can create text in 46 languages and write code in 13 programming languages.
Hugging Face
Hugging Face is a platform that offers open-source tools for you to build and deploy machine learning models. It acts as a community hub, and developers can share and explore models and datasets. Membership for individuals is free, although paid subscriptions offer higher levels of access. You have public access to nearly 200,000 models and 30,000 datasets.
What are challenges with foundation models?
Foundation models can coherently respond to prompts on subjects they haven’t been explicitly trained on. But they have certain weaknesses. Here are some of the challenges facing foundation models:
- Infrastructure requirements. Building a foundation model from scratch is expensive and requires enormous resources, and training may take months.
- Front-end development. For practical applications, developers need to integrate foundation models into a software stack, including tools for prompt engineering, fine-tuning, and pipeline engineering.
- Lack of comprehension. Although they can provide grammatically and factually correct answers, foundation models have difficulty comprehending the context of a prompt. And they aren’t socially or psychologically aware.
- Unreliable answers. Answers to questions on certain subject matter may be unreliable and sometimes inappropriate, toxic, or incorrect.
- Bias. Bias is a distinct possibility as models can pick up hate speech and inappropriate undertones from training datasets. To avoid this, developers should carefully filter training data and encode specific norms into their models.
How AWS Can Help?
Amazon Bedrock is the easiest way to build and scale generative AI applications with foundation models. Amazon Bedrock is a fully managed service that makes foundation models from Amazon and leading AI startups available through an API, so you can choose from various FMs to find the model that's best suited for your use case. With Bedrock, you can speed up developing and deploying scalable, reliable, and secure generative AI applications without managing infrastructure.
Amazon SageMaker JumpStart, which is a ML hub offering models, algorithms, and solutions, provides access to hundreds of foundation models, including top performing publicly available foundation models. New foundation models continue to be added, including Llama 2, Falcon, and Stable Diffusion XL 1.0.