Meta's Llama in Amazon Bedrock

Build the future of AI with Llama

Introducing Llama 4

The Llama 4 models mark the beginning of a new era for the Llama ecosystem, delivering the most scalable generation of Llama. With native multimodality, mixture-of-experts architecture, expanded context windows, significant performance improvements, and optimized computational efficiency, Llama 4 is engineered to address diverse application requirements. The Llama 4 models come in easy-to-deploy sizes, making them adaptable for various use cases.

Llama 4 Maverick 17B

Llama 4 Maverick is a natively multimodal model for image and text understanding with advanced intelligence and fast responses at a low cost.

Llama 4 Scout 17B

Llama 4 Scout is a natively multimodal model that integrates advanced text and visual intelligence with efficient processing capabilities. The model enables comprehensive multi-document analysis, robust codebase reasoning, and sophisticated data processing through its extensive context handling.

Benefits

Llama 3.2 offers a more personalized AI experience, with on-device processing. The Llama 3.2 models are designed to be more efficient, with reduced latency and improved performance, making them suitable for a wide range of applications.
128K context length allows Llama to capture even more nuanced relationships in data.
Llama models are trained on over 15 trillion tokens from online public data sources to better comprehend language intricacies.
Llama 3.2 is multilingual and supports eight languages including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
The Amazon Bedrock managed API makes using Llama models easier than ever. Organizations of all sizes can access the power of Llama without worrying about the underlying infrastructure. Since Amazon Bedrock is serverless, you don't have to manage any infrastructure, and you can securely integrate and deploy the generative AI capabilities of Llama into your applications using the AWS services you are already familiar with. This means you can focus on what you do best—building your AI applications.

Meet Llama

For over the past decade, Meta has been focused on putting tools into the hands of developers and fostering collaboration and advancements among developers, researchers, and organizations. Llama models are available in a range of parameter sizes, enabling developers to select the model that best fits their needs and inference budget. Llama models in Amazon Bedrock open up a world of possibilities because developers don't need to worry about scalability or managing infrastructure. Amazon Bedrock is a turnkey way for developers to get started using Llama.

Use cases

Llama models excel at image understanding and visual reasoning, language nuances, contextual understanding, and complex tasks, such as visual data analysis, image captioning, dialogue generation, and translation, and can handle multistep tasks seamlessly. Additional use cases Llama models are a great fit for include sophisticated visual reasoning and understanding, image-text-retrieval, visual grounding, document visual question answering, text summarization and accuracy, text classification, sentiment analysis and nuance reasoning, language modeling, dialog systems, code generation, and following instructions.

Model versions

Llama 4 Maverick 17B

A general purpose model featuring 128 experts and 400 billion total parameters. It excels in text understanding across 12 languages and English image understanding, making it suitable for versatile assistant and chat applications.

Max tokens: 1M

Languages: English, French, German, Hindi, Italian, Portuguese, Spanish, Thai, Arabic, Indonesian, Tagalog, and Vietnamese; [image] English only

Fine-tuning supported: No

Supported use cases: High-quality multilingual assistant and chat applications with image understanding, coding assistance, and document understanding for structured data extraction, customer support with image analysis capabilities, creative content generation across languages, and research applications requiring text analysis and multimodal data integration

Read the blog

Llama 4 Scout 17B

A general purpose multimodal model with 16 experts, 17 billion active parameters, and 109 billion total parameters. Its multimillion context window enables comprehensive multi-document analysis, establishing it as a uniquely powerful and efficient model in its class.

Max tokens: 3.5M (10M coming soon)

Languages: English, French, German, Hindi, Italian, Portuguese, Spanish, Thai, Arabic, Indonesian, Tagalog, and Vietnamese; [image] English only

Fine-tuning supported: No

Supported use cases: Chat applications requiring high-quality responses and image understanding in multilingual contexts, coding assistance and document intelligence for extracting structured data, customer support with image analysis capabilities, creative content generation across multiple languages, and research applications requiring multimodal data integration

Read the blog

Llama 3.3 70B

Text-only 70B instruction-tuned model that provides enhanced performance relative to Llama 3.1 70B–and to Llama 3.2 90B when used for text-only applications. Llama 3.3 70B delivers similar performance to Llama 3.1 405B while requiring only a fraction of the computational resources.

Max tokens: 128K

Languages: English, German, French, Italian, Portuguese, Spanish, and Thai

Fine-tuning supported: No

Supported use cases: Conversational AI designed for content creation, enterprise applications, and research, offering advanced language understanding capabilities, including text summarization, classification, sentiment analysis, and code generation. The model also supports the ability to leverage model outputs to improve other models including synthetic data generation and distillation

Read the blog

Llama 3.2 90B

Multimodal model that takes both text and image inputs and outputs. Ideal for applications requiring sophisticated visual intelligence, such as image analysis, document processing, multimodal chatbots, and autonomous systems.

Max tokens: 128K

Languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai

Fine-tuning supported: Yes

Supported use cases: Image understanding, visual reasoning, and multimodal interaction, enabling advanced applications such as image captioning, image-text retrieval, visual grounding, visual question answering, and document visual question answering, with a unique ability to reason and draw conclusions from visual and textual inputs

Read the blog

Llama 3.2 11B

Multimodal model that takes both text and image inputs and outputs. Ideal for applications requiring sophisticated visual intelligence, such as image analysis, document processing, and multimodal chatbots.

Max tokens: 128K

Languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

Fine-tuning supported: Yes

Supported use cases: Image understanding, visual reasoning, and multimodal interaction, enabling advanced applications such as image captioning, image-text retrieval, visual grounding, visual question answering, and document visual question answering

Read the blog

Llama 3.2 3B

Text-only lightweight model built to deliver highly accurate and relevant results. Designed for applications requiring low-latency inferencing and limited computational resources. Ideal for query and prompt rewriting, mobile AI-powered writing assistants, and customer service applications, particularly on edge devices where its efficiency and low latency enable seamless integration into various applications, including mobile AI-powered writing assistants and customer service chatbots.

Max tokens: 128K

Languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai

Fine-tuning supported: Yes

Supported use cases: Advanced text generation, summarization, sentiment analysis, emotional intelligence, contextual understanding, and common sense reasoning

Read the blog

Llama 3.2 1B

Text-only lightweight model built to deliver fast and accurate responses. Ideal for edge devices and mobile applications. The model enables on-device AI capabilities while preserving user privacy and minimizing latency.

Max tokens: 128K

Languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai

Fine-tuning supported: Yes

Supported use cases: Multilingual dialogue use cases such as personal information management, multilingual knowledge retrieval, and rewriting tasks

Read the blog

Llama 3.1 405B

Ideal for enterprise-level applications, research and development, synthetic data generation and model distillation. With latency-optimized inference capabilities available in public preview, this model delivers exceptional performance and scalability, enabling organizations to accelerate their AI initiatives while maintaining high-quality outputs across diverse use cases.

Max tokens
: 128K

Languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai

Fine-tuning supported: No

Supported use cases: General knowledge, long-form text generation, machine translation, enhanced contextual understanding, advanced reasoning and decision making, better handling of ambiguity and uncertainty, increased creativity and diversity, steerability, math, tool use, multilingual translation, and coding

Read the blog

Llama 3.1 70B

Ideal for content creation, conversational AI, language understanding, research development, and enterprise applications. With new latency-optimized inference capabilities available in public preview, this model sets a new performance benchmark for AI solutions that process extensive text inputs, enabling applications to respond more quickly and handle longer queries more efficiently.

Max tokens: 128K

Languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai

Fine-tuning supported: Yes

Supported use cases: Text summarization, text classification, sentiment analysis, and language translation

Read the blog

Llama 3.1 8B

Ideal for limited computational power and resources, faster training times, and edge devices.

Max tokens: 128K

Languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai

Fine-tuning supported: Yes

Supported use cases: Text summarization, text classification, sentiment analysis, and language translation

Read the blog

Llama 3 70B

Ideal for content creation, conversational AI, language understanding, research development, and enterprise applications. 

Max tokens: 8K

Languages: English

Fine-tuning supported: No

Supported use cases: Text summarization and accuracy, text classification and nuance, sentiment analysis and nuance reasoning, language modeling, dialogue systems, code generation, and following instructions

Read the blog

Llama 3 8B

Ideal for limited computational power and resources, faster training times, and edge devices.

Max tokens: 8K

Languages: English

Fine-tuning supported: No

Supported use cases: Text summarization, text classification, sentiment analysis, and language translation

Read the blog

Llama 2 70B

Fine-tuned model in the parameter size of 70B. Suitable for larger-scale tasks such as language modeling, text generation, and dialogue systems.

Max tokens: 4K

Languages: English

Fine-tuning supported: Yes

Supported use cases: Assistant-like chat

Read the blog

Llama 2 13B

Fine-tuned model in the parameter size of 13B. Suitable for smaller-scale tasks such as text classification, sentiment analysis, and language translation.

Max tokens: 4K

Languages: English

Fine-tuning supported: Yes

Supported use cases: Assistant-like chat

Read the blog

Nomura uses Llama models from Meta in Amazon Bedrock to democratize generative AI

 

Aniruddh Singh, Nomura's Executive Director and Enterprise Architect, outlines the financial institution’s journey to democratize generative AI firm-wide using Amazon Bedrock and Llama models from Meta. Amazon Bedrock provides critical access to leading foundation models like Llama, enabling seamless integration. Llama offers key benefits to Nomura, including faster innovation, transparency, bias guardrails, and robust performance across text summarization, code generation, log analysis, and document processing. 

TaskUs revolutionizes customer experiences using Llama models from Meta in Amazon Bedrock

TaskUs, a leading provider of outsourced digital services and next-generation customer experience to the world’s most innovative companies, helps its clients represent, protect, and grow their brands. Its innovative TaskGPT platform, powered by Amazon Bedrock and Llama models from Meta, empowers teammates to deliver exceptional service. TaskUs builds tools on TaskGPT that leverage Amazon Bedrock and Llama for cost-effective paraphrasing, content generation, comprehension, and complex task handling.