AWS Deep Learning AMIs
The AWS Deep Learning AMIs (DLAMIs) equip machine learning (ML) practitioners and researchers with the infrastructure and tools to accelerate deep learning in the cloud at scale. You can quickly launch Amazon Elastic Compute Cloud (EC2) instances preinstalled with PyTorch to train sophisticated, custom artificial intelligence (AI) models to experiment with new algorithms or learn new skills and techniques.
DLAMIs come preconfigured with the NVIDIA CUDA interface and NVIDIA CUDA Deep Neural Network library (cuDNN). DLAMIs also support Habana Gaudi–based Amazon EC2 DL1 instances and AWS Inferentia powered Amazon EC2 Inf1 instances and AWS Neuron libraries. To begin building PyTorch models using DLAMIs, review the DLAMI tutorial.
AWS Deep Learning Containers
AWS Deep Learning Containers are Docker images preinstalled with PyTorch to make it easier to quickly deploy custom ML environments instead of having to build and optimize your environments from scratch. Deep Learning Containers provide optimized environments and are available in the Amazon Elastic Container Registry (ECR).
Amazon SageMaker provides containers for its built-in algorithms and prebuilt Docker images for PyTorch. If you would like to extend a prebuilt SageMaker algorithm or model Docker image, you can modify the SageMaker image. If you would like to adapt a preexisting PyTorch container image to work with SageMaker, you can modify the Docker container to use either the SageMaker training or Inference toolkit.
To get started with PyTorch on AWS Deep Learning Containers, use the following resources:
- Deep Learning Containers for Amazon EC2 using PyTorch: Training | Inference
- Deep Learning Containers for Amazon Elastic Container Service (ECS) using PyTorch: Training | Inference
- Deep Learning Containers for Amazon Elastic Kubernetes Service (EKS) using PyTorch: Training | Distributed Training | Inference
- Deep Learning Containers for Amazon SageMaker using PyTorch: Using Docker containers with SageMaker
Amazon SageMaker
You can use Amazon SageMaker to train and deploy a model with custom PyTorch code. Amazon SageMaker Python SDK with PyTorch estimators and models and SageMaker open-source PyTorch containers help simplify the process of writing and running a PyTorch script. SageMaker removes the heavy lifting from each step of the ML lifecycle to make it easier to develop high-quality models. Use SageMaker distributed libraries with PyTorch to perform large-model training more quickly by automatically splitting deep learning models and training datasets across AWS GPU instances through data parallelism or model parallelism.
To get started with PyTorch on SageMaker, use the following resources:
- Use PyTorch with Amazon SageMaker
- PyTorch in the Amazon SageMaker Python SDK
- Amazon SageMaker PyTorch container
- Amazon SageMaker PyTorch serving container
- Amazon SageMaker Model Training
- Amazon SageMaker Distributed Training Libraries
- Sample App: Shop by Style
- Extending containers
- Getting Started with TorchServe
Amazon EC2 Inf1 instances and AWS Inferentia
Amazon EC2 Inf1 instances are built from the ground up to support machine learning inference applications. Inf1 instances feature up to 16 AWS Inferentia chips, high performance machine learning inference chips designed and built by AWS. Inf1 instances deliver up to 3x higher throughput and up to 40% lower cost per inference than Amazon EC2 G4 instances, which were already the lowest cost instance for machine learning inference available in the cloud. Using Inf1 instances, you can run large scale machine learning inference with PyTorch models at the lowest cost in the cloud. To get started, see our tutorial on running PyTorch models on Inf1.