How do I fix a compute environment that's not valid in AWS Batch?

5 minute read
0

My compute environment in AWS Batch is in the INVALID state. How do troubleshoot the error?

Short description

You receive the error: "CLIENT_ERROR - Your compute environment has been INVALIDATED and scaled down because none of the instances joined the underlying ECS Cluster. Common issues preventing instances joining are the following: VPC/Subnet configuration preventing communication to ECS, incorrect Instance Profile policy preventing authorization to ECS, or customized AMI or LaunchTemplate configurations affecting ECS agent."

Issues preventing your instances from joining an Amazon Elastic Container Service (Amazon ECS) cluster include:

  • Amazon Virtual Private Cloud (Amazon VPC) subnet configuration settings are preventing successful communication to Amazon ECS.
  • An incorrect setting within the instance profile policy preventing authorization to Amazon ECS.
  • Customized Amazon Machine Images (AMI) or launch template configurations affecting the ECS agent.

The CLIENT_ERROR message indicates that the Amazon Elastic Compute Cloud (Amazon EC2) instances created by the AWS Batch compute environment have failed to join the ECS cluster. When the CLIENT_ERROR message occurs, AWS Batch automatically terminates the EC2 instance and then moves the compute environment into an INVALID state.

If your compute environment is in the INVALID state, choose one of the following resolutions based on the error message that you receive:

CLIENT_ERROR - Not authorized to perform sts:AssumeRoleComplete the steps in the Fix a service role that's not valid section.

CLIENT_ERROR - Parameter: SpotFleetRequestConfig.IamFleetRole is invalid
Complete the steps in the Fix a Spot Fleet role that's not valid section.

CLIENT_ERROR - The specified launch template, with template ID [xxx], does not exist
Complete the steps in the Deactivate and delete your compute environment section.

CLIENT_ERROR - Access denied
Create a service role with the correct permissions or choose an existing service role with the correct permissions.

Internal Error
Complete the steps in the Deactivate and then activate your compute environment section.

INVALID CLIENT_ERROR - null
Complete the steps in the Deactivate and then activate your compute environment section.

CLIENT_ERROR - The request uses the same client token as previous, but non-identical request
Complete the steps in the Deactivate and then activate your compute environment section.

CLIENT_ERROR - You are not authorized to use launch template
Check the following:

  • Review your Service Role to see if permissions related to Amazon Elastic Compute Cloud and Auto Scaling groups are granted. Then, complete the steps in the Fix a service role that's not valid section.
  • Review if your account is part of AWS Organizations and if any service control policies are blocking access to your Amazon EC2 permissions. Then, update any service control policies, if needed.

Resolution

Fix a service role that's not valid

1.    Open the AWS Batch console.

2.    In the navigation pane, choose Compute environments.

3.    Choose the compute environment that's in the INVALID state.
Note: If your compute environment is in the DISABLED state, choose Enable to activate your compute environment.

4.    Choose Edit.

5.    For Service role, choose a service role with the permissions needed for AWS Batch to make calls to other AWS services.
Important: Your service role manages the resources that you use with the service. Before you can use the service, you must have an AWS Identity and Access Management (IAM) policy and role that provides the necessary permissions to AWS Batch. You must create a service role with permissions if you don't have one.

6.    Choose Save.

Fix a Spot Fleet role that's not valid

For managed compute environments that use Amazon EC2 Spot Fleet Instances, you must create a role that grants the Spot Fleet the following permissions:

  • Bidding on instances
  • Launching instances
  • Tagging instances
  • Terminating instances

If you don't have a Spot Fleet role, complete the following steps to create one for your compute environment:

1.    Open the IAM console.

2.    In the navigation pane, choose Roles.

3.    Choose Create role.

4.    Choose AWS service. Then, choose EC2 as the service that will use the role that you're creating.

5.    In the Select your use case section, choose EC2 Spot Fleet Role.
Important: Don't choose the similarly named EC2 - Spot Fleet.

6.    Choose Next: Permissions.

7.    Choose Next: Tags. Then, choose Next: Review.

8.    For Role name, enter AmazonEC2SpotFleetRole.

9.    Choose Create role.
Note: Use your new Spot Fleet role to create new compute environments. Existing compute environments can't change Spot Fleet roles. To get rid of the obsolete environment, deactivate and then delete that environment.

10.    Open the AWS Batch console.

11.    In the navigation pane, choose Compute environments.

12.    Choose the compute environment that's in the INVALID state. Then, choose Disable.

13.    Choose Delete.

Deactivate and delete your compute environment

You must deactivate and delete your compute environment because the launch template associated with your compute environment doesn't exist. This means that you can't use the compute environment associated with your launch template. You must delete that compute environment, and then create a new compute environment.

1.    Open the AWS Batch console.

2.    In the navigation pane, choose Compute environments.

3.    Select the compute environment that's in the INVALID state. Then, choose Disable.

4.    Choose Delete.

5.    Create a new compute environment.

Deactivate and then activate your compute environment

1.    Open the AWS Batch console.

2.    In the navigation pane, choose Compute environments.

3.    Choose the compute environment that's in the INVALID state. Then, choose Disable.

4.    Choose the same compute environment from step 3. Then, choose Enable.


Related information

Troubleshooting AWS Batch

Why is my Amazon ECS or Amazon EC2 instance unable to join the cluster?

Why is my AWS Batch job stuck in RUNNABLE status?

AWS OFFICIAL
AWS OFFICIALUpdated 2 years ago