Why didn’t Amazon EC2 Auto Scaling terminate an unhealthy instance?

6 minute read
1

I have an Amazon Elastic Compute Cloud (Amazon EC2) Auto Scaling group set up, but it's not terminating an unhealthy Amazon EC2 instance. How can I fix this?

Short description

Amazon EC2 Auto Scaling is able to automatically determine the health status of an instance using Amazon EC2 status checks and Elastic Load Balancing (ELB) health checks. All scaling actions of an Amazon EC2 Auto Scaling group are logged in Activity History on the Amazon EC2 console. Sometimes you can't determine why Amazon EC2 Auto Scaling didn't terminate an unhealthy instance from Activity History alone.

You can find further details about an unhealthy instance's state, and how to terminate that instance, within the Amazon EC2 console. Check the following settings:

  • Health check grace period
  • Suspended processes
  • Instance state in the EC2 console
  • Instance state in Auto Scaling groups
  • ELB health checks

Resolution

First, note the state of the instance in Amazon EC2 Auto Scaling:

  1. Sign in to the Amazon EC2 console. In the navigation pane under Auto Scaling, choose Auto Scaling Groups, and then select the instance's group.
  2. Choose the Instances view and note the health state of the instance.

Health Check Grace Period

Amazon EC2 Auto Scaling doesn't terminate an instance that came into service based on EC2 status checks and ELB health checks until the health check grace period expires. To find the grace period length:

  1. On the Amazon EC2 console navigation pane, under Auto Scaling, choose Auto Scaling Groups, and then select the instance's group.
  2. Choose the Details view and note the Health Check Grace Period length.

Suspended Processes

The suspension of processes such as HealthCheck, ReplaceUnhealthy, or Terminate affects Amazon EC2 Auto Scaling's capability to detect, replace, or terminate unhealthy instances:

  1. Under Auto Scaling in the navigation pane of the Amazon EC2 console navigation pane, choose Auto Scaling Groups, and then select the instance's group.
  2. Choose the Details view.
  3. Choose Edit and remove any of the following processes from Suspended Processes if they are present: HealthCheck, ReplaceUnhealthy, or Terminate.
  4. Choose Save to resume the processes.

Instance State in Amazon EC2 Console

Amazon EC2 Auto Scaling does not immediately terminate instances with an Impaired status. Instead, Amazon EC2 Auto Scaling waits a few minutes for the instance to recover. To check if an instance is impaired:

  1. On the Amazon EC2 console navigation pane, under Instances, choose Instances, and then select the instance.
  2. Choose the Status Checks view and note if the instance's status is Impaired.

Amazon EC2 Auto Scaling might also delay or not terminate instances that fail to report data for status checks. This usually happens when there is insufficient data for the status check metrics in Amazon CloudWatch. To terminate these instances manually:

  1. On the Amazon EC2 console navigation pane, under Instances, choose Instances, and then select the instance.
  2. Choose the Monitoring view and note the status of the instance.
  3. If the status is Insufficient Data, select the instance again, choose the Actions menu, choose Instance State, and then choose Terminate.

Instance State in Auto Scaling Group

Amazon EC2 Auto Scaling does not perform health checks on instances in the Standby state. To set Standby instances back to the InService state:

  1. On the Amazon EC2 console navigation pane, under Auto Scaling Groups, select the instance's group, and then choose the Instances view.
  2. Choose the filter menu Any Lifecycle State, and then select Standby.
  3. To resume health checks, open the context (right-click) menu for an instance, and then choose Set to InService, which exits the Standby state.

Amazon EC2 Auto Scaling waits to terminate an instance if it is waiting for a lifecycle hook to complete. To find the lifecycle status and complete the lifecycle hook:

  1. On the Amazon EC2 console navigation pane, under Auto Scaling, choose Auto Scaling Groups, and then select the instance's group.
  2. Choose the Instances view and note the Lifecycle status for the instance.
  3. If the status is terminating:wait, you can check the heartbeat timeout and then run completing-lifecycle-action to complete the lifecycle hook.

If Amazon EC2 Auto Scaling is waiting for an ELB connection draining period to complete, it waits to terminate the instance:

  1. On the Amazon EC2 console navigation pane, under Auto Scaling, choose Auto Scaling Groups, and then select the instance's group.
  2. Choose the Instances view and confirm that the instance's Lifecycle is terminating.
  3. Choose the Activity History view.
  4. For Filter, select Waiting for ELB connection draining to confirm if the group is waiting to terminate the instance.

ELB Health Checks

ELB settings can affect health checks and instance replacements. Note the instance's status in on the ELB console:

  1. On the Amazon EC2 console navigation pane, under Load Balancing, choose Load Balancers, and then select the load balancer to which the instance is registered.
  2. Choose the Instances view and note the instance's status and description.

Amazon EC2 Auto Scaling doesn't use the results of ELB health checks to determinate an instance's health status when the group's health check configuration is set to EC2. As a result, Amazon EC2 Auto Scaling doesn't terminate instances that fail ELB health checks. If an instance's status is OutofService on the ELB console, but the instance's status is Healthy on the Amazon EC2 Auto Scaling console, confirm that the health check type is set to ELB:

  1. On the Amazon EC2 console navigation pane, under Auto Scaling, choose Auto Scaling Groups, and then select the instance's group.
  2. Choose the Details view and note the Health Check Type.
  3. Choose Edit and select ELB for Health Check Type, and then choose Save.

If the group's health check type is already ELB and the instance's status on the ELB console is OutofService, use the status description that you noted earlier to determine further steps:

  • Instance registration is still in progress: wait for load balancer to complete instance registration and for the instance to enter the InService state.
  • Instance is in the Amazon EC2 Availability Zone for which LoadBalancer is not configured to route traffic to: edit the subnets of the Auto Scaling group or load balancer to be sure they are same as the instance's subnets.
  • Instance hasn't passed the configured HealthyThreshold number of health checks consecutively: wait for ELB to complete health checks and the instance to enter the InService state.

Related information

Troubleshooting instances with failed status checks

Why did Amazon EC2 Auto Scaling terminate an instance?

AWS OFFICIAL
AWS OFFICIALUpdated 2 years ago