Why am I seeing high or increasing CPU usage in my ElastiCache for Redis cluster?

4 minute read
0

I'm seeing high or increasing CPU usage in my Amazon ElastiCache for Redis cluster. How can I troubleshoot this?

Short description

There are two Amazon CloudWatch CPU metrics for ElastiCache for Redis:

  • EngineCPUUtilization: This metric reports CPU utilization of the Redis engine thread. Because Redis is single-threaded, it's a best practice to monitor the EngineCPUUtilization metric for nodes with four or more vCPUs.
  • CPUUtilization: This metric shows the percentage of CPU utilization for the entire host. For smaller nodes with two vCPUs or less, use the CPUUtilization metric to monitor the cluster workload.

Resolution

High EngineCPUUtlilization

The following are common reasons for high EngineCPUUtilization:

  • A long-running command that consumes high CPU time: Commands with high time-complexity such as keys, hkeys, hgetall, and so on, consume higher CPU time. For time-complexity and performance suggestions for each command, see Commands on the redis.io website. Lua scripts (run by EVAL or EVALSHA Redis commands) is an atomic operation in Redis. All server activities are blocked during the entire run time of a Lua script, causing high EngineCPUUtilization. Check if there are long-running commands or a long-running Lua script using Redis Slow log.
  • A high number of requests: Check the commands statistics to determine if there are command bursts, or if latency is increasing. You can check command statistic using CloudWatch metrics such as GetTypeCmds or HashBasedCmds. Or, you can use the Redis command info commandstats. If you see a high number of requests due to the expected workload on the application, consider scaling the cluster.
  • Backup and replication: Check the SaveInProgress metric to see if backup or replication is occurring. This binary metric returns "1" when a background save (forked or forkless) is in progress. The metric returns "0" if a background save isn't in progress. Make sure that you have enough memory to create a Redis snapshot.
  • High number of NewConnections: Establishing a TCP connection is a computationally expensive operation, especially for TLS-enabled clusters. A high number of new client connection requests in a short time period might cause an increase in EngineCPUUtilization. Performance improvements for TLS-enabled clusters using x86 node types with eight vCPUs or more on Graviton2 node types with four vCPUs or more have been implemented since Redis 6.2. For recommendations on handling a large number of connections, see Best practices: Redis clients and Amazon ElastiCache for Redis.
  • High number of evictions: Redis evicts keys according to the maxmemory-policy parameter. Eviction happens when the cache doesn't have enough memory to hold new data. If eviction volume is high, then EngineCPUUtilization increases because Redis is busy evicting the keys. Eviction volume can be monitored using CloudWatch metrics Evictions. If eviction is high, then scale your cluster up by using a larger node type, or scale out by adding more nodes.
  • High number of reclaim: To free up memory, Redis samples and then deletes any keys that have reached their timeout expiration. This process is called "reclaim." If there is a high number of expirations, EngineCPUUtilization increases because Redis is busy reclaiming the keys. You can monitor the number of key expiration events using the CloudWatch metrics Reclaimed. It's a best practice that you don't expire too many keys at the same time by, for example, running the EXPIREAT Redis command.

For more information on troubleshooting high EngineCPUUtilization, see Troubleshooting connections - CPU usage.

High CPUUtilization

The following are common reasons for high CPUUtilization:


AWS OFFICIAL
AWS OFFICIALUpdated 2 years ago