Why is my Amazon Aurora DB cluster clone, snapshot restore, or point in time restore taking so long?

4 minute read

I'm performing a cluster clone, snapshot restore, or a point in time operation on my Amazon Aurora cluster.

Short description

Amazon Aurora’s continuous backup and restore techniques are optimized to avoid variation in restore times. They also help the cluster’s storage volume to reach full performance as soon as the cluster becomes available. Long restore times are generally caused by long-running transactions in the source database at the time that the backup is taken.

Resolution

Note: If you receive errors when running AWS Command Line Interface (AWS CLI) commands, make sure that you’re using the most recent AWS CLI version.

Amazon Aurora backups your cluster volume’s changes automatically and continuously. The backups are retained for the length of your backup retention period. This continuous backup allows you to restore your data to a new cluster, to any point in time within the retention period specified. This avoids the need for a lengthy binlog roll-forward process. Because you create a new cluster, there is no impact to performance or interruption to your original database.

When you initiate a clone, snapshot, or point in time restore, Amazon Relational Database Service (Amazon RDS) calls the following APIs on your behalf:

Either RestoreDBClusterFromSnapshot or RestoreDBClusterToPointInTime. These APIs create a new cluster and restore volume from Amazon Simple Storage Service (Amazon S3). This can take up to a few hours to complete. When you restore data to an Aurora cluster, all data is brought in parallel from Amazon S3 to the six copies on your three Availability Zones.
Cluster storage volume cloning is a variation of RestoreDBClusterToPointInTime. It uses the copy-on-write protocol, and usually completes in a few minutes.

When this step completes, the cluster changes into the Available state. You can check your cluster state by refreshing the console or checking with the AWS CLI.

The instance creation process starts only when the cluster is Available. This happens in two stages: setting up the instance configuration and database crash recovery.

You can check if the API has finished setting up the instance by looking for the MySQL error log file. You can do this even if the instance is in the Creating status. If the error log file is available to download, then the instance is set up and the engine is now performing crash recovery. The error log file is also the best resource to check on the progress of your database crash recovery, along with Amazon CloudWatch metrics.

Note: If you're using the AWS CLI or API to perform a restore operation, then you must invoke the CreateDBInstance call because it's not automatic.

Check for long-running write operations on the source database

It’s a best practice to confirm that there aren’t long-running write operations on the source database at the time of the snapshot, point-in-time, or clone. Any long-running DCL, DDL, or DML (open write transactions) might lengthen the time it takes for the restored database to become available.

For example, you activate the binary log for an Aurora cluster, and this increases the time it takes to perform a recovery. This is because InnoDB automatically checks the logs and performs a roll-forward of the database to the present. It then rolls back any uncommitted transactions that are present at the time of the recovery. For more information on InnoDB crash recovery, see Innodb recovery.

When the instance finishes the creation and recovery processes, the cluster and the instance are then ready to accept incoming connections.

Note: Aurora doesn't require the binary log. It's a best practice to deactivate it unless it's required. For cross-Region replication, you can evaluate the Aurora global databases instead. Aurora global databases also don't require binary logs.

Related information

Amazon Aurora storage and reliability

Restoring from a DB cluster snapshot

Topics

Database

Relevant content

Aurora PostgreSQL restore from Snapshot with CLI
vk8536
asked 2 months ago
Auto restore DB from snapshot or backup
AWS-User-1080814
asked 2 years ago
Restore RDS Cluster without taking Snapshot or backup
bhone myint
asked a year ago
aws docdb restore-db-cluster-from-snapshot
Accepted Answer
Basitlikiyidir
asked 4 months ago
How much time it takes to restore RDS Aurora DB from AWS Backup?
JK
asked a year ago
Why is my Aurora MySQL-Compatible DB cluster snapshot taking so long to restore?
AWS OFFICIALUpdated a year ago
Why is it taking so long to restore a snapshot of my Amazon RDS for MySQL DB instance?
AWS OFFICIALUpdated 3 years ago
Why is the restore of my Amazon DynamoDB table taking a long time?
AWS OFFICIALUpdated 2 years ago
Why is the point-in-time restore of my Amazon RDS DB instance taking a long time?
AWS OFFICIALUpdated 2 years ago
VMware on AWS - How to restore NSX DFW firewall rules to previous state
EXPERT
Jagadeesh_Devaraj
published 9 months ago