How do I update all Amazon EMR nodes after the bootstrap phase?

2 minute read
0

I want to update all nodes on an Amazon EMR cluster after the BOOTSTRAPPING state is complete.

Short description

The bootstrap phase occurs before Amazon EMR installs and configures applications such as Apache Hadoop and Apache Spark. To make additional changes on all cluster nodes after Amazon EMR installs and configures the applications, run a bootstrap action that downloads and runs another script.

Resolution

1.    Create a bash script that specifies the changes that you want to make on all cluster nodes. Refer to the following examples.

Example 1: This script waits for a configuration file ( /etc/hadoop/conf/hadoop-env.sh) to become available, and then performs additional work.

#!/bin/bash
#
# This is an example of script_b.sh
#
while [ ! -f /etc/hadoop/conf/hadoop-env.sh ]
do
  sleep 1
done
#
# Now the file is available, do your work here
#

exit 0

Example 2: This script waits for the EMR cluster to enter the WAITING state before implementing additional customization. This allows the EMR cluster to finish installing and configuring Hadoop and other applications.

#!/bin/bash
#
# This is an example of script_b.sh
#
# Wait for EMR provisioning to become successful.
#
while [[ $(sed '/localInstance {/{:1; /}/!{N; b1}; /nodeProvision/p}; d' /emr/instance-controller/lib/info/job-flow-state.txt | sed '/nodeProvisionCheckinRecord {/{:1; /}/!{N; b1}; /status/p}; d' | awk '/SUCCESSFUL/' | xargs) != "status: SUCCESSFUL" ]];
do
  sleep 1
done
#
# Now the EMR cluster is ready. Do your work here.
#

exit 0

2.    Launch an EMR cluster and add a bootstrap action similar to the following. This script downloads the script that you created in the previous step ( script_b.sh) and then runs it in the background.

#!/bin/bash
aws s3 cp s3://doc-example-bucket/script_b.sh .
chmod +x script_b.sh
nohup ./script_b.sh &>/dev/null &

Related information

Understanding the cluster lifecycle

AWS OFFICIAL
AWS OFFICIALUpdated 2 years ago