How do I encrypt an HBase table in Amazon EMR using AES?

5 minute read
0

I want to use the Advanced Encryption Standard (AES) to encrypt an Apache HBase table on an Amazon EMR cluster.

Resolution

You can encrypt a new or existing HBase table using the transparent encryption feature. This feature encrypts HFile data and write-ahead logs (WAL) at rest.

Note: When you use Amazon Simple Storage Service (Amazon S3) as the data source rather than HDFS, you can protect data at rest and in transit using server-side and client-side encryption. For more information, see Protecting data using encryption.

Encrypt a new HBase table

1.    Open the Amazon EMR console.

2.    Choose a cluster that already has HBase, or create a new cluster with HBase.

3.    Connect to the master node using SSH.

4.    Use the keytool command to create a secret key of appropriate length for AES Encryption. Provide a password and alias.

Example command:

sudo keytool -keystore /etc/hbase/conf/hbase.jks -storetype jceks -storepass:file mysecurefile -genseckey -keyalg AES -keysize 128 -alias your-alias<br>

Note: The file: securefile contains a storepass password. Ensure that the file is readable only by the file owner and is deleted after use.

Example output:

Output:
Enter key password for <your_key_store>
    (RETURN if same as keystore password):
Warning:
The JCEKS keystore uses a proprietary format. It is recommended to migrate to PKCS12 which is an industry standard format using "keytool -importkeystore -srckeystore /etc/hbase/conf/hbase.jks -destkeystore /etc/hbase/conf/hbase.jks -deststoretype pkcs12".

5.    Add the following properties to the hbase-site.xml file on each node in the EMR cluster. In the hbase.crypto.keyprovider.parameters property, provide the path to hbase.jks and the password. This is the same password that you specified in the keytool command in step 4. In the hbase.crypto.master.key.name property, specify your alias.

<property>
    <name>hbase.crypto.keyprovider.parameters</name>
    <value>jceks:///etc/hbase/conf/hbase.jks?password=your_password</value>
  </property>

  <property>
    <name>hbase.crypto.master.key.name</name>
    <value><your-alias></value>
  </property>

  <property>
    <name>hbase.regionserver.hlog.reader.impl</name>
    <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
  </property>

  <property>
    <name>hbase.regionserver.hlog.writer.impl</name>
    <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
  </property>

  <property>
    <name>hfile.format.version</name>
    <value>3</value>
  </property>

  <property>
    <name>hbase.regionserver.wal.encryption</name>
    <value>true</value>
  </property>

  <property>
    <name>hbase.crypto.keyprovider</name>
    <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
  </property>

6.    Copy the hbase.jks file to all cluster nodes. Be sure to copy the file to the location that's specified in the hbase.crypto.keyprovider.parameters property. In the following example, replace HostToCopy and ToHost with the corresponding public DNS names for the nodes.

cd /etc/hbase/conf
scp hbase.jks HostToCopy:/tmp
ssh ToHost
sudo cp /tmp/hbase.jks /etc/hbase/conf/

7.    Restart all HBase services on the master and core nodes, as shown in the following example. Repeat the hbase-regionserver stop and start commands on each core node.

Note: Stopping and starting Region servers might impact ongoing reads/writes to HBase tables on your cluster. Therefore, stop and start the HBase daemons only during downtime. Verify possible impacts on a test cluster before starting and stopping a production cluster.

Amazon EMR 5.30.0 and later release versions:

sudo systemctl stop hbase-master
sudo systemctl stop hbase-regionserver

sudo systemctl start hbase-master
sudo systemctl start hbase-regionserver

Amazon EMR 4x to Amazon EMR 5.29.0 release versions:

sudo initctl stop hbase-master
sudo initctl stop hbase-regionserver

sudo initctl start hbase-master
sudo initctl start hbase-regionserver

8.    Log in to the HBase shell:

# hbase shell

9.    Create a table with AES encryption:

create 'table1',{NAME=>'columnfamily',ENCRYPTION=>'AES'}

Example output:

0 row(s) in 1.6760 seconds
=> Hbase::Table - table1

10.    Describe the table to confirm that AES encryption is enabled:

describe 'table1'

Example output:

Table table1 is ENABLED
table1
COLUMN FAMILIES DESCRIPTION
{NAME => 'columnfamily', BLOOMFILTER => 'ROW', ENCRYPTION => 'AES', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',
DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.0320 seconds

Encrypt an existing table

1.    Describe the unencrypted table:

describe 'table2'

Example output:

Table table2 is ENABLED
table2
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf2', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 
'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '6
5536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.0140 seconds

2.    Use the alter command to enable AES encryption:

alter 'table2',{NAME=>'cf2',ENCRYPTION=>'AES'}

Example output:

Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.9000 seconds

3.    Confirm that the table is encrypted:

describe 'table2'

Example output:

Table table2 is ENABLED
table2
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf2', BLOOMFILTER => 'ROW', ENCRYPTION => 'AES', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',
DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.0120 seconds

Note: If you create a secondary index on the table (for example, with Apache Phoenix), then WAL encryption might not work. When this happens, you get a "java.lang.NullPointerException" response. To resolve this issue, set hbase.regionserver.wal.encryption to false in the hbase-site.xml file. Example:

<property>
      <name>hbase.regionserver.wal.encryption</name>
      <value>false</value>
  </property>

Related information

Using the HBase shell

Transparent encryption in HDFS on Amazon EMR

AWS OFFICIAL
AWS OFFICIALUpdated 2 years ago