Why do I get the error "HIVE_BAD_DATA: Error parsing field value for field X for input string:" when I query JSON data in Amazon Athena?

7 minute read
0

When I query data in Amazon Athena, I get an error similar to one of the following: "HIVE_BAD_DATA: Error parsing field value for field X for input string:" or "HIVE_BAD_DATA: Error parsing column '0': target scale must be larger than source scale."

Short description

There are several versions of the HIVE_BAD_DATA error. The error message might specify a null or empty input string, such as "For input string: """. For this type of error message, see Why does my Amazon Athena query fail with the error "HIVE_BAD_DATA: Error parsing field value for field X: For input string: "12312845691""?

Errors that specify an input string with a value occur under one of the following conditions:

  • The data type that's defined in the table definition doesn't match the actual source data.
  • A single field contains different types of data, such as an integer value for one record and a decimal value for another record.

Resolution

It's a best practice to use only one data type in a column. Otherwise, the query might fail. To resolve errors, be sure that each column contains values of the same data type and that the values are in the allowed ranges.

If you still get errors, then change the column's data type to a compatible data type that has a higher range. If the change to the data type doesn't resolve the issue, then try the solutions in the following examples.

Example 1

  • Source format: JSON
  • Issue: In the last record, the id key value is "0.54." This key value is the DECIMAL data type. For the other records, the id key value is set to INT.

Source data:

{ "id" : 50, "name":"John" }
{ "id" : 51, "name":"Jane" }
{ "id" : 53, "name":"Jill" }
{ "id" : 0.54, "name":"Jill" }

Data Definition Language (DDL) statement:

CREATE EXTERNAL TABLE jsontest_error_hive_bad_data (    id INT,
    name STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
LOCATION 's3://awsexamplebucket/jsontest_error_hive_bad_data/';

Data Manipulation Language (DML) statement:

SELECT * FROM jsontest_error_hive_bad_data

Error:

HIVE_BAD_DATA: Error reading field value: For input string: "0.54"
This query ran against the 'default' database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query ID: add6f434-3397-4c80-a55f-1672a0102e32

To resolve this issue, redefine the id column as STRING. The STRING data type can correctly represent all values in this dataset.

Modified DDL Statement:

CREATE EXTERNAL TABLE jsontest_error_hive_bad_data_correct_id_data_type (    id STRING,
    name STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
LOCATION 's3://awsexamplebucket/jsontest_error_hive_bad_data/';

DML statement:

SELECT * FROM jsontest_error_hive_bad_data_correct_id_data_type

You can also cast to the desired data type. For example, you can cast a string as an integer. However, for some data types this might return null or inaccurate results that you cast from and to. Values that can't be cast are discarded. For example, if you cast the string value "0.54" to INT, then null results are returned:

SELECT TRY_CAST(id AS INTEGER) FROM jsontest_error_hive_bad_data_correct_id_data_type

Example output:

Results     _col0
1    50
2    51
3    53
4

The output shows that the value "0.54" discarded. You can't cast that value directly from a string to an integer. To resolve this issue, use the COALESCE function to cast the mixed type values in the same column as the output. Then, allow the aggregate function to run on the column. Example:

SELECT COALESCE(TRY_CAST(id AS INTEGER), TRY_CAST(id AS DECIMAL(10,2))) FROM jsontest_error_hive_bad_data_correct_id_data_type

Output:

Results     _col0
1    50.00
2    51.00
3    53.00
4    0.54

Run aggregate functions:

SELECT SUM(COALESCE(TRY_CAST(id AS INTEGER), TRY_CAST(id AS DECIMAL(10,2)))) FROM jsontest_error_hive_bad_data_correct_id_data_type

Output:

  _col01    154.54

Example 2

  • Source format: JSON
  • Issue: The id column is defined as INT. Athena couldn't parse "49612833315" because the range for INT values in Presto is -2147483648 to 2147483647.

Source data:

{ "id" : 50, "name":"John" }
{ "id" : 51, "name":"Jane" }
{ "id" : 53, "name":"Jill" }
{ "id" : 49612833315, "name":"Jill" }

DDL statement:

CREATE EXTERNAL TABLE jsontest_error_hive_bad_data_sample_2 (    id INT,
    name STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
LOCATION 's3://awsexamplebucket/jsontest_error_hive_bad_data_2/';

DML statement:

SELECT * FROM jsontest_error_hive_bad_data_sample_2

Error:

HIVE_BAD_DATA: Error reading field value: For input string: "49612833315"
This query ran against the 'default' database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query ID: 25be5ca1-0799-4c21-855d-b9171aadff47

To resolve this issue, define the id column as BIGINT, which can read the value "49612833315." For more information, see Integer types.

Modified DDL Statement:

CREATE EXTERNAL TABLE jsontest_error_hive_bad_data_sample_2_corrected (    id BIGINT,
    name STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
LOCATION 's3://awsexamplebucket/jsontest_error_hive_bad_data_2/';

Example 3

  • Source format: JSON
  • Issue: The input data is DECIMAL, and the column is defined as DECIMAL in the table definition. However, the scale is defined as 2, and 2 doesn't match the "0.000054" value. For more information, see DECIMAL or NUMERIC type.

Source data:

{ "id" : 0.50, "name":"John" }
{ "id" : 0.51, "name":"Jane" }
{ "id" : 0.53, "name":"Jill" }
{ "id" : 0.000054, "name":"Jill" }

DDL statement:

CREATE EXTERNAL TABLE jsontest_error_hive_bad_data_sample_3(    id DECIMAL(10,2),
    name STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
LOCATION 's3://awsexamplebucket/jsontest_error_hive_bad_data_3/';

DML statement:

SELECT * FROM jsontest_error_hive_bad_data_sample_3

Error:

HIVE_BAD_DATA: Error Parsing a column in the table: target scale must be larger than source scale
This query ran against the 'default' database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query ID: 97789f62-d2a5-4749-a506-26480a91a1db

To resolve this issue, redefine the column with a scale that captures all input values. For example, instead of (10,2), use (10,7).

Modified DDL Statement:

CREATE EXTERNAL TABLE jsontest_error_hive_bad_data_sample_3_corrected(    id DECIMAL(10,7),
    name STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
LOCATION 's3://awsexamplebucket/jsontest_error_hive_bad_data_3/';

Example 4

  • Source format: JSON
  • Issue: The input data is STRING, and the column is defined as INT in the table definition. However, Athena can't parse the STRING record as id key defined INT.

Source Data

{ "id" : 50, "name":"John" }
{ "id" : 51, "name":"Jane" }
{ "id" : 53, "name":"Jill" }
{ "id" : one, "name":"Jenny" }

DDL Statement:

CREATE EXTERNAL TABLE jsontest_error_hive_bad_data_sample_4(    
id INT,
name STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
LOCATION 's3://awsexamplebucket/jsontest_error_hive_bad_data_4/';

DML Statement:

SELECT * FROM jsontest_error_hive_bad_data_sample_4

Error:

HIVE_BAD_DATA: Error reading field value: For input string: "one"
This query ran against the 'default' database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query ID: 291b69f6-48f9-4381-af72-6786b3a90826

This error occurs because the actual source data has an incompatible data type definition.

To resolve this issue, redefine the id column as STRING. The STRING data type can correctly represent all values in this dataset. Example:

Modified DDL statement:

CREATE EXTERNAL TABLE jsontest_error_hive_bad_data_sample_4_corrected (    
id STRING,
name STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true')
LOCATION 's3://awsexamplebucket/jsontest_error_hive_bad_data_4/';

Related information

JSON related errors

Troubleshooting in Athena

AWS OFFICIAL
AWS OFFICIALUpdated 2 months ago