How do I troubleshoot data loading errors when I use the COPY command in Amazon Redshift?

4 minute read
0

I tried to use the COPY command to load a flat file. However, I experience data loading issues or errors in Amazon Redshift.

Short description

Use the STL_LOAD_ERRORS table to identify data loading errors that occur during a flat file load. The STL_LOAD_ERRORS table can help you track the progress of a data load and record any failures or errors. After you troubleshoot the issue, use the COPY command to reload the data in the flat file.

Note: If you use the COPY command to load a flat file in Parquet format, then you can also use the SVL_S3LOG table to identify errors.

Resolution

Note: The following steps use an example dataset of cities and venues.

To use the STL_LOAD_ERRORS table to identify data loading errors, complete the following steps:

  1. Check the data in your sample flat file and confirm that the source data is valid:

    7|BMO Field|Toronto|ON|016|TD Garden|Boston|MA|0  
    23|The Palace of Auburn Hills|Auburn Hills|MI|0  
    28|American Airlines Arena|Miami|FL|0  
    37|Staples Center|Los Angeles|CA|0  
    42|FedExForum|Memphis|TN|0  
    52|PNC Arena|Raleigh|NC  ,25   |0  
    59|Scotiabank Saddledome|Calgary|AB|0  
    66|SAP Center|San Jose|CA|0  
    73|Heinz Field|Pittsburgh|PA|65050

    In the preceding example demo.txt file, a pipe character separates the five fields that are used. For more information, see Load LISTING from a pipe-delimited file (default delimiter).

  2. Open the Amazon Redshift console.

  3. Use the following data definition language (DDL) to create a sample table:

    CREATE TABLE VENUE1(VENUEID SMALLINT,  
    VENUENAME VARCHAR(100),  
    VENUECITY VARCHAR(30),  
    VENUESTATE CHAR(2),  
    VENUESEATS INTEGER  
    ) DISTSTYLE EVEN;
  4. To identify the cause of the data loading error, create a view to preview the relevant columns from the STL_LOAD_ERRORS table:

    create view loadview as(select distinct tbl, trim(name) as table_name, query, starttime,  
    trim(filename) as input, line_number, colname, err_code,  
    trim(err_reason) as reason  
    from stl_load_errors sl, stv_tbl_perm sp  
    where sl.tbl = sp.id);
  5. To load the data, run the COPY command:

    copy Demofrom 's3://your_S3_bucket/venue/'  
    iam_role 'arn:aws:iam::123456789012:role/redshiftcopyfroms3'  
    delimiter '|' ;

    Note: Replace your_S3_bucket with the name of your S3 bucket and arn:aws:iam::123456789012:role/redshiftcopyfroms3 with the ARN for your AWS Identity and Access Management (IAM) role. The IAM role must have permissions to access data from your S3 bucket. For more information, see Parameters.

  6. To display and review the error load details of the table, query the load view:

    testdb=# select * from loadview where table_name='venue1';tbl | 265190  
    table_name | venue1  
    query | 5790  
    starttime | 2017-07-03 11:54:22.864584  
    input | s3://  
    your_S3_bucket/venue/venue_pipe0000_part_00  
    line_number | 7  
    colname | venuestate  
    err_code | 1204  
    reason | Char length exceeds DDL length

    In the preceding example, the exception is caused by the length value and must be added to the venuestate column. The (NC ,25 |) value is longer than the length defined in the VENUESTATE CHAR(2) DDL.
    To resolve this issue, complete one of the following tasks:
    If the data is expected to exceed the defined length of the column, then update the table definition to modify the column length.
    -or-
    If the data isn't correctly formatted or transformed, then modify the data in file to use the correct value.
    The output from the query includes the following information:
    The file that causes the error
    The column that causes the error
    The line number in the input file
    The reason for the exception

  7. Modify the data in your load file to use the correct values:

    7|BMO Field|Toronto|ON|016|TD Garden|Boston|MA|0  
    23|The Palace of Auburn Hills|Auburn Hills|MI|0  
    28|American Airlines Arena|Miami|FL|0  
    37|Staples Center|Los Angeles|CA|0  
    42|FedExForum|Memphis|TN|0  
    52|PNC Arena|Raleigh|NC|0  
    59|Scotiabank Saddledome|Calgary|AB|0  
    66|SAP Center|San Jose|CA|0  
    73|Heinz Field|Pittsburgh|PA|65050

    Note: The length must align with the defined column length.

  8. Reload the data load:

    testdb=# copy Demofrom 's3://your_S3_bucket/sales/'  
    iam_role 'arn:aws:iam::123456789012:role/redshiftcopyfroms3' delimiter '|' ;  
    INFO:  Load into table 'venue1' completed, 808 record(s) loaded successfully.

    Note: The STL_LOAD_ERRORS table can hold only a limited number of logs and for approximately 4 to 5 days. Standard users can view only their own data when they query the STL_LOAD_ERRORS table. To view all the table data, you must be a superuser.

Related information

Amazon Redshift best practices for designing tables

Amazon Redshift best practices for loading data

System tables for troubleshooting data loads

Working with recommendations from Amazon Redshift Advisor

AWS OFFICIAL
AWS OFFICIALUpdated 4 months ago