How do I use Amazon Athena to analyze VPC flow logs?

4 minute read

I want to use Amazon Athena to analyze my Amazon Virtual Private Cloud (Amazon VPC) flow logs.

Short description

Use Amazon VPC Flow Logs to analyze network traffic patterns, and identify threats and risks across your Amazon VPC network.

Resolution

Use Athena to analyze Amazon VPC flow logs

To use Athena to analyze the Amazon VPC flow logs, complete the following steps:

Use the Amazon Athena console query editor, to run the following command to create a database. Replace test_db_vpclogs with a name for your database:
```
CREATE DATABASE test_db_vpclogs;
```
Important: It's a best practice to create the database in the same AWS Region as the Amazon Simple Storage Service (Amazon S3) bucket that the flow logs are stored in.

In the database, run the following command to create a table for the VPC flow logs. Replace test_table_vpclogs with the name of your table. Also, modify the LOCATION parameter to point to the Amazon S3 bucket that contains your log data:

CREATE EXTERNAL TABLE `test_table_vpclogs`(
  `version` int, 
  `account_id` string, 
  `interface_id` string, 
  `srcaddr` string, 
  `dstaddr` string, 
  `srcport` int, 
  `dstport` int, 
  `protocol` bigint, 
  `packets` bigint, 
  `bytes` bigint, 
  `start` bigint, 
  `end` bigint, 
  `action` string, 
  `log_status` string, 
  `vpc_id` string, 
  `subnet_id` string, 
  `instance_id` string, 
  `tcp_flags` int, 
  `type` string, 
  `pkt_srcaddr` string, 
  `pkt_dstaddr` string, 
  `az_id` string, 
  `sublocation_type` string, 
  `sublocation_id` string, 
  `pkt_src_aws_service` string, 
  `pkt_dst_aws_service` string, 
  `flow_direction` string, 
  `traffic_path` int)
PARTITIONED BY ( 
  `region` string, 
  `day` string)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ' ' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://awsexamplebucket/awsexampleprefix/awsexamplelogs/1111222233334444/vpcflowlogs/'
TBLPROPERTIES (
  'projection.day.format'='yyyy/MM/dd', 
  'projection.day.range'='2021/01/01,NOW', 
  'projection.day.type'='date', 
  'projection.enabled'='true', 
  'projection.region.type'='enum', 
  'projection.region.values'='us-east-1,us-west-2,ap-south-1,eu-west-1', 
  'skip.header.line.count'='1', 
  'storage.location.template'='s3://awsexamplebucket/awsexampleprefix/awsexamplelogs/1111222233334444/vpcflowlogs/${region}/${day}'
  )

Note: The preceding command uses partition projection to create a table, partition the table, and automatically populate the partitions. If a projected partition doesn't exist in Amazon S3, then Athena still projects the partition. It's a best practice to use partitioned attributes in your queries.

Use the Amazon Athena console query editor to run SQL statements on the table. You can save the queries, view previous queries, or download query results in CSV format.

Example queries

Note: Replace test_table_vpclogs with the name of your table. Modify the column values and other variables to fit your query.

To view the first 100 flow log entries in chronological order for a specified period of time, run a query similar to the following:

SELECT * 
 FROM test_table_vpclogs 
 WHERE day >= '2021/02/01' AND day < '2021/02/28' 
 ORDER BY day ASC 
 LIMIT 100;

To view the server that receives the top ten number of HTTP packets for a specified period of time, run a query similar to the following:

SELECT SUM(packets) AS packetcount, 
       dstaddr
FROM test_table_vpclogs
WHERE dstport = 443
  AND day >= '2021/03/01'
  AND day < '2021/03/31'
GROUP BY dstaddr
ORDER BY packetcount DESC
LIMIT 10;

The preceding query counts the number of packets that are received on HTTPS port 443 and groups them by destination IP address. Then, it returns the top 10 entries from the previous week.

To check the logs that were created for a specified time range, run a query similar to the following:

SELECT interface_id,
       srcaddr,
       action,
       protocol,
       to_iso8601(from_unixtime(start)) AS start_time,
       to_iso8601(from_unixtime("end")) AS end_time
FROM test_table_vpclogs
WHERE DAY >= '2021/04/01'
  AND DAY < '2021/04/30';

To view flow logs for a specific source IP address between a specified time range, run a query similar to the following:

SELECT *
FROM test_table_vpclogs
WHERE srcaddr = '10.117.1.22'
  AND day >= '2021/02/01'
  AND day < '2021/02/28';

To list the rejected TCP connections between a specified time range, run a query similar to the following:

SELECT day,
       interface_id,
       srcaddr,
       action,
       protocol
FROM test_table_vpclogs
WHERE action = 'REJECT' 
    AND protocol = 6 
    AND day >= '2021/02/01' AND day < '2021/02/28'
LIMIT 10;

To view the flow logs for the IP address range that starts with '10.117', run a query similar to the following:

SELECT *
FROM test_table_vpclogs
WHERE split_part(srcaddr,'.', 1)='10'
  AND split_part(srcaddr,'.', 2) ='117'

To view the flow logs for a specific destination IP address between a time range, run a query similar to the following:

SELECT *
FROM test_table_vpclogs
WHERE dstaddr = '10.0.1.14'
AND day >= '2021/01/01'
AND day < '2021/01/31'

Related information

How do I monitor traffic in my VPC with flow logs?

Analyzing VPC Flow Logs using Amazon Athena, and Amazon QuickSight

Topics

Analytics

Relevant content

VPC flow logs difference with VPC traffic mirroring
Accepted Answer
JosepMaria
asked 10 months ago
VPC Flow Logs and DX
Accepted Answer
davidgmre
asked 8 years ago
AWS VPC Flow Logs - centralized
Orlando
asked 6 months ago
Do NW Load Balancers External IP appear in VPC Flow Logs? & VPC Flow Log timestamps?
rePost-User-3910006
asked a year ago
Is GuardDuty inspecting the AWS Firewall flow logs or VPC flow logs only ?
JFN
asked 15 days ago
How can I analyze custom VPC Flow Logs using CloudWatch Logs Insights?
AWS OFFICIALUpdated 2 years ago
How do I use Amazon VPC Reachability Analyzer to troubleshoot connectivity issues with an Amazon VPC resource?
AWS OFFICIALUpdated 7 months ago
How do I push VPC flow logs to Splunk using Amazon Kinesis Firehose?
AWS OFFICIALUpdated a year ago
How do I activate Amazon VPC Flow Logs?
AWS OFFICIALUpdated 9 months ago
How to Analyze your AWS IoT Logs with Standard SQL queries
EXPERT
Charlyscott237
published 6 months ago