How can I analyze custom VPC Flow Logs using CloudWatch Logs Insights?

7 minute read
1

I have configured custom VPC Flow Logs. How can I discover patterns and trends with Amazon CloudWatch Logs Insights?

Short description

You can use CloudWatch Logs Insights to analyze VPC Flow Logs. CloudWatch Log Insights automatically discovers fields in many Amazon provided logs, as well as JSON formatted log events, to allow for easy query construction and log exploration. VPC Flow Logs that are in the default format are automatically discovered by CloudWatch Logs Insights.

But, VPC Flow Logs are deployed in a custom format. Because of this, they aren't automatically discovered, so you must modify the queries. This article gives several examples of queries that you can customize and extend to match your use cases.

This custom VPC Flow Logs format is used:

${account-id} ${vpc-id} ${subnet-id} ${interface-id} ${instance-id} ${srcaddr} ${srcport} ${dstaddr} ${dstport} ${protocol} ${packets} ${bytes} ${action} ${log-status} ${start} ${end} ${flow-direction} ${traffic-path} ${tcp-flags} ${pkt-srcaddr} ${pkt-src-aws-service} ${pkt-dstaddr} ${pkt-dst-aws-service} ${region} ${az-id} ${sublocation-type} ${sublocation-id}

Resolution

Retrieve latest VPC Flow Logs

Because log fields are not automatically discovered by CloudWatch Logs Insights, you must use the parse keyword to isolate desired fields. In this query, the results are sorted by the flow log event start time, and restricted to the two most recent log entries.

Query

#Retrieve latest custom VPC Flow Logs
parse @message "* * * * * * * * * * * * * * * * * * * * * * * * * * *" as account_id, vpc_id, subnet_id, interface_id,instance_id, srcaddr, srcport, dstaddr, dstport, protocol, packets, bytes, action, log_status, start, end, flow_direction, traffic_path, tcp_flags, pkt_srcaddr, pkt_src_aws_service, pkt_dstaddr, pkt_dst_aws_service, region, az_id, sublocation_type, sublocation_id
| sort start desc
| limit 2

Results

account_id vpc_id  subnet_id interface_idinstance_idsrcaddrsrcport
123456789012 vpc-0b69ce8d04278ddd subnet-002bdfe1767d0ddb0eni-0435cbb62960f230e172.31.0.10455125
123456789012 vpc-0b69ce8d04278ddd1 subnet-002bdfe1767d0ddb0eni-0435cbb62960f230e91.240.118.8149422

Summarize data transfers by source/destination IP address pairs

Next, summarize the network traffic by source/destination IP address pairs. In this example, the sum statistic is used to perform an aggregation on the bytes field. This calculates a cumulative total of the data transferred between hosts. For more context, the flow_direction is included. The results of this aggregation are then assigned to the Data_Transferred field, temporarily. Then, the results are sorted by Data_Transferred in descending order, and the two largest pairs are returned.

Query

parse @message "* * * * * * * * * * * * * * * * * * * * * * * * * * *" as account_id, vpc_id, subnet_id, interface_id,instance_id, srcaddr, srcport, dstaddr, dstport, protocol, packets, bytes, action, log_status, start, end, flow_direction, traffic_path, tcp_flags, pkt_srcaddr, pkt_src_aws_service, pkt_dstaddr, pkt_dst_aws_service, region, az_id, sublocation_type, sublocation_id
| stats sum(bytes) as Data_Transferred by srcaddr, dstaddr, flow_direction
| sort by Data_Transferred desc
| limit 2

Results

srcaddrdstaddrflow_directionData_Transferred
172.31.1.2473.230.172.154egress346952038
172.31.0.463.230.172.154egress343799447

Analyze data transfers by EC2 instance ID

You can use custom VPC Flow Logs to analyze an Amazon Elastic Compute Cloud (Amazon EC2) instance ID, directly. Taking the previous query, you can now determine the most active EC2 instances by using the instance_id field.

Query

parse @message "* * * * * * * * * * * * * * * * * * * * * * * * * * *" as account_id, vpc_id, subnet_id, interface_id,instance_id, srcaddr, srcport, dstaddr, dstport, protocol, packets, bytes, action, log_status, start, end, flow_direction, traffic_path, tcp_flags, pkt_srcaddr, pkt_src_aws_service, pkt_dstaddr, pkt_dst_aws_service, region, az_id, sublocation_type, sublocation_id
| stats sum(bytes) as Data_Transferred by instance_id
| sort by Data_Transferred desc
| limit 5

Results

instance_idData_Transferred
-1443477306
i-03205758c9203c979517558754
i-0ae33894105aa500c324629414
i-01506ab9e9e90749d198063232
i-0724007fef3cb06f354847643

Filter for rejected SSH traffic

To better understand the traffic that was denied by your security group and network access control lists (ACL), filter on reject VPC Flow Logs. You can further narrow this filter down to include protocol and target port. To identify hosts that are rejected on SSH traffic, extend the filter to include TCP protocol (for example, protocol 6) and traffic with a destination port of 22.

Query

parse @message "* * * * * * * * * * * * * * * * * * * * * * * * * * *" as account_id, vpc_id, subnet_id, interface_id,instance_id, srcaddr, srcport, dstaddr, dstport, protocol, packets, bytes, action, log_status, start, end, flow_direction, traffic_path, tcp_flags, pkt_srcaddr, pkt_src_aws_service, pkt_dstaddr, pkt_dst_aws_service, region, az_id, sublocation_type, sublocation_id
| filter action = "REJECT" and protocol = 6 and dstport = 22
| stats sum(bytes) as SSH_Traffic_Volume by srcaddr
| sort by SSH_Traffic_Volume desc
| limit 2

Results

srcaddrSSH_Traffic_Volume
23.95.222.129160
179.43.167.7480

Isolate HTTP data stream for a specific source/destination pair

To further investigate trends in your data using CloudWatch Logs Insights, isolate bidirectional traffic between two IP addresses. In this query, ["172.31.1.247","172.31.11.212"] returns flow logs using either IP address as the source or destination IP address. To isolate HTTP traffic, the filter statements match VPC Flow Log events with protocol 6 (TCP) and port 80. Use the display keyword to return a subset of all available fields.

Query

#HTTP Data Stream for Specific Source/Destination Pair
parse @message "* * * * * * * * * * * * * * * * * * * * * * * * * * *" as account_id, vpc_id, subnet_id, interface_id,instance_id, srcaddr, srcport, dstaddr, dstport, protocol, packets, bytes, action, log_status, start, end, flow_direction, traffic_path, tcp_flags, pkt_srcaddr, pkt_src_aws_service, pkt_dstaddr, pkt_dst_aws_service, region, az_id, sublocation_type, sublocation_id
| filter srcaddr in ["172.31.1.247","172.31.11.212"] and dstaddr in ["172.31.1.247","172.31.11.212"] and protocol = 6 and (dstport = 80 or srcport=80)
| display interface_id,srcaddr, srcport, dstaddr, dstport, protocol, bytes, action, log_status, start, end, flow_direction, tcp_flags
| sort by start desc
| limit 2

Results

interface_idsrcaddrsrcportdstaddrdstportprotocolbytesactionlog_status
eni-0b74120275654905e172.31.11.21280172.31.1.2472937665160876ACCEPTOK
eni-0b74120275654905e172.31.1.24729376172.31.11.21280697380ACCEPTOK

Isolate HTTP data stream for specific source/destination pair

You can use CloudWatch Logs Insights to visualize results as a bar or pie chart. If the results include the bin() function, then query results are returned with a timestamp. This timeseries can then be visualized with a line or stacked area graph.

Building on the previous query, you can use stats sum(bytes) as Data_Trasferred by bin(1m) to calculate the cumulative data transferred over one-minute intervals. To view this visualization, toggle between the Logs and Visualization tables in the CloudWatch Logs Insights console.

Query

parse @message "* * * * * * * * * * * * * * * * * * * * * * * * * * *" as account_id, vpc_id, subnet_id, interface_id,instance_id, srcaddr, srcport, dstaddr, dstport, protocol, packets, bytes, action, log_status, start, end, flow_direction, traffic_path, tcp_flags, pkt_srcaddr, pkt_src_aws_service, pkt_dstaddr, pkt_dst_aws_service, region, az_id, sublocation_type, sublocation_id
| filter srcaddr in ["172.31.1.247","172.31.11.212"] and dstaddr in ["172.31.1.247","172.31.11.212"] and protocol = 6 and (dstport = 80 or srcport=80)
| stats sum(bytes) as Data_Transferred by bin(1m)

Results

bin(1m)Data_Transferred
2022-04-01 15:23:00.00017225787
2022-04-01 15:21:00.00017724499
2022-04-01 15:20:00.0001125500
2022-04-01 15:19:00.000101525
2022-04-01 15:18:00.00081376

Related information

Supported logs and discovered fields

Analyzing log data with CloudWatch Logs Insights

CloudWatch Logs Insights query commands

Tutorial: Run a query that produces a time series visualization

AWS OFFICIAL
AWS OFFICIALUpdated 2 years ago
No comments