Skip to main content

Monitoring

This document describes application monitoring methods based on this catalog's configuration.

Overview

Application monitoring for the constructed system utilizes CloudWatch, AWS's standard service. Specifically, it involves deploying API Gateway and AWS Lambda standard metrics to CloudWatch Dashboard, and configuring CloudWatch Alarms for each metric to send notifications via email or messaging tools (Slack).

The following metrics will be added to CloudWatch Dashboard and thresholds will be set as targets for Alarms mentioned later for monitoring.

AWS Lambda Standard Metrics

Metric NameNotes
ThrottlesNumber of throttling occurrences · Burst Limit detection
DurationDetection of unusually long execution times or unresponsive external systems during external system integration
ConcurrentExecutionMonitor concurrent execution count for application operation; perform optimization or mitigation requests when approaching upper limits

For items not listed above, please configure additional Dashboard and Alarm monitoring as needed.

[B] Application Log (Lambda) Monitoring

The application is based on outputting JSON structured logs like the following using the Logger from the AWS Lambda Powertools library.

{
"cold_start": true,
"function_arn": "arn:aws:lambda:us-east-1:123456789012:function:shopping-cart-web-lambda",
"function_memory_size": 128,
"function_request_id": "c6af9ac6-7b61-11e6-9a41-93e812345678",
"function_name": "shopping-cart-web-lambda",
"level": "ERROR",
"message": "This is an ERROR log with some context",
"service": "shopping-cart-web-loader",
"timestamp": "2023-12-12T21:21:08.921Z",
"xray_trace_id": "abcdef123456abcdef123456abcdef123456"
}

These structured logs can be searched from the CloudWatch Logs console by specifying field values in forms like { $.level = "ERROR" }. Using this mechanism, it's possible to use "Subscription Filters" to check structured log fields from CloudWatch Logs Log Groups and forward matching logs to other services (S3, Kinesis, Lambda, etc.), but for monitoring purposes, we use "Metric Filters" that reflect to CloudWatch metrics.

※ For frontend applications, when frameworks output unstructured logs and customization is difficult, Filter Expressions that perform general string matching can be used. For details, please refer to this official documentation.

Metric Filters can be created from the CloudWatch Logs console. Please refer to the following code sample written in CloudFormation Template format as a reference for configuration values.

# "Core" is an example of Lambda function name or alias
CoreErrorLogMetricFilter:
Type: AWS::Logs::MetricFilter
DependsOn: CoreLogGroup
Properties:
LogGroupName: !Ref CoreLogGroup
FilterPattern: '{ $.level = "ERROR" }'
MetricTransformations:
- MetricValue: 1
MetricNamespace: !Join [ '', [ 'Logs/', !Ref CoreLambdaFunction ]]
MetricName: Errors

CoreWarnLogMetricFilter:
# This sample code is included in the catalog AMI.

When logs with ERROR log level are output, they are registered as custom metrics through CloudWatch Logs Metric Filter, and by setting up Alarms, you can create a monitoring mechanism that sends notifications when the number of error logs exceeds a certain threshold within, for example, 10 minutes.

Configure Alarms to Automate Anomaly Detection

CloudWatch Alarm sets thresholds on metrics and normally remains in an OK state. If metric values exceed (or fall below) the threshold within a certain time window, the state changes to ALARM, and this state change triggers notifications via email or SNS with messages about the changed state and how data points relate to the threshold. Detection is possible not only for OKALARM transitions but also for the reverse.

Alarms are configured for standard metrics as follows:

  CoreLambdaThrottlesAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: prod-CoreLambdaThrottles-Alarm
AlarmDescription: Throttled error of Decade API core Lambda function
AlarmActions:
- !Ref MonitoringTopic
Namespace: AWS/Lambda
Dimensions:
- Name: FunctionName
Value: !Ref CoreLambdaFunction
EvaluationPeriods: 5
MetricName: Throttles
Period: 60 # 60 seconds
Statistic: Sum
Threshold: 10 # 10 throttles within 5 minutes
ComparisonOperator: GreaterThanThreshold

CoreLambdaDurationAlarm:
# This sample code is included in the catalog AMI.

CoreLambdaConcurrentExecutionAlarm:
# This sample code is included in the catalog AMI.

This can be applied not only to standard metrics but also to custom metrics using the structured logs and Metric Filters mentioned earlier.

CoreLambdaErrorsAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: prod-webapp-CoreLambdaErrors-Alarm
AlarmDescription: "Log level `ERROR` count in CoreLambdaFunction"
AlarmActions:
- !Ref MonitoringTopic # SNS Topic
Namespace: !Join [ '', [ 'Logs/', !Ref CoreLambdaFunction ]]
EvaluationPeriods: 5 # 5 minutes
MetricName: Errors
Period: 60 # 60 seconds
Statistic: Sum
Threshold: 10 # 10 error logs within 5 minutes
ComparisonOperator: GreaterThanThreshold