Client-side telemetry: Alarms

Setting up automated CloudWatch alarms that send errors to your inbox.

A classy headshot of Graeme wearing cool glasses, looking like a goofball.

Graeme Zinck

Senior software engineer at LVL Wellbeing

A phone shows an error message while a paper plane flies away with a message.

This is the 5th article in a 5-part series:

Rolling your own client-side telemetry solution using AWS CDK

A step-by-step walkthrough on deploying a client-side telemetry stack using AWS CDK, Lambda, API Gateway, and CloudWatch.

  1. Client-side telemetry: Series overview
  2. Client-side telemetry: Setting up a new CDK project
  3. Client-side telemetry: Deploying a Typescript Lambda function with CDK
  4. Client-side telemetry: Lambda permissions and APIs in CDK
  5. Client-side telemetry: Alarms

So you have error logs stashed away and you're able to make pretty graphs using CloudWatch metrics. That's enough to help you log dive into customer-reported bugs, but a good engineer will fix bugs before they get in customers' hands.

If you have a time turner or a TARDIS, you can just go back in time after a customer complains! For the rest of us, though, it's more practical to have a robust pipeline with multiple environments, solid testing, and alarms.

Today, we'll set up an alarm that sends an email every time something goes wrong.

Creating an SNS topic for notifications

The first thing we need to set up is an Simple Notification Service (SNS) topic that will email us when something goes wrong. Replace the email in the following code with the email you want to receive notifications.

// lib/infra-fe-telemetry-stack.ts
//
// NOTE: the following subscription will send emails to
//       your_email@example.com. Replace it with your email!

// export class InfraFETelemetryStack extends cdk.Stack {
//   constructor() {
//     ...
       const topic = new sns.Topic(this, "TelemetryTopic", {
         displayName: `${serviceName} ${environment} Notifications`,
       });
       const sub = new subs.EmailSubscription("your_email@example.com");
       topic.addSubscription(sub);
//   }
// }

Deploy it with npx cdk deploy and you should get an email from "fe-telemetry-service Notifications" (no-reply@sns.amazonaws.com) asking you to confirm your subscription to the new SNS topic. Make sure to click the link to make sure you get alerts when alarms go off!

You can see your new SNS resources in the AWS console for your selected region.

https://[REGION-CODE].console.aws.amazon.com/sns/v3/home#/topics

Adding alarms to the stack

Now, we need to create alarms that will send emails to our SNS topic when our CloudWatch metrics show there are errors! We'll configure a separate alarm for each severity level. In theory, we could customize each alarm to trigger on different thresholds (e.g., a severity 1 alarm might trigger on 1 error/5 minutes, but a severity 2 might trigger on 10 errors/hour), but we're going to keep things simple here.

We're going to...

  • Create an alarm for severity levels 1–3 (4 and 5 are for information only)
  • Attach the alarm to the appropriate metric namespace, severity, and environment (defined in lambda/error-logger/src/clients/cloudwatchClient.ts)
  • Set the alarm threshold (1 error/5 minutes)
  • Add an alarm action to send an email to our SNS topic when it goes off
// lib/constructs/Alarms.ts
import { Construct } from "constructs";
import * as cwa from "aws-cdk-lib/aws-cloudwatch-actions";
import * as sns from "aws-cdk-lib/aws-sns";
import * as cw from "aws-cdk-lib/aws-cloudwatch";
import * as cdk from "aws-cdk-lib";

interface AlarmsProps {
  readonly serviceName: string;
  readonly environment: string;
  readonly snsTopic: sns.ITopic;
  readonly errorMetricNamespace: string;
}

export class Alarms extends Construct {
  public readonly alarms: cw.Alarm[];

  constructor(scope: Construct, id: string, props: AlarmsProps) {
    super(scope, id);

    // For each severity level 3 or higher, create an alarm
    this.alarms = ["1", "2", "3"].map(
      (severity) =>
        new cw.Alarm(this, `Sev${severity}Alarm`, {
          metric: new cw.Metric({
            namespace: `${props.errorMetricNamespace}/Errors`,
            metricName: "client.error",
            dimensionsMap: {
              "error.severity": severity,
              environment: props.environment,
            },
            period: cdk.Duration.minutes(5),
            statistic: "Sum",
          }),
          threshold: 1,
          evaluationPeriods: 1,
          comparisonOperator:
            cw.ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD,
          actionsEnabled: true,
          alarmDescription: `More than one Sev${severity} error occurred in the last 5 minutes`,
          alarmName: `${props.serviceName}-${props.environment}-Sev${severity}Alarm`,
          treatMissingData: cw.TreatMissingData.NOT_BREACHING,
        }),
    );

    this.alarms.forEach((alarm) => {
      alarm.addAlarmAction(new cwa.SnsAction(props.snsTopic));

      // Add the line below to email when the alarm is resolved
      // alarm.addOkAction(new cwa.SnsAction(props.snsTopic));
    });
  }
}

Now we just need to add the construct to our stack...

// lib/infra-fe-telemetry-stack.ts
   import { Alarms } from './constructs/Alarms';
// ...
// export class InfraFETelemetryStack extends cdk.Stack {
//   constructor() {
//     ...
       new Alarms(this, 'Alarms', {
         serviceName,
         environment,
         snsTopic: topic,
         errorMetricNamespace,
       });
//   }
// }

And deploy it with npx cdk deploy.

Now, hit that endpoint!

curl -X POST 'https://fe-telemetry.{local|staging|production}.{your-domain}.com/error' \
  -H "Content-Type: application/json" \
  -d '{
    "severity": 2,
    "errorCode": "UNCAUGHT_ERROR",
    "device": "iPhone 15",
    "os": "iOS 17.2",
    "appVersion": "1.0.0",
    "error": "Some error message"
  }'

If you see an email within 10 minutes, you're all set! 🚀

If you wanted to add more environments, it's as easy as adding a new stack in bin/infra-fe-telemetry.ts.

Configuring multiple environments

Here's the real magic of CDK: we can instantly deploy more environments with almost no effort!

Adding environments simply involves adding more stacks to the bin/infra-fe-telemetry.ts file.

// bin/infra-fe-telemetry.ts
// ...
// const localStack = new InfraFETelemetryStack(app, "FETelemetryLocal", {
//   ...
// });
// cdk.Tags.of(localStack).add("environment", "local");

   const stagingStack = new InfraFETelemetryStack(app, "FETelemetryStaging", {
     env: {
       account: process.env.CDK_DEFAULT_ACCOUNT,
       region: process.env.CDK_DEFAULT_REGION,
     },
     domainName: "my-domain-name.com",
     subdomain: "fe-telemetry.staging",
     environment: "staging",
     serviceName: "fe-telemetry-service",
   });
   cdk.Tags.of(stagingStack).add("environment", "staging");

   const productionStack = new InfraFETelemetryStack(app, "FETelemetryProduction", {
     env: {
       account: process.env.CDK_DEFAULT_ACCOUNT,
       region: process.env.CDK_DEFAULT_REGION,
     },
     domainName: "my-domain-name.com",
     subdomain: "fe-telemetry.production",
     environment: "production",
     serviceName: "fe-telemetry-service",
   });
   cdk.Tags.of(productionStack).add("environment", "production");

To deploy all the stacks at once, run npx cdk deploy --all. Easy!

One last thing: in staging and production, we want to alarm on errors and send emails to our SNS topic. However, in our local env, we probably don't want to spam our inbox with alerts.

To avoid sending unnecessary emails, we can add a condition to the stack that disables the alarms in the local env.

// lib/infra-fe-telemetry-stack.ts
// ...
// export class InfraFETelemetryStack extends cdk.Stack {
//   constructor() {
//     ...
       if (environment !== 'local') {
//       const topic = new sns.Topic(this, "TelemetryTopic", {
//         ...
//       });
//       ...
//       new Alarms(this, 'Alarms', {
//         ...
//       });
       }
//   }
// }

After another npx cdk deploy, the alarms should be gone from the local env!

Wrapping up

In this series, we've built a fully scalable, serverless stack on AWS to log and alarm on errors from any client! We did it all with AWS CDK and Typescript, leveraging reusable constructs and stacks to make it easy to add new environments down the road.

Next steps

An error logger isn't useful until you hook up your clients! If you're building in React Native, you can add an error boundary to your app and send fetch requests to your endpoint. Remember that error boundaries don't catch errors in event handlers and asynchronous code (e.g., setTimeout or requestAnimationFrame). You'll need to use try/catch statements in those cases.

With these same building blocks, we could also add a Lambda to record how long it takes for content to load and plug it into the same API Gateway with a new set of metrics and alarms. All you need to do is add the new resources, run npm run build && npx cdk deploy, and kick back while CDK does the rest!

Thank you!

If you found this series helpful, connect with me on LinkedIn. And if you're ever passing through Halifax, Canada, let's grab a coffee. I may be a nerd, but I also like talking to fellow human beings. Cheers!