Monitoring & observability services

Natasha Ong
This is some text inside of a div block.
4 min read

In a nutshell:

With the elastic nature of AWS services, monitoring services monitor your AWS resources to make sure your systems are running as expected.
AWS CloudTrail lets you know exactly who did what, when, and from where. It answers all of your AWS auditing questions, only except explaining why a user performed an action.
Amazon CloudWatch can provide near real-time understanding of how your system is behaving, including being alerted to anything unusual that require your attention. CloudWatch also gives you the ability to look at those metrics over time as you tune your system for maximum performance.
AWS X-Ray is a valuable tool for developers working on complex programs. It helps trace errors in their code and pinpointing where issues might be occurring.

NOTE: Video wise, will come back to this once blog is complete

Monitoring in an AWS account is a bit like having your personal weather forecast for the cloud.

It's the habit of keeping a close eye on your digital resources, services, and apps, making sure they're all running smoothly and securely.

Just as you'd want to know if your picnic might get rained on, businesses monitor their AWS systems to predict, prevent, and fix issues:

  • You would want to know if an EC2 instance is constantly over-utilised. Maybe it's time to make a Lambda function that can trigger a scaling event to automatically launch EC2 instances.
  • You would want to know if an application starts sending error responses at a high rate, so you can ask an employee to investigate what's happening.

Let's learn AWS's monitoring tools! They will help you measure how your systems are performing, alert you when things aren't right, and even help you debug and troubleshoot issues as they come along.

AWS CloudTrail

Introducing AWS CloudTrail, the comprehensive auditing tool! You can think of CloudTrail as the trail of breadcrumbs that someone leaves behind them when they make actions in AWS.

Remember how API calls = the way we give instructions to set up and control things in AWS?

With CloudTrail, you can view a complete history of user activity and API calls for your applications and resources. This means every request gets logged in CloudTrail, like launching an EC2 instance, adding a row to a DynamoDB table, and changing a user's permissions.

The engine records everything about the API call too:

  • Exactly who made the request
  • When did they send the API call
  • Where were they
  • What was their IP address
  • What was the response? Was the request denied? Did something change? What is the new state?

Key facts

  • Events are usually recorded in CloudTrail within 15 minutes after an API call.
  • You can filter events by specifying the time and date that an API call occurred, the user who requested the action, the type of resource that was involved in the API call, and more.
  • You can also save CloudTrail's logs in secure S3 buckets.

Example scenario

Suppose that the you are browsing through AWS Identity and Access Management (IAM) and find out a new IAM user named Mary was created.

Huh, you don't know who, when, or which method created this new user.

To answer these questions, visit AWS CloudTrail. In the Event History section, you apply a filter so you're only setting the events where new IAM users are created. You start scanning through the results...

A-ha, mystery solved! CloudTrail has a record of this event. On January 1, 2020 at 9:00 AM, IAM user John created a new IAM user (Mary) through the AWS Management Console.

CloudTrail event details: What happened, who made the request, when the request occurred, and how the request was made.

CloudTrail Insights

CloudTrail Insights is an optional feature that lets CloudTrail automatically detect unusual API activities in your AWS account.

For example, CloudTrail Insights might detect that a higher number of Amazon EC2 instances than usual have recently launched in your account. You can then look over the full event details to figure out which actions you need to take next.

Amazon CloudWatch

You need a way to monitor the health and operations of your solutions - how do you know whether things are running well?

Introducing Amazon CloudWatch!

  • CloudWatch allows you to monitor your AWS infrastructure and the applications you run on AWS in real time.
  • It works by tracking and monitoring metrics. Think of metrics as numbers that measure different parts of you resources. For example, the CPU utilisation of an EC2 instance, or the total number of requests made to an Amazon S3 bucket.
  • AWS services send metrics to CloudWatch. CloudWatch uses these metrics to create graphs that show how performance has changed over time.
  • You can also set alarms when values cross certain limits, and use a dashboard for seeing all metrics at a glance. More on this below!

CloudWatch alarms

With CloudWatch, you can set up alarms that do things automatically when your metric goes above or below a limit you choose.

  • Let's say your company's developers use Amazon EC2 instances for their work, but sometimes they forget to turn them off. The instances will keep running and costing money.
  • You can create a CloudWatch alarm to stop an Amazon EC2 instance when its CPU use stays below a certain level for a specific time. And you can even get a heads-up when the alarm goes off.
  • Even better, CloudWatch alarms are integrated with SNS. So we can then send an SMS to the team's manager to say, "hey, some EC2 instances are still running."

CloudWatch dashboard

CloudWatch dashboard showing metrics for Amazon RDS, Amazon EC2, and Amazon EBS

The CloudWatch dashboard is like your one-stop shop for checking out all the metrics in real-time.

  • Dashboards would auto refresh when they are open, so we can see an up-to-date view of our resources.
  • You can even customise separate dashboards for different business purposes, applications, or resources.

AWS X-Ray

X-Ray is a really helpful tool for developers wanting to fix any errors in their code. When you're working on big and complex programs, especially those with a microservices architecture (i.e. a program broken into many smaller parts), it can be really difficult to trace where errors are coming from. Which part of your program could have possible gone wrong?

With X-Ray, you can look at all of the tasks and requests that your application is handling and pinpoint exactly which part of your application is causing an issue or performing much slower than the rest.

X-Ray does this by tracing an an end-to-end journey of requests as they travel through your application. It can even turn this data into a visual map to show how each part of your application is connected with another, and where the errors are occurring in this map.

Even if there aren't any errors that need fixing, you can still use X-Ray to get insights into how you improve your code and reduce downtime.