Automate, Schedule and Deploy AMI backups of EBS-Backed EC2 Instances using Lambda, CloudWatch and Terraform

Last week a friend asked for a solution to automatically take backups of their EBS-Backed EC2 instances on AWS. The EC2 instances included a mix of Linux and Windows AMIs.This solution uses an AWS Lambda function written in Python that is scheduled using Cloudwatch and the whole solution is deployed to AWS using Terraform.

The following two articles that manually create Lambda functions using the console helped as a starting point for my solution:

Lambda Function

This solution (https://github.com/Irtaza/maintain_amis/blob/master/lambda_function.py) uses a single Lambda function that does the following:

  1. Look for all EC2 instances that have a tag with key Backup and value True
  2. For each instance that has the Backup tag, look for the tag with key Retention with a integer value that specifies the number of days the backup AMI should be retained for. If a tag doesn’t exist the default value of 7 days is used.
  3. Create an AMI for each EC2 instance that has the Backup tag with a value of True. The AMI creation process will automatically create snapshots of each instance’s root volume and any other EBS volumes attached to that instance.
  4. Add a tag with key DeleteOn using the the value of Retention tag to calculate a date value for the DeleteOn tag.
  5. Look for all AMIs that have a DeleteOn tag with a date value less then the execution date. Then delete all of these AMIs.
  6. Previous step doesn’t automatically delete the snapshots of EBS volumes. So the next step gets all snapshots linked to the expired AMIs and then deletes them.

Terraform

The complete Terraform script for deploying the Lambda function is available here on GitHub. I will breakdown the Terraform script in the following sections.

1. Create the Lambda function

The first step is to create the Lambda function.

resource "aws_lambda_function" "maintain_amis" {
  function_name = "maintain-amis"

  # The bucket that contains the lambda source code
  s3_bucket = "irtaza-code-repo"
  s3_key    = "lambda/maintain_amis/v1.0.0/maintain_amis.zip"

  # "lambda_function" is the filename within the zip file (lambda_function.py)
  # and "lambda_handler" is the name of the method where the lambda starts
  handler = "lambda_function.lambda_handler"

  runtime = "python3.6"
  timeout = "600"

  role = "${aws_iam_role.lambda_exec.arn}"
}
2. Create IAM execution role

The next step is to create an IAM role that the Lambda function will assume during execution.

# IAM role which dictates what other AWS services the Lambda function
# may access.
resource "aws_iam_role" "lambda_exec" {
  name = "maintain_amis_lambda_role"

  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}
EOF
}
3. Create an AWS IAM policy

The lambda function needs access to various AWS resources. This is where an IAM policy comes in handy. The policy will be defined in two steps. First we will create the policy resource:

# IAM policy that allows the maintain_amis_lambda_role to get required
# permissions
resource "aws_iam_policy" "policy" {
  name        = "maintain_amis_policy"
  description = "A policy for creating and deleing AMIs via Lambda"

  path   = "/"
  policy = "${data.aws_iam_policy_document.policy_document.json}"
}

Then we will use Terraform data source to construct a JSON representation of an IAM policy document, that is referenced in the last line of the script above against the policy key. The policy lists all the actions that the Lambda function will need to perform on various AWS resources.

# Generate json policy document for maintain_amis_policy
data "aws_iam_policy_document" "policy_document" {
  statement {
    sid = "1"

    actions = [
      "logs:CreateLogGroup",
      "logs:CreateLogStream",
      "logs:PutLogEvents",
      "ec2:CreateImage",
      "ec2:CreateTags",
      "ec2:DescribeSnapshots",
      "ec2:DeleteSnapshot",
      "ec2:DeregisterImage",
      "ec2:DescribeImages",
      "ec2:DescribeInstances",
    ]

    resources = [
      "*",
    ]
  }
}
4. Attach the policy to the role

Now we will attach the policy created in the last step to the IAM role we created in the second step.

# Attach policy to to IAM role
resource "aws_iam_policy_attachment" "policy_attach" {
  name       = "policy_attachment"
  roles      = ["${aws_iam_role.lambda_exec.name}"]
  policy_arn = "${aws_iam_policy.policy.arn}"
}
5. Create and attach a Cloudwatch rule

The last step is to add a Cloudwatch rule to trigger the Lambda function once a week.

# Creates a cloudwatch event rule
resource "aws_cloudwatch_event_rule" "every-saturday-three-am" {
  name                = "every-saturday-three-am"
  description         = "Fires every Saturday at 3 am"
  schedule_expression = "cron(0 3 ? * SAT *)"
}

# Links the lambda function to the cloudwatch rule
resource "aws_cloudwatch_event_target" "check_manintain_amis_staurday_three_am" {
  rule      = "${aws_cloudwatch_event_rule.every-saturday-three-am.name}"
  target_id = "maintain_amis"
  arn       = "${aws_lambda_function.maintain_amis.arn}"
}

# Grant permission to Cloudwatch to involke the lambda function
resource "aws_lambda_permission" "allow_cloudwatch_to_call_check_maintain_amis" {
  statement_id  = "AllowExecutionFromCloudWatch"
  action        = "lambda:InvokeFunction"
  function_name = "${aws_lambda_function.maintain_amis.function_name}"
  principal     = "events.amazonaws.com"
  source_arn    = "${aws_cloudwatch_event_rule.every-saturday-three-am.arn}"
}

If you don’t know how to deploy this script using Terraform, you should follow the “Getting Started” guide on Terraform’s website.

You May Also Like

About the Author: Irtaza

2 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Bitnami