Testing Infrastructure-as-Code Using Dynamic Tooling

Erik Steringer, NCC Group

Overview

TL;DR: Go check out https://github.com/ncc-erik-steringer/Aerides

As public cloud service consumption has grown, engineering and security professionals have responded with different tools and techniques to achieve security in the cloud. As a consultancy, we at NCC Group have published multiple tools that we use to guide testing and identify risks for our clients.

In recent years, these cloud providers as well as other companies have also provided infrastructure-as-code (IaC) solutions to manage infrastructure and set up environments. IaC allows cloud customers to write code that represents the infrastructure they want to deploy with a cloud provider. Rather than depend on a cloud engineer to create infrastructure without documentation, piece-by-piece, IaC allows engineers to produce reusable code. This centralizes the process of building and deploying infrastructure and prevents entire classes of risks.

One key benefit of IaC is the ability to perform analysis of the code to identify risks in the infrastructure before the infrastructure is deployed. For example, if an engineer writes code that creates an Amazon S3 bucket with a bucket policy, it is possible to pull the contents of the bucket policy and then look for risks such as making the bucket world-readable.

Static Tools

Much like in the Application Security space, we can classify Cloud Security tools as either static or dynamic. Static tools interpret and analyze code, rather than pulling data from a cloud provider. Static tools easily integrate with practices such as continuous integration and continuous deployment. Engineers can write security or policy-as-code checks that run each time someone commits changes. This can prevent high-severity misconfiguration issues from affecting deployed infrastructure.

Figure 1 - Static Analysis Methodology — Figure 1 – Static Analysis Methodology

Dynamic Tools

Dynamic tools such as ScoutSuite, interact with a cloud provider to pull data, then interpret and analyze that data to identify risks. This means catching risks in the current state of the infrastructure, rather than the intended design and desired state. Engineers can write security and policy-as-code checks that run periodically or after deployments. Dynamic tools, as they currently stand, cannot identify risks before deployments. But they can find risks that static tools cannot identify such as risks created from resource configuration drift, or independent deployments of IaC to a common account/subscription/project of a cloud provider.

The Best of Both Worlds?

Beyond the universal limitations, cloud security tools that identify risks deeper than misconfigurations (graph-based analysis of IAM or network access controls such as with PMapper¹, CloudMapper², and Cartography³) are dynamic rather than static. This means we miss out on the value these tools bring unless we invest the time to build interpreters for the different ways that people can write infrastructure-as-code.

However, there is a project designed specifically for continuous integration and testing called LocalStack⁴ that sets up a mock AWS API endpoint. By using LocalStack, it is possible to take dynamic tools and use them like static tools by pointing them at the emulated AWS API. It should also be possible to take test cases from normal dynamic testing and port them to the IaC testing.

Figure 2 - Dynamic Analysis Methodology — Figure 2 – Dynamic Analysis Methodology

Demonstration

Today, we are releasing a project called Aerides. This project demonstrates how to integrate LocalStack and dynamic tools for assessing IaC. Aerides includes mock infrastructure for a web service that is written using Terraform’s HCL. It is hosted on GitHub and uses GitHub Actions to perform automatic tests for pull requests.

Clone the repository ( https://github.com/ncc-erik-steringer/Aerides ) onto your machine and install its dependencies. Navigate into the Aerides/infracode directory and run:

# this will take ~30s to spin up
localstack start -d


terraform init

terraform apply -var "acctid=000000000000"

This will launch LocalStack (daemon mode) and deploy the Terraform code. Now it is possible to run commands and see the mock infrastructure. For example:

# set fake access keys, set default region to us-east-1
aws configure --profile localstack
aws iam list-users
--profile localstack
--endpoint-url hxxp://localhost:4566

Run PMapper against LocalStack like so:

pmapper --profile localstack graph create
--localstack-endpoint hxxp://localhost:4566
--exclude-services autoscaling

# should output 000000000000.svg if graphviz is installed
pmapper --account 000000000000 visualize

Figure 3 – PMapper Visualization of Infra-as-Code

In the repository, there are currently four pull requests that demonstrate different types of risks that can be detected before deployment. The GitHub Actions that run the tests are hosted in the same repository. The test cases are written with Python’s `unittest` framework and show how you can programmatically handle the data generated by these tools.

How to Build Continuous Integration with LocalStack

Although the noted repository is using GitHub with GitHub Actions, it is possible to use the same technique with other CI solutions. To generalize the process:

Download the repository source
Install dependencies including Terraform and LocalStack, as well as any other dynamic tools to use for testing (note, with solutions that support this option, it might be wise to create an image that has these dependencies installed and ready to go rather than install them every time you execute this process)
Initialize LocalStack and allow it to run in the background throughout the remainder of the process (`-d` parameter)
Initialize Terraform
Use Terraform to apply the IaC to the running instance of LocalStack
(Depending on the dynamic tools used) Initialize mitmproxy and allow it to run in the background throughout the remainder of the process
Run dynamic tools to gather data from LocalStack, using the proxy when necessary
Run test cases against the data gathered from the dynamic tools (see the testcode folder)

Figure 4 – The general CI processes for this technique

Advantages and Disadvantages

The technique we demonstrate in Aerides does include tradeoffs. It wholly hinges on LocalStack. Any delta between LocalStack’s API and the actual AWS APIs leads to unexpected behavior in dynamic tools. This ranges from errors/exceptions in the tools to false negatives/positives from the reporting. When trying to use different dynamic tools, we ran into several instances of these issues. We can help mitigate this disadvantage by making contributions to LocalStack and its underlying dependencies (i.e. Moto⁶). As LocalStack improves, these gaps can be reduced and the signals from dynamic cloud security tools become clearer.

Additionally, LocalStack covers a wide range of services from AWS and can mock several of the resources they make available. However, not all services/resources are available. Some services and resources are only available through LocalStack’s premium offering. This means there will be coverage gaps. Additionally, the engineers trying to use LocalStack will need to adjust their templates to accommodate these gaps.

However, the biggest benefit from utilizing LocalStack is speed. The actual process of standing up LocalStack, deploying the infrastructure-as-code, running dynamic tools, and running test cases altogether takes around thirty seconds for a small project with a few test cases. It can also be executed from an engineer’s development device rather than in CI processes. This is far faster than committing changes to a repository, then waiting for those changes to get picked up and deployed to a cloud provider (such as in a test/dev account/subscription/project), then executing the gathering/testing processes. This speedup scales with the number of resources (due to the number of HTTP requests made over the Internet versus via loopback). This is a nicer experience compared to pushing a change, then finding out a half-hour later that it created a major vulnerability in a live cloud environment.

References

Erik Steringer