Technical Advisory: containerd – containerd-shim API Exposed to Host Network Containers (CVE-2020-15257)

30 November 2020

Vendor: containerd Project
Vendor URL: https://containerd.io/
Versions affected: 1.3.x, 1.2.x, 1.4.x, others likely
Systems Affected: Linux
Author: Jeff Dileo
CVE Identifier: CVE-2020-15257
Advisory URL: https://github.com/containerd/containerd/security/advisories/GHSA-36xw-fx78-c5r4
Risk: High (full root container escape for a common container configuration)

Summary

containerd is a container runtime underpinning Docker and common Kubernetes configurations. It handles abstractions related to containerization and provides APIs to manage container lifecycles. containerd-shim is a binary spawned by containerd that serves as the parent of a container and which implements container lifecycle and reconnection logic that it exposes to containerd through the containerd shim API. This API is exposed over an abstract namespace Unix domain socket that is accessible from the root network namespace. Due to this, non-user namespaced containers with host networking can access this API and cause containerd-shim to perform dangerous actions and spin up arbitrarily privileged containers, enabling container escapes and escalation to full root privileges on the host.

Location

containerd/containerd
- runtime/v1/shim/client/client.go: WithStart(), newCommand()
- cmd/containerd-shim/main_unix.go: serve()
- cmd/containerd-shim/shim_linux.go: newServer()
containerd/ttrpc (via vendor/github.com/containerd/ttrpc/unixcreds_linux.go)
- unixcreds_linux.go: UnixSocketRequireSameUser()

Impact

An attacker that is able to run or compromise a host network container running as UID 0 can escape the container, escalate privileges, and compromise the host.

Details

containerd is a core container runtime, which manages runc-based containers, and is used by Docker (from which it was spun out of) and Kubernetes, either through Docker or directly through the containerd CRI shim. Generally, containerd exists as a long-running service daemon that exposes gRPC APIs (e.g. those for containers and tasks) for container lifecycle management operations (e.g. container execution and supervision, image handling, etc.). To implement its APIs, containerd does not directly parent the containers that it creates and oversees on behalf of its clients. Instead, containerd spawns containerd-shim processes that manage the lifecycle of each container. containerd-shim stays alive for the course of the container’s life to manage it and directly invokes the runc binary to directly spawn and run the container itself.

To serve its own gRPC (actually ttrpc, an embedded gRPC implementation and wire protocol) APIs (e.g. v1 and v2), containerd-shim listens on an abstract Unix domain socket. These are Linux-specific Unix domain sockets that use length-prefixed keys that begin with a null byte and may contain arbitrary binary sequences. These containerd-shim sockets take different forms across different containerd versions; however, a common behavior is that they embed a trailing null byte in the abstract Unix domain socket sun_path key, which prevents a number of common Unix tools (e.g. socat) from connecting to it.

@/containerd-shim///shim.sock
@/containerd-shim/.sock

While containerd-shim is more than capable of binding and listening on such a socket itself when passed the --socket CLI flag, it also supports receiving an arbitrary socket file descriptor from its parent process. containerd uses this approach and pre-creates and listen(2)s on the abstract Unix domain socket before the containerd-shim child process is created to that it may be initialized with a handle to it. containerd-shim then starts its containerd shim API ttrpc server on the socket. As abstract Unix domain sockets are otherwise permissionless, containerd-shim uses standard Unix domain socket features to validate that incoming connections have the same UID and EUID (effective UID) as the containerd-shim process itself (typically UID:0 and EUID:0, root).

However, unlike normal Unix domain sockets, which are bound to file paths, abstract Unix domain sockets are tied to the network namespace of a process. As a result, containers that use host networking (e.g. docker run --host network alpine ...) will be able to access it. Furthermore, while most containerization platforms run their containers with a minimal set of Linux capabilities (the constituent privileges of root), they also do not run the containers in user namespaces, resulting in containers that run as a privileged dropped root user. Due to this, such containers run by default with a host user namespace UID and EUID of 0. This combination enables such containers to enumerate containerd-shim sockets (e.g. via netstat -xl or /proc/net/unix) and successfully connect to them.

containerd-shim exposes a number of dangerous APIs that can be used to escape a container and execute privileged commands. Across the two main versions of containerd(-shim) in use, 1.2.x and 1.3.x, the following exploit primitives are exposed to users, among others:

Arbitrary file reads
Arbitrary file appends
Arbitrary file writes
Arbitrary command execution in the context of containerd-shim (root)
Creating a container from a runc config.json file
Starting a created container

As a result, it is trivial for an attacker to compromise the host if they can reach the containerd shim API.

Technical Recommendation

Abstract namespace Unix domain sockets should not be used to communicate with containerd-shim. Instead, the connection should be performed over unnamed Unix domain sockets created with socketpair(2), or Unix domain sockets bound to a file path, like /run/containerd/containerd.sock and /run/containerd/containerd.sock.ttrpc. If this is not feasible, stricter access control checks would need to be performed to validate incoming shim API clients, and it may be necessary to modify the connection handshake to provide additional authentication data and/or identification. It should be noted that it is insufficient to check that the connecting process is not a child of containerd-shim itself as the process could still connect to the shim API of a different container’s containerd-shim.

User Recommendation

For users running container workloads on vulnerable systems, this issue may be mitigated by disallowing host networking from any containers that are not user namespaced, or by ensuring that such containers are run with a non-zero UID/GID.

Users should update to the newest versions of containerd that include patches for this issue. Additionally, as any running containers created prior to updating containerd to a fixed version will remain vulnerable after the update, users will need to ensure that all containers are fully stopped and then restarted after the update is completed.

For users who are uncertain about whether CVE-2020-15257 affects them, the below command can be used to quickly determine if a container created by a vulnerable version of containerd is still running. If any results are returned, a vulnerable containerd-shim process is running.

$ cat /proc/net/unix | grep 'containerd-shim' | grep '@'

Vendor Communication

6/03/20 - NCC Group emailed the security email of the containerd project
          (security@containerd.io) asking for a means of secure
          communication to disclose vulnerability information
6/03/20 - NCC Group disclosed vulnerability to the containerd project along
          with exploit code targeting containerd 1.2.x and 1.3.x
6/04-05/20 - After some initial conversation over email about possible
             remediations, communication migrated to GitHub.
6/05/20 - NCC Group discussed the (in)feasibility of relying on
          AppArmor/SELinux to remediate this issue.
6/12/20 - NCC Group requests an update.
6/15/20 - Issue is not accepted as a security vulnerability in containerd.
          The containerd project indicates that while a fix will be applied, it
          will not be backported to in-use branches. A sample patch is shared
          with NCC Group.
6/15-16/20 - Further replies and conversation occurred about the aforementioned
             patch's implementation and its incompatibility with prior versions
             of containerd. NCC Group provided information on an alternate
             approach that could work for all versions.
6/19-24/20 - Further development of a patch occurs by a containerd maintainer
             who requests and receives permission to make a public pull
             request. The implementation follows NCC Group's original
             recommendation and would be compatible across containerd versions.
7/10/20 - NCC Group requests an update and an estimate on when the fix will be
          merged and applied to older containerd branches.
7/13/20 - A containerd maintainer replies stating that the upcoming 1.4.0
          release will forgo having the fix applied, and that instead, it will
          be be applied as a fix in 1.4.1 and to at least the 1.3.x branch.
9/04/20 - After a lack of updates, NCC Group states an intention to publish a
          technical advisory for this issue, and asks if anyone can confirm if
          the fix has been applied/backported as the standing pull request was
          commented as having been pushed to the future 1.5.x release. NCC
          Group also asks for a timeline on when the issue will be fixed and
          states that they can wait up to 30 days (10/05/20) or until a fix is
          released to publish the advisory since the issue was not accepted as
          a vulnerability.
9/10/20 - A containerd maintainer replies stating that the issue is still not
          fixed and that the pull request is not likely to be merged soon. They
          ask for reconsideration of the backwards-incompatible fix.
9/10/20 - NCC Group replies with concerns about the approach of the
          backwards-incompatible fix, including a timing side channel in the
          implementation that would enable guessing the authentication secret,
          and a bias in the PRNG used to create it.
10/02/20 - A maintainer replies with a potential fix based on verifying that the
           PID of the connecting process is on the host mount namespace.
           Immediately after this, a containerd security advisor asks if NCC
           Group still plans to publish a technical advisory on 10/05/20 and if
           they would be open to having a conversation about the issue.
10/02/20 - NCC Group replies raising a concern over a possible race condition
           in the underlying mechanism of potential fix. NCC Group also states
           that they can postpone publishing the advisory, and would be happy
           to converse about the issue if it would help to have it fixed. Over
           email, meeting availability is exchanged.
10/06/20 - NCC Group, a containerd security advisor and two containerd
           maintainers discuss the issue in a call and agree on a plan to
           remediate the issue as a vulnerability, with patches applied to
           supported branches of containerd.
10/06/20
-11/04/20 - The containerd project works on implementing the fixes
                    across several supported protocol versions, backports the
                    patches to the 1.4.x and 1.3.x branches.
10/16/20 - CVE-2020-15257 is issued for this vulnerability.
11/10-13/20 - NCC Group reviews and tests the patches, and provides feedback
              on the changes; no major issues are identified. Subsequent
              discussion resolves questions raised in the feedback.
11/13/20 - A follow-up call occurs to discuss disclosure timelines, patch
           releases, and embargo dates.
11/13-30/20 - Patches are provided under embargo to vendors and Linux
              distributions.
11/19-25/20 - A containerd security maintainer backports the patches to the
              end-of-life containerd 1.2.x for Linux distributions using that
              version. After discussion and analysis, a backport based on
              similar patches provided by Canonical and Google is selected for
              merging into the 1.2.x branch.
11/30/20 - containerd publishes a security advisory for this issue,
           CVE-2020-15257.
11/30/20 - NCC Group publishes this security advisory following the containerd
           publication.

Thanks to

Michael Crosby, Samuel Karp, and Derek McGowan of the containerd project.

About NCC Group

NCC Group is a global expert in cyber security and risk mitigation, working with businesses to protect their brand, value and reputation against the ever-evolving threat landscape.

With our knowledge, experience and global footprint, we are best placed to help businesses identify, assess, mitigate and respond to the risks they face.

We are passionate about making the Internet safer and revolutionizing the way in which organizations think about cybersecurity.

Jeff Dileo