Exploiting Samba CVE-2015-0240 on Ubuntu 12.04 and Debian 7 32-bit

23 marzo 2015

tl;dr

It was found that Ubuntu 12.04 32-bit and Debian 7 Samba binaries contained a stack layout that was suitable for exploiting the recent _netr_ServerPasswordSet bug. I was able to develop a reliable exploit that grants pre-authenticated remote root against both systems.

Introduction

On March 2, 2015 I posted a blog entry [1] discussing the exploitability of a recent uninitialized variable bug that was found in Samba, tracked as CVE-2015-0240. My post showed why the binaries I tested did not appear to be exploitable, but also showed how you would go about testing for exploitable cases in general.

A few other people were reporting on different binaries that were not exploitable. Most importantly though a researcher named Worawit Wang (@sleepya_) kept updating their findings on different binaries and eventually noted that Ubuntu 12.04 32-bit and Debian 7 appeared exploitable [2]. Kudos to @sleepya_ for keeping at it and finding these interesting results. I’ve been testing a lot of others, but haven’t come up with anything fruitful yet beyond their results.

I couldn’t resist trying to exploit this bug and any resulting capability would be useful during the penetration tests performed by our consultants. What follows is an overview of the approach I used to achieve reliable and relatively speedy exploitation.

This post assumes you’re familiar with the bug (see previous blog [1] if not), as well as how Return Oriented Programming (ROP) works, and things like stack pivoting.

Exploitable condition

Information about how this bug manifests on Ubuntu 12.04 32-bit was updated in @sleepya_’s public proof of concept exploit [3]. Specifically it says:

Ubuntu 12.04 x86 (samba 3.6.3): (confirmed code execution)

- 'creds' value is '_ptr_server_name' value in ndr_pull_netr_ServerPasswordSet() function

And in the exploit there is another comment:

primaryName = nrpc.PLOGONSRV_HANDLE()
# ReferentID field of PrimaryName controls the uninitialized value of creds in ubuntu 12.04 32bit
primaryName.fields['ReferentID'] = 0x41414141

This is really interesting. From previous analysis we know that the uninitialized value creds will be passed to TALLOC_FREE(), meaning we effectively have an arbitrary free condition. Before talking about what to free, I’ll give a bit more background on what the code allowing this control looks like in detail.

First let’s look at the ndr_pull_netr_ServerPasswordSet() function. In Samba the NDR pull routines are used to demarshal network data into locally understood structures. In this case the function is populating a netr_ServerPasswordSet structure.

The first variable in the function is the _ptr_server_name variable:

static enum ndr_err_code ndr_pull_netr_ServerPasswordSet(struct ndr_pull *ndr, int flags, struct netr_ServerPasswordSet *r)

{

     uint32_t _ptr_server_name;
     TALLOC_CTX *_mem_save_server_name_0;
     TALLOC_CTX *_mem_save_credential_0;
     TALLOC_CTX *_mem_save_return_authenticator_0;
     TALLOC_CTX *_mem_save_new_password_0;

     if (flags   NDR_IN) {
           ZERO_STRUCT(r->out);
           NDR_CHECK(ndr_pull_generic_ptr(ndr,  _ptr_server_name));
           if (_ptr_server_name) {
                NDR_PULL_ALLOC(ndr, r->in.server_name);
           } else {
                r->in.server_name = NULL;
           }

We can see that the ndr_pull_generic_ptr() function is used to populate this variable we’re interested in. What does it do? As its comment says, it pulls out a pointer referent identifier. The ndr_pull_uint3264() function is used to populate v, which points to _ptr_server_name:

/*
  parse a pointer referent identifier
*/

_PUBLIC_ enum ndr_err_code ndr_pull_generic_ptr(struct ndr_pull *ndr, uint32_t *v)
{
     NDR_CHECK(ndr_pull_uint3264(ndr, NDR_SCALARS, v));
     if (*v != 0) {
           ndr->ptr_count++;
     }
     return NDR_ERR_SUCCESS;
}

A referent ID is just used to identify a pointer in a marshalled packet. Some more information about this is available at [4]. In this case what we’re most interested in is that the referent ID can be controlled by us, by manipulating the PrimaryName field of our packet; as @sleepya_ showed.

And running it confirms that we crash on TALLOC_FREE(0x41414141).

The plan

We know we have an arbitrary free, that there is ASLR and NX in place. We also know that Samba is a forking daemon, which means it will spawn a new server on every execution; so crashing on a bad attempt is okay. So I’ll first note what we’ll need to solve in an exploit.

TALLOC_FREE() will expect a legit chunk in order to function, and the heap is randomized, so we’ll have to defeat heap ASLR to know the address of a controlled chunk to make TALLOC_FREE() do something interesting. If you’re familiar with talloc already, you know there is a destructor function pointer in-band inside the chunk header. So TALLOC_FREE() doing something in this case will eventually be calling a controlled destructor pointer, which we can use to get code execution.

Once we can achieve code execution at an arbitrary address we still need to defeat .text ASLR, since in order to defeat NX we’ll also likely have to use ROP, and that means knowing the address of the smbd binary or some other library.

We have to figure out two different independently randomized addresses, so to defeat ASLR we’re going to need to either find an information leak or bruteforce through the address space entropy. The most important bit about this is that bruteforcing two separately randomized addresses at once is really not ideal. The bruteforce complexity explodes too quickly to make the timeframes feasible. So we want to look for a way to bruteforce the addresses separately, or find a way to reveal memory.

What to free?

First we need to free something and ideally something we control. But how do we find it? Given that the bug only allows an arbitrary free, and this point (short of a secondary bug) there is no way to reveal memory. We must resort to bruteforce.

But, we just noted we don’t want to have to bruteforce the heap and .text at the same time. So assuming we find a legitimate heap address, how can we differentiate between a crash due to a bad address and a crash due to a good address? If we end up pointing TALLOC_FREE() at a chunk we do control and it doesn’t contain legitimate values, Samba will still crash in a way that is not distinguishable from our other bruteforcing. If we can instead populate the chunk with certain values that make TALLOC_FREE() behave differently we can know we pointed to a controlled chunk.

To solve this problem we must understand a bit more about how talloc and specifically TALLOC_FREE()works under the hood.

Every allocation on the talloc heap results in a chunk that has a chunk header. This is defined in lib/talloc/talloc.c. The talloc_chunk structure describes the header:

struct talloc_chunk {
     struct talloc_chunk *next, *prev;
     struct talloc_chunk *parent, *child;
     struct talloc_reference_handle *refs;
     talloc_destructor_t destructor;
     const char *name;
     size_t size;
     unsigned flags;
     void *pool;
};

The chunk is actually padded with an additional 8 bytes after the pool member, to ensure 16-byte alignment, so ends up being 48 bytes total. When you allocate memory this header precedes the address the allocation routine gives back to you, so by subtracting 48 bytes from the pointer passed to TALLOC_FREE(), you can obtain a pointer to the talloc_chunk header.

The TALLOC_FREE() macro expands to _talloc_free(), which calls talloc_chunk_from_ptr() in order to obtain the talloc_chunk structure shown above:

_PUBLIC_ int _talloc_free(void *ptr, const char *location)
{
     struct talloc_chunk *tc;
     if (unlikely(ptr == NULL)) {
           return -1;
     }
     tc = talloc_chunk_from_ptr(ptr);

The talloc_chunk_from_ptr() does some a few things that concern us.

/* panic if we get a bad magic value */
static inline struct talloc_chunk *talloc_chunk_from_ptr(const void *ptr)
{
     const char *pp = (const char *)ptr;
     struct talloc_chunk *tc = discard_const_p(struct talloc_chunk, pp - TC_HDR_SIZE);
     if (unlikely((tc->flags   (TALLOC_FLAG_FREE | ~0xF)) != TALLOC_MAGIC)) { 
           if ((tc->flags   (~0xFFF)) == TALLOC_MAGIC_BASE) {
                talloc_abort_magic(tc->flags   (~0xF));
                return NULL;
           }

           if (tc->flags   TALLOC_FLAG_FREE) {
                talloc_log("talloc: access after free error - first free may be at %sn", tc->name);
                talloc_abort_access_after_free();
                return NULL;
           } else {
                talloc_abort_unknown_value();
                return NULL;
           }
     }
     return tc;

You can see above that tc is set to heap address subtracted by TC_HDR_SIZE, which corresponds to sizeof(struct talloc_chunk). So assuming we point to a the address right after a fake chunk header we provided, we have to be sure that we provide the right TALLOC_MAGIC in the flag field of our header. We also want to be sure that we don’t set the TALLOC_FLAG_FREE flag, to avoid having the allocator think there is a double free.

TALLOC_MAGIC is a macro that uses a constant and the talloc version information to generate a value. Samba uses talloc version 2.0.

#define TALLOC_MAGIC_BASE 0xe814ec70
#define TALLOC_MAGIC ( 
     TALLOC_MAGIC_BASE + 
     (TALLOC_VERSION_MAJOR << 12) + 
     (TALLOC_VERSION_MINOR << 4) 
)

#define TALLOC_VERSION_MAJOR 2
#define TALLOC_VERSION_MINOR 0

So if we provide the right magic, the chunk header will be passed back to _talloc_free() and we can continue. In the code shown below we see that the first member checked is the tc->refs which indicates if anything other contexts references this chunk.

        if (unlikely(tc->refs != NULL)) {
           struct talloc_reference_handle *h;
           if (talloc_parent(ptr) == null_context    tc->refs->next == NULL) {
                /* in this case we do know which parent should
                   get this pointer, as there is really only
                   one parent */
                return talloc_unlink(null_context, ptr);
           }

           talloc_log("ERROR: talloc_free with references at %sn",
                   location);

           for (h=tc->refs; h; h=h->next) {
                talloc_log("treference at %sn",
                        h->location);
           }
           return -1;
     }

     return _talloc_free_internal(ptr, location);

The return -1 seems interesting in that it would mean there is no crash, so let’s walk through it. We control the full chunk header, so we can make tc->refs any value. Next it checks to see if this chunks parent member points to null_context. We can avoid this condition setting tc->parent to NULL, or any other value as we don’t know the address of the null_context global anyway.

Next an error will be logged and then the for loop is run, which looks dangerous. We would need tc->refs non-NULL in order to hit this condition in the first place, so the loop will definitely reference h->next and h->location. This means tc->refs has to be a legit pointer. However, for each attempted heap address we can assume that the address is legit, so we can point tc->refs into the same address and try to populate h->next with a NULL to terminate the loop and h->location with a pointer to a NULL. This is actually a legit way to cause the TALLOC_FREE() not to crash, but is somewhat complicated due to the pointer dependencies. So I kept looking for something easier if possible.

If we provide tc->refs of NULL instead, and skip the whole condition we just described, we will call into _talloc_free_internal(). First it re-does a bunch of stuff that was already done in the last function, which we can ignore. The first thing it does after re-checking tc->refs for NULL is this:

                if (unlikely(tc->flags   TALLOC_FLAG_LOOP)) {
           /* we have a free loop - stop looping */
           return 0;
     }

This looks like exactly what we wanted and doesn’t have any of the complications we originally saw. All we have to do is set the TALLOC_FLAG_LOOP bit in tc->flags member of our fake chunk, and the function will exit.

To be clear what this allows is for the TALLOC_FREE() function to not crash when it points at a legitimate heap address that we control, which means Samba won’t crash and we’ll get an RPC packet in response to our _netr_ServerPasswordSet request. This behavior can be used as a boolean indicator to determine when we’ve provided a legitimate heap address that we control.

You might now be wondering about a reliability quirk that exists here. What about the case where we supply a pointer that is a legitimate heap address, but rather than pointing at a chunk we control, it actually points to another legitimate chunk on the heap that is properly freed and prevents a crash. We need a way to be sure that it’s actually our chunk we’re pointing to. To solve this problem we can use the destructor function pointer in the chunk, which we’re going to be using anyway to try to get code execution. To understand, let’s look a little further in _talloc_free_internal(). Immediately following the test for the TALLOC_FLAG_LOOP flag, we see:

       if (unlikely(tc->destructor)) {
           talloc_destructor_t d = tc->destructor;
           if (d == (talloc_destructor_t)-1) {
                return -1;
           }

           tc->destructor = (talloc_destructor_t)-1;
           if (d(ptr) == -1) {
                tc->destructor = d;
                return -1;
           }
           tc->destructor = NULL;
     }

So if tc->destructor is set, it will try to execute that address. How does this help? As long as we interact with the Samba server the same way each time we try to trigger some behavior, the heap layout will be deterministic, due to it being fork()-based. So, if we find a heap address that we think might point to our chunk because TALLOC_FREE() didn’t crash, we use the exact same address on a fresh attempt, but unset the TALLOC_FLAG_LOOP flag and set the destructor value to an invalid address. If it’s our chunk at that address, we can confirm by seeing that we changed what used to be a safe free back into a crashing free.

Chunk spraying

We now know what we want to free in order for us to bruteforce a heap address without having to bruteforce a code address at the same time. But we still need to come up with a good way to find our chunk.

Samba allows you to send up to 15mb DCERPC packets, so if we’re able to send that much data, it’s possible to bruteforce the address space relatively quickly (depending on network speeds). Smaller sizes can be used too. In the end what you want to be able to spray at least a few adjacent pages with chunks you control to be sure that one entire entire page is full of your fake chunks.

Basically the idea is to have a whole bunch of adjacent fake chunks sprayed into memory, all identical and able to trigger the condition we’re looking for. In order to be sure that each guessed heap address definitely points to an offset that aligns with our fake we have to test every address at multiple offsets. So if our fake chunk is 48 bytes, we have to test 12 offsets for each address.

I’ve illustrated the sprayed chunks to understand why you need to bruteforce all of the offsets to ensure successes.

Honey, I shrunk the chunk! AKA chunk compression

This thing I’ve dubbed “chunk compression” is not an explicit requirement of exploitation, but I thought it would be interesting to talk about, since it can help reduce the heap bruteforce complexity a decent amount.

In the previous section I showed how you can construct a fake talloc chunk, spray it into pages, and then for every guessed heap address try every offset within the chunk to ensure the free points to a correct alignment that would trigger our oracle.

There is a lot of useless cruft (from our perspective) in the talloc chunk, so having to bruteforce every 4-byte offset in 48 bytes is annoying. I wanted a way to basically craft a smaller chunk, compressing the important bits, into a package that was as small as possible, but when sprayed with adjacent fake chunks would combine to form a larger legitimate chunk header of the expected size and with all the members we need to control being the correct values. If you’re familiar with the unlinkme chunk from phrack[5] way back when, compressing our talloc chunks is in a similar vein.

I was able to reduce the chunk size to 20 bytes instead of 48. This means for every heap address I only have to test 5 offsets, so it’s over a 50% speedup on that front. The trick to this is to identify the smallest amount of data that fits your needs that when adjacent to the same data still provides offsets that when referenced will trigger our desired boolean indicator.

I’ve illustrated how this works in the following image.

To be able to better appreciate how the compressed chunks and associated offsets look like compared to using a normal sized talloc chunk, below I’ve illustrated a regular chunk overlaying the compressed chunks.

I’d be curious to see if someone came up with an even smaller approach.

Bruteforcing .text and getting code execution

So now we know the address of our fake chunk, so we can start bruteforcing the .text address until we can kick off our ROP chain. This can be bruteforced one 4kb page at a time. In this case where my exploit supports both a vulnerable Ubuntu 12.04 and Debian 7 package, I end up trying twice for every guessed address. I try one target .text offset, and then the other. The caveat to this is the more targets you have, the slower this portion of the bruteforce becomes. That said, unless you’re able to somehow profile the OS prior to exploitation, there is no way to know for sure which target you’re exploiting.

Once you’ve bruteforced the correct .text address you dispatch the destructor routine, you have a few registers that point into your chunk, but because you sprayed your fake chunks into this space, when you stack pivot you don’t have a register pointing directly to a ROP chain payload.

To work around this I used a “ROP sled”, which just means selecting a gadget that adjusts the stack pointer to point to the same relative offset in the next adjacent fake chunk, re-execute the gadget with the new stack pointer, and so on. We can place these gadget addresses in unused fields of the fake chunks, without having to worry about unwanted side effects.

For those unfamiliar with this concept, I’ve illustrated it as well.

Eventually you slide down the heap until the end of the sprayed chunks and then we can execute an actual ROP payload appended to the end of the packet. From there I just used a typical mmap() + memcpy() ROP chain that jumps to a stage 2 payload, which spawns a root shell over the existing socket.

Timing

The exploit has taken under 2 minutes and up to 20 minutes in others. The timing will vary based on where in your bruteforce range the targets address space actually falls, network speed, etc. Having to wait 20 minutes isn’t really ideal, but given it is a pre-auth remote root, it’s usually worth the wait.

Mitigations

As you saw the two primary mitigations in place, ASLR and NX, were defeated using bruteforce and ROP; pretty standard stuff.

You could use something like grsecurity [6] to slow down bruteforcing. By enabling PaX you’d also have the benefit of increased ASLR entropy which would make bruteforcing even slower.

Obviously the talloc allocator [7] could be hardened as well. Hardening measures like out of band metadata, a randomly generated chunk cookie instead of static magic, or destructor function encoding would’ve all made exploitation significantly more difficult.

Conclusion

This was a fun bug to exploit. I hope to find or hear about some more packages that exhibit the exploitable condition. If you run into one and want to share, please feel free to contact me on twitter @fidgetingbits or aaron <.> adams <@> nccgroup <.> com. Feedback and corrections are always welcome.

References

Published date: 23 March 2015

Written by: Aaron Adams

Aaron Adams