Cisco ASA series part four: dlmalloc-2.8.x, libdlmalloc, & dlmalloc on Cisco ASA | NCC Group | Leading Cyber Security & Managed Services

09 October 2017

This article is part of a series of blog posts. We recommend that you start at the beginning. Alternatively, scroll to the bottom of this article to navigate through the whole series.

This article is meant to provide a summary of some key functionality for dlmalloc-2.8.x and introduce a debugging plugin called libdlmalloc [1] that is designed to aid analysis of dlmalloc-2.8.x heap structures and associated chunks. Analysis and development was primarily done using dlmalloc-2.8.3, as that was the version found on the specific Cisco ASA systems under which analysis was carried out.

Introduction

The Doug Lea allocator, better known as dlmalloc [2], is a fairly popular and well-researched heap. dlmalloc-2.7.x is the original heap from which ptmalloc2 [3] was forked, which is likely the most well-known version. ptmalloc2 (or most precisely, a modification of it) is widely used nowadays as it is what served as a base for the GNU Libc heap. dlmalloc-2.8.x used to be the default heap provided in libbionic [4] (Android’s libc), before switching to jemalloc. Various versions of dlmalloc are found in many embedded devices. You can find a list of historical dlmalloc versions on the official server [5]. dlmalloc-2.8.x also used to be the default heap allocator on Cisco ASA firewalls, before switching to the glibc one based on ptmalloc2.

To obtain a copy of dlmalloc-2.8.x for reference, see the listing [5]. To follow along exactly with what we describe, you can grab malloc-2.8.3.[ch] from [5], which is the version we specifically analysed. To see the changelog of the history of dlmalloc, and especially the changes introduced in the 2.8.x branch, see the History: in the latest dlmalloc release [6].

The Doug Lea allocator

In this section we go into some of the more interesting details about dlmalloc-2.8.x, as well as the history of dlmalloc versions and their relationship to ptmalloc. If you are already familiar with dlmalloc-2.8.x feel free to skip on to the libdlmalloc tool section.

dlmalloc vs ptmalloc vs glibc

The ptmalloc allocator, which is part of glibc, was regularly forked from dlmalloc. In the case of ptmalloc2, glibc makes its own modifications to the allocator in its own fork. We did not investigate if this was also the case for ptmalloc, but it presumably was. A quick note about the table below, for those that may be unfamiliar: a bin is a linked list used to track free chunks. The following table demonstrates the relationship of versions:

dlmalloc	ptmalloc	glibc	Types of bins
dlmalloc 2.5.x	N/A	N/A	bins
dlmalloc 2.6.x	ptmalloc	forked?	smallbins/bins
dlmalloc 2.7.x	ptmalloc2	forked	fastbins/smallbins/largebins
dlmalloc 2.8.x	ptmalloc3	N/A	smallbins/treebins

Often when people refer to dlmalloc they don’t explicitly mention the version. However, these versions can make a notable difference from a functionality and exploitation perspective. For example, dlmalloc 2.6.x lacks fastbins, 2.7.x adds fastbins, 2.8.x drops fasftbins again and introduces a tree-based structure for tracking large free chunks. Fastbins, for instance, are the source of some exploit techniques described by Phantasmal Phantasmagoria [7] and others, which means they don’t directly apply to any dlmalloc version outside of 2.7.x.

With that noted, it’s important to reiterate that in this article we are explicitly talking about dlmalloc-2.8.x and not ptmalloc2 (forked from dlmalloc-2.7.x). Nor are we talking about ptmalloc3 (forked from dlmalloc-2.8.x). Although they share many similarities, we did not analyse ptmalloc3. The tool will be unable to accurately analyse a ptmalloc3 heap without additional functionality being added.

These are critical distinctions because often when people casually refer to dlmalloc, they are referring to ptmalloc2 or sometimes even glibc’s custom modified version of ptmalloc2. Or they are talking about some other specific branch of dlmalloc without the per-thread enhancements, but the version of which they don’t specify. Although these heaps all share the same roots, when you’re doing heap exploitation, the minor differences often become of major significance and thus we feel it is important to highlight. As with the article, the libdlmalloc tool we discuss in this document is specifically designed to analyse dlmalloc-2.8.x.

Historical analysis of dlmalloc-2.8.x / ptmalloc3

Over the past couple of decades there have been quite a few fantastic heap articles that focus on some older versions of dlmalloc (2.7.x or earlier), ptmalloc, or the glibc allocator specifically. However, very few of the well known heap articles focus on dlmalloc-2.8.x or ptmalloc3 specifically. The following is an (incomplete) look at some papers or exploits that discuss dlmalloc-2.8.x or ptmalloc3 directly in some capacity.

blackngel’s Phrack 66 paper MALLOC DES-MALEFICARUM [8] briefly notes that the HOUSE OF SPIRIT can work on ptmalloc3.

blackngel’s Phrack 67 paper The House Of Lore: Reloaded [9] specifically talks about the details of ptmalloc3 and compares it to ptmalloc2. It discusses porting the House of Lore technique directly to ptmalloc3, with a wargames-style example. This appears to be the most comprehensive analysis of ptmalloc3 (and thus dlmalloc-2.8.x) from an exploitation perspective.

Ian Beer’s summary of Pinkie Pie’s Pwn2Own 2013 exploit [10] specifically mentions dlmalloc on Android, which was presumably dlmalloc 2.8.5 or 2.8.6 (versions [11] used by libbionic prior to the jemalloc move). The paper describes his (Pinkie’s) approach to target dlmalloc, as well as an overview of dlmalloc-2.8.x.

Exodus Intelligence’s CVE-2016-1287 exploit paper [12] targeted dlmalloc-2.8.x, although didn’t go into much detail about any nuances of 2.8.x itself.

That appears to be most of the good references. Other passing references to Android’s dlmalloc are made in numerous papers, but little to no actual implementation details about the algorithm are provided.

Note that it’s entirely possible we’re missing some important references, so if you are aware of any we would appreciate hearing about them and will update this section as necessary.

High level differences between dlmalloc-2.8.x dlmalloc-2.7.x

Although The House Of Lore: Reloaded touched on a bunch of the changes between ptmalloc2 and ptmalloc3, and is well worth reading, we have decided to (re)summarise some of the notable differences between the dlmalloc 2.7.x and 2.8.x changes in more detail.

We don’t go into exhaustive detail about all the differences in this article, but we do touch on some areas of specific interest with regards to what we were looking at. For now, the best place to find additional details about anything you’re interested in is the dlmalloc source code as most things are described in the source code comments.

mstates arenas

In dlmalloc 2.7.x (and thus ptmalloc2) there is the concept of an arena, which is effectively a structure used to manage the range of memory used to service non-mmapped-based allocations. In dlmalloc 2.7.x, terminology leans towards the mstate or malloc_state and isn’t typedef’ed to an arena-related structure. Instead a single global malloc_state called _av is used and looked up using the get_malloc_state() macro. Of note is that the malloc_state structure tracks a single region of memory that can be extended and this region of memory is used to service the allocations. The dlmalloc-2.7.x comments refer to the memory managed by an mstate as an arena, and sometimes the well-known ‘main arena’.

In ptmalloc2 this becomes somewhat more complicated in that it allows threads to have their own associated arenas for servicing allocations, which can help reduce lock contention on the heap. In these scenarios the default arena for use is called the main_arena. ptmalloc2 uses the arena terminology much more, which a lot of past exploit-related research uses.

In dlmalloc-2.8.x the terminology arena seems to be minimised even more and things are now talked about using the concept of mstates and mspaces. In relation to mstates the concept of a memory segment was introduced. Instead of an mstate only being able to track a single region of memory, it now tracks a list of malloc_segment structures (aka msegment), each of which can be allocated independently from different regions of memory.

Although this isn’t particularly important to a lot of exploit scenarios, it’s important to have the correct mental model and the tool we release specifically shows some information about the segments.

smallbin (2.8.x) vs fastbin (2.7.x)

The smallbins in dlmalloc-2.8.x are used to track free chunks in doubly linked list and are most similar to what you’d traditionally imagine when thinking of largebins in something like a ptmalloc2/dlmalloc 2.7.x.

The chunk structure for small chunks looks like this:

struct malloc_chunk {
 size_t prev_foot; /* Size of previous chunk (if free). */
 size_t head; /* Size and inuse bits. */
 struct malloc_chunk* fd; /* double links -- used only if free. */
 struct malloc_chunk* bk;
};

The use of only smallbins and treebins in dlmalloc-2.8.x differs significantly from dlmalloc-2.7.x, as the latter used singly linked lists called fastbins to manage especially small chunks, smallbins to track small chunks, and largebins to track bigger chunks below the mmap threshold.

treebin (2.8.x) vs largebin (2.7.x)

Unlike its predecessors, dlmalloc-2.8.x moves away from a purely doubly linked list for tracking large chunks and moves towards a tree structure, a bitwise trie [17]. The head of each such tree is tracked in what is referred to as a treebin. This leads to significantly more metadata stored for large free chunks and in some situations can complicate corruption scenarios as there is more logic to work around. Conversely, more code paths can lead to additional paths of logic that can also aid in exploitation.

These large chunks are now referred to as malloc_tree_chunk structures and look like the following:

struct malloc_tree_chunk {
  /* The first four fields must be compatible with malloc_chunk */
  size_t                    prev_foot;
  size_t                    head;
  struct malloc_tree_chunk* fd;
  struct malloc_tree_chunk* bk;

  struct malloc_tree_chunk* child[2];
  struct malloc_tree_chunk* parent;
  bindex_t                  index;
};

As mentioned above, these additional fields can both complicate and aid in some exploitation scenarios. In the more traditional approach of abusing a coalescing scenario to achieve a mirror write (a controlled overwrite that occurs when a double linked list entry is unlinked), you can imagine that if you corrupted the fd and bk pointers you’d have to be careful what values you place into the additional entries.

Generally the easiest way to simplify exploitation in this scenario is to set parent to NULL, as this will prevent the additional fields from being parsed.

For example, when unlinking a large chunk, the unlink_large_chunk() macro is called. It is defined as follows:

#define unlink_large_chunk(M, X) {
 tchunkptr XP = X->parent;
 tchunkptr R;
 if (X->bk != X) {
   tchunkptr F = X->fd;
   R = X->bk;
   if (RTCHECK(ok_address(M, F))) {
     F->bk = R;
     R->fd = F;
   }
   else {
     CORRUPTION_ERROR_ACTION(M);
   }
 }
 else {
   [...]
 }

This will unlink the chunk from the double linked list, if the list is not empty. Afterwards, which we show below, it checks to see if the parent (XP) is NULL. If it is not NULL you can see it does a bunch of additional actions, including manipulating the child nodes. However, if parent is NULL this whole portion of the unlink logic is skipped. In the event you are just wanting to leverage this logic for a traditional mirror write, then ensuring the parent is NULL will mean you’re safe.

 if (XP != 0) {
   tbinptr* H = treebin_at(M, X->index);
   if (X == *H) {
     if ((*H = R) == 0) 
       clear_treemap(M, X->index);
   }
   else if (RTCHECK(ok_address(M, XP))) {
     if (XP->child[0] == X) 
       XP->child[0] = R;
     else 
       XP->child[1] = R;
   }
   else
     CORRUPTION_ERROR_ACTION(M);
   if (R != 0) {
     if (RTCHECK(ok_address(M, R))) {
      tchunkptr C0, C1;
      R->parent = XP;
      if ((C0 = X->child[0]) != 0) {
        if (RTCHECK(ok_address(M, C0))) {
          R->child[0] = C0;
          C0->parent = R;
        }
        else
          CORRUPTION_ERROR_ACTION(M);
      }  
      if ((C1 = X->child[1]) != 0) {
        if (RTCHECK(ok_address(M, C1))) {
          R->child[1] = C1;
          C1->parent = R;
        }
        else
          CORRUPTION_ERROR_ACTION(M);
     }
   }
   else
     CORRUPTION_ERROR_ACTION(M);
   }
  }
 }

This is something we do in our CVE-2016-1287 IKEv1 exploit that we will detail in a future blog post.

mspaces

dlmalloc-2.8.x introduced the concept of an mspace, which is enabled when using the MSPACES constant. This seems, at least in part, to try to provide something analogous to per-thread arenas, but also serves other purposes. Ostensibly, an mspace is just an opaque structure that refers to a heap. Most generally, an mspace structure is simply cast directly to an mstate, which is then used to manage that heap. The point of using the MSPACES constant is to facilitate the creation and management of multiple discrete heaps, rather than using the more traditional single global mstate structure (called _gm_) to track a single heap.

A few compile-time constants are introduced that are of interest related to mspaces. First, the MSPACES and ONLY_MSPACES constants. By defining MSPACES you enable the use of allocate wrappers called mspace_xxx(), such as mspace_malloc(), mspace_free(), etc. The point of this is to allow a developer to create a dedicated space, by using create_mspace(), which can then be passed to these functions. The ONLY_MSPACES constant is significant because if you don’t define ONLY_MSPACES then dlmalloc library will provide both the malloc, free, calloc, and realloc functions that will become your default allocator and the mspace_xxx functions that can be used for dedicated heaps. In some cases, developers would only want the mspace versions and ONLY_MSPACES allows that. This is noteworthy because you might run into dlmalloc-2.8.x on a system and it won’t be the default allocator, but only used for some specific functionality with a dedicated mspace.

When you call create_mspace(), the allocator maps a memory segment of the requested size capacity and inserts an mstate into it as the first chunk.

MSPACES brings with it some important functionality related to another constant called FOOTERS, which we’ll look at shortly.

2.8.x security mechanisms

dlmalloc-2.8.x has a dedicated section in its comments about security. The main points it raises are the FOOTERS, PROCEED_ON_ERROR, and INSECURE constants. We would like to include the DEBUG and USE_DEV_RANDOM constants here, as they can provide some significant security enhancements.

INSECURE

In general, setting the INSECURE constant will disable almost all of the validation that the heap does. This includes disabling sanity checks of chunk header flags, and disabling sanity checks on addresses, found in chunks, that normally ensure the addresses fall within expected memory ranges associated with the heap. The constant also dictates whether or not any error conditions that are encountered will abort execution.

PROCEED_ON_ERROR

This constant simply dictates the assignment for the CORRUPTION_ERROR_ACTION and USAGE_ERROR_ACTION macros, which control what happens when an error is encountered. If PROCEED_ON_ERROR is set, a detected corrupted state will simply reset the state of the heap and reinitialise the entire mstate rather than failing. It also unsets USAGE_ERROR_ACTION so nothing occurs on error.

The default behavior when PROCEED_ON_ERROR is unset is to abort the program when errors are encountered.

FOOTERS, MSPACES magic

One interesting security mechanism of dlmalloc-2.8.x is the features that come from the use of combination of the MSPACES and FOOTERS constants and how they relate to the use of malloc_param (aka mparam) magic member.

The FOOTERS constant tells dlmalloc to, when creating an in-use chunk, store a size_t value into the adjacent chunk header. This differs from other heap behavior that will often re-use the first member of the adjacent header as spillover area to save space and simplify alignment. The dlmalloc comment describes the FOOTERS constant as follows:

If FOOTERS is defined nonzero, then each allocated chunk
carries an additional check word to verify that it was malloced
from its space. These check words are the same within each
execution of a program using malloc, but differ across
executions, so externally crafted fake chunks cannot be
freed. This improves security by rejecting frees/reallocs that
could corrupt heap memory, in addition to the checks preventing
writes to statics that are always on. This may further improve
security at the expense of time and space overhead. (Note that
FOOTERS may also be worth using with MSPACES.)

In dlmalloc-2.8.x the first member of a malloc_chunk header is not called prev_size anymore but instead called prev_foot. The name foot is used because, when FOOTERS is specified, it is a dedicated field used to hold a specially calculated value to identify the associated heap space. Alternatively, when a chunk is free, the chunk adjacent to it will have a prev_foot value holding the size of the previous chunk, more like what you would expect from dlmalloc-2.7.x and ptmalloc2. Note that this dedicated footer field differs from other configurations where the analogous field can serve as spillover data for a previous in-use chunk.

The value stored in the footer for an allocated chunk, as described in the quoted paragraph above, is used to verify that the chunk is correctly associated with a specific space. This space can either be the default global mstate referenced by _gm_ or if MSPACES is used, then whatever mspace the chunk is associated with.

Note that the quoted text says the footer value is the same for any given space in any given execution, but differs across multiple executions. This is as you’d expect for any sort of global cookie value. Let’s look at how the footer value is calculated. The macro is mark_inuse_foot():

/* Set foot of inuse chunk to be xor of mstate and seed */
#define mark_inuse_foot(M,p,s)
  (((mchunkptr)((char*)(p) + (s)))->prev_foot = ((size_t)(M) ^ mparams.magic))

The parameter M will point to the mstate associated with this chunk, p points to the in-use chunk whose foot is being marked and s is the size of the in-use chunk. This effectively results in the prev_foot of the adjacent chunks header being set to the address of M XORed against mparams.magic.

The effectiveness of this as a security mechanism thus relies on the inability to predict whatever M and mparams.magic are used. M is largely dependent on both ASLR and if you can predict what mstate is being used. We will touch on the implications of a predictable M later.

For now, let’s understand what this mparams.magic value is. The magic value here is calculated and stored both in the malloc_params (aka mparams) structure and the malloc_state (aka mstate) structure. We can see the initialisation of this value in ensure_initialization().

/* Ensure mparams initialized */
#define ensure_initialization()
    (void)(mparams.magic != 0 || init_mparams())

The init_param() function initialises this magic member when the heap is first being initialised and sets it to one of two values. If USE_DEV_RANDOM is set it will read sizeof(size_t) bytes from /dev/urandom. Otherwise it uses time(0).

#if USE_DEV_RANDOM
      int fd;
      unsigned char buf[sizeof(size_t)];
      /* Try to use /dev/urandom, else fall back on using time */
      if ((fd = open("/dev/urandom", O_RDONLY)) >= 0   
          read(fd, buf, sizeof(buf)) == sizeof(buf)) {
        s = *((size_t *) buf);
        close(fd);
      }
      else
#endif /* USE_DEV_RANDOM */
        s = (size_t)(time(0) ^ (size_t)0x55555555U);
     
      s |= (size_t)8U; /* ensure nonzero */
      s  = ~(size_t)7U; /* improve chances of fault for bad values */
    
    }

Obviously USE_DEV_RANDOM is much better, especially if you have any problems with your time functions (more on that later). The next question is where and when is this prev_foot validated at runtime. The macro to decode the expected mstate address is get_mstate_for():

#define get_mstate_for(p)
 ((mstate)(((mchunkptr)((char*)(p) +
   (chunksize(p))))->prev_foot ^ mparams.magic))

We can see this in use during calls to dlfree():

#if FOOTERS
    mstate fm = get_mstate_for(p);
    if (!ok_magic(fm)) {
      USAGE_ERROR_ACTION(fm, p);
      return;
    }
#else /* FOOTERS */
#define fm gm
#endif /* FOOTERS */

and mspace_free():

#if FOOTERS
    mstate fm = get_mstate_for(p);
#else
    mstate fm = (mstate)msp;
#endif
    if (!ok_magic(fm)) {
      USAGE_ERROR_ACTION(fm, p);
      return;
    }

It’s also checked in the dlrealloc() and mspace_realloc() functions.

First this is fetching what it believes to be an mstate address from a chunk address. In the process it is validating with ok_magic() that the magic member of that mstate matches the global mparams.magic value that is set. So, in the event that the prev_foot value is incorrect, the magic value won’t match.

Whether or not an abort occurs is based on the compile-time constant called PROCEED_ON_ERROR that we touched on earlier.

#if (FOOTERS    !INSECURE)
/* Check if (alleged) mstate m has expected magic field */
#define ok_magic(M)      ((M)->magic == mparams.magic)
#else
#define ok_magic(M)      (1)
#endif

Safe unlinking

dlmalloc-2.8.5 introduced safe unlinking, something glibc’s ptmalloc2, forked from dlmalloc-2.7.x, has had for some time. This safe unlinking is done for both small chunks and tree chunks.

The unlink_small_chunk() from malloc-2.8.4.c and below is:

/* Unlink a chunk from a smallbin */
#define unlink_small_chunk(M, P, S) {
 mchunkptr F = P->fd;
 mchunkptr B = P->bk;
 bindex_t I = small_index(S);
 assert(P != B);
 assert(P != F);
 assert(chunksize(P) == small_index2size(I));
 if (F == B)
   clear_smallmap(M, I);
 else if (RTCHECK((F == smallbin_at(M,I) || ok_address(M, F))   
                  (B == smallbin_at(M,I) || ok_address(M, B)))) {
   F->bk = B;
   B->fd = F;
 }
 else {
   CORRUPTION_ERROR_ACTION(M);
 }
}

Whereas, for malloc-2.8.5.c, they introduced additional checks F->bk == P and B->fd == P.

/* Unlink a chunk from a smallbin */
#define unlink_small_chunk(M, P, S) {
 mchunkptr F = P->fd;
 mchunkptr B = P->bk;
 bindex_t I = small_index(S);
 assert(P != B);
 assert(P != F);
 assert(chunksize(P) == small_index2size(I));
 if (RTCHECK(F == smallbin_at(M,I) || (ok_address(M, F)    F->bk == P))) { 
   if (B == F) {
     clear_smallmap(M, I);
   }
   else if (RTCHECK(B == smallbin_at(M,I) ||
                    (ok_address(M, B)    B->fd == P))) {
     F->bk = B;
     B->fd = F;
   }
   else {
     CORRUPTION_ERROR_ACTION(M);
   }
 }
 else {
   CORRUPTION_ERROR_ACTION(M);
 }
}

Note RTCHECK() causing an exception immediately is reliant on INSECURE being unset and a specific version of GNUC. Otherwise it will be based on CORRUPTION_ERROR_ACTION behavior, which we described earlier.

/* In gcc, use __builtin_expect to minimize impact of checks */
#if !INSECURE
#if defined(__GNUC__)    __GNUC__ >= 3
#define RTCHECK(e) __builtin_expect(e, 1)
...

Similarly, malloc-2.8.4.c has an unlink_large_chunk() defined as follows:

#define unlink_large_chunk(M, X) {
 tchunkptr XP = X->parent;
 tchunkptr R;
 if (X->bk != X) {
   tchunkptr F = X->fd;
   R = X->bk;
   if (RTCHECK(ok_address(M, F))) {
     F->bk = R;
     R->fd = F;
   }
   else {
     CORRUPTION_ERROR_ACTION(M);
   }
 }

And malloc-2.8.5.c introduced the more secure version:

#define unlink_large_chunk(M, X) {
 tchunkptr XP = X->parent;
 tchunkptr R;
 if (X->bk != X) {
   tchunkptr F = X->fd;
   R = X->bk;
   if (RTCHECK(ok_address(M, F)    F->bk == X    R->fd == X)) {
     F->bk = R;
     R->fd = F;
 }
 else {
     CORRUPTION_ERROR_ACTION(M);
 }
 ...

It’s useful to know your version in case you run into a system using an older version.

DEBUG constant

Although it’s not explicitly marked as a security feature, the DEBUG constant will more aggressively validate handled chunks and therefore does result in more security. This is in part because, in DEBUG mode, assert()'s will abort the program, preventing further exploitation if triggered. It’s also because of the introduction of additional functions called at runtime like check_inuse_chunk():

#if ! DEBUG

#define check_free_chunk(M,P)
#define check_inuse_chunk(M,P)
#define check_malloced_chunk(M,P,N)
#define check_mmapped_chunk(M,P)
#define check_malloc_state(M)
#define check_top_chunk(M,P)

#else
#define check_free_chunk(M,P)        do_check_free_chunk(M,P)
#define check_inuse_chunk(M,P)       do_check_inuse_chunk(M,P)
#define check_top_chunk(M,P)         do_check_top_chunk(M,P)
#define check_malloced_chunk(M,P,N)  do_check_malloced_chunk(M,P,N)
#define check_mmapped_chunk(M,P)     do_check_mmapped_chunk(M,P)
#define check_malloc_state(M)        do_check_malloc_state(M)

By default most allocation routines functions don’t do a lot of validation, but these DEBUG-specific functions actually do. That said, even though there are a lot defined above, most of them aren’t called during the normal code allocation and free code paths. One that is called is check_inuse_chunk(), which we can take a closer look at.

/* Check properties of inuse chunks */
static void do_check_inuse_chunk(mstate m, mchunkptr p) {
 do_check_any_chunk(m, p);
 assert(cinuse(p));
 assert(next_pinuse(p));
 /* If not pinuse and not mmapped, previous chunk has OK offset */
 assert(is_mmapped(p) || pinuse(p) || next_chunk(prev_chunk(p)) == p);
 if (is_mmapped(p))
   do_check_mmapped_chunk(m, p);
}

We can see that it does some pretty obvious checks. If the current chunk is in use it should have the CINUSE flag set and the adjacent forward chunk should have the PINUSE flag set. If the chunk isn’t mmapped and the previous chunk is free, it can work out the previous chunk’s starting location using the size stored in prev_foot. So it validates that the size of the previous chunk points to the in-use chunk as you’d expect.

In addition to these we see the first call is do_check_any_chunk(), so let’s take a look at what this does as well.

/* Check properties of any chunk, whether free, inuse, mmapped etc */
static void do_check_any_chunk(mstate m, mchunkptr p) {
 assert((is_aligned(chunk2mem(p))) || (p->head == FENCEPOST_HEAD));
 assert(ok_address(m, p));
}

This is quite straightforward. It ensures that the chunk is aligned on an expected boundary or that the chunk is a special FENCEPOST value. It calls a macro ok_address(). This is defined as follows:

#if !INSECURE
/* Check if address a is at least as high as any from MORECORE or MMAP */
#define ok_address(M, a) ((char*)(a) >= (M)->least_addr)

This check is interesting as it could, for example, prevent a bug where you can free an arbitrary address and try to free a fake chunk located on the stack. This would only be prevented by assuming the stack address was lower than the segment(s) of memory the mstate is managing.

All in all, these do_check_inuse_chunk() checks are pretty easy to overcome. Where is check_inuse_chunk() actually called? Mostly in debug functions that aren’t directly called. But most notably it is called by dlfree() and mspace_free() before freeing a chunk:

void mspace_free(mspace msp, void* mem) {
 if (mem != 0) {
   mchunkptr p = mem2chunk(mem);
#if FOOTERS
   mstate fm = get_mstate_for(p);
#else /* FOOTERS */
   mstate fm = (mstate)msp;
#endif /* FOOTERS */
 if (!ok_magic(fm)) {
   USAGE_ERROR_ACTION(fm, p);
   return;
 }
 if (!PREACTION(fm)) {
   check_inuse_chunk(fm, p);

It is also checked in internal_realloc(), which is called by dlrealloc() and mspace_realloc(), and a few other places like inside do_check_malloced_chunk(). This latter function is called quite a bit throughout the allocation routines and means there are even more checks happening regularly. You can see below that it does a few additional checks to validate the size of the chunk in use.

/* Check properties of malloced chunks at the point they are malloced */
static void do_check_malloced_chunk(mstate m, void* mem, size_t s) {
 if (mem != 0) {
   mchunkptr p = mem2chunk(mem);
   size_t sz = p->head   ~(PINUSE_BIT|CINUSE_BIT);
   do_check_inuse_chunk(m, p);
   assert((sz   CHUNK_ALIGN_MASK) == 0);
   assert(sz >= MIN_CHUNK_SIZE);
   assert(sz >= s);
   /* unless mmapped, size is less than MIN_CHUNK_SIZE more than request */
   assert(is_mmapped(p) || sz < (s + MIN_CHUNK_SIZE));
 }
}

We mentioned earlier that safe unlinking isn’t on by default <= 2.8.4, so it is interesting to note that check_free_chunk() function does do a linkage check itself.

/* Check properties of free chunks */
static void do_check_free_chunk(mstate m, mchunkptr p) {
 size_t sz = p->head   ~(PINUSE_BIT|CINUSE_BIT);
 mchunkptr next = chunk_plus_offset(p, sz);
 do_check_any_chunk(m, p);
 assert(!cinuse(p));
 assert(!next_pinuse(p));
 assert (!is_mmapped(p));
 if (p != m->dv    p != m->top) {
   if (sz >= MIN_CHUNK_SIZE) {
     assert((sz   CHUNK_ALIGN_MASK) == 0);
     assert(is_aligned(chunk2mem(p)));
     assert(next->prev_foot == sz);
     assert(pinuse(p));
     assert (next == m->top || cinuse(next));
     assert(p->fd->bk == p);
     assert(p->bk->fd == p);
   }
   else /* markers are always of size SIZE_T_SIZE */
     assert(sz == SIZE_T_SIZE);
 }
}

However, unlike the other DEBUG functions, it turns out check_free_chunk() is rarely ever called. The only normal code path it is called along is a specific conditional case in prepend_alloc(), which in turn is only called by very specific cases via sys_alloc(). Therefore, for typical corruption situations it would never actually touch a corrupted free chunk where you had modified the linkage.

The take away from all of this is that with DEBUG builds you’ll have to be more careful when modifying heap chunk headers, but you can often still abuse unlinking during coalescing and other common exploit tricks despite the extra checks.

There is an interesting case with DEBUG builds where some aggressive checking functions, like traverse_and_check(), are defined and also do these additional DEBUG-based checks.

/* Traverse each chunk and check it; return total */
static size_t traverse_and_check(mstate m) {
 size_t sum = 0;
 if (is_initialized(m)) {
   msegmentptr s =  m->seg;
   sum += m->topsize + TOP_FOOT_SIZE;
   while (s != 0) {
     mchunkptr q = align_as_chunk(s->base);
     mchunkptr lastq = 0;
     assert(pinuse(q));
     while (segment_holds(s, q)   
            q != m->top    q->head != FENCEPOST_HEAD) {
        sum += chunksize(q);
        if (cinuse(q)) {
          assert(!bin_find(m, q));
          do_check_inuse_chunk(m, q);
        }
        else {
          assert(q == m->dv || bin_find(m, q));
          assert(lastq == 0 || cinuse(lastq)); /* Not 2 consecutive free */
          do_check_free_chunk(m, q);
        }
        lastq = q;
        q = next_chunk(q);
    }
    s = s->next;
  }
 }
 return sum;
}

These functions are not used by any default heap functions but can be used by some specific modifications of dlmalloc, and is the case for Cisco ASA Checkheaps implementation. We will detail this in a future blog post dedicated to Checkheaps.

libdlmalloc

Before looking at how dlmalloc was built on Cisco ASA devices, it is useful to introduce a new tool that can help with some of the analysis with such things.

libdlmalloc [1] was originally developed to aid in observing the success or failure of heap feng shui attempts and exploit states, but it should provide general purpose value to both developers and exploit writers doing any sort of dlmalloc-related analysis. libdlmalloc was primarily modelled after other similar heap analysis tools such as libtalloc [13], developed by Aaron Adams, and in turn the jemalloc plugin shadow [14], developed by argp and huku at CENSUS labs. In the same vein as these tools, main functionality is provided by a set of discrete debugger commands. Some Python functions are included that replicate various macros and functions inside dlmalloc-2.8.x, as is the approach in cloudburst’s libheap [15], however this wasn’t the primary focus.

It is worth noting that we don’t currently abstract out much of the debugging logic. However, we plan to do it eventually, so in that sense it lags behind the more recent design changes in both shadow [16] and libheap.

Currently we provide a file containing the main logic called libdlmalloc_28x.py. Although some portions contain gdb-specific functionality, in general it can be used to some capacity without being run inside gdb as well, which is useful for offline analysis of a heap snapshot for instance.

import libdlmalloc_28x as libdlmalloc

When used with our asadbg scripts [19], libdlmalloc is very powerful because it will be automatically loaded and available on whatever firmware version you are debugging. We detailed this in a previous blog post.

Overview of commands

dlhelp

This is the main function to view the available commands. Each of the commands supports the -h option which allows you to obtain more detailed usage instructions.

(gdb) dlhelp
[libdlmalloc] dlmalloc commands for gdb
[libdlmalloc] dlchunk    : show one or more chunks metadata and contents
[libdlmalloc] dlmstate   : print mstate structure information. caches address after first use
[libdlmalloc] dlcallback : register a callback or query/modify callback status
[libdlmalloc] dlhelp     : this help message
[libdlmalloc] NOTE: Pass -h to any of these commands for more extensive usage. Eg: dlchunk -h

dlmstate

First we’ll show the dlmstate command. We can use this command to analyse the contents of an mstate structure at a specified address.

(gdb) dlmstate -h
[libdlmalloc] usage: dlmstate [-v] [-f] [-x] [-c ] 
[libdlmalloc]   a mstate struct addr. Optional if mstate cached
[libdlmalloc]  -v     use verbose output (multiples for more verbosity)
[libdlmalloc]  -c     print bin counts
[libdlmalloc]  --depth how deep to count each bin (default 10)
[libdlmalloc]  NOTE: Last defined mstate will be cached for future use

The usage is straightforward. Normally you will simply supply an address or, if you want to use a cached version, no address at all.

For our example we happen to know that the mstate is at address 0xa8400008:

(gdb) dlmstate 0xa8400008
struct dl_mstate @ 0xa8400008 {
smallmap    = 0b000111101100001000011111111100
treemap     = 0b000000000000001011011010000111
dvsize      = 0xed268
topsize     = 0x2eac55c0
least_addr  = 0xa8400000
dv          = 0xad03d778
top         = 0xad13aa10
trim_check  = 0x200000
magic       = 0x2900d4d8
smallbin[00] (sz 0x0)   = 0xa840002c, 0xa840002c [EMPTY]
smallbin[01] (sz 0x8)   = 0xa8400034, 0xa8400034 [EMPTY]
smallbin[02] (sz 0x10)  = 0xacff8068, 0xa88647f0
smallbin[03] (sz 0x18)  = 0xa95059c0, 0xa9689640
[SNIP]
smallbin[13] (sz 0x68)  = 0xad0297e8, 0xad0297e8 [EMPTY]
smallbin[14] (sz 0x70)  = 0xad029420, 0xad029420 [EMPTY]
smallbin[15] (sz 0x78)  = 0xad03d688, 0xad03d688
[SNIP]
smallbin[27] (sz 0xd8)  = 0xad03d3e0, 0xad03d3e0 [EMPTY]
smallbin[28] (sz 0xe0)  = 0xad0370d0, 0xad0370d0 [EMPTY]
smallbin[29] (sz 0xe8)  = 0xac4e5760, 0xac4e5760 [EMPTY]
smallbin[30] (sz 0xf0)  = 0xad029330, 0xad029330 [EMPTY]
smallbin[31] (sz 0xf8)  = 0xad0368b8, 0xad0368b8 [EMPTY]
treebin[00] (sz 0x180)      = 0xacff0e98
treebin[01] (sz 0x200)      = 0xacfefc00
treebin[02] (sz 0x300)      = 0xa9509958
treebin[03] (sz 0x400)      =        0x0 [EMPTY]
[SNIP]
treebin[28] (sz 0x600000)   =        0x0 [EMPTY]
treebin[29] (sz 0x800000)   =        0x0 [EMPTY]
treebin[30] (sz 0xc00000)   =        0x0 [EMPTY]
treebin[31] (sz 0xffffffff) =        0x0 [EMPTY]
footprint   = 0x33800000
max_footprint = 0x33800000
mflags      = 0x7
mutex       = 0x0,0x0,0x0,0x0,0xa8400000,
seg = struct malloc_segment @ 0xa84001d4 {
base        = 0xa8400000
size        = 0x33800000
next        = 0x0
sflags      = 0x8

We can see that there is some useful information presented, like the state of the various small and tree bins, including if they contain any chunks. Whether or not a bin is marked as [EMPTY] is dictated by checking the corresponding smallmap or treemap bitmaps and whether or not the bin entry has legitimate pointers. We can also get segment information that tracks the various backing pages used to store actual chunks.

In the case above we happen to know that the mstate on a 32-bit system we were analysing was at 0xa8400008. We don’t currently support symbol resolution, however if symbols are present it would be possible to find this address by querying the _gm_ global from dlmalloc similar to how, on ptmalloc2, tools will often look up main_arena.

It’s worth highlighting that some values shown must be fuzzily inferred, which means they may be prone to error. For instance, we don’t necessarily know the native system’s mutex size, or if mutexes were even compiled into dlmalloc. As such, we try to work it out based on a simple heuristic check.

We will refer back to the originally shown dlmstate structure in order to further demonstrate other commands.

Another nice feature is related to the fact that sometimes you want an approximate idea of how many chunks live in a specific bin. Perhaps you want to ensure you have only one chunk in a bin. You can use dlmstate -c to do a count of each bin. By default it only counts maximum ten entries per bin, so that it’s not too sluggish over slow debugging connections.

(gdb) dlmstate -c
[libdlmalloc] Using cached mstate
smallbin[00] (sz 0x0)   = 0xa840002c, 0xa840002c [EMPTY]
smallbin[01] (sz 0x8)   = 0xa8400034, 0xa8400034 [EMPTY]
smallbin[02] (sz 0x10)  = 0xa94f59c0, 0xa88647f0 [10+]
smallbin[03] (sz 0x18)  = 0xacb59f70, 0xa9689a30 [10+]
smallbin[04] (sz 0x20)  = 0xacff2be0, 0xa87206f8 [10+]
smallbin[05] (sz 0x28)  = 0xa883dd48, 0xa948a100 [10+]
smallbin[06] (sz 0x30)  = 0xa8a1d6a8, 0xa8a1e230 [10+]
smallbin[07] (sz 0x38)  = 0xac787d80, 0xacfe5070 [8]
smallbin[08] (sz 0x40)  = 0xa94af598, 0xa94af598 [2]
smallbin[09] (sz 0x48)  = 0xac4e5088, 0xac4e5088 [EMPTY]
smallbin[10] (sz 0x50)  = 0xac4e5080, 0xac4e5080 [EMPTY]
smallbin[11] (sz 0x58)  = 0xa8a1d680, 0xa8a1d680 [EMPTY]
smallbin[12] (sz 0x60)  = 0xac782c08, 0xac782c08 [EMPTY]
smallbin[13] (sz 0x68)  = 0xac4e5068, 0xac4e5068 [2]
smallbin[14] (sz 0x70)  = 0xac4e4fe0, 0xac4e4fe0 [EMPTY]
...
smallbin[31] (sz 0xf8)  = 0xacff1e28, 0xacff1e28 [EMPTY]
treebin[00] (sz 0x180)      = 0xac799f38 [1]
treebin[01] (sz 0x200)      = 0xa883da10 [2]
treebin[02] (sz 0x300)      =        0x0 [EMPTY]
...
treebin[31] (sz 0xffffffff) = 0x0 [EMPTY]

dlchunk

The dlchunk command is used to show information related to a chunk.

(gdb) dlchunk -h
[libdlmalloc] usage: dlchunk [-v] [-f] [-x] [-c ] 
[libdlmalloc]    a dlmalloc chunk header
[libdlmalloc] -v       use verbose output (multiples for more verbosity)
[libdlmalloc] -f       use  explicitly, rather than be smart
[libdlmalloc] -x       hexdump the chunk contents
[libdlmalloc] -m       max bytes to dump with -x
[libdlmalloc] -c       number of chunks to print
[libdlmalloc] -s       search pattern when print chunks
[libdlmalloc] --depth  depth to search inside chunk
[libdlmalloc] -d       debug and force printing stuff

To demonstrate the dlchunk command we will analyse the chunk that was shown at smallbin[15] in the first dlmstate output example (not shown in the dlmstate -c example). Let’s analyse this chunk at 0xad03d688 that we expect to have a size of 0x78 (the bin size).

(gdb) dlchunk 0xad03d688
0xad03d688 F sz:0x00078 fl:-P

We see that this is a free chunk (F) of size 0x78 and the PINUSE flag is set, meaning the previously adjacent chunk is in use. If we want a bit more detail we use the -v flag:

(gdb) dlchunk -v 0xad03d688
struct malloc_chunk @ 0xad03d688 {
prev_foot   = 0x8140d4d0
head        = 0x78 (PINUSE)
fd          = 0xa84000a4
bk          = 0xa84000a4

Here we get some more information, as well as the fd/bk pointers which happen to point into the mstate bin index for this size of free chunk. This is an example where, if you don’t yet know the mstate address but you’re analysing some heap chunk, you might still be able to find a reference to somewhere inside the mstate structure and work out the base address.

If we need to, we can dump the contents of the chunk using the additional -x:

(gdb) dlchunk -v 0xad03d688 -x
struct malloc_chunk @ 0xad03d688 {
prev_foot   = 0x8140d4d0
head        = 0x78 (PINUSE)
fd          = 0xa84000a4
bk          = 0xa84000a4
0x68 bytes of chunk data:
0xad03d698: 0xf3ee0123 0x00000000 0xad03d618 0xad03d708
0xad03d6a8: 0x08a7c048 0x096cfb47 0x00000000 0x00000000
0xad03d6b8: 0x00000000 0x00000000 0x00000000 0xffffffff
0xad03d6c8: 0x0000001c 0x0000002e 0x00000007 0x0000001e
0xad03d6d8: 0x00000004 0x00000075 0x00000002 0x00000095
0xad03d6e8: 0x00000000 0x00000000 0x02b0a0c5 0x00000000
0xad03d6f8: 0x5ee33210 0xf3eecdef

Using –m, the output from -x can be limited to only show a max number of bytes from the chunk.

If we want to look at a few chunks adjacent to this free chunk we’re analysing, we use -c:

(gdb) dlchunk -c 10 0xad03d688
0xad03d688 F sz:0x00078 fl:-P
0xad03d700 M sz:0x00078 fl:C-
0xad03d778 F sz:0xed268 fl:-P
0xad12a9e0 M sz:0x10030 fl:C-
0xad13aa10 F sz:0x2eac55c0 fl:-P
0xdbbfffd0 F sz:0x00030 fl:--
<<< end of heap segment >>>

We specify we want to see ten chunks adjacent to the 0x78-byte chunk of interest. What we see above is that there is an adjacent in-use chunk (denoted by the CINUSE flag C being set), an adjacent free chunk, another adjacent allocated chunk, and then an extremely large free chunk followed by a special small free chunk (special because a chunk should always have P or C flags set).

If you are familiar with dlmalloc, you will recognise that the 0x2eac55c0-byte chunk is the top chunk which is a free chunk encompassing almost the entire remainder of available heap memory. The final free chunk following it is a special marker indicating the actual end of the segment. Note that the top chunk is not specific to dlmalloc-2.8.x, but exists in 2.6.x and 2.7.x. This top chunk is often referred to as the wilderness.

Back to the commands, we can get significantly more details about these chunks by combining the -v and -c options:

(gdb) dlchunk -v -c 10 0xad03d688
struct malloc_chunk @ 0xad03d688 {
prev_foot   = 0x8140d4d0
head        = 0x78 (PINUSE)
fd          = 0xa84000a4
bk          = 0xa84000a4
--
struct malloc_chunk @ 0xad03d700 {
prev_foot   = 0x78
size        = 0x78 (CINUSE)
--
struct malloc_tree_chunk @ 0xad03d778 {
prev_foot   = 0x8140d4d0
head        = 0xed268 (PINUSE)
fd          = 0x0
bk          = 0x0
left        = 0x0
right       = 0x0
parent      = 0x5ee33210
bindex      = 0xa84003a4
--
struct malloc_chunk @ 0xad12a9e0 {
prev_foot   = 0xed268
size        = 0x10030 (CINUSE)
--
struct malloc_tree_chunk @ 0xad13aa10 {
prev_foot   = 0x8140d4d0
head        = 0x2eac55c0 (PINUSE)
fd          = 0x0
bk          = 0x0
left        = 0x0
right       = 0x0
parent      = 0x0
bindex      = 0x0
--
struct malloc_chunk @ 0xdbbfffd0 {
prev_foot   = 0x0
head        = 0x30
fd          = 0x0
bk          = 0x0
--
<<< end of heap segment >>>

As you can see, the tool will stop parsing if it hits what it determines is the edge of the heap segment.

We support large free chunk analysis which, as we noted earlier, uses a tree-based structure (technically a bitwise trie [17]) rather than the usually doubly linked list seen in small chunks. We can see in the output above that the chunk at 0xad03d778 is both large and free; correspondingly we see that the output additionally shows the left, right, parent, and bindex values for the tree structure. However, the reader will notice that these values seem wrong. And indeed, they aren’t accurate in this particular case. Why is that? It’s because this chunk happens to be the most recently freed chunk and therefore is currently what is called the ‘designated victim’, meaning it hasn’t yet been inserted into a treebin tree yet. You can validate this by checking against the dv value shown in the mstate structure:

(gdb) dlmstate 0xa8400008
struct dl_mstate @ 0xa8400008 {
smallmap    = 0b000111101100001000011111111100
treemap     = 0b000000000000001011011010000111
dvsize      = 0xed268
topsize     = 0x2eac55c0
least_addr  = 0xa8400000
dv          = 0xad03d778
top         = 0xad13aa10
[...]

As you can see, dv and dvsize match the chunk we were analysing. The ability to correlate this type of information when analysing heap behavior is very useful. There are times during exploitation where you specifically want to rely on the designated victim chunk’s behavior, so this can help realise those scenarios.

Going back to the treebin structure, by analysing a chunk actually on a treebin, we can see that the fields are in fact set correctly:

(gdb) dlchunk -v 0xad029ac0
struct malloc_tree_chunk @ 0xad029ac0 {
prev_foot   = 0x8140d4d0
head        = 0xc530 (PINUSE)
fd          = 0xad029ac0
bk          = 0xad029ac0
left        = 0x0
right       = 0x0
parent      = 0xa8400170
bindex      = 0xf

Another useful option of dlchunk is searching:

(gdb) dlchunk -x 0xad03d700
0xad03d700 M sz:0x00078 fl:C- alloc_pc:0x08a7c048,-
0x70 bytes of chunk data:
0xad03d708: 0xa11c0123 0x00000048 0x00000000 0x00000000
0xad03d718: 0xad03d618 0xa84003c4 0x08a7c048 0x096cfb47
0xad03d728: 0x00000000 0x00000000 0x00000000 0x00000000
0xad03d738: 0x00000000 0xffffffff 0x0000001c 0x00000010
0xad03d748: 0x00000008 0x0000001e 0x00000004 0x00000075
0xad03d758: 0x00000002 0x00000095 0x00000000 0x00000000
0xad03d768: 0x02cc1773 0x00000000 0xa11ccdef 0x00000000
(gdb) dlchunk -s 0x02cc1773 0xad03d700
0xad03d700 M sz:0x00078 fl:C- alloc_pc:0x08a7c048,- [MATCH]

We see above that the value 0x02cc1773 is present in the chunk’s hex output, so we search for it and we get a match. This is coupled with the -c command to search a number of chunks. This would let you see, for example, if a series of adjacent chunks of some recognisable size hold some value you had tried to populate them with, without manually searching through the hexdumps.

We can further augment this using the --depth argument which lets you specify how deep into the chunk you want to search for the value. This is useful if you know that one of a hundred chunks you are interested in will have some wanted value in the first 16-bytes, but you don’t care about the remaining bytes of the chunk, no matter its size.

dlcallback

Another feature we added, which was specifically to help with some analysis of the Cisco ASA device, is the concept of callbacks that the dlchunk and dlmstate libdlmalloc commands can call to provide additional information. The callback we created to test this functionality so far is from a separate heap analysis tool we called libmempool that we detail in a future blog post. But in short it is designed to describe Cisco-specific related structures injected into dlmalloc chunks by special heap wrapping functions.

(gdb) dlcallback -h
[libdlmalloc] usage: dlcallback

In order to use a callback you need to register a function that will be passed some information about the chunk or mstate being inspected. In our case we have libmempool.py specify the mpcallback function that takes an argument to a dictionary containing a lot of information provided by libdlmalloc. We can then register this function using the command:

(gdb) dlcallback register mpcallback libmempool/libmempool
[libmempool] loaded
[libdlmalloc] mpcallback registered as callback

In the case above, the libmempool.py package located in the libmempool/ folder is loaded and the mpcallback function is dynamically looked up. The code will attempt to either lookup the specified function in the global namespace or load the specified module and look up the function in that module’s namespace.

We can validate that the callback is registered:

(gdb) dlcallback status
[libdlmalloc] a callback is registered and enabled

If we wanted to temporarily disable the callback for whatever reason, we can disable it and re-enable it at a different time as well. We can clear it technically:

(gdb) dlcallback status
[libdlmalloc] a callback is registered and enabled
(gdb) dlcallback disable
[libdlmalloc] callback disabled
(gdb) dlcallback status
[libdlmalloc] a callback is registered and disabled
(gdb) dlcallback enable
[libdlmalloc] callback enabled
(gdb) dlcallback clear
[libdlmalloc] callback cleared
(gdb) dlcallback status
[libdlmalloc] a callback is not registered

For the sake of example, let’s register the mpcallback function. When run, it’s entirely up to the callback to do whatever it wants to do with the information provided to it by libdlmalloc. In our example case, mpcallback will look for a Cisco-specific structure inside the chunk and dump out the values.

(gdb) dlcallback register mpcallback libmempool/libmempool
[libmempool] loaded
[libdlmalloc] mpcallback registered as callback

Then we look at some dlmalloc chunks on a Cisco ASA device, where we know they’ve been wrapped by a function that inserts a mempool header inside:

(gdb) dlcallback disable
[libdlmalloc] callback disabled
(gdb) dlchunk 0xad03d700
0xad03d700 M sz:0x00078 fl:C-
(gdb) dlcallback enable
[libdlmalloc] callback enabled
(gdb) dlchunk 0xad03d700
0xad03d700 M sz:0x00078 fl:C- alloc_pc:0x08a7c048,-

In the example above we compare the output from a non-verbose dlchunk listing with and without the callback enabled. We see that the callback adds a special alloc_pc field, which represents the address of the function that was responsible for the allocation, which Cisco helpfully tracks. We can get even more information by using the verbose listing:

(gdb) dlcallback disable
[libdlmalloc] callback disabled
(gdb) dlchunk -v 0xad03d700
struct malloc_chunk @ 0xad03d700 {
prev_foot    = 0x78
size         = 0x78 (CINUSE)
(gdb) dlcallback enable
[libdlmalloc] callback enabled
(gdb) dlchunk -v 0xad03d700
struct malloc_chunk @ 0xad03d700 {
prev_foot    = 0x78
size         = 0x78 (CINUSE)
struct mp_header @ 0xad03d708 {
mh_magic     = 0xa11c0123
mh_len       = 0x48
mh_refcount  = 0x0
mh_unused    = 0x0
mh_fd_link   = 0xad03d618 (OK)
mh_bk_link   = 0xa84003c4 (-)
allocator_pc = 0x8a7c048 (-)
free_pc      = 0x96cfb47 (-)

We can see all the contents of the internal structure parsed by the callback. You can imagine that this type of information is extremely helpful if, for example, you are corrupting an adjacent heap chunk and these structure values must hold specific values.

A similar use for the callbacks is when analysing an mstate. Although it’s likely to be rare, in theory an mstate could be modified to hold additional information. In the case of the Cisco ASA devices we looked at, this is exactly what happens. This mstate structure contains additional book keeping bins and statistics located after the end of the mstate‘s segment structure member.

(gdb) dlcallback status
[libdlmalloc] a callback is registered and enabled
(gdb) dlcallback disable
[libdlmalloc] callback disabled
(gdb) dlmstate
struct dl_mstate @ 0xa8400008 {
smallmap    = 0b000111101100001000011111111100
treemap     = 0b000000000000001011011010000111
dvsize      = 0xed268
topsize     = 0x2eac55c0
least_addr  = 0xa8400000
dv          = 0xad03d778
top         = 0xad13aa10
trim_check  = 0x200000
magic       = 0x2900d4d8
smallbin[00] (sz 0x0)    = 0xa840002c, 0xa840002c [EMPTY]
smallbin[01] (sz 0x8)    = 0xa8400034, 0xa8400034 [EMPTY]
smallbin[02] (sz 0x10)   = 0xacff8068, 0xa88647f0
smallbin[03] (sz 0x18)   = 0xa95059c0, 0xa9689640
smallbin[04] (sz 0x20)   = 0xad0247a8, 0xa87206f8
...
smallbin[30] (sz 0xf0)   = 0xad029330, 0xad029330 [EMPTY]
smallbin[31] (sz 0xf8)   = 0xad0368b8, 0xad0368b8 [EMPTY]
treebin[00] (sz 0x180)       = 0xacff0e98
treebin[01] (sz 0x200)       = 0xacfefc00
treebin[02] (sz 0x300)       = 0xa9509958
treebin[03] (sz 0x400)       =        0x0 [EMPTY]
treebin[04] (sz 0x600)       =        0x0 [EMPTY]
...
treebin[30] (sz 0xc00000)    =        0x0 [EMPTY]
treebin[31] (sz 0xffffffff)  =        0x0 [EMPTY]
footprint   = 0x33800000
max_footprint = 0x33800000
mflags      = 0x7
mutex       = 0x0,0x0,0x0,0x0,0xa8400000,
seg = struct malloc_segment @ 0xa84001d4 {
base        = 0xa8400000
size        = 0x33800000
next        = 0x0
sflags      = 0x8

Above, we call dlmstate with the callback disabled and it shows us the contents of the mstate structure only. Now we’ll enable the callback and check again:

(gdb) dlmstate
struct dl_mstate @ 0xa8400008 {
smallmap    = 0b000111101100001000011111111100
treemap     = 0b000000000000001011011010000111
dvsize      = 0xed268
topsize     = 0x2eac55c0
least_addr  = 0xa8400000
dv          = 0xad03d778
top         = 0xad13aa10
trim_check  = 0x200000
magic       = 0x2900d4d8
smallbin[00] (sz 0x0)   = 0xa840002c, 0xa840002c [EMPTY]
smallbin[01] (sz 0x8)   = 0xa8400034, 0xa8400034 [EMPTY]
smallbin[02] (sz 0x10)  = 0xacff8068, 0xa88647f0
smallbin[03] (sz 0x18)  = 0xa95059c0, 0xa9689640
smallbin[04] (sz 0x20)  = 0xad0247a8, 0xa87206f8
...
smallbin[30] (sz 0xf0)  = 0xad029330, 0xad029330 [EMPTY]
smallbin[31] (sz 0xf8)  = 0xad0368b8, 0xad0368b8 [EMPTY]
treebin[00] (sz 0x180)      = 0xacff0e98
treebin[01] (sz 0x200)      = 0xacfefc00
treebin[02] (sz 0x300)      = 0xa9509958
treebin[03] (sz 0x400)      =        0x0 [EMPTY]
treebin[04] (sz 0x600)      =        0x0 [EMPTY]
...
treebin[30] (sz 0xc00000)   =        0x0 [EMPTY]
treebin[31] (sz 0xffffffff) =        0x0 [EMPTY]
footprint   = 0x33800000
max_footprint = 0x33800000
mflags      = 0x7
mutex       = 0x0,0x0,0x0,0x0,0xa8400000,
seg = struct malloc_segment @ 0xa84001d4 {
base        = 0xa8400000
size        = 0x33800000
next        = 0x0
sflags      = 0x8
struct mp_mstate @ 0xa84001e4 {
mp_smallbin[00] - sz: 0x00000000 cnt: 0x0000, mh_fd_link: 0x0
mp_smallbin[01] - sz: 0x00000008 cnt: 0x0000, mh_fd_link: 0x0
mp_smallbin[02] - sz: 0x00000010 cnt: 0x0000, mh_fd_link: 0x0
mp_smallbin[03] - sz: 0x00000018 cnt: 0x0000, mh_fd_link: 0x0
mp_smallbin[04] - sz: 0x00000020 cnt: 0x0000, mh_fd_link: 0x0
mp_smallbin[05] - sz: 0x00000028 cnt: 0x0000, mh_fd_link: 0x0
mp_smallbin[06] - sz: 0x00000030 cnt: 0x0212, mh_fd_link: 0xacff5230
mp_smallbin[07] - sz: 0x00000038 cnt: 0x0cb6, mh_fd_link: 0xa94b2290
mp_smallbin[08] - sz: 0x00000040 cnt: 0x1c8e, mh_fd_link: 0xad01c8d8
mp_smallbin[09] - sz: 0x00000048 cnt: 0x0273, mh_fd_link: 0xad017fb8
mp_smallbin[10] - sz: 0x00000050 cnt: 0x0426, mh_fd_link: 0xacfdd1c8
mp_smallbin[11] - sz: 0x00000058 cnt: 0x0120, mh_fd_link: 0xad03d5c0
mp_smallbin[12] - sz: 0x00000060 cnt: 0x0127, mh_fd_link: 0xad03d560
mp_smallbin[13] - sz: 0x00000068 cnt: 0x09ff, mh_fd_link: 0xacb53fc0
mp_smallbin[14] - sz: 0x00000070 cnt: 0x003f, mh_fd_link: 0xacff6c78
mp_smallbin[15] - sz: 0x00000078 cnt: 0x0074, mh_fd_link: 0xad03d708
...
mp_smallbin[30] - sz: 0x000000f0 cnt: 0x0006, mh_fd_link: 0xacfe83f0
mp_smallbin[31] - sz: 0x000000f8 cnt: 0x0045, mh_fd_link: 0xad0184e8
mp_treebin[00] - sz: 0x00000100 cnt: 0x0191, mh_fd_link: 0xad024698
mp_treebin[01] - sz: 0x00000200 cnt: 0x0134, mh_fd_link: 0xacff5380
mp_treebin[02] - sz: 0x00000300 cnt: 0x016e, mh_fd_link: 0xacffc548
mp_treebin[03] - sz: 0x00000400 cnt: 0x004e, mh_fd_link: 0xad002f08
mp_treebin[04] - sz: 0x00000600 cnt: 0x0071, mh_fd_link: 0xa9506260
mp_treebin[05] - sz: 0x00000800 cnt: 0x0030, mh_fd_link: 0xacb50bf0
mp_treebin[06] - sz: 0x00000c00 cnt: 0x0273, mh_fd_link: 0xacffb828
mp_treebin[07] - sz: 0x00001000 cnt: 0x004f, mh_fd_link: 0xa9506690
mp_treebin[08] - sz: 0x00001800 cnt: 0x003e, mh_fd_link: 0xacb4d448
mp_treebin[09] - sz: 0x00002000 cnt: 0x0010, mh_fd_link: 0xac74f1e8
mp_treebin[10] - sz: 0x00003000 cnt: 0x0024, mh_fd_link: 0xac781f00
mp_treebin[11] - sz: 0x00004000 cnt: 0x0028, mh_fd_link: 0xacf9e618
mp_treebin[12] - sz: 0x00006000 cnt: 0x009b, mh_fd_link: 0xac795fc0
mp_treebin[13] - sz: 0x00008000 cnt: 0x000b, mh_fd_link: 0xacae3998
mp_treebin[14] - sz: 0x0000c000 cnt: 0x0026, mh_fd_link: 0xad003428
mp_treebin[15] - sz: 0x00010000 cnt: 0x000b, mh_fd_link: 0xacab70b8
...
mp_treebin[30] - sz: 0x00c00000 cnt: 0x0001, mh_fd_link: 0xaae411d0
mp_treebin[31] - sz: 0xffffffff cnt: 0x0001, mh_fd_link: 0xab641700 [UNSORTED]

You can see that an mp_mstate structure is now shown after the mstate structure, which was populated by the callback implemented in libmempool.

When we used dlmstate above, we didn’t provide the address that we had originally passed to dlmstate. This highlights a ‘caching’ feature where libdlmalloc will cache a copy of the mstate structure and address, and use it on subsequent calls unless an address is explicitly specified. This is good on systems where you’re debugging over serial whereby reading a large amount of data can be cumbersome. The caveat with it is that you can sometimes forget that you’re looking at a stale copy of the mstate structure which does not take into account the latest allocations.

Future libdlmalloc development

In the future we hope to properly abstract out the gdb-related portions so that the tool could be easily integrated into other debuggers, in a similar vein to what argp did with shadow, his jemalloc analysis tool. We hope to eventually test on more versions of dlmalloc-2.8.x and with additional compile time options, as currently all testing has been on Cisco devices. We would like to implement other commands such as allowing for easy bin walking and searching, as well as linear in-use searching in a given segment or series of segments.

If you want to try play around with libdlmalloc-28x.py, you can compile one of the malloc-2.8.x.c files from the official server [5] and give it a try.

Cisco implementation of dlmalloc

As noted many times now, the libdlmalloc tool was developed while researching Cisco ASA devices, and as such it has been specifically tested on this platform. This means that it is currently biased towards dlmalloc-2.8.3 despite supporting 2.8.x in general, so expect some inconsistencies if looking at other versions.

On 32-bit and old 64-bit Cisco ASA devices, lina uses a version of dlmalloc compiled directly into its ELF binary, rather than relying on an external version provided in something like glibc or some other library. lina does in fact link to glibc, but due to the way it wraps allocation functions it won’t end up using them. Rather, it calls directly into the dlmalloc functions. We will talk more about the wrappers in another article.

Identifying dlmalloc version used in lina

When reversing lina, we identified several assertions, such as below:

.text:09BE4DE0 loc_9BE4DE0:
.text:09BE4DE0   mov     edx, ebx
.text:09BE4DE2   mov     ecx, offset aHeapMemoryCorr ; "Heap memory corrupted"
.text:09BE4DE7   sub     edx, [ebx] ; unk
.text:09BE4DE9   mov     eax, ebx ; chunk_addr
.text:09BE4DEB   call    print_checkheaps_failure
.text:09BE4DF0   mov     dword ptr [esp+8], 0B0Ah
.text:09BE4DF8   mov     dword ptr [esp+4], offset aMalloc_c ; "malloc.c"
.text:09BE4E00   mov     dword ptr [esp], offset aNext_pinuseP_0 ; "(next_pinuse(p))"
.text:09BE4E07   call    __lina_assert

This can easily be matched to dlmalloc [2] heap allocator source code. It is interesting to match the exact version to get a good idea of what we’re up against and to see if it supports things like safe unlinking.

Looking at lina, starting from validate_buffers() and mspace_malloc() we identify the following functions: check_free_chunk, check_inuse_chunk, check_mmapped_chunk and check_top_chunk. They can be found by looking at the debugging strings passed to __lina_assert. After dumping all malloc*.c from the official website [18] into a local directory, we can easily deduce that it is almost definitely version 2.8.3, or a modification of it, based on two debugging strings being used:

dlmalloc$ grep "(next == m->top || cinuse(next))" *
malloc-2.8.0.c:      assert (next == m->top || cinuse(next));
malloc-2.8.1.c:      assert (next == m->top || cinuse(next));
malloc-2.8.2.c:      assert (next == m->top || cinuse(next));
malloc-2.8.3.c:      assert (next == m->top || cinuse(next));
dlmalloc$ grep "segment_holds(sp, (char*)sp)" *
malloc-2.8.3.c:        assert(segment_holds(sp, (char*)sp));
malloc-2.8.4.c:        assert(segment_holds(sp, (char*)sp));
malloc-2.8.5.c:        assert(segment_holds(sp, (char*)sp));
malloc-2.8.6.c:        assert(segment_holds(sp, (char*)sp));

This means the dlmalloc version embedded in lina does not have safe unlinking!

As a side note, you might wonder what this validate_buffers() function is and why it’s packed with all these asserts. This will be covered in our blog post dedicated to Checkheaps.

Cisco build constants

Given that we know we’re looking at dlmalloc-2.8.3, we still need to try to poke around and work out other configuration details. Based on looking at other asserts and testing we can infer the following configuration constants of interest were used:

MSPACES          1
FOOTERS          1
USE_LOCKS        1
ONLY_MSPACES     1
DEBUG            1
INSECURE         0
USE_DEV_RANDOM   0
PROCEED_ON_ERROR 0

Most notably is that DEBUG is set, which means a significant number of chunk validation routines will be called, though as we described earlier they’re not always enough to catch corruption.

Working out these types of build-time constant for a target system can help you compile a toy build of dlmalloc for offline testing that you can use to try out some ideas without necessarily having to debug a live device (which can be slow over serial).

Static magic and static mstates

Earlier we described the concepts of FOOTERS and MSPACES and how these provide a security mechanism effectively by serving as a form of cookie at the end of a chunk. We showed that USE_DEV_RANDOM could further improve the random value selected for use as the magic member. Let’s take a look at how all of these available pieces can still not come together correctly when looking at the implementation of this logic on the Cisco ASA (analysis taken from 9.2.4 32-bit).

If we return to init_param(), in Cisco’s case they clearly do not use the USE_DEV_RANDOM constant and therefore instead rely on the usage of a time API as their random value. Specifically a function __wrap_time(0) is called. This in turn calls unix_time(), which in turn calls clock_get_time() and clock_epoch_to_unix_time().

.text:09BE4AE0 loc_9BE4AE0: ; CODE XREF: create_mspace_with_base+E
.text:09BE4AE0   mov     ds:mparams_mmap_threshold, 40000h
.text:09BE4AEA   mov     ds:mparams_trim_threshold, 200000h
.text:09BE4AF4   mov     ds:mparam_default_mflags, 3
.text:09BE4AFE   mov     dword ptr [esp], 0
.text:09BE4B05   call    __wrap_time
.text:09BE4B0A   mov     edx, ds:mparams_magic
.text:09BE4B10   test    edx, edx
.text:09BE4B12   jnz     short loc_9BE4B2E
.text:09BE4B14   xor     eax, 55555555h
.text:09BE4B19   or      eax, 8
.text:09BE4B1C   and     eax, 0FFFFFFF8h
.text:09BE4B1F   mov     ds:mparams_magic, eax
.text:09BE4B24   mov     eax, ds:mparam_default_mflags
.text:09BE4B29   mov     ds:_gm__mflags, eax

The magic member of mparams is setup initially when creating a new mspace via create_mspace_with_base().

__wrap_time() is defined as follows:

.text:09C00D70 arg_0             = dword ptr 8
.text:09C00D70
.text:09C00D70   push     ebp
.text:09C00D71   mov      ebp, esp
.text:09C00D73   push     ebx
.text:09C00D74   sub      esp, 4
.text:09C00D77   mov      ebx, [ebp+arg_0]
.text:09C00D7A   call     unix_time
.text:09C00D7F   test     ebx, ebx
.text:09C00D81   jz       short loc_9C00D85
.text:09C00D83   mov      [ebx], eax
.text:09C00D85
.text:09C00D85 loc_9C00D85: ; CODE XREF: __wrap_time+11
.text:09C00D85   add esp, 4
.text:09C00D88   pop ebx
.text:09C00D89   pop ebp
.text:09C00D8A   retn
.text:09C00D8A   endp

The interesting thing about this is that at the time of init_param() in create_mspace_with_base() it is early enough during boot that the __wrap_time(0) call will always end up return a static value. This value appears to be the NTP timestamp max value of 0x7c558180, corresponding to ‘Wed Feb 6 23:28:16 2036’, which is the rollover to a new epoch. This static value is returned because clock_get_time() relies on reading from a global variable to read the current system time and the variable isn’t yet initialised early in boot. So clock_get_time() returns 0. This is then passed to clock_epoch_to_unix_time(), which tries to convert it by doing the following:

.text:09BC4E70 arg_0            = dword ptr 8
.text:09BC4E70
.text:09BC4E70   push    ebp
.text:09BC4E71   mov     ebp, esp
.text:09BC4E73   mov     eax, [ebp+arg_0]
.text:09BC4E76   pop     ebp
.text:09BC4E77   mov     eax, [eax]
.text:09BC4E79   add     eax, 7C558180h
.text:09BC4E7E   retn
.text:09BC4E7E   endp
.text:09BC4E7E

This leads to the result of 0x7c558180 being returned. When XORed with 0x55555555 and bit fiddled the resulting magic value is 0x2900d4d8.

s = (size_t)(time(0) ^ (size_t)0x55555555U);

s |= (size_t)8U; /* ensure nonzero */
s  = ~(size_t)7U; /* improve chances of fault for bad values */

We can confirm that this is in fact the value by looking at the magic value output from calling the dlmstate command:

(gdb) dlmstate 0xa8400008
struct dl_mstate @ 0xa8400008 {
...
magic       = 0x2900d4d8
...

If you’re playing around with a Cisco ASA device you could confirm that this magic value never changes across builds or reboots on your system. So, why is this significant?

Imagine a system with no ASLR and that a constant mparams.magic value is setting these footers. The value of M should be relatively predictable, and mparams.magic is entirely predictable. As such, we can predict the value of prev_foot, which means if we’re abusing a memory corruption bug on a Cisco ASA device, we have a good chance of bypassing the footer checks.

As an example using the addresses and values previously shown in this article, our mstate is at 0xa8400008 and mparams.magic is 0x2900d4d8. This means we’d expect the prev_foot values to be 0xa8400008^0x2900d4d8=0x8140d4d0. Let’s take a look at an in-use chunk:

(gdb) dlchunk -x 0xacfe5430-8
0xacfe5428 M sz:0x00030 fl:CP alloc_pc:0x09bec1ae,-
0x28 bytes of chunk data:
0xacfe5430: 0xa11c0123 0x00000004 0x00000000 0x00000000
0xacfe5440: 0xacfee540 0xa84002a4 0x09bec1ae 0x00000000
0xacfe5450: 0xacfe9ab0 0xa11ccdef

We see that we have a 0x30 byte in-use chunk. We want to see the prev_foot value of the adjacent chunk to it:

(gdb) dlchunk 0xacfe5458 -v
struct malloc_chunk @ 0xacfe5458 {
prev_foot   = 0x8140d4d0
head        = 0x10 (PINUSE)
fd          = 0xacff2ca0
bk          = 0xacff3e58

As we can see, the prev_foot value is exactly what we expected.

How predictable are the mstate addresses in practice? On our Cisco ASA 5505 devices we’ve only ever observed two addresses for the most heavily used mspace. There are, however, multiple mspaces in use on this device; one for general shared heap allocations and one dedicated to DMA-related allocations. We found that the mstate for the global shared mempool that most allocations come from is almost always at 0xa8800008 or 0xa8400008.

However, this variance can reduce the chances of successful exploitation if you were to get it wrong. This isn’t ideal. But it turns out another customisation Cisco made to dlmalloc makes all of what we just described irrelevant for the security of the allocator anyway. As we showed earlier, normally the beginning of a call like mspace_free() is as follows:

void mspace_free(mspace msp, void* mem) {
 if (mem != 0) {
    mchunkptr p = mem2chunk(mem);
#if FOOTERS
    mstate fm = get_mstate_for(p);
    msp = msp; /* placate people compiling -Wunused */
#else /* FOOTERS */
    mstate fm = (mstate)msp;
#endif /* FOOTERS */
    if (!ok_magic(fm)) {
      USAGE_ERROR_ACTION(fm, p);
      return;
 }

However, upon reversing Cisco’s implementation of mspace_free(), it turns out they have a custom modification where they assume that the correct mspace is passed if non-NULL and don’t rely on using the pointer derived from a footer at all!

/* Approximate implementation on Cisco ASA 9.2.4 */
void mspace_free(mspace msp, void* mem) {
  if (mem != 0) {
    mchunkptr p = mem2chunk(mem);

    if (msp != NULL) {
      mstate fm = (mstate)msp;
    } else {
      mstate fm = get_mstate_for(p);
      msp = msp; /* placate people compiling -Wunused */
    }
    if (!ok_magic(fm)) {
      USAGE_ERROR_ACTION(fm, p);
      return;
    }

A similar lack of checking is present in mspace_realloc().

This means we don’t have to correctly guess the footer at all! Interestingly, you can confirm this in practice if you’re testing Exodus Intel’s CVE-2016-1287 exploit. They use a static prev_foot that, when decoded, corresponds to an mstate located at 0xc8000008, which with an unmodified dlmalloc-2.8.3 should fail on our systems where we’ve been seeing 0xa8400008 and 0xa8800008. But it still succeeds!

A quick look at dlmalloc on 64-bit

In all of our earlier examples we showed 32-bit examples which were taken from a 32-bit Cisco ASA device. One interesting thing to note is that for general heap allocations, 64-bit Cisco ASA devices only use dlmalloc to track a special mempool-bookkeeping structure, which is a custom extension to the dlmalloc mstate structure and nothing else.

Let’s take a look. In the example below we’ve found the mstate chunk and it starts at 0x7ffff7ff7000. If we list the chunks from this point we see:

(gdb) dlchunk -c 4 0x7ffff7ff7000
0x7ffff7ff7000 M sz:0x010c0 fl:CP alloc_pc:0x00000000,-
0x7ffff7ff80c0 F sz:0x00ee0 fl:-P free_pc:0x00000000,-
0x7ffff7ff8fa0 F sz:0x00060 fl:-- free_pc:0x00000000,-
<<< end of heap segment >>>

In the example above the 0xee0 is the top chunk and 0x60 is the special dlmalloc segment footer. If we take a look at the actual mstate structure we see the following:

(gdb) dlmstate 0x7ffff7ff7010
struct dl_mstate @ 0x7ffff7ff7010 {
smallmap    = 0b000000000000000000000000000000
treemap     = 0b000000000000000000000000000000
dvsize      = 0x0
topsize     = 0xee0
least_addr  = 0x7ffff7ff7000
dv          = 0x0
top         = 0x7ffff7ff80c0
trim_check  = 0x200000
magic       = 0x2900d4d8
smallbin[00] (sz 0x0)   = 0x7ffff7ff7050, 0x7ffff7ff7050 [EMPTY]
smallbin[01] (sz 0x8)   = 0x7ffff7ff7060, 0x7ffff7ff7060 [EMPTY]
smallbin[02] (sz 0x10)  = 0x7ffff7ff7070, 0x7ffff7ff7070 [EMPTY]
smallbin[03] (sz 0x18)  = 0x7ffff7ff7080, 0x7ffff7ff7080 [EMPTY]
smallbin[04] (sz 0x20)  = 0x7ffff7ff7090, 0x7ffff7ff7090 [EMPTY]
...
smallbin[29] (sz 0xe8)  = 0x7ffff7ff7220, 0x7ffff7ff7220 [EMPTY]
smallbin[30] (sz 0xf0)  = 0x7ffff7ff7230, 0x7ffff7ff7230 [EMPTY]
smallbin[31] (sz 0xf8)  = 0x7ffff7ff7240, 0x7ffff7ff7240 [EMPTY]
treebin[00] (sz 0x180)      =        0x0 [EMPTY]
treebin[01] (sz 0x200)      =        0x0 [EMPTY]
treebin[02] (sz 0x300)      =        0x0 [EMPTY]
treebin[03] (sz 0x400)      =        0x0 [EMPTY]
treebin[04] (sz 0x600)      =        0x0 [EMPTY]
...
treebin[29] (sz 0x800000)   =        0x0 [EMPTY]
treebin[30] (sz 0xc00000)   =        0x0 [EMPTY]
treebin[31] (sz 0xffffffff) =        0x0 [EMPTY]
footprint   = 0x2000
max_footprint = 0x2000
mflags      = 0x7
mutex       = 0x0,0x0,0x55555e506260,0x0,0x0,0x7ffff7ff7000,
seg = struct malloc_segment @ 0x7ffff7ff73a0 {
base        = 0x7ffff7ff7000
size        = 0x2000
next        = 0x0
sflags      = 0x8

We see that it’s basically empty and the segment for the heap is only 0x2000, corresponding to the chunks we saw before. This is because 64-bit relies on glibc, which is a modified ptmalloc2 for servicing all actual allocations.

Conclusions

We took a look at dlmalloc-2.8.x and noted the important differences between this version and dlmalloc-2.7.x. We revealed a gdb-based Python script called libdlmalloc which is completely integrated with asadbg. libdlmalloc is designed to aid in the analysis of dlmalloc-2.8.x heaps and showed some of its use on a real-world Cisco ASA 5505 system running 9.2.4, as well as giving some examples of interest about how it is configured and customised and how those changes impact the security of the system.

Hopefully this post highlighted the importance of explicit clarification when documenting heap exploitation, as well as heap tools. As time goes on, there will be an increasing number of heap versions, branches, and forks, many of them introducing their own subtle changes. All of this means that the importance of clarifying exactly what versions research pertains to and has been tested on will be of increasing importance.

We would appreciate any feedback or corrections. You can test out the libdlmalloc code and feel free to send pull requests for any issues you have. The tool is not perfect, so don’t be surprised if you run into bugs. If you would like to contact us we can be reached by email or twitter: aaron(dot)adams(at)nccgroup(dot)trust / @fidgetingbits and cedric(dot)halbronn(at)nccgroup(dot)trust / @saidelike.