Originally written by Jack Leadford
Introduction
In their eternal quest for more performance, compilers like GCC perform clever optimizations behind the scenes to make your code more performant, among other optimization classes. One example of this is adding padding to struct objects so that accessing their members is memory-aligned and is therefore faster.
However, this can lead to subtle vulnerabilities!
In this post we will:
- Take a detailed stroll through what structs are and how GCC uses padding bytes to achieve efficient alignment of members
- Look at how this padding can lead to a process disclosing stack memory, with two examples: an RPC server and a kernel driver
- Discuss how this issue can be fixed
C Structures
To start off, let us understand what a structure is in the C programming language.
A structure, or struct, groups some logically tied data; each item of data is known as a struct member. The intention is similar to something you may be more familiar with in other languages: — classes — we can reuse the notion of some “thing”, expecting it to have certain attributes, reading and writing them.
As an example, consider the following struct declaration, representing users in a hypothetical program:
struct user { int userid; bool superuser; };
When we operate on a user in our program, we declare an associated struct, and then we have access to the members we expect users to have (such as their userid).
Besides declaration, there is initialization and assignment of structs.
You can control initialization to give (some or all) members a value when creating an instance of some particular struct.
#includestruct user { long userid; bool superuser; }; int main(void) { struct user alice = { 1, true }; return 0; }
And assignment is used to fill in members’ values after declaration or initialization.
#include#include #include struct user { long userid; bool superuser; unsigned char name[10]; }; int main() { struct user alice; size_t len = sizeof(alice.name); for (size_t i = 0; i < len; i++){ printf("%02Xn", alice.name[i]); } return 0; }
You do not have to initialize a struct’s members.
Note that the program above prints some junk to stdout, on a commonplace platform. [1] This is because if a member is uninitialized, the compiler is free to do with it as it wishes, which commonly results in some latent stack memory being used; we will see more of this later on. You can avoid this by ensuring you initialize at least one member when you place some struct onto the stack, for example. The compiler will zero or null omitted members. [2][3] This can be sort of unexpected, so you can use GCC’s -Wmissing-field-initializers
to warn on excluded members that are zeroed due to a lack of an initialized value, [4] but note that this flag doesn’t catch an empty initalizer list (i.e., struct foo bar = {};
), which still zeroes members [5] and padding bytes.
GCC also has a -Wuninitialized
flag, which will alert on code that accesses uninitialized memory, including struct members, [6] though it has some open areas of improvement. [7]
The compiler will not re-order members (in the block of memory’s layout), but some tricks are indeed performed on the memory layout, which we will cover later on. For more background, check out this page. [8]
Another note: some references are living examples using the Godbolt Compiler Explorer, which is a fantastic open source web application that allows easy illustration and collaboration with respect to the code-to-binary translation process, check them out here! [9]
Padding and Alignment
As previously mentioned, there are some particulars to a struct’s memory layout. This is an artifact of the machine. Quoting from the Linux kernel’s documentation: [10]
Unaligned memory accesses occur when you try to read N bytes of data starting from an address that is not evenly divisible by N (i.e. addr % N != 0). For example, reading 4 bytes of data from address 0x10004 is fine, but reading 4 bytes of data from address 0x10005 would be an unaligned memory access.
If memory accesses, such as for a struct’s member, is misaligned, Bad Things occur, which can include degraded performance (since multiple memory accesses, each of a word size, must be performed), or exceptions (e.g., various ARM archictures, when the MMU is disabled). Modern CPUs are less sensitive to such requirements. [11]
To ensure proper alignment in memory, the compiler will step in here and add padding to our structs.
We take the following example from this [12] helpful blog post by katecpp:
The key insight to observe in the example above is the presence of padding, which is used to achieve alignment among members. The exact behavior is implementation-defined.
If it is unclear what the padding may be in some scenario, it is easiest to write some example code, and compile and run it using the compiler and platform you expect your code to run on (the Compiler Explorer is great for this). More illustrative examples and general rules can be found here. [13] [14]
The reader may ask, “well, where does the struct padding come from?” This is noted in, for example, C17: “…any padding bytes take unspecified values…” [15] This behavior was discussed at some length during Linux kernel development, [16] as it was not understood what GCC does with respect to padding bytes. In short, padding bytes are uninitialized, which in practice means they hold values previously written to the stack, because the stack pointer is simply moved when creating a new stack frame (holding the struct). See this paper, [17] for example, which relies on this behavior to “spray” a controlled value into an uninitialized read. This is a different setting, but hopefully it illustrates the behavior of such memory.
In fact, this is why comparing structs with, let’s say, memcmp
, does not work, [18] [19] as you end up comparing the “unspecified” padding bytes as well — you must compare member-by-member.
It is worth noting that the C11 standard attempted to better define what happens with padding bytes, and in some cases they may be initialized as per C11; for example, when an incomplete designated initializer list for the struct is used 20 (i.e., not all members have initializers [21]). Though, it appears [22] that this has been the behavior in practice for some time, and the behavior’s utility [23] was the reasoning behind surpressing -Wmissing-field-initializers
(included in -Wextra
) in some cases. [24] While writing this post, the author observed that some verions of GCC (experimentally, >= 4.7, < 8.0) do not zero padding if an empty intializer list is passed, under certain a certain code pattern; if an entire struct (i.e. sizeof(STRUCTNAME)
) is subsequently memcpy
‘d after assignment of its members, and this intermediate buffer is what is used by the code going forward. [25] This appears to be based on how optimization passes interact with GCC’s built-in memcpy
, since passing -fno-builtin-memcpy
returns the behavior to the expected. [26]
Disclosing Bytes
So far, we have discussed structs; specifically, that it is common for a struct’s underlying block of memory to contain “unspecified” padding bytes (i.e., uninitialized stack memory). Now, we’ll see how this can be exploited, in a few different scenarios.
Kernel Drivers
This case is well-documented, and we will discuss SEI CERT’s example [27] code:
#includestruct test { int a; char b; int c; }; extern int copy_to_user(void *dest, void *src, size_t size); void do_stuff(void *usr_buf) { struct test arg = {.a = 1, .b = 2, .c = 3}; copy_to_user(usr_buf, arg, sizeof(arg)); }
The function above, do_stuff
, takes in a pointer to some memory address in userspace, initializes a struct on the stack (with associated padding bytes), and then uses the kernel API copy_to_user
to copy the struct’s memory to the userspace address. This API performs some validation, but is largely a memcpy [28] [29] [30], which will result in all of the struct’s bytes being copied to the userspace address, including the uninitialized padding bytes (or uninitialized members).
See an illustration of the disclosure described above, from a blog post by Alexander Popov. [31]
This has been the source of several bugs in the Linux kernel [32] [33], and it is a common pattern when developing a driver code reachable by some user’s ioctl — process the user’s input, and return the result as a struct that is copied to a user-supplied address.
Netcode
This case is less well-known, though it is mentioned in this blog post, [34] and we have seen it in the wild, so it bears mentioning.
It resembles the kernel driver case outlined above — padding bytes are disclosed across a trust boundary — but in this case, a server process discloses memory to a client.
This is a simple TCP “RPC server” that can be compiled with gcc -g server.c -o server
and run with ./server PORT
. Its struct has the same layout as the kernel driver example, and thus discloses three bytes of stack memory.
// adapted from http://www.cs.rpi.edu/~moorthy/Courses/os98/Pgms/socket.html #include#include #include #include #include #include void error(char *msg) { perror(msg); exit(1); } struct responsepacket { int timestamp; //cmd timestamp char response; //noting a response packet int command; //which command was issued }; int main(int argc, char *argv[]) { int sockfd, newsockfd, portno, clilen; char buffer[256]; struct sockaddr_in serv_addr, cli_addr; int n; if (argc < 2) { fprintf(stderr,"ERROR, no port providedn"); exit(1); } sockfd = socket(AF_INET, SOCK_STREAM, 0); if (sockfd < 0) error("ERROR opening socket"); bzero((char *) serv_addr, sizeof(serv_addr)); portno = atoi(argv[1]); serv_addr.sin_family = AF_INET; serv_addr.sin_addr.s_addr = INADDR_ANY; serv_addr.sin_port = htons(portno); if (bind(sockfd, (struct sockaddr *) serv_addr, sizeof(serv_addr)) < 0) error("ERROR on binding"); listen(sockfd,5); clilen = sizeof(cli_addr); newsockfd = accept(sockfd, (struct sockaddr *) cli_addr, clilen); if (newsockfd < 0) error("ERROR on accept"); bzero(buffer,256); n = read(newsockfd,buffer,255); if (n < 0) error("ERROR reading from socket"); int cmd = 0; cmd = atoi(buffer); switch (cmd) { case 1: { printf("foo"); time_t timestamp; struct responsepacket foo = {.timestamp = time( timestamp), .response = 1, .command = cmd}; FILE *file = fopen("/tmp/foo", "w"); int i = 1; fprintf(file, "%d", i); fclose(file); n = write(newsockfd, foo,sizeof(foo)); if (n < 0) error("ERROR writing to socket"); break; } case 2: { printf("bar"); time_t timestamp; struct responsepacket bar = {.timestamp = time( timestamp), .response = 1, .command = cmd}; FILE *file = fopen("/tmp/bar", "w"); int i = 1; fprintf(file, "%d", i); fclose(file); n = write(newsockfd, bar,sizeof(bar)); if (n < 0) error("ERROR writing to socket"); break; } case 3: { time_t timestamp; printf("clear"); struct responsepacket clear = {.timestamp = time( timestamp), .response = 1, .command = cmd}; remove("/tmp/foo"); remove("/tmp/bar"); n = write(newsockfd, clear,sizeof(clear)); if (n < 0) error("ERROR writing to socket"); break; } default: { printf("baz"); time_t timestamp; struct responsepacket baz = {.timestamp = time( timestamp), .response = 1, .command = cmd}; n = write(newsockfd, baz,sizeof(baz)); if (n < 0) error("ERROR writing to socket"); break; } } return 0; }
This is a simple “RPC client”, which can be run with python3 PORT COMMAND
. The command 1 creates a file at /tmp/foo
, 2 creates a file at /tmp/bar
, and 3 deletes these files.
#!/usr/bin/env python import socket import sys TCP_IP = '127.0.0.1' TCP_PORT = int(sys.argv[1]) BUFFER_SIZE = 1024 MESSAGE = bytes(sys.argv[2], "UTF-8") s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((TCP_IP, TCP_PORT)) s.send(MESSAGE) data = s.recv(BUFFER_SIZE) s.close() print("Time is: %d" % (int.from_bytes(data[:4], "little"))) if data[4] == 0x01: response = True else: response = False print("Is this a response packet: %s" % (response)) print("Command was: %d" % (int.from_bytes(data[8:], "little"))) print("n") print("----") print(data) print("----")
Reviewing the three subsequent runs (of python3 client.py 1340 1
), we can see that, in addition to the response
byte, three additional padding bytes (i.e., uninitalized stack memory from the server) are returned to our client, as the entire struct (including padding bytes) is written to the response packet with write(2)
.
Note the change in the three bytes within the packet’s second field, after the timestamp’s four bytes at the beginning, and immediately after the static byte 0x01
indicating a response packet.
Time is: 1566977146 Is this a response packet: True Command was: 1 ---- b'z,f]x01xeax9aKx01x00x00x00' ----
Time is: 1566977158 Is this a response packet: True Command was: 1 ---- b'x86,f]x01exd2Tx01x00x00x00' ----
Time is: 1566977465 Is this a response packet: True Command was: 1 ---- b'xb9-f]x01Wxf0xd2x01x00x00x00' ----
Now to Fix
In both of these examples, a less privileged domain can disclose three bytes of stack memory from a more privileged domain — each time it exploits the bug! This is always perilous, but concrete exploitability depends on your program and its stack during exploitation.
We will discuss one particular fix here, using code from SEI CERT’s page on the issue, though note that there are other approaches that may be more appropriate for your project.
The struct may be zeroed (including its padding bytes) before assignment:
struct test arg; /* Set all bytes (including padding bytes) to zero */ memset( arg, 0, sizeof(arg)); arg.a = 1; arg.b = 2; arg.c = 3;
And consider this example result. [35]
Tools like Valgrind [36] can be helpful when debugging or discovering disclosure via padding bytes or uninitialized members:
The figure above is the result of running python3 client.py 1337 1
against the “RPC server” code (i.e., disclosure via padding bytes).
And in addition, changing the “RPC server” code to the following illustrates Valgrind’s ability to detect disclosure via uninitialized members:
switch (cmd) { case 1: { printf("foo"); time_t timestamp; struct responsepacket foo; memset( foo, 0, sizeof(foo)); foo.timestamp = time( timestamp); foo.response = 1; foo.command = cmd; FILE *file = fopen("/tmp/foo", "w"); int i = 1; fprintf(file, "%d", i); fclose(file); n = write(newsockfd, foo,sizeof(foo)); if (n < 0) error("ERROR writing to socket"); break; }
For detecting usage of uninitialized memory more generally, there is also MemorySanitizer, [37] which is available to the Clang compiler.
In the Linux kernel space, PaX developed a GCC Plugin, STRUCTLEAK
. STRUCTLEAK
works by emitting the equivilent of an empty designated initializer list, if a struct it is inspecting lacks an initializer list (by looking for a CONSTRUCTOR
node emitted by GCC); this zeroes members [38] and padding bytes. [39] See pages 39 ‐ 41 [40] for details. Something useful to note is that GCC does not [41] emit a CONSTRUCTOR
node when an initializer is specified for all struct members (i.e., this case [42]), which is funtionally equivilent to this code 43 (individual member assignment). Thus, this GCC behavior a) results in padding bytes remaining uninitialized when a full initializer list is used and b) is why STRUCTLEAK
will also zero padding bytes for structs of this nature: to the plugin, it lacks a CONSTRUCTOR
node, so it will be given the equivilent of an empty initializer list. Upstream Linux has merged an adaptation of PaX’s STRUCTLEAK
plugin, [44] and has added illustrative test cases. [45]
As discussed in the last paragraph of this post’s Padding and Alignment section, the behavior of an empty initializer list (zeroes padding and members) is not affected by GCC’s optimization in practice, as long as the struct is not subsequently copied with GCC’s built-in memcpy
; for example, fwrite(3)
and copy_to_user
were tested and do not trigger this behavior. Grsecurity’s latest patches use a separate GCC plugin to address GCC’s erroenous optimization behavior in this case. While the author of this post found Linux kernel code treated by STRUCTLEAK
by adding an empty initializer list (noted during kernel build logs) that followed this pattern (i.e. after member assignment, fully copied to some other buffer and not touched again), [46] no concretely exploitable cases were found; in this case the destination buffer is user-accessible, [47] but the struct is not padded. So, whether or not there are exploitable cases in the Linux kernel under STRUCTLEAK
remains an open question.
This presentation [48] by Alexander Potapenko at 2019’s Linux Security Summit EU gives general background on approaches to dealing with uninitialized memory use in the kernel, such as KMSAN
.
Conclusion
In this blog post, we explored in detail how alignment and padding bytes in C structs can lead to subtly dangerous behavior; for example, disclosing bytes on the stack.
This behavior has been exploited in kernel drivers, disclosing memory to userspace, as well as server processes, disclosing memory to clients.
In general, to fix this issue you can:
memset
over the struct with zeroes before member assignment- Use GCC’s
-Wuninitialized
to alert on usage of uninitialized values, such as struct members (this doesn’t catch uninitialized padding bytes) - Use Valgrind to detect when any uninitialized memory is transferred elsewhere, e.g. to
write(2)
- Use MemorySanitizer to detect usage of uninitialized memory (more generally) if LLVM/Clang can be used
- Enable
STRUCTLEAK
in your Linux kernel builds with theCONFIG_GCC_PLUGIN_STRUCTLEAK*
KCONFIGs
If you enjoyed this post or have any questions (or spot errors!), feel free to reach out, I’d love to chat.
Thanks to grsecurity for discussion that improved this post.
References
1 https://godbolt.org/z/uS0GAN
2 https://godbolt.org/z/Q429iK
3 Section 6.7.8 at http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
4 https://godbolt.org/z/GLxDQ_
5 https://godbolt.org/z/aRT7Jr
6 https://godbolt.org/z/5rsOJ0
7 https://gcc.gnu.org/wiki/Better_Uninitialized_Warnings
8 https://en.wikipedia.org/wiki/Struct_(C_programming_language)
9 https://www.patreon.com/mattgodbolt
10 https://www.kernel.org/doc/Documentation/unaligned-memory-access.txt
11 https://lemire.me/blog/2012/05/31/data-alignment-for-speed-myth-or-reality/
12 https://katecpp.github.io/struct-members-order/
13 https://en.wikipedia.org/wiki/Data_structure_alignment#Typical_alignment_of_C_structs_on_x86
14 http://www.catb.org/esr/structure-packing/
15 Section 6.2.6.1 at https://web.archive.org/web/20181230041359if_/http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf
16 https://lwn.net/Articles/417994/
18 https://godbolt.org/z/mKIz78
19 https://wiki.sei.cmu.edu/confluence/display/c/EXP42-C.+Do+not+compare+padding+data
20 https://lore.kernel.org/kernel-hardening/587E4FDD.31940.D47F642@pageexec.freemail.hu/
21 Contrast https://godbolt.org/z/us5UyR (4/5 members) with https://godbolt.org/z/gnP__C (5/5 members)
22 https://www.pixelbeat.org/programming/gcc/auto_init.html
23 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36750#c0
24 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36750#c3
25 Padding not zeroed https://godbolt.org/z/p_U-8N, padding zeroed https://godbolt.org/z/teGCqG
26 https://godbolt.org/z/K28pzP
28 https://github.com/torvalds/linux/blob/v5.2/include/linux/uaccess.h#L97
29 https://github.com/torvalds/linux/blob/v5.2/include/linux/uaccess.h#L149
30 https://github.com/torvalds/linux/blob/v5.2/include/asm-generic/uaccess.h#L40
31 https://a13xp0p0v.github.io/2018/11/04/stackleak.html
32 https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-3881
34 https://trust-in-soft.com/trap-representations-and-padding-bits/
35 Compare https://godbolt.org/z/4rzJ-e (memset) with https://godbolt.org/z/3H1hfy (no memset)
36 https://linux.die.net/man/1/valgrind
37 https://clang.llvm.org/docs/MemorySanitizer.html
38 https://godbolt.org/z/ez-Jzj
39 https://godbolt.org/z/hCTmUE
40 https://pax.grsecurity.net/docs/PaXTeam-H2HC13-PaX-gcc-plugins.pdf
41 https://lore.kernel.org/lkml/587D1D70.26193.89AFE10@pageexec.freemail.hu/
42 https://godbolt.org/z/gnP__C
43 https://godbolt.org/z/V4S0AK
44 https://lore.kernel.org/patchwork/patch/750940/
45 https://patchwork.kernel.org/patch/10777077/
46 https://github.com/torvalds/linux/blob/v4.19/drivers/net/wan/hdlc_raw.c#L85
47 https://github.com/torvalds/linux/blob/v4.19/drivers/net/wan/hdlc_raw.c#L56