Rustproofing Linux (Part 4/4 Shared Memory)

16 February 2023

This is a four part blog post series that starts with Rustproofing Linux (Part 1/4 Leaking Addresses).

Shared memory is often used to share data without the performance hit of copying. Whenever a shared resource is consumed by one component while being modified by another component, there is potential for Time-Of-Check-Time-Of-Use (TOCTOU) or Double Fetch vulnerabilities. In these examples we focus on the case where double fetching occurs in the kernel and the software changing that data is in userspace, making this an avenue for user-to-kernel privilege escalation. However, note that this same type of vulnerability could exist when accessing memory that is shared between a device driver and a peripheral, two userspace processes, hypervisor and kernel, etc.

As a side note, we would like to mention that double fetch vulnerabilities can also arise due to compiler introduced problems.

Our vulnerable example is a bit contrived for the sake of brevity, but it should illustrate a common buggy pattern of shared memory usage:

static int vuln_open(struct inode *ino, struct file *filp)
{
    struct file_state *state;

    state = kzalloc(sizeof(*state), GFP_KERNEL);
    if (!state)
        return -ENOMEM;

    state->page = alloc_pages(GFP_KERNEL | __GFP_ZERO, 0);

A memory page is allocated

static int vuln_mmap(struct file *filp, struct vm_area_struct *vma)
{
    struct file_state *state = filp->private_data;
    int ret = 0;

    ret = vm_map_pages_zero(vma,  state->page, 1);
    return ret;
}

The page is mapped into userspace

static long vuln_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
    struct file_state *state = filp->private_data;
    volatile u32 *sh_buf = page_to_virt(state->page);
    u8 tmp_buf[32];

    switch (cmd) {
    case VULN_PROCESS_BUF:
        if (sh_buf[0] <= sizeof(tmp_buf)) {
            memcpy(tmp_buf, (void *) sh_buf[1], sh_buf[0] );

Data is read from shared memory

The vulnerability is in reading sh_buf[0] twice. If memory contents change between the reads, this could lead to a buffer overflow of tmp_buf.

A PoC was created to change sh_buf[0] value between the two fetches by repeatedly changing the memory contents in one process while calling vuln_ioctl in the other:

    volatile u32 *buf = mmap(NULL, LEN, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
    if (buf == MAP_FAILED) {
        perror("mmap");
        return -1;
    }

    int child = fork() == 0;

    cpu_set_t set;
    CPU_ZERO( set);
    CPU_SET(child,  set);
    if (sched_setaffinity(getpid(), sizeof(set),  set) < 0) {
        perror("sched_setaffinity error");
        return -1;
    }

    if (child) {
        while (1) {
            buf[0] = 32;
            buf[0] = 128;
        }
    } else {
        while (1) {
            ioctl(fd, VULN_PROCESS_BUF, 0);
        }
    }

One process changing memory contents, the other calling VULN_PROCESS_BUF

When this PoC is executed, KASAN reports the vulnerability as a 128 byte out of bounds write.

Porting to Rust

The code we ported to Rust looks similar, but is guided by mm::virt::Area and pages::Pages abstractions. This starts with the mmap implementation:

fn mmap(state:  Self, _file:  File, vma:  mut mm::virt::Area) -> Result {
    vma.insert_page(vma.start(),  state.mutable.lock().page)?;

    Ok(())
}

mmap() callback implementation in Rust

The mmap method we implement for file::Operations has an vma: mm::virt::Area argument. While this struct only has one member, a pointer to C’s struct vm_area_struct, it is private, so we need to use the only available method to create a mapping, insert_page().

insert_page() requires a pages::Pages<0> argument, and similarly we don’t get access to the underlying struct page and are limited to provided methods to access the memory contents:

fn ioctl(state:  Self, _file:  File, cmd:  mut IoctlCommand) -> Result<i32> {
    let (cmd, _arg) = cmd.raw();
    match cmd {
        VULN_PROCESS_BUF => {
            let mut tmp_buf = Box::try_new([0u8; 32])?; // on heap

            let page =  state.mutable.lock().page;

            let mut size = 0u32;
            unsafe { page.read( mut size as *mut u32 as _, 0, 4)? };
            if size as usize <= core::mem::size_of_val( tmp_buf) {
                unsafe { page.read( mut size as *mut u32 as _, 0, 4)? };
                unsafe { page.read(tmp_buf.as_mut_ptr(), 4, size as usize)? };

                if tmp_buf[0] == 'A' as u8 {
                    return Ok(0);
                }
            }

ioctl() callback using Pages<0>::read() to read memory

Let’s compare the above marked lines to the same C-based PoC, where the first word of the shared buffer is accessed simply as sh_buf[0]. Since these two highlighted lines are identical, and don’t really have a purpose except to intentionally introduce a TOCTOU vulnerability, we believe it would be very unusual for a developer to do this. Thus, it seems unlikely for such TOCTOU vulnerabilities to be naively ported from C to Rust.

Variant Using Raw Pointers

In the above port, the abstractions were preventing us from dereferencing a memory pointer like we did in C. Since Rust is a low-level language we should be able to bypass the Pages struct abstraction and directly use C’s struct page it contains. In our experiment we created our own copy of Pages, ExposedPages, and we used core::mem::transmute to basically cast Pages into our new type.

VULN_PROCESS_BUF => {
    let mut tmp_buf = Box::try_new([0u8; 32])?; // on heap

    let page =  state.mutable.lock().page;
    // page.pages is private, page.kmap() is private, tricks required
    let page:  ExposedPages = unsafe { core::mem::transmute(page) };
    let sh_buf: *mut u32 = unsafe { bindings::kmap(page.pages) } as _;

    // XXX assembly shows this will be only one access to *sh_buf
    if unsafe { *sh_buf } as usize <= tmp_buf.len() {
        unsafe { core::ptr::copy(sh_buf.offset(1) as *mut u8, tmp_buf.as_mut_ptr(), *sh_buf as _) };

        if tmp_buf[0] == 'A' as u8 {
            return Ok(0);
        }
    }

Dereferencing a raw pointer to access shared memory

This PoC is closer to the C-language version (sh_buf[0] in C code could also be written as *sh_buf, so that part could be identical), but since we can’t just mark the pointer as volatile, the compiler optimises out the second *sh_buf. For those interested, a full example is provided.

Variant With Volatile Pointer Access

While Rust has no volatile keyword, it does offer a way to dereference pointers the same way with core::ptr::read_volatile() and core::ptr::write_volatile().

Our next variation uses read_volatile instead of pointer dereference:

if unsafe { read_volatile(sh_buf) } as usize <= tmp_buf.len() {
    unsafe { copy(sh_buf.offset(1) as *mut u8, tmp_buf.as_mut_ptr(), read_volatile(sh_buf) as _) };

Using core::ptr::read_volatile

This does trigger the TOCTOU vulnerability, and one could find it plausible for a developer to use read_volatile(sh_buf) twice instead of declaring a temporary variable.

We have also explored accessing raw contents of mm::virt::Area instead of pages::Pages, but the source code then becomes even more like C, and uses more C bindings.

Takeaways

The ways we have tried to access shared memory in a vulnerable way all felt a bit forced or contrived, and did not feel like idiomatic Rust. Rust abstractions require us to read memory in a way that makes a double fetch more obvious. While the abstractions can be bypassed, even a cursory code inspection should pick up the unsafe block with transmute and later also a read_volatile, making sure that the code would be harshly reviewed, and maybe even removed.

Overall Conclusions

To conclude this four part blog series (one, two, three, four) we note that Rust brings some very nice features to the table. Writing Linux device drivers in Rust will almost certainly improve the kernel’s overall security posture.

However, the security improvements in the Rust language are not free or completely automatic. Porting C code to Rust is a non-trivial matter that has its own set of unique pitfalls. We believe that Rust is a tool which still requires considerable expertise of its master to avoid shooting themself in the foot. As we’ve shown, naïve ports from C to Rust may still exhibit vulnerabilities.

While it is easy to spot the unsafe keyword when auditing Rust code, thoroughly inspecting and documenting it requires a deeper understanding of Rust and the driver code. Even with all unsafe blocks removed (or proven to be memory safe) there’s still potential for other vulnerabilities, although those will probably be less severe, since by design they should not be related to memory safety.

In particular, we wish to highlight the MutexGuard usage caveat that we discussed in post #2 – while the automatic unlock at the guard variable’s end of life is very nice, one should be aware of patterns like the demonstrated .lock() method chaining, where we produced a race condition because a mutex was unlocked between two guarded variable accesses.

From our experimentation, integer overflows as well as shared memory accesses seem to be less likely causes of vulnerabilities, since the programmer needs to go out of their way to introduce a bug.

Finally, leaking kernel addresses seems to be as easy as always. While the benefits of KASLR are questioned by some already, the bypasses probably won’t go away either.

We hope the future is less buggy and software more secure. As Rust gets used more in the Linux kernel, we predict that the security research community will start to discover new manifestations of traditional driver vulnerabilities. Collectively, we probably need more time to discover these new vulnerability patterns, and better tools are likely needed to automatically detect and eliminate them.

Acknowledgements

Thanks to Miguel Ojeda, Alex Gaynor, Gary Guo and other Rust for Linux maintainers for valuable insights.

Special thanks to Jeremy Boone for all his help and suggestions.

Domen Puncer Kugler