Rustproofing Linux (Part 2/4 Race Conditions)

08 February 2023

This is a four part blog post series that starts with Rustproofing Linux (Part 1/4 Leaking Addresses).

This post uses a simple example to demonstrate a class of vulnerability that we encounter quite frequently when auditing kernel drivers and firmware. It’s a race condition, or more precisely a TOCTOU vulnerability.

The complete vulnerable C driver is a bit more complex, but the parts that are most relevant to the vulnerability are shown below:

struct file_state {
    u8 *buf;
    size_t buf_size;
};

static ssize_t vuln_write(struct file *filp, const char __user *buf, size_t count, loff_t *offset)
{
    struct file_state *state = filp->private_data;

    if (count == 0)
        return 0;

    if (state->buf_size < count)
        return -ENOSPC;

    if (copy_from_user(state->buf, buf, count) != 0)
        return -EFAULT;

    return count;
}

static long vuln_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
    struct file_state *state = filp->private_data;

    switch (cmd) {
    case VULN_SETUP_BUF:
        if (arg == 0)
            return -EINVAL;

        state->buf = krealloc(state->buf, arg, GFP_KERNEL);
        if (state->buf == NULL)
            return -ENOMEM;

        state->buf_size = arg;
    }

    pr_err("error: wrong ioctl command: %#xn", cmd);
    return -EINVAL;
}

C driver vulnerable to a race condition between ioctl and write operations

Quickly summarised, the vuln_write function checks that the kernel buffer is large enough to hold the attacker-provided userspace data, then copies the data into the buffer. The vuln_ioctl function reallocates a new buffer and sets its size.

This would be fine, if only one thread calls these functions, but if they’re called concurrently, a small buffer could be allocated in vuln_ioctl while another thread is already processing vuln_write and has successfully proceeded past the buf_size check. There are a few variations how to trigger a bug, but the root cause is the same – shared data is not properly synchronised between threads.

If we run our PoC for the C-language vulnerability, KASAN will complain as expected:

root@(none):/# /xxx/rustproofing-linux/poc/test.sh vuln_race_device
[   81.997668] vuln_race_device: loading out-of-tree module taints kernel.
[   82.032372] ==================================================================
[   82.034436] BUG: KASAN: slab-out-of-bounds in _copy_from_user+0x35/0x70
[   82.036122] Write of size 40 at addr ffff888001eacf00 by task poc_vuln_race_d/191
[...]
[   82.052548]  kasan_report+0xc1/0xf0
[   82.053580]  ? _copy_from_user+0x35/0x70
[   82.054706]  kasan_check_range+0x2bd/0x2e0
[   82.055881]  _copy_from_user+0x35/0x70
[   82.056919]  vuln_write+0x5b/0x80 [vuln_race_device]

PoC demonstrating an out of bounds write caused by a race condition

The above problem may sound rather obvious, but we should keep in mind many device drivers are often designed to be used by only one process/thread, and testing might not cover even the simplest multithreaded cases. As a result, this class of vulnerability is quite prevalent in Linux kernel drivers.

The fix is to simply add a mutex and make sure all operations on buf_size and buf happen while the lock is held.

Porting to Rust With Atomics

A naïve port to Rust won’t even compile since the first argument to ioctl/write isn’t mutable. The trick is to use interior mutability, and simplest way to do this is to use atomics. The above code could be translated into something that looks like the following:

struct RustVuln {
    buf: AtomicPtr<core::ffi::c_void>,
    buf_size: AtomicUsize,
}

    fn write(state:  Self, _:  File, data:  mut impl IoBufferReader, _offset: u64) -> Result<usize> {
        if data.is_empty() {
            return Ok(0);
        }

        if state.buf_size.load(Ordering::Relaxed) < data.len() {
            return Err(ENOSPC);
        }

        // SAFETY: 'buf' is allocated and big enough ('buf_size' check)
        unsafe { data.read_raw(state.buf.load(Ordering::Relaxed) as *mut u8, data.len())?; }

        Ok(data.len())
    }

    fn ioctl(state:  Self, _:  File, cmd:  mut IoctlCommand) -> Result<i32> {
        let (cmd, arg) = cmd.raw();
        match cmd {
            VULN_SETUP_BUF => {
                if arg == 0 {
                    return Err(EINVAL)
                }

                // SAFETY: only calling C's krealloc
                let newbuf = unsafe { bindings::krealloc(state.buf.load(Ordering::Relaxed), arg, bindings::GFP_KERNEL) };
                state.buf.store(newbuf, Ordering::Relaxed);
                state.buf_size.store(arg, Ordering::Relaxed);

                Ok(0)
            }
            _ => {
                pr_err!("error: wrong ioctl command: {:#x}n", cmd);
                Err(EINVAL)
            }
        }
    }

Rust version of the driver with the same vulnerability

The above code is a good example of how not to use atomics, so hopefully readers of this blog post will not be tempted to do this. Of course, this Rust version also exhibits the same buggy behaviour as the C version. It should be clear that naïve ports to Rust require careful management of critical sections or else race conditions may still arise.

Porting to Rust With a Mutex

Interior mutability (and thread safety) could also be achieved by guarding data with a Mutex.

Rust mutex syntax is quite neat. Data guarded by the mutex can only ever be accessed when the mutex is locked, since the data is contained inside the mutex. So I rewrote this driver to use Mutexes, which can be seen in the following example. Look for .lock() to see where the mutex is locked:

struct RustVulnState {
    buf: *mut core::ffi::c_void,
    buf_size: usize,
}
// SAFETY: only used within Mutex, so it's safe to be accessed from multiple threads
unsafe impl Send for RustVulnState {}

struct RustVuln {
    mutable: Mutex<RustVulnState>,
}

    fn write(state:  Self, _:  File, data:  mut impl IoBufferReader, _offset: u64) -> Result<usize> {
        if data.is_empty() {
            return Ok(0);
        }

        if state.mutable.lock().buf_size < data.len() {
            return Err(ENOSPC);
        }

        // SAFETY: 'buf' is allocated and big enough ('buf_size' check)
        unsafe { data.read_raw(state.mutable.lock().buf as _, data.len())?; }

        Ok(data.len())
    }

    fn ioctl(state:  Self, _:  File, cmd:  mut IoctlCommand) -> Result<i32> {
        let (cmd, arg) = cmd.raw();
        match cmd {
            VULN_SETUP_BUF => {
                if arg == 0 {
                    return Err(EINVAL)
                }

                let mut mutable = state.mutable.lock();
                // SAFETY: only calling C's krealloc
                let newbuf = unsafe { bindings::krealloc(mutable.buf, arg, bindings::GFP_KERNEL) };
                mutable.buf = newbuf;
                mutable.buf_size = arg;

                Ok(0)
            }

Using a Mutex instead of atomics

Since memory safety guarantees aren’t enforced inside unsafe blocks, it’s good to try to minimise their use. There are three unsafe blocks above and two more related to mutex initialisation. We should be able to remove at least some of them, by using Rust constructs.

First, sharing the pointer requires the use of unsafe impl Send for RustVulnState {} syntax. However, this unsafe can be eliminated by changing the pointer to an usize and adding a few casts (source code).

Then there’s an unsafe call to C’s krealloc, which can be replaced with Box allocations. Eventually, after enacting these few improvements to make the code look less like C and more like Rust, we end up with the following (full source):

struct RustVulnState {
    buf: Box<[core::mem::MaybeUninit<u8>]>,
    buf_size: usize,
}

struct RustVuln {
    mutable: Mutex<RustVulnState>,
}

    fn write(state:  Self, _:  File, data:  mut impl IoBufferReader, _offset: u64) -> Result<usize> {
        if data.is_empty() {
            return Ok(0);
        }

        if state.mutable.lock().buf_size < data.len() {
            return Err(ENOSPC);
        }

        let buf = state.mutable.lock().buf.as_mut_ptr() as *mut u8;
        // SAFETY: 'buf' is allocated and big enough ('buf_size' check)
        unsafe { data.read_raw(buf, data.len())? };

        Ok(data.len())
    }

    fn ioctl(state:  Self, _:  File, cmd:  mut IoctlCommand) -> Result<i32> {
        let (cmd, arg) = cmd.raw();
        match cmd {
            VULN_SETUP_BUF => {
                if arg == 0 {
                    return Err(EINVAL)
                }

                let mut mutable = state.mutable.lock();
                mutable.buf = Box::try_new_uninit_slice(arg)?;
                mutable.buf_size = arg;

                Ok(0)
            }

More Rust idioms, fewer unsafe blocks

The single remaining unsafe is required because IoBufferReader.read_raw() reads into a raw kernel buffer, and the programmer must make sure the buffer and length are valid.

So we execute the PoC again, and there’s still the bug:

root@(none):/# /xxx/rustproofing-linux/poc/test.sh rust_vuln_race_device_v6_box
[  249.302940] ==================================================================
[  249.304771] BUG: KASAN: use-after-free in _copy_from_user+0x35/0x70
[  249.306319] Write of size 40 at addr ffff888001809280 by task poc_vuln_race_d/204
[...]
[  249.323241]  kasan_check_range+0x2bd/0x2e0
[  249.324360]  _copy_from_user+0x35/0x70
[  249.325385]  _RNvXs_NtCsfATHBUcknU9_6kernel8user_ptrNtB4_18UserSlicePtrReaderNtNtB6_9io_buffer14IoBufferReader8read_raw+0x37/0x80
[  249.328424]  ? _RNvMs3_NtCsfATHBUcknU9_6kernel4fileINtB5_16OperationsVtableINtNtB7_7miscdev12RegistrationNtCscqXYApJJQRL_28rust_vuln_race_device_v6_box8RustVulnEB1p_E14write_callbackB1r_+0xaf/0x140 [rust_vuln_race_device_v6_box]
[...]
[  249.343846] Allocated by task 205:
[  249.344520]  kasan_set_track+0x3d/0x60
[  249.345368]  __kasan_kmalloc+0x85/0x90
[  249.346195]  __kmalloc_node_track_caller+0xa0/0x190
[  249.347324]  krealloc+0x54/0xc0
[  249.347823]  _RNvXCscqXYApJJQRL_28rust_vuln_race_device_v6_boxNtB2_8RustVulnNtNtCsfATHBUcknU9_6kernel4file10Operations5ioctl+0x40/0x120 [rust_vuln_race_device_v6_box]

PoC still works in Rust version

Call traces for Rust code are a bit harder to read because of symbol mangling. While eliminating the third unsafe will go some way towards fixing the bug (not completely though, as discussed later), arguably the actual cause of the use-after-free is not the use of read_raw(), but rather, the incorrect Mutex usage in fn write. It is useful to focus on this nuanced behaviour for a minute, because it illustrates yet another footgun when porting C-to-Rust:

if state.mutable.lock().buf_size < data.len() {
    return Err(ENOSPC);
}

let buf = state.mutable.lock().buf.as_mut_ptr() as *mut u8;
// SAFETY: 'buf' is allocated and big enough ('buf_size' check)
unsafe { data.read_raw(buf, data.len())? };

Ok(data.len())

Incorrect use of a Mutex

Note above how there isn’t an unlock() corresponding with the highlighted lock(). That is actually how mutexes are supposed to work in Rust – Mutex.lock() creates a MutexGuard and when that goes out of scope it is automatically unlocked. Well, MutexGuard above is very much temporary and goes out of scope on the same line where it’s created. The code is effectively mutex lock/unlock for .buf_size and another lock/unlock that covers the use of .buf. This is probably not what the developer intended, but hopefully we can see how easy it might be to accidentally introduce such a bug.

Instead, what the code should do is to ensure that the same MutexGuard is used to cover access to both .buf_size and .buf. Below, MutexGuard mutable covers all of that, because it goes out of scope after the last use of the covered variables and references to them (full source):

let mut mutable = state.mutable.lock();
if mutable.buf_size < data.len() {
    return Err(ENOSPC);
}

let buf = mutable.buf.as_mut_ptr() as *mut u8;
// SAFETY: 'buf' is allocated and big enough ('buf_size' check)
unsafe { data.read_raw(buf, data.len())? };

Ok(data.len())

Correct use of a Mutex

Some readers will already know that the Box type stores the object size, so buf_size is redundant really. The next iteration explores this, and the results are as expected. That is, the driver exhibits the bug when we use two MutexGuards, but when we use only one guard that covers all buf access, the bug is eliminated (source code of fixed mutex use).

A zero initialised boxed slice was attempted in order to allow the use of IoBufferReader.read_slice(), which would help eliminate the remaining unsafe. This was not successful, since slice would need to be a compile-time constant size.

Other Variations

Vec, a growable array type, could also be used. In C terms, type Vec is made of three parts: a buffer pointer, a capacity and a length. The APIs don’t nicely match the allocations that C does, but an attempt was made using uninitialised Vec. On the other hand, if one uses a zero-initialised Vec, this can be coded without the remaining unsafe that was required in the above snippets.

struct RustVulnState {
    buf: Vec<u8>,
}

    fn write(state:  Self, _:  File, data:  mut impl IoBufferReader, _offset: u64) -> Result<usize> {
        if data.is_empty() {
            return Ok(0);
        }

        if state.mutable.lock().buf.len() < data.len() {
            return Err(ENOSPC);
        }

        // use only data.len() bytes
        data.read_slice(state.mutable.lock().buf.split_at_mut(data.len()).0)?;

        Ok(data.len())
    }

    fn ioctl(state:  Self, _:  File, cmd:  mut IoctlCommand) -> Result<i32> {
        let (cmd, arg) = cmd.raw();
        match cmd {
            VULN_SETUP_BUF => {
                if arg == 0 {
                    return Err(EINVAL)
                }

                state.mutable.lock().buf.try_resize(arg, 0u8)?;

                Ok(0)

Variant that uses a Vec

The code does not use unsafe blocks, but it still uses mutexes incorrectly. So what happens when the PoC is executed?

root@(none):/# /xxx/rustproofing-linux/poc/test.sh rust_vuln_race_device_v9_vec_init
[   18.988595] rust_vuln_race_device_v9_vec_init: loading out-of-tree module taints kernel.
[   19.021009] rust_kernel: panicked at 'assertion failed: mid <= self.len()', /home/kali/rust/rustproofing-linux/rust_vuln_race_device_v9_vec_init.rs:59:50
[   19.024106] ------------[ cut here ]------------
[   19.025212] kernel BUG at rust/helpers.c:45!
[...]
[   19.050883] Call Trace:
[   19.051528]  
[   19.052180]  rust_begin_unwind+0x66/0x80
[   19.053428]  ? _RNvXsP_NtCs3yuwAp0waWO_4core3fmtRhNtB5_5Debug3fmtCsfATHBUcknU9_6kernel+0x50/0x50
[   19.055508]  ? bit_wait_io_timeout+0xc0/0xc0
[   19.056591]  ? _RNvNtCs3yuwAp0waWO_4core9panicking9panic_fmt+0x2c/0x30
[   19.057952]  ? virtqueue_get_buf_ctx+0x1a1/0x4c0
[   19.059298]  ? _RNvNtCs3yuwAp0waWO_4core9panicking5panic+0x49/0x50
[   19.060680]  ? _RNvMs3_NtCsfATHBUcknU9_6kernel4fileINtB5_16OperationsVtableINtNtB7_7miscdev12RegistrationNtCsffK8dNQUbvs_33rust_vuln_race_device_v9_vec_init8RustVulnEB1p_E14write_callbackB1r_+0x145/0x170 [rust_vuln_race_device_v9_vec_init]

PoC on code without unsafe blocks results in a call to BUG()

It’s not a memory error anymore, but it is a call to BUG(). That didn’t happen because of a write to memory, but just a bit before, where buf‘s slice is split at an invalid location. This is possible due to a race condition that arises because the mutex was briefly unlocked after the buf.len() check.

What about the last two unsafe blocks that we conveniently ignored so far? If we use a simple mutex, kernel::sync::smutex::Mutex, instead of kernel’s struct mutex, kernel::sync::Mutex, we can simplify the initialisation and be rid of the unsafe. Code samples are in the files with names that end with _smutex.rs.

Takeaways

Rust’s Mutex API has a very nice way of describing the data it protects, and a nice locking API. However, a developer still needs to think about unlocking even if there’s no explicit call to unlock. Calling .lock() on the same mutex multiple times could be an indicator of a race condition. On the other hand, holding a mutex for too long also has a downside – that is, the performance might suffer.

We learned that porting drivers that use mutexes from C to Rust is not a straightforward activity. There are some pitfalls and it is still possible to introduce problems or to accidentally create a race condition. If at all possible, it might be best to just bite the bullet and refactor the code. The trivial example used here could be quite easily changed to remove allocation from fn ioctl and change fn write to use IoBufferReader.read_all() which allocates a new Vec and copies data into it. However, in real driver code, it might not be as straightforward to do this sort of refactoring. Ultimately, thinking about interfaces and their fragile points is a worthwhile exercise, whatever the language.

In the Next Part…

Part 3 is about integer overflows and plausibility of them persisting when C code is ported to Rust.

Domen Puncer Kugler