This is a four part blog post series that starts with Rustproofing Linux (Part 1/4 Leaking Addresses).
This post uses a simple example to demonstrate a class of vulnerability that we encounter quite frequently when auditing kernel drivers and firmware. It’s a race condition, or more precisely a TOCTOU vulnerability.
The complete vulnerable C driver is a bit more complex, but the parts that are most relevant to the vulnerability are shown below:
struct file_state { u8 *buf; size_t buf_size; }; static ssize_t vuln_write(struct file *filp, const char __user *buf, size_t count, loff_t *offset) { struct file_state *state = filp->private_data; if (count == 0) return 0; if (state->buf_size < count) return -ENOSPC; if (copy_from_user(state->buf, buf, count) != 0) return -EFAULT; return count; } static long vuln_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) { struct file_state *state = filp->private_data; switch (cmd) { case VULN_SETUP_BUF: if (arg == 0) return -EINVAL; state->buf = krealloc(state->buf, arg, GFP_KERNEL); if (state->buf == NULL) return -ENOMEM; state->buf_size = arg; } pr_err("error: wrong ioctl command: %#xn", cmd); return -EINVAL; }C driver vulnerable to a race condition between
ioctl
and write
operations
Quickly summarised, the vuln_write
function checks that
the kernel buffer is large enough to hold the attacker-provided
userspace data, then copies the data into the buffer. The
vuln_ioctl
function reallocates a new buffer and sets its
size.
This would be fine, if only one thread calls these functions, but if
they’re called concurrently, a small buffer could be allocated in
vuln_ioctl
while another thread is already processing
vuln_write
and has successfully proceeded past the
buf_size
check. There are a few variations how to trigger a
bug, but the root cause is the same – shared data is not properly
synchronised between threads.
If we run our PoC for the C-language vulnerability, KASAN will complain as expected:
root@(none):/# /xxx/rustproofing-linux/poc/test.sh vuln_race_device [ 81.997668] vuln_race_device: loading out-of-tree module taints kernel. [ 82.032372] ================================================================== [ 82.034436] BUG: KASAN: slab-out-of-bounds in _copy_from_user+0x35/0x70 [ 82.036122] Write of size 40 at addr ffff888001eacf00 by task poc_vuln_race_d/191 [...] [ 82.052548] kasan_report+0xc1/0xf0 [ 82.053580] ? _copy_from_user+0x35/0x70 [ 82.054706] kasan_check_range+0x2bd/0x2e0 [ 82.055881] _copy_from_user+0x35/0x70 [ 82.056919] vuln_write+0x5b/0x80 [vuln_race_device]PoC demonstrating an out of bounds write caused by a race condition
The above problem may sound rather obvious, but we should keep in mind many device drivers are often designed to be used by only one process/thread, and testing might not cover even the simplest multithreaded cases. As a result, this class of vulnerability is quite prevalent in Linux kernel drivers.
The fix
is to simply add a mutex and make sure all operations on
buf_size
and buf
happen while the lock is
held.
Porting to Rust With
Atomic
s
A naïve
port to Rust won’t even compile since the first argument to
ioctl
/write
isn’t mutable. The trick is to use
interior mutability, and simplest way to do this is to use atomics. The
above code could be translated
into something that looks like the following:
struct RustVuln { buf: AtomicPtr<core::ffi::c_void>, buf_size: AtomicUsize, } fn write(state: Self, _: File, data: mut impl IoBufferReader, _offset: u64) -> Result<usize> { if data.is_empty() { return Ok(0); } if state.buf_size.load(Ordering::Relaxed) < data.len() { return Err(ENOSPC); } // SAFETY: 'buf' is allocated and big enough ('buf_size' check) unsafe { data.read_raw(state.buf.load(Ordering::Relaxed) as *mut u8, data.len())?; } Ok(data.len()) } fn ioctl(state: Self, _: File, cmd: mut IoctlCommand) -> Result<i32> { let (cmd, arg) = cmd.raw(); match cmd { VULN_SETUP_BUF => { if arg == 0 { return Err(EINVAL) } // SAFETY: only calling C's krealloc let newbuf = unsafe { bindings::krealloc(state.buf.load(Ordering::Relaxed), arg, bindings::GFP_KERNEL) }; state.buf.store(newbuf, Ordering::Relaxed); state.buf_size.store(arg, Ordering::Relaxed); Ok(0) } _ => { pr_err!("error: wrong ioctl command: {:#x}n", cmd); Err(EINVAL) } } }Rust version of the driver with the same vulnerability
The above code is a good example of how not to use atomics, so hopefully readers of this blog post will not be tempted to do this. Of course, this Rust version also exhibits the same buggy behaviour as the C version. It should be clear that naïve ports to Rust require careful management of critical sections or else race conditions may still arise.
Porting to Rust With a
Mutex
Interior mutability (and thread safety) could also be achieved by guarding data with a Mutex.
Rust mutex syntax is quite neat. Data guarded by the mutex can only
ever be accessed when the mutex is locked, since the data is contained
inside the mutex. So I rewrote this driver to use Mutexes, which can be
seen in the following
example. Look for .lock()
to see where the mutex is
locked:
struct RustVulnState { buf: *mut core::ffi::c_void, buf_size: usize, } // SAFETY: only used within Mutex, so it's safe to be accessed from multiple threads unsafe impl Send for RustVulnState {} struct RustVuln { mutable: Mutex<RustVulnState>, } fn write(state: Self, _: File, data: mut impl IoBufferReader, _offset: u64) -> Result<usize> { if data.is_empty() { return Ok(0); } if state.mutable.lock().buf_size < data.len() { return Err(ENOSPC); } // SAFETY: 'buf' is allocated and big enough ('buf_size' check) unsafe { data.read_raw(state.mutable.lock().buf as _, data.len())?; } Ok(data.len()) } fn ioctl(state: Self, _: File, cmd: mut IoctlCommand) -> Result<i32> { let (cmd, arg) = cmd.raw(); match cmd { VULN_SETUP_BUF => { if arg == 0 { return Err(EINVAL) } let mut mutable = state.mutable.lock(); // SAFETY: only calling C's krealloc let newbuf = unsafe { bindings::krealloc(mutable.buf, arg, bindings::GFP_KERNEL) }; mutable.buf = newbuf; mutable.buf_size = arg; Ok(0) }Using a
Mutex
instead of atomics
Since memory safety guarantees aren’t enforced inside
unsafe
blocks, it’s good to try to minimise their use.
There are three unsafe
blocks above and two more related to
mutex initialisation. We should be able to remove at least some of them,
by using Rust constructs.
First, sharing the pointer requires the use of
unsafe impl Send for RustVulnState {}
syntax. However, this
unsafe
can be eliminated by changing the pointer to an
usize
and adding a few casts (source
code).
Then there’s an unsafe
call to C’s
krealloc
, which can be replaced with Box
allocations. Eventually, after enacting these few improvements to make
the code look less like C and more like Rust, we end up with the
following (full
source):
struct RustVulnState { buf: Box<[core::mem::MaybeUninit<u8>]>, buf_size: usize, } struct RustVuln { mutable: Mutex<RustVulnState>, } fn write(state: Self, _: File, data: mut impl IoBufferReader, _offset: u64) -> Result<usize> { if data.is_empty() { return Ok(0); } if state.mutable.lock().buf_size < data.len() { return Err(ENOSPC); } let buf = state.mutable.lock().buf.as_mut_ptr() as *mut u8; // SAFETY: 'buf' is allocated and big enough ('buf_size' check) unsafe { data.read_raw(buf, data.len())? }; Ok(data.len()) } fn ioctl(state: Self, _: File, cmd: mut IoctlCommand) -> Result<i32> { let (cmd, arg) = cmd.raw(); match cmd { VULN_SETUP_BUF => { if arg == 0 { return Err(EINVAL) } let mut mutable = state.mutable.lock(); mutable.buf = Box::try_new_uninit_slice(arg)?; mutable.buf_size = arg; Ok(0) }More Rust idioms, fewer
unsafe
blocks
The single remaining unsafe
is required because
IoBufferReader.read_raw()
reads into a raw kernel buffer,
and the programmer must make sure the buffer and length are valid.
So we execute the PoC again, and there’s still the bug:
root@(none):/# /xxx/rustproofing-linux/poc/test.sh rust_vuln_race_device_v6_box [ 249.302940] ================================================================== [ 249.304771] BUG: KASAN: use-after-free in _copy_from_user+0x35/0x70 [ 249.306319] Write of size 40 at addr ffff888001809280 by task poc_vuln_race_d/204 [...] [ 249.323241] kasan_check_range+0x2bd/0x2e0 [ 249.324360] _copy_from_user+0x35/0x70 [ 249.325385] _RNvXs_NtCsfATHBUcknU9_6kernel8user_ptrNtB4_18UserSlicePtrReaderNtNtB6_9io_buffer14IoBufferReader8read_raw+0x37/0x80 [ 249.328424] ? _RNvMs3_NtCsfATHBUcknU9_6kernel4fileINtB5_16OperationsVtableINtNtB7_7miscdev12RegistrationNtCscqXYApJJQRL_28rust_vuln_race_device_v6_box8RustVulnEB1p_E14write_callbackB1r_+0xaf/0x140 [rust_vuln_race_device_v6_box] [...] [ 249.343846] Allocated by task 205: [ 249.344520] kasan_set_track+0x3d/0x60 [ 249.345368] __kasan_kmalloc+0x85/0x90 [ 249.346195] __kmalloc_node_track_caller+0xa0/0x190 [ 249.347324] krealloc+0x54/0xc0 [ 249.347823] _RNvXCscqXYApJJQRL_28rust_vuln_race_device_v6_boxNtB2_8RustVulnNtNtCsfATHBUcknU9_6kernel4file10Operations5ioctl+0x40/0x120 [rust_vuln_race_device_v6_box]PoC still works in Rust version
Call traces for Rust code are a bit harder to read because of symbol
mangling. While eliminating the third unsafe
will go some
way towards fixing the bug (not completely though, as discussed later),
arguably the actual cause of the use-after-free is not the use of
read_raw()
, but rather, the incorrect Mutex
usage in fn write
. It is useful to focus on this nuanced
behaviour for a minute, because it illustrates yet another footgun when
porting C-to-Rust:
if state.mutable.lock().buf_size < data.len() { return Err(ENOSPC); } let buf = state.mutable.lock().buf.as_mut_ptr() as *mut u8; // SAFETY: 'buf' is allocated and big enough ('buf_size' check) unsafe { data.read_raw(buf, data.len())? }; Ok(data.len())Incorrect use of a
Mutex
Note above how there isn’t an unlock()
corresponding
with the highlighted lock()
. That is actually how mutexes
are supposed to work in Rust – Mutex.lock()
creates a
MutexGuard
and when that goes out of scope it is
automatically unlocked. Well, MutexGuard
above is very much
temporary and goes out of scope on the same line where it’s created. The
code is effectively mutex lock/unlock for .buf_size
and
another lock/unlock that covers the use of .buf
. This is
probably not what the developer intended, but hopefully we can see how
easy it might be to accidentally introduce such a bug.
Instead, what the code should do is to ensure that the same
MutexGuard
is used to cover access to both
.buf_size
and .buf
. Below,
MutexGuard
mutable
covers all of that, because
it goes out of scope after the last use of the covered variables and
references to them (full
source):
let mut mutable = state.mutable.lock(); if mutable.buf_size < data.len() { return Err(ENOSPC); } let buf = mutable.buf.as_mut_ptr() as *mut u8; // SAFETY: 'buf' is allocated and big enough ('buf_size' check) unsafe { data.read_raw(buf, data.len())? }; Ok(data.len())Correct use of a
Mutex
Some readers will already know that the Box
type stores
the object size, so buf_size
is redundant really. The next
iteration explores this, and the results are as expected. That is,
the driver exhibits the bug when we use two MutexGuard
s,
but when we use only one guard that covers all buf
access,
the bug is eliminated (source
code of fixed mutex use).
A zero initialised boxed slice was attempted in order to allow the
use of IoBufferReader.read_slice()
, which would help
eliminate the remaining unsafe
. This was not successful,
since slice
would need to be a compile-time constant size.
Other Variations
Vec
, a growable array type, could also be used. In C
terms, type Vec
is made of three parts: a buffer pointer, a
capacity and a length. The APIs don’t nicely match the allocations that
C does, but an attempt
was made using uninitialised Vec
. On the other hand, if
one uses a zero-initialised Vec
, this can be coded
without the remaining unsafe
that was required in the above
snippets.
struct RustVulnState { buf: Vec<u8>, } fn write(state: Self, _: File, data: mut impl IoBufferReader, _offset: u64) -> Result<usize> { if data.is_empty() { return Ok(0); } if state.mutable.lock().buf.len() < data.len() { return Err(ENOSPC); } // use only data.len() bytes data.read_slice(state.mutable.lock().buf.split_at_mut(data.len()).0)?; Ok(data.len()) } fn ioctl(state: Self, _: File, cmd: mut IoctlCommand) -> Result<i32> { let (cmd, arg) = cmd.raw(); match cmd { VULN_SETUP_BUF => { if arg == 0 { return Err(EINVAL) } state.mutable.lock().buf.try_resize(arg, 0u8)?; Ok(0)Variant that uses a
Vec
The code does not use unsafe
blocks, but it still uses
mutexes incorrectly. So what happens when the PoC is executed?
root@(none):/# /xxx/rustproofing-linux/poc/test.sh rust_vuln_race_device_v9_vec_init [ 18.988595] rust_vuln_race_device_v9_vec_init: loading out-of-tree module taints kernel. [ 19.021009] rust_kernel: panicked at 'assertion failed: mid <= self.len()', /home/kali/rust/rustproofing-linux/rust_vuln_race_device_v9_vec_init.rs:59:50 [ 19.024106] ------------[ cut here ]------------ [ 19.025212] kernel BUG at rust/helpers.c:45! [...] [ 19.050883] Call Trace: [ 19.051528]PoC on code without[ 19.052180] rust_begin_unwind+0x66/0x80 [ 19.053428] ? _RNvXsP_NtCs3yuwAp0waWO_4core3fmtRhNtB5_5Debug3fmtCsfATHBUcknU9_6kernel+0x50/0x50 [ 19.055508] ? bit_wait_io_timeout+0xc0/0xc0 [ 19.056591] ? _RNvNtCs3yuwAp0waWO_4core9panicking9panic_fmt+0x2c/0x30 [ 19.057952] ? virtqueue_get_buf_ctx+0x1a1/0x4c0 [ 19.059298] ? _RNvNtCs3yuwAp0waWO_4core9panicking5panic+0x49/0x50 [ 19.060680] ? _RNvMs3_NtCsfATHBUcknU9_6kernel4fileINtB5_16OperationsVtableINtNtB7_7miscdev12RegistrationNtCsffK8dNQUbvs_33rust_vuln_race_device_v9_vec_init8RustVulnEB1p_E14write_callbackB1r_+0x145/0x170 [rust_vuln_race_device_v9_vec_init]
unsafe
blocks results in
a call to BUG()
It’s not a memory error anymore, but it is a call to
BUG()
. That didn’t happen because of a write to memory, but
just a bit before, where buf
‘s slice
is split
at an invalid location. This is possible due to a race condition that
arises because the mutex was briefly unlocked after the
buf.len()
check.
What about the last two unsafe
blocks that we
conveniently ignored so far? If we use a simple mutex,
kernel::sync::smutex::Mutex
, instead of kernel’s
struct mutex
, kernel::sync::Mutex
, we can
simplify the initialisation and be rid of the unsafe
. Code samples
are in the files with names that end with _smutex.rs
.
Takeaways
Rust’s Mutex API has a very nice way of describing the data it
protects, and a nice locking API. However, a developer still needs to
think about unlocking even if there’s no explicit call to unlock.
Calling .lock()
on the same mutex multiple times could be
an indicator of a race condition. On the other hand, holding a mutex for
too long also has a downside – that is, the performance might
suffer.
We learned that porting drivers that use mutexes from C to Rust is
not a straightforward activity. There are some pitfalls and it is still
possible to introduce problems or to accidentally create a race
condition. If at all possible, it might be best to just bite the bullet
and refactor the code. The trivial example used here could be quite
easily changed to remove allocation from fn ioctl
and
change fn write
to use
IoBufferReader.read_all()
which allocates a new
Vec
and copies data into it. However, in real driver code,
it might not be as straightforward to do this sort of refactoring.
Ultimately, thinking about interfaces and their fragile points is a
worthwhile exercise, whatever the language.
In the Next Part…
Part 3 is about integer overflows and plausibility of them persisting when C code is ported to Rust.