Saltar a la navegación Saltar al contenido principal Ir al pie de página

Constant-Time Data Processing At a Secret Offset, Privacy and QUIC

05 septiembre 2022

By Gérald Doussot

Introduction

NCC Group Cryptography Services team assessed security aspects of several implementations of the QUIC protocol. During the course of their reviews, the team found a number of recurrent cryptography side channel findings of arguably negligible privacy risk to users, across these implementations. However, repetition in itself makes these findings somehow worth having a deeper look, as it may indicate design issues, including complexity of implementing security controls, and/or potential misunderstandings. In this blog post, we will focus on explaining timing side channels that may arise from processing data that starts at a secret offset, and potential remediation. We then offer a full Rust implementation of the constant-time proof of concept code, and an extra proof of concept implementation of constant-time data processing at a secret offset in the Common Lisp, a general-purpose, multi-paradigm programming language. For a primer on constant-time cryptography, first read the excellent BearSSL “Why Constant-Time Crypto?” article.

QUIC Protocol Privacy Controls

The QUIC protocol describes and mandates privacy preserving or enhancing controls throughout RFC 9000 “QUIC: A UDP-Based Multiplexed and Secure Transport”, and RFC 9001 “Using TLS to Secure QUIC”.

Of interest for the purpose of this blog post, the former standard document explains that an endpoint that moves between networks may not wish to have their activity correlated by any entity other than their peer. It provides a number of security controls to protect against activity correlation, including but not limited to header protection. In section 9.5, the standard states that “Header protection ensures that packet numbers cannot be used to correlate activity“, noting further that “This does not prevent other properties of packets, such as timing and size, from being used to correlate activity.

The latter standard document describes some of the requirements in adding and removing header protection:

For authentication to be free from side channels, the entire process of header protection removal, packet number recovery, and packet protection removal MUST be applied together without timing and other side channels.

For the sending of packets, construction and protection of packet payloads and packet numbers MUST be free from side channels that would reveal the packet number or its encoded size.

The packet number is used as input to the AEAD nonce in the encryption, and decryption of QUIC data. The designers considered the “Nonces are Noticed: AEAD Revisited” paper, and QUIC provides nonce privacy.

Timing Side Channels

NCC Group Cryptography Services team identified deviations from the two standards in all reviewed QUIC implementations, for instance where the processing of packet numbers and sizes conditionally branches based on their values, or where data lookup depends on packet number sizes, thus inducing side channels that may assist attackers in guessing these values. They don’t reveal cryptographic keys or passwords; at worst, they may reveal a packet number and/or size, (and incidentally, the size of the embedded encrypted TLS record payload, after the QUIC packet number field).

One of these uncovered side channel issues is more interesting than others, as it is about processing data after a secret offset in a given payload, such as a QUIC packet in our case. It appears to be a less common issue, and there is no known efficient way to address them in the general case. Note that in the aforementioned BearSSL article, CBC padding) verification is one instance of processing data at a secret offset, for which a specific, relatively efficient solution was identified, and implemented to remediate the TLS “Lucky Thirteen” attack (HTTP link).

Before we delve into constant-time processing of data at a secret offset, let’s quickly recall a few concepts:

  • A timing side channel is a vulnerability where an attacker may learn some or all information about the secret data being processed, because the execution trace varies with the secret value itself. A typical, and somehow more widely-know timing side channel issue materializes when a given user hashed password is compared against a server record of that hashed password in a web application. How long it takes to compare the hashed passwords may reveal that the first few bytes up to the length of the hashed passwords match or not. This may help an attacker guess the hashed password, and possibly the password if it is weak.
  • Constant-time processing of secret data aspires to not reveal that secret data, via timing side channels. In our web application example, this would mean that the comparison of the hashed passwords would not return until all bytes have been compared, whether some or all of the two hashed passwords bytes differ or not.

Processing Data in Constant-Time at a Secret Offset Applied to QUIC

So, what do we mean by constant-time processing of (potentially secret) data, at a secret offset? To illustrate the issue, we will look at the structure of a QUIC application packet, and how one would process it. It is composed of the following fields:

  • Packet Header Byte, fixed length, one byte, encrypted. The least significant two bits of the decrypted Packet Header Byte encode the secret packet number field size, in bytes: b00 = 1, b01 = 2, b10 = 3, or b11 = 4.
  • Destination Connection ID, zero to twenty bytes, in plaintext.
  • Packet Number, variable length of 1 to 4 bytes, encrypted.
  • Encrypted Application Traffic Data, variable length.
  • AEAD Authentication Tag, fixed length, in plaintext.

A naïve QUIC implementation may perform the following to retrieve the application data from this packet:

  1. Read one Packet Header Byte, at a fixed offset from the beginning of the packet.
  2. Decrypt the Packet Header Byte.
  3. Extract the length of the Packet Number field, from the decrypted Packet Header Byte.
  4. Read Destination Connection ID, at a fixed offset from the beginning of the packet.
  5. Read 1, 2, 3 or 4 bytes depending on the Packet Number field length, extracted from Packet Header Byte above, at a publicly known offset from beginning of packet.
  6. Read Encrypted Application Traffic Data, up to packet length minus AEAD Authentication Tag length, at a variable offset from beginning of packet.
  7. Read AEAD Authentication Tag, at a fixed offset from the end of packet.
  8. Decrypt Encrypted Application Traffic Data using AEAD Authentication Tag.
  9. Process decrypted application traffic.

Steps 5. and 6. are in effect look-ups indexed by secret data, the packet number length. The access time to an indexed element in memory can vary with its index, depending on whether a cache-miss has occurred or not. This may reveal the value of the packet number length, and incidentally, the size of the Encrypted Application Traffic Data.

Ensuring that code does not leak the size of the packet number can be implemented using constant-time selection of bytes for each possible offset over the whole QUIC packet, starting at the offset of the packet number size field.

Constant-Time Proof Of Concept Code

We will try to implement a prototype of constant-time processing at a secret offset in the Rust programming language, using a simplified problem. Our sample application processes packets consisting of the following fields:

  • Packet Header Byte . The least significant two bits of the decrypted Packet Header Byte encode the secret packet number field size, in bytes: b00 = 1, b01 = 2, b10 = 3, or b11 = 4. Assume it is in plaintext, e.g. decrypted earlier by our application for our purpose.
  • Packet Number variable length of 1 to 4 bytes. Encrypted, and the actual packet number value is not used in our example. The length of the field is secret, and must be determined in constant-time.
  • Data, variable length, padded to maximum packet size.

We arbitrarily choose a maximum packet size of 12 bytes in our example, so the Data payload may range from 7 to 10 bytes long. With the above, how do we extract and return Data without revealing Packet Number length?

We first need to implement three constant-time primitives. Function is_zero() takes a byte, using an unsigned 32 bits representation, and returns 2^32 – 1 if the byte is equal to 0, or 0 otherwise:

// return 2^32 -1 if x is 0, otherwise 0 in constant-time
// #[inline(always)]
pub fn is_zero(x: u32) -> u32 {
    !(((x as i32) | (x.wrapping_neg() as i32)) >> 31) as u32
}

If argument x to function is_zero() is set to 0, then both x and -x are equal to 0. The bitwise OR | and arithmetic right shift >> operators do not affect that result. If x is not equal to 0, at least x or -x will have the leftmost bit set, and the (signed) arithmetic right shift will fill the rest of the byte with 1s, forming the value 2^32 – 1. The negation, which inverts the result from 0 to 2^32 -1, and vice versa is not necessary – for the purpose of this post, it makes it easier to relate the code to boolean values true (2^32 -1) and false (0), and hopefully aid comprehension.

We model this algorithm using the Z3 Theorem Prover to validate its correctness, and elucidate potential incorrect assumptions or misunderstandings.

(push)
;; if x == 0 then our function will return 2^32 - 1
(assert
 (forall ((x  (_ BitVec 32)))
	 (=> (= x (_ bv0 32)) ; x == 0
	     (= (_ bv4294967295 32) ; result == 4294967295
		(bvnot
		 (bvashr ;;(signed) arithmetic shift 
			 (bvor x
			       (bvadd (_ bv1 32) (bvnot x))) ; modeling of two-complement negation of x
			 (_ bv31 32)))))))

(check-sat)

(pop)
(push)

;; if x > 0 then our function will return 0
(assert
 (forall ((x  (_ BitVec 32)))
	 (=> (bvugt x (_ bv0 32)) ; x > 0
	     (= (_ bv0 32) ; result == 0
		(bvnot
		 (bvashr
		  (bvor x (bvadd (_ bv1 32) (bvnot x)))
		  (_ bv31 32)) )))))

(check-sat)

(pop)
(push)

;; for all x, our function will return 0 or 2 ^ 32 - 1
(assert
 (forall ((x  (_ BitVec 32)))
	 (or (= (_ bv4294967295 32)
		(bvnot
		 (bvashr
		  (bvor x (bvadd (_ bv1 32) (bvnot x)))
		  (_ bv31 32)) ))
	     (= (_ bv0 32)
		(bvnot
		 (bvashr
		  (bvor x (bvadd (_ bv1 32) (bvnot x)))
		  (_ bv31 32)) )))))

(check-sat)

Z3 should return three consecutive (sat), showing that our assertions hold, and strengthening our confidence in our algorithm, assuming that we modeled the algorithm correctly, and that our Rust implementation implements the same algorithm as Z3. Of course, because our input is small (one byte), we can write a unit test case in our target implementation language, which verifies the results for all potential byte input values. Z3 can verify the results for much larger input e.g. 32 or 64 bits.

The second primitive, is_equal() compares two bytes, and returns 2^32 – 1 if they are equal, or 0 otherwise:

// return 2^32 -1  if x and y are equal, otherwise 0 in constant-time
// #[inline(always)]
pub fn is_equal(x: u32, y: u32) -> u32 {
    is_zero(x ^ y)
}

It builds upon our previous function is_zero() and uses the XOR operation, which returns 2^32 – 1 if both operands are equal. Then is_zero(0) is 2^32 -1, and is_zero(x>0) is 0. The last primitive conditional_select_ct() is a conditional selection between two values (without actual branching and therefore timing side channels), based on a given choice value, which in our case, can be either 0 or 2^32 – 1:

// return y if choice is zero, else x (choice i == 2^32 -1 ) in constant-time
// #[inline(always)]
pub fn conditional_select_ct(x: u32, y: u32, choice: u32) -> u32 {
    return y ^ (choice   (x ^ y));
}

If choice is 0, then the right expression (choice (x ^ y)) returns (bitwise AND between 0 and any other value always return 0), and conditional_select_ct() returns the value y XOR 0, therefore y.

if choice is 2^32 -1, bitwise AND works as an identity function (over the length of the 32 bits argument), and returns the right most expression (x ^ y). We are left with expression y ^ x ^ y , with both ys “canceling” each other (y ^ y == 0), ultimately evaluating to x ^ 0, therefore x.

Now, we can implement the main function that correctly returns the data at the secret index offset (either +1, +2, +3 or +4) in constant-time, using our last primitive:

// PACKET HEADER BYTE (1) | PACKET NUMBER (1..4) | DATA TO EXTRACT ...Zero padded (7-10) |
//                        ^                      ^
//                        |                      |-- Secret offset
//                        |-- Known offset

const PACKET_NUMBER_FIELD_START_OFFSET: usize = 1;
const PACKET_NUMBER_MIN_LEN : usize = 1;
const PACKET_NUMBER_MAX_LEN : usize = 4;
const DATA_FRAME_SIZE: usize = 12;

// Take a buffer of data of packet header,
// packet number (whose len is secret and ranges from 1 to 4),
// and of data to extract at secret offset
// and return extracted data in constant-time
// Returned data must be processed in constant-time
// otherwise it will reveal length of packet number
pub fn extract_data_at_secret_index (data:  [u8]) -> [u8; DATA_FRAME_SIZE] {
    assert!(data.len() == DATA_FRAME_SIZE);
    let mut data_out = [0u8; DATA_FRAME_SIZE];

    let secret_length = (data[0]   0x03) + 1; // compute the length of secret data

    for offset in PACKET_NUMBER_MIN_LEN..=PACKET_NUMBER_MAX_LEN {
        let mut i = offset;
        while i < DATA_FRAME_SIZE - PACKET_NUMBER_FIELD_START_OFFSET {
            data_out[i-offset] =
            conditional_select_ct( data[i+PACKET_NUMBER_FIELD_START_OFFSET] as u32,
                data_out[i-offset] as u32,
                is_equal(offset as u32 , secret_length as u32)) as u8;
            i += 1;
        }
    }
    data_out

}

After we extracted our secret packet number field length, we loop over the packet 4 (offset possible range of values) times. In each loop, we compare the offset with our secret packet number length. If they match, we conditionally select and copy in constant-time the correct byte value from our input (for each byte of the input), otherwise we just copy the previous byte value again. This means that in 1 out of 4 loops, we copy the correct value from our input, and that in 3 out of 4 loops, we copy the previous value again (whether it was set to the correct value yet, or not).

Let’s write an unit test case, to demonstrate the input and expected output for a secret offset of value 2 (meaning that the packet number field size is 2 bytes). We expect our attacker to not learn anything about the size of the packet number field, during the processing of the data field. For the purpose of our test, we set the packet number value to an arbitrary value, 0xffff, which has no incidence on the objectives of our test:

    #[test]
    fn displacement_2() {
        let data = [ 0x01u8, 0xff, 0xff, 1, 2, 3, 4, 5, 6, 7, 0, 0 ];
        let result = [1u8, 2, 3, 4, 5, 6, 7, 0, 0, 0, 0, 0];
        let r = extract_data_at_secret_index( data);
        assert_eq!(result, r);
    }

As we can see above, function extract_data_at_secret_index() stripped the `Packet Header Byte, of value 0x40 (0b01000000), the Packet Number of value 0xffff from the packet, and outputted the plaintext DATA (1,2,3,4,5,6,7), padded with extra 0s up to the length of the original packet.

Potential Shortcomings

For the implementation to be effectively constant-time, we have to have padded data. Otherwise, processing of the extracted data would reveal the size of the data after the secret offset, and therefore of the secret offset in the packet.

Furthermore, all additional processing of the extracted data must continue to be constant-time, as it may again reveal the size of the secret offset. This may be an insurmountable task, depending of the data to be processed. Areas of risk may include decryption, as in the case of the QUIC protocol, but also deserialization of data (e.g. JSON, base64), business logic etc.

Compilers, computer architectures, and operating environments also play a substantial role in enforcing constant-time execution. Compilers may or may not emit constant-time code with the same input, from one release to another. Several computer architectures may have non constant-time operations, such as multiplication, and binary right shift. Constant-time code implementers must carefully review the disassembled code output of their compilers in the context of the target operating environments, and actually time their code for increased assurance.

We analyzed the disassembly output for the Rust x86_64 compiler version 1.61.0 on macOS, and found it to be free of side-channels. For example, when is_zero is not compiled inline, it produces the following output, which does not contain any branching based on secret data:

objdump  -disassemble -x86-asm-syntax=intel target/release/libct_secret_pos.rlib

target/release/libct_secret_pos.rlib(lib.rmeta):	file format mach-o 64-bit x86-64


target/release/libct_secret_pos.rlib(ct_secret_pos-ffc0c1f54738cb18.ct_secret_pos.e06b5aa2-cgu.0.rcgu.o):	file format mach-o 64-bit x86-64


Disassembly of section __TEXT,__text:

0000000000000000 <__ZN13ct_secret_pos7is_zero17h31d480cf09b5c4d9E>:
       0: 55                           	push	rbp
       1: 48 89 e5                     	mov	rbp, rsp
       4: 83 ff 01                     	cmp	edi, 1
       7: 19 c0                        	sbb	eax, eax
       9: 5d                           	pop	rbp
       a: c3                           	ret
       b: 0f 1f 44 00 00               	nop	dword ptr [rax + rax]

Cost

We now hopefully have a constant-time implementation to extract data after a secret offset. However, the implementation is costly: we need to iterate 4 times over every received packet, from a publicly known offset near the beginning of the packet, up to its end, including padding. A QUIC packet can be up to 1,350 bytes long, minus 1 byte for the Packet Header Byte, and up to 20 bytes for the Destination Connection ID field. Things get worse thereafter. Remember that in QUIC, Encrypted Application Traffic Data is actually encrypted. We need to decrypt the data at 4 different offsets to not reveal its length, and the actual secret offset by inference, based on the maximum QUIC packet length. Then the application needs to decode, and process the decrypted data, in constant-time, depending of its threat mode, as alluded to earlier in this post.

We also casually omitted in our simplified QUIC protocol, that the value of the packet number is actually encrypted too, and must be decrypted, you guessed, four times, and processed in-constant-time thereafter.

Then there is the cost of the attack to consider. Most side-channel vulnerabilities are thought to be ranging from challenging to impossible to exploit, but this is highly contingent of the execution environment (attackers would fare a better chance if the QUIC process runs in the SGX Trusted Execution Environment, with the attackers controlling the SGX host), attacker location (host, local or publicly addressable network, etc.). It seems unlikely that attackers would expend effort in mounting an attack to reveal the packet number size, to further de-anonymize users, at least in the absence of other vulnerabilities, and in the vast majority of usage contexts.

Potential Improvements

In the general case, if one wants to access data at a secret offset and the secret offset range (maximum minus minimum) is N, then it can be done in log(N) passes.

In the case of QUIC and the packet number field, N = 4 so it hardly justifies doing anything more sophisticated, but it can be helpful in some situations. In the case of TLS 1.2 CBC records (the aforementioned TLS “Lucky Thirteen” attack remediation), the range is N = 20 (the size of a HMAC/SHA-1 output) and it can become interesting to use the log(N) optimization (the 20-byte value is “rotated back” in 5 passes instead of 20). See CBC padding.

Conclusion and Closing Thoughts

The QUIC protocol implements controls to ensure that packet numbers cannot be used to correlate users activity. Decoding, and processing of data based on the packet number field size may reveal information about the packet number, and facilitate correlation of users activity. In order to prevent this, the QUIC protocol mandates that decoding of packet numbers must be performed free of side channels. The QUIC packet number has a variable size encoding, forcing implementors to resort to constant-time processing at a secret offset, which is costly in the general case. We demonstrated a simplified example of such constant-time processing in the Rust programming language, noting that to maintain constant-time properties, one must establish an appropriate process as part of the software development life cycle to minimize risks over time.

Source Code and Extra Material

In this section, we provide the full Rust implementation of the constant-time proof of concept code, and an extra proof of concept implementation of constant-time data processing at a secret offset in the Common Lisp, a general-purpose, multi-paradigm programming language.

Constant-time data processing at a secret offset proof of concept code in Rust:


// return 2^32 -1 if x is 0, otherwise 0 in constant-time
// #[inline(always)]
pub fn is_zero(x: u32) -> u32 {
    !(((x as i32) | (x.wrapping_neg() as i32)) >> 31) as u32
}

// return 2^32 -1  if x and y are equal, otherwise 0 in constant-time
// #[inline(always)]
pub fn is_equal(x: u32, y: u32) -> u32 {
    is_zero(x ^ y)
}

// return y if choice is zero, else x (choice i == 2^32 -1 ) in constant-time
// #[inline(always)]
pub fn conditional_select_ct(x: u32, y: u32, choice: u32) -> u32 {
    return y ^ (choice   (x ^ y));
}

// PACKET HEADER BYTE (1) | PACKET NUMBER (1..4) | DATA TO EXTRACT ...Zero padded (7-10) |
//                        ^                      ^
//                        |                      |-- Secret offset
//                        |-- Known offset

const PACKET_NUMBER_FIELD_START_OFFSET: usize = 1;
const PACKET_NUMBER_MIN_LEN : usize = 1;
const PACKET_NUMBER_MAX_LEN : usize = 4;
const DATA_FRAME_SIZE: usize = 12;

// Take a buffer of data of packet header,
// packet number (whose len is secret and ranges from 1 to 4),
// and of data to extract at secret offset
// and return extracted data in constant-time
// Returned data must be processed in constant-time
// otherwise it will reveal length of packet number
pub fn extract_data_at_secret_index (data:  [u8]) -> [u8; DATA_FRAME_SIZE] {
    assert!(data.len() == DATA_FRAME_SIZE);
    let mut data_out = [0u8; DATA_FRAME_SIZE];

    let secret_length = (data[0]   0x03) + 1; // compute the length of secret data

    for offset in PACKET_NUMBER_MIN_LEN..=PACKET_NUMBER_MAX_LEN {
        let mut i = offset;
        while i < DATA_FRAME_SIZE - PACKET_NUMBER_FIELD_START_OFFSET {
            data_out[i-offset] =
            conditional_select_ct( data[i+PACKET_NUMBER_FIELD_START_OFFSET] as u32,
                data_out[i-offset] as u32,
                is_equal(offset as u32 , secret_length as u32)) as u8;
            i += 1;
        }
    }
    data_out

}

#[cfg(test)]
mod tests {

    use crate::is_zero;
    use crate::is_equal;
    use crate::extract_data_at_secret_index;

    #[test]
    // We test for a byte range [0,255]
    fn it_is_correct_and_does_not_overflow() {
        for i in 1..=u32::MAX {
            assert_eq!(is_zero(i), 0);
        }
        assert_eq!(is_zero(0), u32::MAX);
    }

    #[test]
    fn ct_base_operations() {
        assert_eq!(is_zero(5), 0);
        assert_eq!(is_zero(255), 0);
        assert_eq!(is_zero(0), u32::MAX);
        assert_eq!(is_equal(0, 255), 0);
        assert_eq!(is_equal(255,255), u32::MAX);
        assert_eq!(is_equal(1,2), 0);
    }

    #[test]
    fn displacement_1() {
        let data = [ 0x00u8, 0xff, 1, 2, 3, 4, 5, 6, 7, 0, 0, 0 ];
        let result = [1u8, 2, 3, 4, 5, 6, 7, 0, 0, 0, 0, 0];
        let r = extract_data_at_secret_index( data);
        assert_eq!(result, r);
    }

    #[test]
    fn displacement_2() {
        let data = [ 0x01u8, 0xff, 0xff, 1, 2, 3, 4, 5, 6, 7, 0, 0 ];
        let result = [1u8, 2, 3, 4, 5, 6, 7, 0, 0, 0, 0, 0];
        let r = extract_data_at_secret_index( data);
        assert_eq!(result, r);
    }

    #[test]
    fn displacement_3() {
        let data = [ 0x02u8, 0xff, 0xff, 0xff, 1, 2, 3, 4, 5, 6, 7, 0 ];
        let result = [1u8, 2, 3, 4, 5, 6, 7, 0, 0, 0, 0, 0];
        let r = extract_data_at_secret_index( data);
        assert_eq!(result, r);
    }

    #[test]
    fn displacement_4() {
        let data = [ 0x03u8, 0xff, 0xff, 0xff, 0xff, 1, 2, 3, 4, 5, 6, 7];
        let result = [1u8, 2, 3, 4, 5, 6, 7, 0, 0, 0, 0, 0];
        let r = extract_data_at_secret_index( data);
        assert_eq!(result, r);
    }

}

Extra material: constant-time, allocation free data processing at a secret offset proof of concept code in Common Lisp.

(defconstant PACKET-NUMBER-FIELD-START-OFFSET 1)
(defconstant PACKET-NUMBER-MIN-LEN 1)
(defconstant PACKET-NUMBER-MAX-LEN 4)
(defconstant DATA-FRAME-SIZE 12)

(declaim (ftype (function ((unsigned-byte 32)) (unsigned-byte 32)) zero-p))
;;                          ^input             ^return value     ^function name
(declaim (inline zero-p))

(defun zero-p(x)
  (declare (optimize (speed 3) (safety 0)))
  (ldb (byte 32 0)
       (ash
	(lognor x (- x))
	-31)))

(declaim (ftype (function ((unsigned-byte 32) (unsigned-byte 32)) (unsigned-byte 32)) equal-p))
;;                          ^input 1          ^ input 2         ^return value      ^function name
(declaim (inline equal-p))

(defun equal-p( x y)
  (declare (optimize (speed 3) (safety 0)))
  (declare (inline equal-p))
  (zero-p (logxor x y)))

(declaim (ftype (function ((unsigned-byte 32) (unsigned-byte 32) (unsigned-byte 32))
			  (unsigned-byte 32)) conditional-select-ct))
(declaim (inline conditional-select-ct))

(defun conditional-select-ct (x y choice)
  (declare (optimize (speed 3) (safety 0)))
  (logxor y
	  (logand
	   choice
	   (logxor x y ))))

(declaim (ftype
          (function
           ((simple-array (unsigned-byte 8))
	    (simple-array (unsigned-byte 8))
	    (function))
           (simple-array (unsigned-byte 8)))
          decrypt-data-at-secret-index))

(defun decrypt-data-at-secret-index(data-in data-out decrypt-fn)
  (declare (optimize (speed 3) (safety 0)))
  (declare  (type (simple-array (unsigned-byte 8)) data-in data-out))
  (let ((secret-length (+ 1 (logand (aref data-in 0) #x03))))
    (declare (type (unsigned-byte 32) secret-length))
    (do ((offset PACKET-NUMBER-MIN-LEN (+ offset 1)))
	((> offset PACKET-NUMBER-MAX-LEN)
	 (funcall decrypt-fn data-out))
      (do* ((i offset ( + i 1))
	    (loc (-  i offset) (- i offset)))
	   ((= i (- DATA-FRAME-SIZE PACKET-NUMBER-FIELD-START-OFFSET)))
	(declare (type (unsigned-byte 32) offset i loc))
	(setf (aref data-out loc)
	      (conditional-select-ct
	       (aref data-in (+ i PACKET-NUMBER-FIELD-START-OFFSET))
	       (aref data-out loc)
	       (equal-p offset secret-length)))))))
;; dummy decrypt function

(declaim (ftype
          (function
           ((simple-array (unsigned-byte 8)))
           (simple-array (unsigned-byte 8)))
          echo))
(declaim (inline echo))

(defun echo(data)
  (declare (optimize (speed 3) (safety 0)))
  data)

;; test

(defun test-decrypt-data-at-secret-index ()
  (let ((test-data
	  (list
	   (make-array DATA-FRAME-SIZE
		       :element-type '(unsigned-byte 8)
		       :initial-contents '( #x00 #xff 1 2 3 4 5 6 7 0 0 0 ))
	   (make-array DATA-FRAME-SIZE
		       :element-type '(unsigned-byte 8)
		       :initial-contents '( #x01 #xff #xff 1 2 3 4 5 6 7 0 0))
	   (make-array DATA-FRAME-SIZE
		       :element-type '(unsigned-byte 8)
		       :initial-contents '( #x02 #xff #xff #xff 1 2 3 4 5 6 7 0))
	   (make-array DATA-FRAME-SIZE
		       :element-type '(unsigned-byte 8)
		       :initial-contents '( #x03 #xff #xff #xff #xff 1 2 3 4 5 6 7 ))))
	  (expected-result
	    (make-array DATA-FRAME-SIZE
			:element-type '(unsigned-byte 8)
			:initial-contents '(1 2 3 4 5 6 7 0 0 0 0 0))))
    (dolist (data-in test-data)
      (let ((data-out
	      (make-array DATA-FRAME-SIZE
			  :element-type '(unsigned-byte 8))))
	(assert
	 (equalp
	  (decrypt-data-at-secret-index data-in data-out #'echo) 
	  expected-result))))))

(test-decrypt-data-at-secret-index)

;; Uncomment the following to check assembly code

;; (compile 'decrypt-data-at-secret-index)
;; (compile 'conditional-select-ct)
;; (compile 'equal-p)
;; (compile 'zero-p)
;; (compile 'echo)
;; (disassemble 'decrypt-data-at-secret-index)
;; (disassemble 'conditional-select-ct)
;; (disassemble 'equal-p)
;; (disassemble 'zero-p)
;; (disassemble 'echo)

Example of assembly code output of functions zero-p, and decrypt-data-at-secret-index(), using Steel Bank Common Lisp (SBCL), a Common Lisp compiler on a macOS intel machine:

; disassembly for ZERO-P
; Size: 34 bytes. Origin: #x5361F5E6                          ; ZERO-P
; 5E6:       488BC2           MOV RAX, RDX
; 5E9:       48F7D8           NEG RAX
; 5EC:       4809C2           OR RDX, RAX
; 5EF:       48C1FA1F         SAR RDX, 31
; 5F3:       4883E2FE         AND RDX, -2
; 5F7:       4883F2FE         XOR RDX, -2
; 5FB:       482315C6FFFFFF   AND RDX, [RIP-58]               ; [#x5361F5C8] = #x1FFFFFFFE
; 602:       488BE5           MOV RSP, RBP
; 605:       F8               CLC
; 606:       5D               POP RBP
; 607:       C3               RET

; disassembly for CONDITIONAL-SELECT-CT
; Size: 18 bytes. Origin: #x5361FCA6                          ; CONDITIONAL-SELECT-CT
; A6:       4831FA           XOR RDX, RDI
; A9:       4821D6           AND RSI, RDX
; AC:       4831F7           XOR RDI, RSI
; AF:       488BD7           MOV RDX, RDI
; B2:       488BE5           MOV RSP, RBP
; B5:       F8               CLC
; B6:       5D               POP RBP
; B7:       C3               RET

Acknowledgments

Many thanks to my NCC Group colleagues Giacomo Pope (@isogenies) for his insightful feedback on this blog post, and Thomas Pornin (@bearsslnews), who taught me so much about timing side channels, for his comments.

Author: Gérald Doussot (@gerald_doussot)