Max Groot and Erik Schamper
TL;DR
- Windows Defender (the antivirus shipped with standard installations of Windows) places malicious files into quarantine upon detection.
- Reverse engineering
mpengine.dll
resulted in finding previously undocumented metadata in the Windows Defender quarantine folder that can be used for digital forensics and incident response. - Existing scripts that extract quarantined files do not process this metadata, even though it could be useful for analysis.
- Fox-IT’s open-source digital forensics and incident response framework Dissect can now recover this metadata, in addition to recovering quarantined files from the Windows Defender quarantine folder.
dissect.cstruct
allows us to use C-like structure definitions in Python, which enables easy continued research in other programming languages or reverse engineering in tools like IDA Pro.- Want to continue in IDA Pro? Just copy paste the structure definitions!
Introduction
During incident response engagements we often encounter antivirus applications that have rightfully triggered on malicious software that was deployed by threat actors. Most commonly we encounter this for Windows Defender, the antivirus solution that is shipped by default with Microsoft Windows. Windows Defender places malicious files in quarantine upon detection, so that the end user may decide to recover the file or delete it permanently. Threat actors, when faced with the detection capabilities of Defender, either disable the antivirus in its entirety or attempt to evade its detection.
The Windows Defender quarantine folder is valuable from the perspective of digital forensics and incident response (DFIR). First of all, it can reveal information about timestamps, locations and signatures of files that were detected by Windows Defender. Especially in scenarios where the threat actor has deleted the Windows Event logs, but left the quarantine folder intact, the quarantine folder is of great forensic value. Moreover, as the entire file is quarantined (so that the end user may choose to restore it), it is possible to recover files from quarantine for further reverse engineering and analysis.
While scripts already exist to recover files from the Defender quarantine folder, the purpose of much of the contents of this folder were previously unknown. We don’t like big unknowns, so we performed further research into the previously unknown metadata to see if we could uncover additional forensic traces.
Rather than just presenting our results, we’ve structured this blog to also describe the process to how we got there. Skip to the end if you are interested in the results rather than the technical details of reverse engineering Windows Defender.
Diving into Windows Defender internals
Existing Research
We started by looking into existing research into the internals of Windows Defender. The most extensive documentation we could find on the structures of Windows Defender quarantine files was Florian Bauchs’ whitepaper analyzing antivirus software quarantine files, but we also looked at several scripts on GitHub.
- In summary, whenever Defender puts a file into quarantine, it does three things:
A bunch of metadata pertaining to when, why and how the file was quarantined is held in aQuarantineEntry
. ThisQuarantineEntry
is RC4-encrypted and saved to disk in the/ProgramData/Microsoft/Windows Defender/Quarantine/Entries
folder. - The contents of the malicious file is stored in a
QuarantineEntryResourceData
file, which is also RC4-encrypted and saved to disk in the/ProgramData/Microsoft/Windows Defender/Quarantine/ResourceData
folder. - Within the
/ProgramData/Microsoft/Windows Defender/Quarantine/Resource
folder, aResource
file is made. Both from previous research as well as from our own findings during reverse engineering, it appears this file contains no information that cannot be obtained from theQuarantineEntry
and theQuarantineEntryResourceData
files. Therefore, we ignore theResource
file for the remainder of this blog.
While previous scripts are able to recover some properties from the ResourceData
and QuarantineEntry
files, large segments of data were left unparsed, which gave us a hunch that additional forensic artefacts were yet to be discovered.
Windows Defender encrypts both the QuarantineEntry
and the ResourceData
files using a hardcoded RC4 key defined in mpengine.dll
. This hardcoded key was initially published by Cuckoo and is paramount for the offline recovery of the quarantine folder.
Pivotting off of public scripts and Bauch’s whitepaper, we loaded mpengine.dll
into IDA to further review how Windows Defender places a file into quarantine. Using the PDB available from the Microsoft symbol server, we get a head start with some functions and structures already defined.
Recovering metadata by investigating the QuarantineEntry file
Let us begin with the QuarantineEntry
file. From this file, we would like to recover as much of the QuarantineEntry
structure as possible, as this holds all kinds of valuable metadata. The QuarantineEntry
file is not encrypted as one RC4 cipherstream, but consists of three chunks that are each individually encrypted using RC4.
These three chunks are what we have come to call QuarantineEntryFileHeader
, QuarantineEntrySection1
and QuarantineEntrySection2
.
QuarantineEntryFileHeader
describes the size ofQuarantineEntrySection1
andQuarantineEntrySection2
, and contains CRC checksums for both sections.QuarantineEntrySection1
contains valuable metadata that applies to allQuarantineEntryResource
instances within thisQuarantineEntry
file, such as theDetectionName
and theScanId
associated with the quarantine action.QuarantineEntrySection2
denotes the length and offset of everyQuarantineEntryResource
instance within thisQuarantineEntry
file so that they can be correctly parsed individually.
A QuarantineEntry
has one or more QuarantineEntryResource
instances associated with it. This contains additional information such as the path of the quarantined artefact, and the type of artefact that has been quarantined (e.g. regkey
or file
).
An overview of the different structures within QuarantineEntry
is provided in Figure 1:
Figure 1: An example overview of a QuarantineEntry
. In this example, two files were simultaneously quarantined by Windows Defender. Hence, there are two QuarantineEntryResource structures contained within this single QuarantineEntry
.
As QuarantineEntryFileHeader
is mostly a structure that describes how QuarantineEntrySection1
and QuarantineEntrySection2
should be parsed, we will first look into what those two consist of.
QuarantineEntrySection1
When reviewing mpengine.dll
within IDA, the contents of both QuarantineEntrySection1
and QuarantineEntrySection2
appear to be determined in theQexQuarantine::CQexQuaEntry::Commit
function.
The function receives an instance of the QexQuarantine::CQexQuaEntry
class. Unfortunately, the PDB file that Microsoft provides for mpengine.dll
does not contain contents for this structure. Most fields could, however, be derived using the function names in the PDB that are associated with the CQexQuaEntry
class:
Figure 2: Functions retrieving properties from QuarantineEntry
The Id
, ScanId
, ThreatId
, ThreatName
and Time
fields are most important, as these will be written to the QuarantineEntry
file.
At the start of the QexQuarantine::CQexQuaEntry::Commit
function, the size of Section1
is determined.
Figure 3: Reviewing the decompiled output of CqExQuaEntry::Commit
shows the size of QuarantineEntrySection1
being set to thre length of ThreatName
plus 53.
This sets section1_size
to a value of the length of the ThreatName
variable plus 53. We can determine what these additional 53 bytes consist of by looking at what values are set in the QexQuarantine::CQexQuaEntry::Commit
function for the Section1
buffer.
This took some experimentation and required trying different fields, offsets and sizes for the QuarantineEntrySection1
structure within IDA. After every change, we would review what these changes would do to the decompiled IDA view of the QexQuarantine::CQexQuaEntry::Commit
function.
Some trial and error landed us the following structure definition:
While reviewing the final decompiled output (right) for the assembly code (left), we noticed a field always being set to 1:
Figure 4: A field of QuarantineEntrySection1
always being set to the value of 1.
Given that we do not know what this field is used for, we opted to name the field ‘One’ for now. Most likely, it’s a boolean value that is always true within the context of the QexQuarantine::CQexQuaEntry::Commit
commit function.
QuarantineEntrySection2
Now that we have a structure definition for the first section of a QuarantineEntry
, we now move on to the second part. QuarantineEntrySection2
holds the number of QuarantineEntryResource
objects confined within a QuarantineEntry
, as well as the offsets into the QuarantineEntry
structure where they are located.
In most scenarios, one threat gets detected at a time, and one QuarantineEntry
will be associated with one QuarantineEntryResource
. This is not always the case: for example, if one unpacks a ZIP folder that contains multiple malicious files, Windows Defender might place them all into quarantine. Each individual malicious file of the ZIP would then be one QuarantineEntryResource
, but they are all confined within one QuarantineEntry
.
QuarantineEntryResource
To be able to parse QuarantineEntryResource
instances, we look into the CQexQuaResource::ToBinary
function. This function receives a QuarantineEntryResource
object, as well as a pointer to a buffer to which it needs to write the binary output to. If we can reverse the logic within this function, we can convert the binary output back into a parsed instance during forensic recovery.
Looking into the CQexQuaResource::ToBinary
function, we see two very similar loops as to what was observed before for serializing the ThreatName
of QuarantineEntrySection1
. By reviewing various decrypted QuarantineEntry
files, it quickly became apparent that these loops are responsible for reserving space in the output buffer for DetectionPath
and DetectionType
, with DetectionPath
being UTF-16 encoded:
Figure 5: Reservation of space for DetectionPath
and DetectionType
at the beginning of CQexQuaResource::ToBinary
Fields
When reviewing the QexQuarantine::CQexQuaEntry::Commit
function, we observed an interesting loop that (after investigating function calls and renaming variables) explains the data that is stored between the DetectionType
and DetectionPath
:
Figure 6: Alignment logic for serializing Fields
It appears QuarantineEntryResource
structures have one or more QuarantineResourceField
instances associated with them, with the number of fields associated with a QuarantineEntryResource
being stored in a single byte in between the DetectionPath
and DetectionType
. When saving the QuarantineEntry
to disk, fields have an alignment of 4 bytes. We could not find mentions of QuarantineEntryResourceField
structures in prior Windows Defender research, even though they can hold valuable information.
The CQExQuaResource
class has several different implementations of AddField
, accepting different kinds of parameters. Reviewing these functions showed that fields have an Identifier
, Type
, and a buffer Data
with a size of Size
, resulting in a simple TLV-like format:
To understand what kinds of types and identifiers are possible, we delve further into the different versions of the AddField
functions, which all accept a different data type:
Figure 7: Finding different field types based on different implementations of the CqExQuaResource::AddField
function
Visiting these functions, we reviewed the Type
and Size
variables to understand the different possible types of fields that can be set for QuarantineResource
instances. This yields the following FIELD_TYPE
enum:
As the AddField
functions are part of a virtual function table (vtable) of the CQexQuaResource
class, we cannot trivially find all places where the AddField
function is called, as they are not directly called (which would yield an xref in IDA). Therefore, we have not exhausted all code paths leading to a call of AddField
to identify all possible Identifier
values and how they are used. Our research yielded the following field identifiers as the most commonly observed, and of the most forensic value:
Especially CreationTime
, LastAccessTime
and LastWriteTime
can provide crucial data points during an investigation.
Revisiting the QuarantineEntrySection2
and QuarantineEntryResource
structures
Now that we have an understanding of how fields work and how they are stored within the QuarantineEntryResource
, we can derive the following structure for it:
Revisiting the QexQuarantine::CQexQuaEntry::Commit
function, we can now understand how this function determines at which offset every QuarantineEntryResource
is located within QuarantineEntry
. Using these offsets, we will later be able to parse individual QuarantineEntryResource
instances. Thus, the QuarantineEntrySection2
structure is fairly straightforward:
The last step for recovery of QuarantineEntry
: the QuarantineEntryFileHeader
Now that we have a proper understanding of the QuarantineEntry
, we want to know how it ends up written to disk in encrypted form, so that we can properly parse the file upon forensic recovery. By inspecting the QexQuarantine::CQexQuaEntry::Commit
function further, we can find how this ends up passing QuarantineSection1
and QuarantineSection2
to a function named CUserDatabase::Add
.
We noted earlier that the QuarantineEntry
contains three RC4-encrypted chunks. The first chunk of the file is created in the CUserDatabase::Add
function, and is the QuarantineEntryHeader
. The second chunk is QuarantineEntrySection1
. The third chunk starts with QuarantineEntrySection2
, followed by all QuarantineEntryResource
structures and their 4-byte aligned QuarantineEntryResourceField
structures.
We knew from Bauch’s work that the QuarantineEntryFileHeader
has a static size of 60 bytes, and contains the size of QuarantineEntrySection1
and QuarantineEntrySection2
. Thus, we need to decrypt the QuarantineEntryFileHeader
first.
Based on Bauch’s work, we started with the following structure for QuarantineEntryFileHeader
:
That leaves quite some bytes unknown though, so we went back to trusty IDA. Inspecting the CUserDatabase:Add
function helps us further understand the QuarantineEntryHeader
structure. For example, we can see the hardcoded magic header and footer:
Figure 8: Magic header and footer being set for the QuarantineEntryHeader
A CRC checksum calculation can be seen for both the buffer of QuarantineEntrySection1
and QuarantineSection2
:
Figure 9: CRC Checksum logic within CUserDatabase::Add
These checksums can be used upon recovery to verify the validity of the file. The CUserDatabase:Add
function then writes the three chunks in RC4-encrypted form to the QuarantineEntry
file buffer.
Based on these findings of the Magic header and footer and the CRC checksums, we can revise the structure definition for the QuarantineEntryFileHeader
:
This was the last piece to be able to parse QuarantineEntry
structures from their on-disk form. However, we do not want just the metadata: we want to recover the quarantined files as well.
Recovering files by investigating QuarantineEntryResourceData
We can now correctly parse QuarantineEntry
files, so it is time to turn our attention to the QuarantineEntryResourceData
file. This file contains the RC4-encrypted contents of the file that has been placed into quarantine.
Step one: eyeball hexdumps
Let’s start by letting Windows Defender quarantine a Mimikatz executable and reviewing its output files in the quarantine folder. One would think that merely RC4 decrypting the QuarantineEntryResourceData
file would result in the contents of the original file. However, a quick hexdump of a decrypted QuarantineEntryResourceData
file shows us that there is more information contained within:
As visible in the hexdump, the MZ value (which is located at the beginning of the buffer of the Mimikatz executable) only starts at offset 0xCC. This gives reason to believe there is potentially valuable information preceding it.
There is also additional information at the end of the ResourceData
file:
At the end of the hexdump, we see an additional buffer, which some may recognize as the “Zone Identifier”, or the “Mark of the Web”. As this Zone Identifier may tell you something about where a file originally came from, it is valuable for forensic investigations.
Step two: open IDA
To understand where these additional buffers come from and how we can parse them, we again dive into the bowels of mpengine.dll
. If we review the QuarantineFile
function, we see that it receives a QuarantineEntryResource
and QuarantineEntry
as parameters. When following the code path, we see that the BackupRead
function is called to write to a buffer of which we know that it will later be RC4-encrypted by Defender and written to the quarantine folder:
Figure 10: BackupRead being called withi nthe QuarantineFile
function.
Step three: RTFM
A glance at the documentation of BackupRead
reveals that this function returns a buffer seperated by Win32 stream IDs. The streams stored by BackupRead
contain all data streams as well as security data about the owner and permissions of a file. On NTFS file systems, a file can have multiple data attributes or streams: the “main” unnamed data stream and optionally other named data streams, often referred to as “alternate data streams”. For example, the Zone Identifier is stored in a seperate Zone.Identifier
data stream of a file. It makes sense that a function intended for backing up data preserves these alternate data streams as well.
The fact that BackupRead
preserves these streams is also good news for forensic analysis. First of all, malicious payloads can be hidden in alternate data streams. Moreover, alternate datastreams such as the Zone Identifier and the security data can help to understand where a file has come from and what it contains. We just need to recover the streams as they have been saved by BackupRead
!
Diving into IDA is not necessary, as the documentation tells us all that we need. For each data stream, the BackupRead
function writes a WIN32_STREAM_ID
to disk, which denotes (among other things) the size of the stream. Afterwards, it writes the data of the stream to the destination file and continues to the next stream. The WIN32_STREAM_ID
structure definition is documented on the Microsoft Learn website:
Who slipped this by the code review?
While reversing parts of mpengine.dll
, we came across an interesting looking call in the HandleThreatDetection
function. We appreciate that threats must be dealt with swiftly and with utmost discipline, but could not help but laugh at the curious choice of words when it came to naming this particular function.
Figure 11: A function call to SendThreatToCamp
, a ‘call’ to action that seems pretty harsh.
Implementing our findings into Dissect
We now have all structure definitions that we need to recover all metadata and quarantined files from the quarantine folder. There is only one step left: writing an implementation.
During incident response, we do not want to rely on scripts scattered across home directories and git repositories. This is why we integrate our research into Dissect.
We can leave all the boring stuff of parsing disks, volumes and evidence containers to Dissect, and write our implementation as a plugin to the framework. Thus, the only thing we need to do is parse the artefacts and feed the results back into the framework.
The dive into Windows Defender of the previous sections resulted in a number of structure definitions that we need to recover data from the Windows Defender quarantine folder. When making an implementation, we want our code to reflect these structure definitions as closely as possible, to make our code both readable and verifiable. This is where dissect.cstruct
comes in. It can parse structure definitions and make them available in your Python code. This removes a lot of boilerplate code for parsing structures and greatly enhances the readability of your parser. Let’s review how easily we can parse a QuarantineEntry
file using dissect.cstruct
:
As you can see, when the structure format is known, parsing it is trivial using dissect.cstruct
. The only caveat is that the QuarantineEntryFileHeader
, QuarantineEntrySection1
and QuarantineEntrySection2
structures are individually encrypted using the hardcoded RC4 key. Because only the size of QuarantineEntryFileHeader
is static (60 bytes), we parse that first and use the information contained in it to decrypt the other sections.
To parse the individual fields contained within the QuarantineEntryResource
, we have to do a bit more work. We cannot add the QuarantineEntryResourceField
directly to the QuarantineEntryResource
structure definition within dissect.cstruct
, as it currently does not support the type of alignment used by Windows Defender. However, it does support the QuarantineEntryResourceField
structure definition, so all we have to do is follow the alignment logic that we saw in IDA:
We can use dissect.cstruct
‘s dumpstruct
function to visualize our parsing to verify if we are correctly loading in all data:
And just like that, our parsing is done. Utilizing dissect.cstruct
makes parsing structures much easier to understand and implement. This also facilitates rapid iteration: we have altered our structure definitions dozens of times during our research, which would have been pure pain without having the ability to blindly copy-paste structure definitions into our Python editor of choice.
Implementing the parser within the Dissect framework brings great advantages. We do not have to worry at all about the format in which the forensic evidence is provided. Implementing the Defender recovery as a Dissect plugin means it just works on standard forensic evidence formats such as E01
or ASDF
, or against forensic packages the like of KAPE
and Acquire
, and even on a live virtual machine:
The full implementation of Windows Defender quarantine recovery can be observed on Github.
Conclusion
We hope to have shown that there can be great benefits to reverse engineering the internals of Microsoft Windows to discover forensic artifacts. By reverse engineering mpengine.dll
, we were able to further understand how Windows Defender places detected files into quarantine. We could then use this knowledge to discover (meta)data that was previously not fully documented or understood. The main results of this are the recovery of more information about the original quarantined file, such as various timestamps and additional NTFS data streams, like the Zone.Identifier
, which is information that can be useful in digital forensics or incident response investigations.
The documentation of QuarantineEntryResourceField
was not available prior to this research and we hope others can use this to further investigate which fields are yet to be discovered. We have also documented how the BackupRead
functionality is used by Defender to preserve the different data streams present in the NTFS file, including the Zone Identifier and Security Descriptor.
When writing our parser, using dissect.cstruct
allowed us to tightly integrate our findings of reverse engineering in our parsing, enhancing the readability and verifiability of the code. This can in turn help others to pivot off of our research, just like we did when pivotting off of the research of others into the Windows Defender quarantine folder.
This research has been implemented as a plugin for the Dissect framework. This means that our parser can operate independently of the type of evidence it is being run against. This functionality has been added to dissect.target
as of January 2nd 2023 and is installed with Dissect as of version 3.4.