Digital Investigation 30 (2019) 127e147
Contents lists available at ScienceDirect
Digital Investigation
j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / d i i n
Reverse engineering of ReFS
Rune Nordvik a, b, *, Henry Georges a, Fergus Toolan b, Stefan Axelsson a, c
a Norwegian University of Science and Technology, Norway
b Norwegian Police University College, Norway
c Halmstad University, Sweden
a r t i c l e i n f o
a b s t r a c t
Article history:
Received 23 March 2019
Received in revised form
4 July 2019
Accepted 17 July 2019
Available online 23 July 2019
Keywords:
Digital forensics
ReFS
File system
File system forensics is an important part of Digital Forensics. Investigators of storage media have
traditionally focused on the most commonly used file systems such as NTFS, FAT, ExFAT, Ext2-4, HFSþ,
APFS, etc. NTFS is the current file system used by Windows for the system volume, but this may change in
the future. In this paper we will show the structure of the Resilient File System (ReFS), which has been
available since Windows Server 2012 and Windows 8. The main purpose of ReFS is to be used on storage
spaces in server systems, but it can also be used in Windows 8 or newer. Although ReFS is not the current
standard file system in Windows, while users have the option to create ReFS file systems, digital forensic
investigators need to investigate the file systems identified on a seized media. Further, we will focus on
remnants of non-allocated metadata structures or attributes. This may allow metadata carving, which
means searching for specific attributes that are not allocated. Attributes found can then be used for file
recovery. ReFS uses superblocks and checkpoints in addition to a VBR, which is different from other
Windows file systems. If the partition is reformatted with another file system, the backup superblocks
can be used for partition recovery. Further, it is possible to search for checkpoints in order to recover both
metadata and content.
Another concept not seen for Windows file systems, is the sharing of blocks. When a file is copied, both
the original and the new file will share the same content blocks. If the user changes the copy, new data
runs will be created for the modified content, but unchanged blocks remain shared. This may impact file
carving, because part of the blocks previously used by a deleted file might still be in use by another file.
The large default cluster size, 64 KiB, in ReFS v1.2 is an advantage when carving for deleted files, since
most deleted files are less than 64 KiB and therefore only use a single cluster. For ReFS v3.2 this
advantage has decreased because the standard cluster size is 4 KiB.
Preliminary support for ReFS v1.2 has been available in EnCase 7 and 8, but the implementation has
not been documented or peer-reviewed. The same is true for Paragon Software, which recently added
ReFS support to their forensic product. Our work documents how ReFS v1.2 and ReFS v3.2 are structured
at an abstraction level that allows digital forensic investigation of this new file system. At the time of
writing this paper, Paragon Software is the only digital forensic tool that supports ReFS v3.x.
It is the most recent version of the ReFS file system that is most relevant for digital forensics, as
Windows automatically updates the file system to the latest version on mount. This is why we have
included information about ReFS v3.2. However, it is possible to change a registry value to avoid
updating. The latest ReFS version observed is 3.4, but the information presented about 3.2 is still valid. In
any criminal case, the investigator needs to investigate the file system version found.
© 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND
license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
* Corresponding author. Norwegian University of Science and Technology,
Norway.
E-mail address: rune.nordvik@phs.no (R. Nordvik).
Introduction
Reverse engineering of closed source file systems is a prereq-
uisite for digital forensic investigations (Marshall and Paige, 2018).
Hence, when investigating a digital storage medium it is imperative
to retrieve the pertinent files from different file systems. Most in-
vestigators use digital forensic tools to retrieve this information,
https://doi.org/10.1016/j.diin.2019.07.004
1742-2876/© 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
128
R. Nordvik et al. / Digital Investigation 30 (2019) 127e147
and depend on these tools because of large backlogs (Scanlon,
2016). Digital forensic tools do not support all existing file sys-
tems, and there might be deviations between which file systems
different tools support. Further, different tools might implement
support for the same file system differently, and all versions of the
file system are not necessarily supported. Because of limited bud-
gets, digital forensic investigators only have access to a limited
number of digital forensic tools (Garfinkel, 2010). Without tools to
parse the underlying file system in a forensically sound manner, we
are left with a large unknown volume. It will still be possible to
carve files based on known file internal signatures (headers and/or
footers), but this approach will not find the metadata structures of
files, and often only partly retrieves the content of fragmented files
(Garfinkel, 2007).
Showing both metadata and the corresponding files in the
directory structure increases the evidentiary value of the evidence,
and the lack of file system parsing support in current digital
forensic tools motivated us to reverse engineer the Resilient file
system (ReFS). At the start of this research EnCase v.8 had support
for ReFS v1.2, and we needed to understand the structures of this
file system in order to verify results from this tool.
We found that Paragon Software now supports full forensic ReFS
access (Paragon Software, 2019b), but their implementation is
closed source.
Objectives
The main research problem is that the low level structures of the
new ReFS file system are undocumented, and investigators are left
with just a few tools that support parsing. It is best practice to test
tools, but this is not feasible for tools that implement ReFS support
without describing the structures. Hence the objectives are:
How can digital forensic investigators verify the reliability of
ReFS file system support in existing Digital Forensic tools?
How will Bþ tree balancing impact the recoverability of files in
ReFS?
Can digital forensic investigators be confident in the ReFS
integrity protection mechanism of metadata?
How will ReFS impact the recovery of files compared to NTFS?
We aim to solve the first research question by describing the
structures necessary to parse the file system. Knowing the struc-
tures will give digital forensic investigators the ability to verify the
digital forensic tools that claim ReFS support. The investigator will
be able to verify tool results, if they understand the low level
structures, by manually parsing the file system on a low level of
abstraction. The investigator can do this by using an existing
methodology, for instance the Framework of Reliable Experimental
Design (FRED) (Horsman, 2018).
The second research question is about Bþ tree balancing, which
often leaves remnants (Stahlberg et al., 2007), and we aim to verify
if Bþ tree node remnants in ReFS contain artifacts relevant for file
recoverability. An introduction to Bþ trees related to file systems
has been described by Carrier [5, p.290]. The aim is to identify
possibilities for recovering files based on unallocated metadata
structures.
The third research question is to test if there are existing tools
that can manipulate metadata in ReFS in such a way that impact our
confidence in the integrity protection mechanism. Would it be
possible to manually manipulate a timestamp using a hex editor,
without the file system detecting or fixing it. Will ReFS be resilient
for this kind of manipulations?
The fourth research question is about the use of remnants found
in unallocated space for recovery of files. This is important to know
since it may allow the investigator to restore previous files. We
compare this with NTFS, where unallocated records in the $MFT can
be recovered as long as they are not overwritten [5, p.328].
In order to answer the research questions we will need to first
reverse engineer and interpret the structures used.
Features important for digital forensics
ReFS, Microsoft's newest file system, increases the availability of
data (Microsoft, 2018c). If integrity streams for data (file content)
are enabled, the integrity of the data is also increased. Unfortu-
nately,
integrity streams for data are not enabled by default
(Microsoft, 2018b). However, integrity streams for metadata are
enabled. This is a very important feature, because it means
increased reliability of the metadata.
ReFS still uses the concept of attributes. The attributes found in
the NTFS $MFT are similar to the attributes found in ReFS, but not
identical (Head, 2015). However, the $MFT is not a part of ReFS
(Head, 2015). Instead, the attributes are now located in Bþ tree
nodes.
If Bþ tree reorganizing leaves remnants of confidential infor-
mation, then these remnants might further support recovery of
deleted data, highly relevant to digital forensics. The use of Bþ trees
in databases implies remnants of privacy data (Stahlberg et al.,
2007).
Currently, there are tools used for manipulation of metadata on
NTFS, for instance SetMace (Schicht, 2014), and we tested this tool
on ReFS, but it did not work since there was no $MFT in ReFS. This
will be similar for other tools that depend on manipulating the
$MFT. As long as no ReFS metadata manipulation tools exist, digital
forensic investigators may have increased trust in the validity of the
metadata.
In this paper we describe the main structures necessary to
manually interpret ReFS, and we have published a prototype tool
that is able to parse the structures of ReFS v1.2. We have published
the prototype tool under an open source license, and the tool is
available from: https://github.com/chef2505/refs. Publishing the
tool allows peers to review our interpretation of ReFS, and to test
our research reproducability in order to make it comply with the
Daubert criteria (US-Supreme-Court, 1993). Making the prototype
tool available as open source also allows other developers to
implement support for newer versions. Its purpose is to automate
the manual parsing of the file system in order to test our hypoth-
eses. Therefore, the prototype tool is not discussed further in this
paper.
Limitations
This reverse engineering of ReFS, a closed source file system,
was initially performed without the checked/debug version of
Windows 10, and therefore the names of the structures might not
correspond with the names given by the file system developers.
When finalizing this paper, we debugged the file system using the
partly debug/checked version from Windows 10.0.17134x64 bit,
including their symbols. We found that the names for structures
and structure field names were not included, only the function
names. Therefore, the real names of the structures are still
unknown.
The selections of file system instances does not handle all
possible use cases. Further, the results of this study are only valid
for the file system versions described.
We describe only ReFS v1.2 and ReFS v3.2 in this paper. A driver
is a software running within the kernel, and as all software it may
be updated to newer versions. Microsoft may develop new features
that may change the structures described in this paper. During the
R. Nordvik et al. / Digital Investigation 30 (2019) 127e147
129
finalizing of this paper we found that the latest ReFS version was
ReFS v3.4 (Unknown, 2019). We were able to manually parse the
ReFS v3.4, using the same structures as those defined for ReFS v3.2.
Organization of this paper
The remainder of this paper describes previous work and our
contributions in section 2. Then we describe the main methods we
have used to reverse engineer ReFS in section 3. Our results for ReFS
v1.2 are described in section 4, and the results for ReFS v3.2 are
described in section 5. In section 6 we discuss the results, and
finally in section 7 we summarize and describe further work.
Literature review
Little peer-reviewed research on ReFS has been published, and
therefore we will describe similar research performed on other
popular file systems in addition to previous work on ReFS. This is
relevant because there are similarities between file systems.
Jeff Ham (Hamm, 2009) was the first to publish documentation
about the ExFAT file system structures, and this work helped
practitioners to understand and analyze this file system. The sys-
tem is similar to the old FAT systems, but each file has multiple
directory entries which describe metadata about files. In addition
there is a file allocation table which is used mainly for fragmented
files, and a bitmap file for describing which cluster (block) is allo-
cated. Vandermeer et al. (2018) continued research of ExFAT and
were able to separate deleted entries from renamed entries by
correlation of the FAT table, the bitmap file and the directory
entries.
Brian Carrier (2005) has documented the NTFS file system, a
work based on the reverse engineering performed by the Linux-
NTFS Project. In NTFS everything is a file, and one of the most
important system files with forensic value is the master file table
($MFT). All files have at least one entry, even the $MFT itself. NTFS is
the main file system on Windows, and is still used on Windows
system volumes including Windows 10.
Microsoft (2018a) has built the new Resilient File System (ReFS)
on the basis of NTFS. However, instead of using a master file table,
they now use a single Bþ tree where metadata, bitmaps (alloca-
tors), files and folders can be found. Unfortunately, the inner
structures are not documented.
Metz (2013) has partly described some of the structures within
the ReFS file system, we assume it is for ReFS v1.1 and 1.2, however,
the results are preliminary. He identified structures that he referred
to as Object identifiers at levels 0,1,2 and 3. These object identifiers
are metadata structures that are 16 KiB in size. We have named
level 0 as the superblock,
level 2 as
$Object_tree and level 3 as directories which can contain files and
subdirectories.
level 1 as checkpoint,
Head (2015) has compared FAT32, NTFS and ReFS and he found
that the File System Recognition Structure was used in the ReFS
volume boot record. Head also described that there is no $MFT in
ReFS, and there are no instances of FILE0 or FILE to indicate any MFT
records. Further, Head found attribute style entries, and a $I30 in-
dex attribute in every folder. Head also describes that ReFS file
content is not saved resident within an attribute. Head also shows
that a previous name of a renamed folder was found in an earlier 16
KiB metadata block.
Gudadhe et al. (2015) describe some of the features that ReFS
has and describe that ReFS uses Bþ trees. Further they describe that
a checksum is always used for preserving the integrity of metadata,
and that a checksum for preserving file content can be enabled per
file, directory, or volume. They do not describe structures or arti-
facts that might be important for digital forensic investigation.
Ballenthin has published information about ReFS in memory
structures (Ballenthin, 2018a) and ReFS on disk structures
(Ballenthin, 2018b). Our work started on the basis of this work.
Georges (2018) has published a masters thesis about the reverse
engineering of ReFS v1.2, and his interpretation is at a low level. We
scrutinize his work, and improve it. Georges describes that he was
unable to document the structures of ReFS v3.2, therefore, we
continue the reverse engineering of ReFS v3.2.
Paragon Software (Paragon Software, 2019a) was the first to
release a ReFS driver for Linux, and it supports ReFS v1.x and ReFS
v3.x. They have not released the source code for the driver, and
customers need to contact them in order to get information about
this driver. They have also included support for ReFS in their digital
forensic tool (Paragon Software, 2019b).
Brian Carrier (2005) has also documented Ext2 and Ext3, which
use superblocks and group descriptors for file system layout, and
inodes for file metadata.
Kevin Fairbanks (2012) has documented Ext4, which is similar to
Ext2 and Ext3, but Ext4 has additional features.
Hansen and Toolan (2017) reverse engineered the APFS file
system, which enabled investigators to analyse iOS and Mac de-
vices. In 2017, none of the commercial digital forensic tools had
support for APFS. APFS also uses inodes for describing metadata
about files. The APFS file system uses Bþ trees extensively.
Plum and Dewald (2018) continued the work where they pro-
posed novel methods for file recovery in APFS. They utilize known
structures in order to recover files.
In October 2018 Apple released the APFS specifications which
show the actual structures and their meaning (Apple, 2018a), which
also could be used for digital forensic purposes. Apple has also
published technical information about the HFS þ file system, which
also uses Bþ trees (Apple, 2018b).
Stahlberg et al. (2007) describe the threat of privacy when using
database systems that utilize Bþ trees. They propose a system that
overwrites obsolete data when the Bþ tree is balanced and still in
memory. This is relevant for this paper because this gave us the
hypothesis that Bþ tree balancing in ReFS will leave remnants of
confidential information.
Method
We used reverse engineering as a method for finding the
structures of ReFS. Initially we used the diskpart command or the
format command in Windows 10x64 (ver 10.0.14393) to format
different partitions with ReFS file systems on one disk. We tried to
use different cluster sizes when formatting ReFS partitions, how-
ever this was not possible for ReFS v1.2. We also created ReFS
volumes where we added both small files and large files.
In order to enable formatting of ReFS v1.2 we added a registry
hack which allows formatting ReFS over non-mirrored volumes
(Winaero, 2018). In Windows 10 Pro we were able to format ReFS
v3.2 without any hack. Even when the registry hack was enabled,
we were not able to format USB thumb drives using ReFS. We have
observed that Microsoft has removed the option to format ReFS in
Windows v 10.0.17134x64 Pro, and the previous Registry hacks do
not work. We are still able to mount with read and write support in
this version of Windows 10. We can still use one of the Windows
Server editions to format a ReFS volume, and they can be attached
to Windows 10, which automatically update the volume to the
latest ReFS version. We are not sure why Microsoft has removed the
support for formatting ReFS volumes in Windows 10 Pro.
We used FTK Imager or ewfacquire to create forensic images of
the disks, and mounted these forensic images using ewfmount in
Linux.
In order to find the volume boot records we parsed the MBR or
130
R. Nordvik et al. / Digital Investigation 30 (2019) 127e147
the GPT using a hex viewer/editor. Then we skipped to the sector
location where the volume boot record starts in order to interpret
the volume (Carrier, 2005). This start sector location is always
important when investigating a disk image with multiple ReFS
volumes, because the ReFS file system volume uses pointer offsets
that are relative to the start of the volume boot record.
Comparing to other solutions
When we started this research only EnCase v7 or newer had
support for parsing ReFS v1.2, but no tool was able to parse ReFS
v3.2. We used information we observed when opening our forensic
images in EnCase v7, and have tried to use the same names on
system files as EnCase used.
ReFS volume boot record
Using knowledge about the NTFS volume boot record, we
assumed ReFS should contain information such as sector size,
cluster size, volume serial number, the location of the metadata
structure for files ($MFT), etc. Since the command line tool fsutil
can yield important properties for the file system, we used this tool
to verify our interpretation of the fields found in the VBR.
Bþ tree
We knew from the sparse documentation from Microsoft that
they extensively used Bþ trees in ReFS (Microsoft, 2018a). Microsoft
describes that one single Bþ tree structure was used that could
include other Bþ tree structures (Microsoft, 2018a). Therefore, we
started searching for blocks of data that could have a typical node
structure. Then we tried to identify the entry point that gives access
to the top node of the Bþ tree. When analyzing the Bþ tree
structures we used knowledge about the HFS þ file system (Apple,
2018b).
We knew from Stahlberg et al. (2007) that Bþ trees in database
systems could be a threat to privacy, because balancing Bþ trees
without wiping the content might leave remnants. For ReFS rem-
nants of previous metadata records, for example records containing
attributes, could have an evidentiary value for digital forensic
investigation. Therefore, we tried to identify if attributes are intact
even when they are not pointed to by pointers in the pointer area of
a node.
Keywords or signatures
When we found different keywords or signatures, we searched
for information about these to see if they could be connected to
known structures.
In memory structures
Often structures in memory are saved directly to disk. We used
previous work on ReFS memory structures in order to see if they are
also found on disk (Ballenthin, 2018a). We did not perform any in
depth kernel debugging or memory analysis, but rather used
structures from previous work. The main reason for this was that
we were not able to get hold of a checked/debug build of Windows
10. When using partly checked Windows 10, we did not get access
to private symbols for the refs.sys driver. We did, however, list
public function names that were available in the partially checked
Windows 10 version.
Known structures
Automation
Manually performing every test in a hex viewer is time
consuming, and therefore we created a prototype tool to parse the
structures found. This tool was used in our testing of ReFS v1.2. We
have made the tool available as open source for other researchers to
validate our work.
Experiments
When we reverse engineered ReFS v1.2, we used several ex-
periments comparing different states of sectors within the file
system, and we started by trying to understand the first sector of
the file system, the volume boot record. Describing all these ex-
periments is beyond the scope of this paper. Based on observations,
we defined research hypotheses that could explain what we
observed. Then we performed new experiments trying to falsify the
null hypothesis, in order to indirectly get support for our main
hypothesis about the meaning of a field.
For example one field that was unknown was the two bytes from
0x28 in the VBR. We observed the values 0x0102. When the ReFS
file system was upgraded to version 3.2 we saw that the value was
changed to 0x0302. Therefore, we defined the H1 that this was the
field for the file system major and minor version. We tested by
formatting new instances of the file system with the old and the
new version several times, and always the fields were corre-
sponding to the version of the file system. We also identified that
the file system was automatically updated after Microsoft released
a new version of the driver. When we used fsutil to verify the file
system version, it always corresponded to the value found in the
VBR. We had to reject our null hypothesis H0 that the changes of
these values was only a result by chance alone. We did not observe
once that the null hypothesis H0 was true. Therefore, the alternate
hypothesis H1 was indirectly supported. Similar experiments were
performed for the other values in the VBR, and values found in
other structures.
After the reverse engineering of the ReFS file system, we per-
formed a number of experiments in order to test if ReFS is resilient
to metadata manipulation. First we tested the tool SetMace
(Schicht, 2014) to see if it succeeds in changing the timestamp of a
file located on a ReFS volume. We also tested to manually change
the timestamp in a FNA file attribute.
We also checked if there are remnants of attributes not currently
in use by the system, and if it is possible to recover data based on
information found in these remnants.
Results - ReFS v1.2
We used our knowledge of known structures and compared
them to patterns discovered in hexdumps in order to give them
meaning. Whenever a structure was interpreted, multiple tests
were performed in order to try to falsify our interpretation (Popper,
1953). We also compared our observations with the ReFS structures
on disk that Ballenthin has published (Ballenthin, 2018b), which
also gave us an indication of searching for the entry block 0x1E and
the standard block size of an entry block.
In this section we will describe the structures necessary to
manually parse the ReFS v1.2 file system. The forensic container file
refs-v1_2.E01, available at Mendeley (Nordvik, 2019), allows the
reader to follow our examples. These structures are the result of
performing reverse engineering on the file system and testing
forensic image containers containing ReFS file systems. We present
the results as a guided tour through the structures necessary to
interpret in order to find the metdatata, files and their contents.
We will start by describing our results for ReFS v1.2. The first
R. Nordvik et al. / Digital Investigation 30 (2019) 127e147
131
Table 1
Structure of the volume boot record.
Length
Description
Offset
0x00
0x03
0x0B
0x10
0x14
0x16
0x18
0x20
0x24
0x28
0x29
0x2A
0x38
3
8
5
4
2
2
8
4
4
1
1
14
8
Jmp (Jump instructions)
FSName
MustBeZero
Identifier
Length (of FSRS)
Checksum (of FSRS)
Sectors in volume
Bytes per sector
Sectors per cluster
File system major version
File system minor version
Unknown
Volume Serial Number
Description
Entry Block number
Unknown
Unknown
Node ID
Unknown
Unknown
Fig. 1. File system recognition structure.
thing to note is that ReFS v1.2 always has 64 KiB clusters. The
format tools in Windows 10 did not support formatting ReFS v1.2
with cluster sizes other than 64 KiB, even though the documenta-
tion (Microsoft, 2018c) claims support for both 4 KiB (default) and
64 KiB cluster sizes. Our observations show that ReFS v3.2 corre-
sponds to the documentation in (Microsoft, 2018c).
ReFS volume boot record
For ReFS, the volume boot record is in the first sector of the file
system volume, as it is for FAT32 and NTFS (Carrier, 2005). For
FAT32 and NTFS the first 3 bytes are jump instructions to the boot
code [5, p. 254]. However, since ReFS v1.2 and ReFS v3.2 can not be
booted, the 3 first bytes are just zeros. We also found a C structure
from Microsoft (Computing, 2018) defining the fields of the file
system recognition structure (FSRS) as can be seen in Fig. 11.
Therefore, we got a kick start in interpreting the ReFS VBR. The first
6 fields in Table 1 were found by using Fig. 1. When interpreting the
length of the types, we assume a Windows OS.
The names used for the FSRS structure are the names from the
developers of ReFS, while the names from byte offset 0x18 are given
based on our experiments.
Entry block
During our reverse engineering we found blocks of 16 KiB and
blocks of 64 KiB. The blocks of 16 KiB were used for metadata and
system files. However, for data streams 64 KiB allocation blocks
were used. We named the 16 KiB blocks, entry blocks.
The first entry blocks we identified were file system metadata
blocks with pointers to other entry blocks, and finally to the first
entry block containing a Bþ tree. We will describe how we found
these entry blocks in section 4.4. Entry blocks in the Bþ tree include
nodes that followed a typical Bþ tree pattern. We found a similar
structure as used by other file systems that use Bþ trees. We
identified that all these 16 KiB blocks started with a descriptor for
the block. The entry block includes a descriptor (size 0x30) and can
include one or more nodes as shown in Fig. 2. When using the
structure tables presented in this paper, the E offset means a byte
offset relative to the entry block, while the R offset means a relative
offset to the actual structure. Whenever the E and R offsets are
equal, we will only show the E offset.
The first 8 byte field found at offset 0x0 of the entry block
descriptor contains its entry block number. We also found at offset
0x18 a field describing the node identifier (Node ID). However,
nodes that contain metadata will typically have the value 0 for the
node id.
We found just one or two nodes in an entry block, but it could be
more. Each node can have one or more records. A record contains
the sub entries of a node. A node describing a directory will contain
1 We added comments to make the structure easier to read for those not familiar
with C structures.
Table 2
Structure of the entry block descriptor.
E offset
0x00
0x08
0x10
0x18
0x20
0x28
Length
0x8
0x08
0x08
0x08
0x08
0x08
records of files and sub directories. There will be more than one
record per file. In NTFS each file has at least one MFT record, which
consists of a number of attributes, while in ReFS each of the attri-
butes are contained within records in directory nodes. A file's
standard information attribute is often more than 1000 bytes,
which means there are not many files required until the entry block
node is full. If an entry block contains a node that runs out of space
for new records, then the Bþ tree system will utilize a new entry
block which consists of a node that has extent records. Extents
make it possible for a node to extend its capacity by including re-
cords to other entry blocks, and adding more nodes, which for
instance allows for more files within a directory node. In this case
record 1 will contain an extent pointer to the entry blocks con-
taining the existing node that is running out of space, and record 2
will have another extent pointer that points to a new entry block
where the new records can be stored in a new node. We have not
tested if the records could be reorganized between the two nodes.
The order of records within a node is decided by the order of
pointers in the pointer area. This means that records can appear in
any order in the hex dump. We detected the extents records when
we were experimenting with different numbers of files in the same
directory.
Superblock
At first we had only identified this superblock as an entry block
that points to $Tree_Control. When we reversed engineered ReFS
v3.2, we found the string SUPB in the entry block descriptor, which
we believe is an abbreviation for superblock. From other file sys-
tems, such as ext4, we know the superblock is similar to the volume
boot record. We were experimenting by trying to find pointers to
the structures within the volume boot record, however we were
looking in the wrong place. Microsoft has included these pointers in
these superblock entries. We find it strange that they did not
included all
the information found in the VBR within the
superblock.
132
R. Nordvik et al. / Digital Investigation 30 (2019) 127e147
Fig. 2. Standard structures used by ReFS.
Finding the Bþ tree check point
In order to find the Bþ tree check point, the root of the Bþ tree,
we first need to find the superblock, then we follow the pointer to
the check point. In this section we will try to manually find the
starting node for the ReFS Bþ tree. EnCase had a system file named
$Tree_Control, which we believe is a check point. In ReFS v3.2
Microsoft includes the string CHKP in the entry block descriptor for
$Tree_Control. Using the sector offset to this structure, we found
the location where this structure started. However, Guidance
Software/Open Text does not explain how to find the structure. In
order to find out how, we searched for the values in the 0x1E entry
block, the superblock, that could point to the check point. Assuming
every entry block was 16 Kib, we could find the entry block (EB)
offset number of the checkpoint $Tree_Control by multiplying the
sector offset with 512 (sector size), and dividing it by 16 KiB (entry
block size). Since the EB offset is relative to the start of the VBR, we
subtract the VBR sector. The equation used to find the EB offset is
shown in Equation (1).
EB offset¼ EB sector*Sector size
EB size
VBR sector
(1)
Converting the value to a hex value, made it easy to find the
same value in the superblock. By searching for this EB offset hex
value (using Big Endian), we always found the superblock in entry
block 0x1E. However, it was not just one superblock.
We found three superblocks that had records with pointers to
$Tree_Control. One of these three superblocks was always found in
entry block 0x1E, and another one was found in the third last entry
block in the volume. In addition we found an extra superblock
backup in the entry block after the second one. When using ewf-
mount (in order to mount E01 files) and tools like dd and xxd (a
hex viewer), we could show the content of the 0x1E super block.
The command used is shown in Listing 1. The start of the VBR
volume is at sector 0x800 on the test image, and 0x1E is the entry
block we would like to show. In Listing 1 we have used a block size
of 512 bytes, and a count of 1 block, which will only show 512 bytes
of output. However, in the figures showing hex dumps, we have not
included all 512 bytes to make it more understandable. In Equation
(2) we show how we compute the sector offset to the sector start of
the entry block, which we use in the skip option of the dd
command.
EB sector¼ VBR sector þ EB number*EB size
Sector size
(2)
Listing 1: Command to show the superblock at 0x1E.
Both the 0x1E superblock and $Tree_Control check point have a
special structure. Table 3 shows the structure of these in order to
parse the superblock (entry point), and an extract from the hex
dump is shown in Fig. 3.
In the superblock 0x1E at offset 0x50 (4 bytes in length) the
value 0xA0 was found, which is the byte offset to where we find the
first entry block pointer. At the entry block byte offset 0xA0 (8 bytes
in length) we find the entry block pointer to $Tree_Control. There is
another pointer offset in 0xA8 (8 bytes in length), and we assume
this is a pointer to the backup $Tree_Control. The highlighted value
found in the hex dump shown in Fig. 3 is 0x1471 (LE) for the first
pointer and 0xF3F7 (LE) for the second pointer.
To show the content of entry block 0x1471, we use the same
command as shown in Listing 1, but we change 0x1E with the value
0x1471. This will change location to $Tree_Control, which is the
checkpoint for our Bþ tree structure.
In this section we have shown how to navigate to the entry block
that controls the Bþ tree. This entry block is a check point to the Bþ
tree. This entry block is named $Tree_Control because it can be
used to navigate the Bþ tree.
Top of the node tree
The entry block containing the top node of the Bþ tree is called
$Tree_Control by EnCase. From this entry block we can find
Table 3
Structure of the superblock.
E Offset
R offset
Length
Description
0x30
0x40
0x50
0x54
0x58
0x5C
0x00
0x10
0x20
0x24
0x28
0x2C
0x10
0x10
0x04
0x04
0x04
0x04
GUID
Unknown
Offset to first entry block pointer
Amount of entry block pointers
Offset to first record
Length of record each record
R. Nordvik et al. / Digital Investigation 30 (2019) 127e147
133
Table 4
Structure of the tree control checkpoint entry block.
E Offset
R offset
Length
Description
0x30
0x34
0x36
0x38
0x3C
0x40
0x50
0x58
0x5C
0x00
0x04
0x06
0x08
0x0C
0x10
0x20
0x28
0x2C
0x4
0x2
0x2
0x04
0x04
0x10
0x8
0x04
Var
Unknown
Major version
Minor version
Offset to first record
Size of a record
Unknown
Unknown
Amount of additional records
4 byte offset to each records
Pointer offset to the $Allocator_Lrg system file can be found in
the entry block at offset 0xB0, and in this example it was
pointing to entry block 0x38 (LE). This system file includes a
bitmap of large blocks (512 MiB).
Pointer offset to the $Allocator_Med system file can be found in
the entry block at offset 0xC8, and in this example it was
pointing to entry block 0x20 (LE). This system file includes a
bitmap of medium sized blocks (64 KiB).
Pointer offset to the $Allocator_Sml system file can be found in
the entry block at offset 0xE0, and in this example it was
pointing to entry block 0x21 (LE). This system file includes a
bitmap of small sized blocks (16 KiB).
Pointer offset to the $Attribute_List system file can be found in
the entry block at offset 0xF8, and in this example it was
pointing to entry block 0x244 (LE). We are uncertain of the
meaning of this system file.
Pointer offset to the $Object system file can be found in the entry
block at offset 0x110, and in this example it was pointing to
entry block 0x240 (LE). This system file describes the child-
parent dependencies, and can be used to rebuild the directory
paths for files.
Until now we have used special entry blocks (superblock and
check point), but in order to parse the normal nodes we introduce
the structure for the standard node descriptor in Table 5 and the
standard node header in Table 6. We will use them to interpret the
normal nodes which we find in the sub nodes pointed to by
$Tree_Control. We can think of $Tree_Control as the top block that
maintains control of all the sub Bþ trees, or as the root of the single
Bþ tree.
The standard node descriptor and standard node header can be
used from the level we have described as MSBþ and below in the
illustration in Fig. 5. The MSBþ is a magic signature for ReFS v3.2,
and includes one or more nodes. In ReFS v1.2 the magic signature is
not available, but we decided to name Entry Blocks containing
nodes for MSB þ anyway to make this consistent with ReFS v3.2.
The abbreviation, we assume, is for Microsoft Bþ tree.
In this section we have shown how to find the starting check
point, named $Tree_Control. This checkpoint makes it possible to
find all other nodes in the file system. In the illustration in Fig. 5 we
have visualized the main structures of ReFS. The superblock was
Table 5
Structure of the standard node descriptor.
E Offset
R offset
Length
Description
0x30
0x34
0x48
0x4A
0x50
0x54
0x00
0x04
0x18
0x1A
0x20
0x24
0x04
0x14
0x02
0x06
0x04
var
Length of Node descriptor
Unknown
Number of extents
Unknown
Number of records in node
Unknown
Fig. 3. Hex dump of the superblock 0x1E.
pointers to six different nodes that have a special purpose, and are
shown by EnCase as system files. In section 4.4 we show how to find
the $Tree_Control checkpoint manually by using a hex viewer. Now
we continue to interpret
checkpoint node
($Tree_Control).
the top level
We skip the first 0x30 bytes in Fig. 4, which is for the entry block
descriptor, and focus on the checkpoint descriptor. This entry block
does not contain a typical Bþ tree node, because the pointer area at
the end is missing.
When using Table 4 we saw, in the hex dump, in Fig. 4 that the
offset to the first record was 0x80, and this first record was a record
entry for this $Tree_Control check point. Further, we saw from the
major and minor version that this was ReFS v1.2. At offset 0x80 we
found a 0x18 byte record. The first 8 bytes of this record had the
value 0x1417 (LE). This was the same as the entry block number of
the $Tree_Control check point as also could be seen in the first 8
bytes of the entry block. This means that each of the records in this
checkpoint started with an entry block number to where the record
points.
According to byte offset 0x58 the number of records was 0x06.
At byte offset 0x5C we found a table of offsets, where each offset
was 4 bytes and included byte offsets to each of the additional
records in this entry block. In the example hex dump in Fig. 4 we
found the following offset values: 0x98, 0xB0, 0xC8, 0xE0, 0xF8,
0x110. The offsets were relative to the start of the entry block. The
first record started at offset 0x98, the next at 0xB0, etc. The list
below shows what we will find in these records. Each of the records
start with an entry block pointer.
Pointer offset to the $Object_Tree system file can be found in
entry block at offset 0x98, and in this example it was pointing to
entry block 0x23F (LE). This system file includes nodes and re-
cords that have information about directories, and files in sub
entry blocks.
Fig. 4. Hex dump of the first part of check point $Tree_Control.
134
R. Nordvik et al. / Digital Investigation 30 (2019) 127e147
Table 6
Structure of the standard node header.
E Offset
R offset
Length
Description
0x120
0x124
0x128
0x12C
0x130
0x134
0x138
0x00
0x04
0x08
0x0C
0x10
0x14
0x18
0x04
0x04
0x04
0x04
0x04
0x04
0x08
Length of Node header
Offset to next free record
Free space in node
Unknown
Offset to first pointer
Number of pointers in node
Offset to end of node
found in entry block 0x1E, and we have found the checkpoint
$Tree_Control. There was also another $Tree_Control which we
assume is a backup. We have already found the pointers to the child
nodes of $Tree_Control.
Files and folders
In this section we will show how the Bþ tree structures can be
parsed to find directories and files. We start by looking at the entry
block for $Object_Tree which, in this example, was located at 0x23F.
We used the command described in Listing 1, but changed 0x1E
with the entry block we wished to investigate. The hex dump of
$Object_Tree is shown in Fig. 6. We skipped the entry block
descriptor, which was 0x30 bytes. Here we found the node
descriptor that started with a length field, which had the value
0xF0. We skipped the node descriptor to make this section easier to
read.
From byte offset 0x120 we found the node header which
included information about the length of the node header (0x20
bytes), offset to the next free record, free space in node and offset to
first record pointer (0x354C (LE)) This is a relative offset from the
start of the node header. To find the entry block byte offset we
added 0x120. Therefore, the pointer area started at offset 0x366C.
We also found the number of pointers (this should be equal to the
amount of allocated records), which was 9. These pointers are 4
bytes in size and pointed to records in this node, and the offset was
relative to the node header (we needed to add 0x120 in this case).
In the node header we also found an offset to the end of the node.
We skipped to the first record by moving to offset 0x20 þ 0x120 ¼
0x140. This record is followed by the fourth record 0x70 þ 0x120 ¼
0x190.
We could have used the pointers from the pointer area to find all
allocated records, which we actually should do for any node.
However, we only showed two of the records here. It is the pointers
that decide the order of records, we can not assume that all records
are in sequence or that all records are in use.
At offset 0x140 in Fig. 6 we found the start of the records area.
Each record was 0x50 bytes. All the records were related to di-
rectories, either directories the user created or system created di-
rectories. These records could be analyzed using Table 7. The first
record was a pointer to $Volume (node id 0x500), which contained
information about the volume and also included a timestamp for
volume creation, which could have a value for the investigation.
The fourth record contained a pointer for the root directory (node
id 0x600). At this level in the Bþ tree we did not see the names of
the nodes, but most of the names could be found within the entry
blocks pointed to by the records at this level. We observed that
records with node IDs from 0x700 and above were normal di-
rectories either created by the user or system created directories.
We have illustrated this in Fig. 7. All the records with a node id
within the 0x500 range were metadata directories, and not shown
by File Explorer when parsing the root directory. All node IDs of
root sub-directories were in the 0x700 range.
Volume information
We used Table 7 to interpret the records in the entry block
$Object_Tree. The record for the 0x500 node id was found in entry
Fig. 5. Illustration of the ReFS structure.