

第1页 / 共21页
第2页 / 共21页
第3页 / 共21页
第4页 / 共21页
第5页 / 共21页
第6页 / 共21页
第7页 / 共21页
第8页 / 共21页
Digital Investigation 30 (2019) 127e147 Contents lists available at ScienceDirect Digital Investigation j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / d i i n Reverse engineering of ReFS Rune Nordvik a, b, *, Henry Georges a, Fergus Toolan b, Stefan Axelsson a, c a Norwegian University of Science and Technology, Norway b Norwegian Police University College, Norway c Halmstad University, Sweden a r t i c l e i n f o a b s t r a c t Article history: Received 23 March 2019 Received in revised form 4 July 2019 Accepted 17 July 2019 Available online 23 July 2019 Keywords: Digital forensics ReFS File system File system forensics is an important part of Digital Forensics. Investigators of storage media have traditionally focused on the most commonly used file systems such as NTFS, FAT, ExFAT, Ext2-4, HFSþ, APFS, etc. NTFS is the current file system used by Windows for the system volume, but this may change in the future. In this paper we will show the structure of the Resilient File System (ReFS), which has been available since Windows Server 2012 and Windows 8. The main purpose of ReFS is to be used on storage spaces in server systems, but it can also be used in Windows 8 or newer. Although ReFS is not the current standard file system in Windows, while users have the option to create ReFS file systems, digital forensic investigators need to investigate the file systems identified on a seized media. Further, we will focus on remnants of non-allocated metadata structures or attributes. This may allow metadata carving, which means searching for specific attributes that are not allocated. Attributes found can then be used for file recovery. ReFS uses superblocks and checkpoints in addition to a VBR, which is different from other Windows file systems. If the partition is reformatted with another file system, the backup superblocks can be used for partition recovery. Further, it is possible to search for checkpoints in order to recover both metadata and content. Another concept not seen for Windows file systems, is the sharing of blocks. When a file is copied, both the original and the new file will share the same content blocks. If the user changes the copy, new data runs will be created for the modified content, but unchanged blocks remain shared. This may impact file carving, because part of the blocks previously used by a deleted file might still be in use by another file. The large default cluster size, 64 KiB, in ReFS v1.2 is an advantage when carving for deleted files, since most deleted files are less than 64 KiB and therefore only use a single cluster. For ReFS v3.2 this advantage has decreased because the standard cluster size is 4 KiB. Preliminary support for ReFS v1.2 has been available in EnCase 7 and 8, but the implementation has not been documented or peer-reviewed. The same is true for Paragon Software, which recently added ReFS support to their forensic product. Our work documents how ReFS v1.2 and ReFS v3.2 are structured at an abstraction level that allows digital forensic investigation of this new file system. At the time of writing this paper, Paragon Software is the only digital forensic tool that supports ReFS v3.x. It is the most recent version of the ReFS file system that is most relevant for digital forensics, as Windows automatically updates the file system to the latest version on mount. This is why we have included information about ReFS v3.2. However, it is possible to change a registry value to avoid updating. The latest ReFS version observed is 3.4, but the information presented about 3.2 is still valid. In any criminal case, the investigator needs to investigate the file system version found. © 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). * Corresponding author. Norwegian University of Science and Technology, Norway. E-mail address: rune.nordvik@phs.no (R. Nordvik). Introduction Reverse engineering of closed source file systems is a prereq- uisite for digital forensic investigations (Marshall and Paige, 2018). Hence, when investigating a digital storage medium it is imperative to retrieve the pertinent files from different file systems. Most in- vestigators use digital forensic tools to retrieve this information, https://doi.org/10.1016/j.diin.2019.07.004 1742-2876/© 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
128 R. Nordvik et al. / Digital Investigation 30 (2019) 127e147 and depend on these tools because of large backlogs (Scanlon, 2016). Digital forensic tools do not support all existing file sys- tems, and there might be deviations between which file systems different tools support. Further, different tools might implement support for the same file system differently, and all versions of the file system are not necessarily supported. Because of limited bud- gets, digital forensic investigators only have access to a limited number of digital forensic tools (Garfinkel, 2010). Without tools to parse the underlying file system in a forensically sound manner, we are left with a large unknown volume. It will still be possible to carve files based on known file internal signatures (headers and/or footers), but this approach will not find the metadata structures of files, and often only partly retrieves the content of fragmented files (Garfinkel, 2007). Showing both metadata and the corresponding files in the directory structure increases the evidentiary value of the evidence, and the lack of file system parsing support in current digital forensic tools motivated us to reverse engineer the Resilient file system (ReFS). At the start of this research EnCase v.8 had support for ReFS v1.2, and we needed to understand the structures of this file system in order to verify results from this tool. We found that Paragon Software now supports full forensic ReFS access (Paragon Software, 2019b), but their implementation is closed source. Objectives The main research problem is that the low level structures of the new ReFS file system are undocumented, and investigators are left with just a few tools that support parsing. It is best practice to test tools, but this is not feasible for tools that implement ReFS support without describing the structures. Hence the objectives are:  How can digital forensic investigators verify the reliability of ReFS file system support in existing Digital Forensic tools?  How will Bþ tree balancing impact the recoverability of files in ReFS?  Can digital forensic investigators be confident in the ReFS integrity protection mechanism of metadata?  How will ReFS impact the recovery of files compared to NTFS? We aim to solve the first research question by describing the structures necessary to parse the file system. Knowing the struc- tures will give digital forensic investigators the ability to verify the digital forensic tools that claim ReFS support. The investigator will be able to verify tool results, if they understand the low level structures, by manually parsing the file system on a low level of abstraction. The investigator can do this by using an existing methodology, for instance the Framework of Reliable Experimental Design (FRED) (Horsman, 2018). The second research question is about Bþ tree balancing, which often leaves remnants (Stahlberg et al., 2007), and we aim to verify if Bþ tree node remnants in ReFS contain artifacts relevant for file recoverability. An introduction to Bþ trees related to file systems has been described by Carrier [5, p.290]. The aim is to identify possibilities for recovering files based on unallocated metadata structures. The third research question is to test if there are existing tools that can manipulate metadata in ReFS in such a way that impact our confidence in the integrity protection mechanism. Would it be possible to manually manipulate a timestamp using a hex editor, without the file system detecting or fixing it. Will ReFS be resilient for this kind of manipulations? The fourth research question is about the use of remnants found in unallocated space for recovery of files. This is important to know since it may allow the investigator to restore previous files. We compare this with NTFS, where unallocated records in the $MFT can be recovered as long as they are not overwritten [5, p.328]. In order to answer the research questions we will need to first reverse engineer and interpret the structures used. Features important for digital forensics ReFS, Microsoft's newest file system, increases the availability of data (Microsoft, 2018c). If integrity streams for data (file content) are enabled, the integrity of the data is also increased. Unfortu- nately, integrity streams for data are not enabled by default (Microsoft, 2018b). However, integrity streams for metadata are enabled. This is a very important feature, because it means increased reliability of the metadata. ReFS still uses the concept of attributes. The attributes found in the NTFS $MFT are similar to the attributes found in ReFS, but not identical (Head, 2015). However, the $MFT is not a part of ReFS (Head, 2015). Instead, the attributes are now located in Bþ tree nodes. If Bþ tree reorganizing leaves remnants of confidential infor- mation, then these remnants might further support recovery of deleted data, highly relevant to digital forensics. The use of Bþ trees in databases implies remnants of privacy data (Stahlberg et al., 2007). Currently, there are tools used for manipulation of metadata on NTFS, for instance SetMace (Schicht, 2014), and we tested this tool on ReFS, but it did not work since there was no $MFT in ReFS. This will be similar for other tools that depend on manipulating the $MFT. As long as no ReFS metadata manipulation tools exist, digital forensic investigators may have increased trust in the validity of the metadata. In this paper we describe the main structures necessary to manually interpret ReFS, and we have published a prototype tool that is able to parse the structures of ReFS v1.2. We have published the prototype tool under an open source license, and the tool is available from: https://github.com/chef2505/refs. Publishing the tool allows peers to review our interpretation of ReFS, and to test our research reproducability in order to make it comply with the Daubert criteria (US-Supreme-Court, 1993). Making the prototype tool available as open source also allows other developers to implement support for newer versions. Its purpose is to automate the manual parsing of the file system in order to test our hypoth- eses. Therefore, the prototype tool is not discussed further in this paper. Limitations This reverse engineering of ReFS, a closed source file system, was initially performed without the checked/debug version of Windows 10, and therefore the names of the structures might not correspond with the names given by the file system developers. When finalizing this paper, we debugged the file system using the partly debug/checked version from Windows 10.0.17134x64 bit, including their symbols. We found that the names for structures and structure field names were not included, only the function names. Therefore, the real names of the structures are still unknown. The selections of file system instances does not handle all possible use cases. Further, the results of this study are only valid for the file system versions described. We describe only ReFS v1.2 and ReFS v3.2 in this paper. A driver is a software running within the kernel, and as all software it may be updated to newer versions. Microsoft may develop new features that may change the structures described in this paper. During the
R. Nordvik et al. / Digital Investigation 30 (2019) 127e147 129 finalizing of this paper we found that the latest ReFS version was ReFS v3.4 (Unknown, 2019). We were able to manually parse the ReFS v3.4, using the same structures as those defined for ReFS v3.2. Organization of this paper The remainder of this paper describes previous work and our contributions in section 2. Then we describe the main methods we have used to reverse engineer ReFS in section 3. Our results for ReFS v1.2 are described in section 4, and the results for ReFS v3.2 are described in section 5. In section 6 we discuss the results, and finally in section 7 we summarize and describe further work. Literature review Little peer-reviewed research on ReFS has been published, and therefore we will describe similar research performed on other popular file systems in addition to previous work on ReFS. This is relevant because there are similarities between file systems. Jeff Ham (Hamm, 2009) was the first to publish documentation about the ExFAT file system structures, and this work helped practitioners to understand and analyze this file system. The sys- tem is similar to the old FAT systems, but each file has multiple directory entries which describe metadata about files. In addition there is a file allocation table which is used mainly for fragmented files, and a bitmap file for describing which cluster (block) is allo- cated. Vandermeer et al. (2018) continued research of ExFAT and were able to separate deleted entries from renamed entries by correlation of the FAT table, the bitmap file and the directory entries. Brian Carrier (2005) has documented the NTFS file system, a work based on the reverse engineering performed by the Linux- NTFS Project. In NTFS everything is a file, and one of the most important system files with forensic value is the master file table ($MFT). All files have at least one entry, even the $MFT itself. NTFS is the main file system on Windows, and is still used on Windows system volumes including Windows 10. Microsoft (2018a) has built the new Resilient File System (ReFS) on the basis of NTFS. However, instead of using a master file table, they now use a single Bþ tree where metadata, bitmaps (alloca- tors), files and folders can be found. Unfortunately, the inner structures are not documented. Metz (2013) has partly described some of the structures within the ReFS file system, we assume it is for ReFS v1.1 and 1.2, however, the results are preliminary. He identified structures that he referred to as Object identifiers at levels 0,1,2 and 3. These object identifiers are metadata structures that are 16 KiB in size. We have named level 0 as the superblock, level 2 as $Object_tree and level 3 as directories which can contain files and subdirectories. level 1 as checkpoint, Head (2015) has compared FAT32, NTFS and ReFS and he found that the File System Recognition Structure was used in the ReFS volume boot record. Head also described that there is no $MFT in ReFS, and there are no instances of FILE0 or FILE to indicate any MFT records. Further, Head found attribute style entries, and a $I30 in- dex attribute in every folder. Head also describes that ReFS file content is not saved resident within an attribute. Head also shows that a previous name of a renamed folder was found in an earlier 16 KiB metadata block. Gudadhe et al. (2015) describe some of the features that ReFS has and describe that ReFS uses Bþ trees. Further they describe that a checksum is always used for preserving the integrity of metadata, and that a checksum for preserving file content can be enabled per file, directory, or volume. They do not describe structures or arti- facts that might be important for digital forensic investigation. Ballenthin has published information about ReFS in memory structures (Ballenthin, 2018a) and ReFS on disk structures (Ballenthin, 2018b). Our work started on the basis of this work. Georges (2018) has published a masters thesis about the reverse engineering of ReFS v1.2, and his interpretation is at a low level. We scrutinize his work, and improve it. Georges describes that he was unable to document the structures of ReFS v3.2, therefore, we continue the reverse engineering of ReFS v3.2. Paragon Software (Paragon Software, 2019a) was the first to release a ReFS driver for Linux, and it supports ReFS v1.x and ReFS v3.x. They have not released the source code for the driver, and customers need to contact them in order to get information about this driver. They have also included support for ReFS in their digital forensic tool (Paragon Software, 2019b). Brian Carrier (2005) has also documented Ext2 and Ext3, which use superblocks and group descriptors for file system layout, and inodes for file metadata. Kevin Fairbanks (2012) has documented Ext4, which is similar to Ext2 and Ext3, but Ext4 has additional features. Hansen and Toolan (2017) reverse engineered the APFS file system, which enabled investigators to analyse iOS and Mac de- vices. In 2017, none of the commercial digital forensic tools had support for APFS. APFS also uses inodes for describing metadata about files. The APFS file system uses Bþ trees extensively. Plum and Dewald (2018) continued the work where they pro- posed novel methods for file recovery in APFS. They utilize known structures in order to recover files. In October 2018 Apple released the APFS specifications which show the actual structures and their meaning (Apple, 2018a), which also could be used for digital forensic purposes. Apple has also published technical information about the HFS þ file system, which also uses Bþ trees (Apple, 2018b). Stahlberg et al. (2007) describe the threat of privacy when using database systems that utilize Bþ trees. They propose a system that overwrites obsolete data when the Bþ tree is balanced and still in memory. This is relevant for this paper because this gave us the hypothesis that Bþ tree balancing in ReFS will leave remnants of confidential information. Method We used reverse engineering as a method for finding the structures of ReFS. Initially we used the diskpart command or the format command in Windows 10x64 (ver 10.0.14393) to format different partitions with ReFS file systems on one disk. We tried to use different cluster sizes when formatting ReFS partitions, how- ever this was not possible for ReFS v1.2. We also created ReFS volumes where we added both small files and large files. In order to enable formatting of ReFS v1.2 we added a registry hack which allows formatting ReFS over non-mirrored volumes (Winaero, 2018). In Windows 10 Pro we were able to format ReFS v3.2 without any hack. Even when the registry hack was enabled, we were not able to format USB thumb drives using ReFS. We have observed that Microsoft has removed the option to format ReFS in Windows v 10.0.17134x64 Pro, and the previous Registry hacks do not work. We are still able to mount with read and write support in this version of Windows 10. We can still use one of the Windows Server editions to format a ReFS volume, and they can be attached to Windows 10, which automatically update the volume to the latest ReFS version. We are not sure why Microsoft has removed the support for formatting ReFS volumes in Windows 10 Pro. We used FTK Imager or ewfacquire to create forensic images of the disks, and mounted these forensic images using ewfmount in Linux. In order to find the volume boot records we parsed the MBR or
130 R. Nordvik et al. / Digital Investigation 30 (2019) 127e147 the GPT using a hex viewer/editor. Then we skipped to the sector location where the volume boot record starts in order to interpret the volume (Carrier, 2005). This start sector location is always important when investigating a disk image with multiple ReFS volumes, because the ReFS file system volume uses pointer offsets that are relative to the start of the volume boot record. Comparing to other solutions When we started this research only EnCase v7 or newer had support for parsing ReFS v1.2, but no tool was able to parse ReFS v3.2. We used information we observed when opening our forensic images in EnCase v7, and have tried to use the same names on system files as EnCase used. ReFS volume boot record Using knowledge about the NTFS volume boot record, we assumed ReFS should contain information such as sector size, cluster size, volume serial number, the location of the metadata structure for files ($MFT), etc. Since the command line tool fsutil can yield important properties for the file system, we used this tool to verify our interpretation of the fields found in the VBR. Bþ tree We knew from the sparse documentation from Microsoft that they extensively used Bþ trees in ReFS (Microsoft, 2018a). Microsoft describes that one single Bþ tree structure was used that could include other Bþ tree structures (Microsoft, 2018a). Therefore, we started searching for blocks of data that could have a typical node structure. Then we tried to identify the entry point that gives access to the top node of the Bþ tree. When analyzing the Bþ tree structures we used knowledge about the HFS þ file system (Apple, 2018b). We knew from Stahlberg et al. (2007) that Bþ trees in database systems could be a threat to privacy, because balancing Bþ trees without wiping the content might leave remnants. For ReFS rem- nants of previous metadata records, for example records containing attributes, could have an evidentiary value for digital forensic investigation. Therefore, we tried to identify if attributes are intact even when they are not pointed to by pointers in the pointer area of a node. Keywords or signatures When we found different keywords or signatures, we searched for information about these to see if they could be connected to known structures. In memory structures Often structures in memory are saved directly to disk. We used previous work on ReFS memory structures in order to see if they are also found on disk (Ballenthin, 2018a). We did not perform any in depth kernel debugging or memory analysis, but rather used structures from previous work. The main reason for this was that we were not able to get hold of a checked/debug build of Windows 10. When using partly checked Windows 10, we did not get access to private symbols for the refs.sys driver. We did, however, list public function names that were available in the partially checked Windows 10 version. Known structures Automation Manually performing every test in a hex viewer is time consuming, and therefore we created a prototype tool to parse the structures found. This tool was used in our testing of ReFS v1.2. We have made the tool available as open source for other researchers to validate our work. Experiments When we reverse engineered ReFS v1.2, we used several ex- periments comparing different states of sectors within the file system, and we started by trying to understand the first sector of the file system, the volume boot record. Describing all these ex- periments is beyond the scope of this paper. Based on observations, we defined research hypotheses that could explain what we observed. Then we performed new experiments trying to falsify the null hypothesis, in order to indirectly get support for our main hypothesis about the meaning of a field. For example one field that was unknown was the two bytes from 0x28 in the VBR. We observed the values 0x0102. When the ReFS file system was upgraded to version 3.2 we saw that the value was changed to 0x0302. Therefore, we defined the H1 that this was the field for the file system major and minor version. We tested by formatting new instances of the file system with the old and the new version several times, and always the fields were corre- sponding to the version of the file system. We also identified that the file system was automatically updated after Microsoft released a new version of the driver. When we used fsutil to verify the file system version, it always corresponded to the value found in the VBR. We had to reject our null hypothesis H0 that the changes of these values was only a result by chance alone. We did not observe once that the null hypothesis H0 was true. Therefore, the alternate hypothesis H1 was indirectly supported. Similar experiments were performed for the other values in the VBR, and values found in other structures. After the reverse engineering of the ReFS file system, we per- formed a number of experiments in order to test if ReFS is resilient to metadata manipulation. First we tested the tool SetMace (Schicht, 2014) to see if it succeeds in changing the timestamp of a file located on a ReFS volume. We also tested to manually change the timestamp in a FNA file attribute. We also checked if there are remnants of attributes not currently in use by the system, and if it is possible to recover data based on information found in these remnants. Results - ReFS v1.2 We used our knowledge of known structures and compared them to patterns discovered in hexdumps in order to give them meaning. Whenever a structure was interpreted, multiple tests were performed in order to try to falsify our interpretation (Popper, 1953). We also compared our observations with the ReFS structures on disk that Ballenthin has published (Ballenthin, 2018b), which also gave us an indication of searching for the entry block 0x1E and the standard block size of an entry block. In this section we will describe the structures necessary to manually parse the ReFS v1.2 file system. The forensic container file refs-v1_2.E01, available at Mendeley (Nordvik, 2019), allows the reader to follow our examples. These structures are the result of performing reverse engineering on the file system and testing forensic image containers containing ReFS file systems. We present the results as a guided tour through the structures necessary to interpret in order to find the metdatata, files and their contents. We will start by describing our results for ReFS v1.2. The first
R. Nordvik et al. / Digital Investigation 30 (2019) 127e147 131 Table 1 Structure of the volume boot record. Length Description Offset 0x00 0x03 0x0B 0x10 0x14 0x16 0x18 0x20 0x24 0x28 0x29 0x2A 0x38 3 8 5 4 2 2 8 4 4 1 1 14 8 Jmp (Jump instructions) FSName MustBeZero Identifier Length (of FSRS) Checksum (of FSRS) Sectors in volume Bytes per sector Sectors per cluster File system major version File system minor version Unknown Volume Serial Number Description Entry Block number Unknown Unknown Node ID Unknown Unknown Fig. 1. File system recognition structure. thing to note is that ReFS v1.2 always has 64 KiB clusters. The format tools in Windows 10 did not support formatting ReFS v1.2 with cluster sizes other than 64 KiB, even though the documenta- tion (Microsoft, 2018c) claims support for both 4 KiB (default) and 64 KiB cluster sizes. Our observations show that ReFS v3.2 corre- sponds to the documentation in (Microsoft, 2018c). ReFS volume boot record For ReFS, the volume boot record is in the first sector of the file system volume, as it is for FAT32 and NTFS (Carrier, 2005). For FAT32 and NTFS the first 3 bytes are jump instructions to the boot code [5, p. 254]. However, since ReFS v1.2 and ReFS v3.2 can not be booted, the 3 first bytes are just zeros. We also found a C structure from Microsoft (Computing, 2018) defining the fields of the file system recognition structure (FSRS) as can be seen in Fig. 11. Therefore, we got a kick start in interpreting the ReFS VBR. The first 6 fields in Table 1 were found by using Fig. 1. When interpreting the length of the types, we assume a Windows OS. The names used for the FSRS structure are the names from the developers of ReFS, while the names from byte offset 0x18 are given based on our experiments. Entry block During our reverse engineering we found blocks of 16 KiB and blocks of 64 KiB. The blocks of 16 KiB were used for metadata and system files. However, for data streams 64 KiB allocation blocks were used. We named the 16 KiB blocks, entry blocks. The first entry blocks we identified were file system metadata blocks with pointers to other entry blocks, and finally to the first entry block containing a Bþ tree. We will describe how we found these entry blocks in section 4.4. Entry blocks in the Bþ tree include nodes that followed a typical Bþ tree pattern. We found a similar structure as used by other file systems that use Bþ trees. We identified that all these 16 KiB blocks started with a descriptor for the block. The entry block includes a descriptor (size 0x30) and can include one or more nodes as shown in Fig. 2. When using the structure tables presented in this paper, the E offset means a byte offset relative to the entry block, while the R offset means a relative offset to the actual structure. Whenever the E and R offsets are equal, we will only show the E offset. The first 8 byte field found at offset 0x0 of the entry block descriptor contains its entry block number. We also found at offset 0x18 a field describing the node identifier (Node ID). However, nodes that contain metadata will typically have the value 0 for the node id. We found just one or two nodes in an entry block, but it could be more. Each node can have one or more records. A record contains the sub entries of a node. A node describing a directory will contain 1 We added comments to make the structure easier to read for those not familiar with C structures. Table 2 Structure of the entry block descriptor. E offset 0x00 0x08 0x10 0x18 0x20 0x28 Length 0x8 0x08 0x08 0x08 0x08 0x08 records of files and sub directories. There will be more than one record per file. In NTFS each file has at least one MFT record, which consists of a number of attributes, while in ReFS each of the attri- butes are contained within records in directory nodes. A file's standard information attribute is often more than 1000 bytes, which means there are not many files required until the entry block node is full. If an entry block contains a node that runs out of space for new records, then the Bþ tree system will utilize a new entry block which consists of a node that has extent records. Extents make it possible for a node to extend its capacity by including re- cords to other entry blocks, and adding more nodes, which for instance allows for more files within a directory node. In this case record 1 will contain an extent pointer to the entry blocks con- taining the existing node that is running out of space, and record 2 will have another extent pointer that points to a new entry block where the new records can be stored in a new node. We have not tested if the records could be reorganized between the two nodes. The order of records within a node is decided by the order of pointers in the pointer area. This means that records can appear in any order in the hex dump. We detected the extents records when we were experimenting with different numbers of files in the same directory. Superblock At first we had only identified this superblock as an entry block that points to $Tree_Control. When we reversed engineered ReFS v3.2, we found the string SUPB in the entry block descriptor, which we believe is an abbreviation for superblock. From other file sys- tems, such as ext4, we know the superblock is similar to the volume boot record. We were experimenting by trying to find pointers to the structures within the volume boot record, however we were looking in the wrong place. Microsoft has included these pointers in these superblock entries. We find it strange that they did not included all the information found in the VBR within the superblock.
132 R. Nordvik et al. / Digital Investigation 30 (2019) 127e147 Fig. 2. Standard structures used by ReFS. Finding the Bþ tree check point In order to find the Bþ tree check point, the root of the Bþ tree, we first need to find the superblock, then we follow the pointer to the check point. In this section we will try to manually find the starting node for the ReFS Bþ tree. EnCase had a system file named $Tree_Control, which we believe is a check point. In ReFS v3.2 Microsoft includes the string CHKP in the entry block descriptor for $Tree_Control. Using the sector offset to this structure, we found the location where this structure started. However, Guidance Software/Open Text does not explain how to find the structure. In order to find out how, we searched for the values in the 0x1E entry block, the superblock, that could point to the check point. Assuming every entry block was 16 Kib, we could find the entry block (EB) offset number of the checkpoint $Tree_Control by multiplying the sector offset with 512 (sector size), and dividing it by 16 KiB (entry block size). Since the EB offset is relative to the start of the VBR, we subtract the VBR sector. The equation used to find the EB offset is shown in Equation (1). EB offset¼ EB sector*Sector size EB size VBR sector (1) Converting the value to a hex value, made it easy to find the same value in the superblock. By searching for this EB offset hex value (using Big Endian), we always found the superblock in entry block 0x1E. However, it was not just one superblock. We found three superblocks that had records with pointers to $Tree_Control. One of these three superblocks was always found in entry block 0x1E, and another one was found in the third last entry block in the volume. In addition we found an extra superblock backup in the entry block after the second one. When using ewf- mount (in order to mount E01 files) and tools like dd and xxd (a hex viewer), we could show the content of the 0x1E super block. The command used is shown in Listing 1. The start of the VBR volume is at sector 0x800 on the test image, and 0x1E is the entry block we would like to show. In Listing 1 we have used a block size of 512 bytes, and a count of 1 block, which will only show 512 bytes of output. However, in the figures showing hex dumps, we have not included all 512 bytes to make it more understandable. In Equation (2) we show how we compute the sector offset to the sector start of the entry block, which we use in the skip option of the dd command. EB sector¼ VBR sector þ EB number*EB size Sector size (2) Listing 1: Command to show the superblock at 0x1E. Both the 0x1E superblock and $Tree_Control check point have a special structure. Table 3 shows the structure of these in order to parse the superblock (entry point), and an extract from the hex dump is shown in Fig. 3. In the superblock 0x1E at offset 0x50 (4 bytes in length) the value 0xA0 was found, which is the byte offset to where we find the first entry block pointer. At the entry block byte offset 0xA0 (8 bytes in length) we find the entry block pointer to $Tree_Control. There is another pointer offset in 0xA8 (8 bytes in length), and we assume this is a pointer to the backup $Tree_Control. The highlighted value found in the hex dump shown in Fig. 3 is 0x1471 (LE) for the first pointer and 0xF3F7 (LE) for the second pointer. To show the content of entry block 0x1471, we use the same command as shown in Listing 1, but we change 0x1E with the value 0x1471. This will change location to $Tree_Control, which is the checkpoint for our Bþ tree structure. In this section we have shown how to navigate to the entry block that controls the Bþ tree. This entry block is a check point to the Bþ tree. This entry block is named $Tree_Control because it can be used to navigate the Bþ tree. Top of the node tree The entry block containing the top node of the Bþ tree is called $Tree_Control by EnCase. From this entry block we can find Table 3 Structure of the superblock. E Offset R offset Length Description 0x30 0x40 0x50 0x54 0x58 0x5C 0x00 0x10 0x20 0x24 0x28 0x2C 0x10 0x10 0x04 0x04 0x04 0x04 GUID Unknown Offset to first entry block pointer Amount of entry block pointers Offset to first record Length of record each record
R. Nordvik et al. / Digital Investigation 30 (2019) 127e147 133 Table 4 Structure of the tree control checkpoint entry block. E Offset R offset Length Description 0x30 0x34 0x36 0x38 0x3C 0x40 0x50 0x58 0x5C 0x00 0x04 0x06 0x08 0x0C 0x10 0x20 0x28 0x2C 0x4 0x2 0x2 0x04 0x04 0x10 0x8 0x04 Var Unknown Major version Minor version Offset to first record Size of a record Unknown Unknown Amount of additional records 4 byte offset to each records  Pointer offset to the $Allocator_Lrg system file can be found in the entry block at offset 0xB0, and in this example it was pointing to entry block 0x38 (LE). This system file includes a bitmap of large blocks (512 MiB).  Pointer offset to the $Allocator_Med system file can be found in the entry block at offset 0xC8, and in this example it was pointing to entry block 0x20 (LE). This system file includes a bitmap of medium sized blocks (64 KiB).  Pointer offset to the $Allocator_Sml system file can be found in the entry block at offset 0xE0, and in this example it was pointing to entry block 0x21 (LE). This system file includes a bitmap of small sized blocks (16 KiB).  Pointer offset to the $Attribute_List system file can be found in the entry block at offset 0xF8, and in this example it was pointing to entry block 0x244 (LE). We are uncertain of the meaning of this system file.  Pointer offset to the $Object system file can be found in the entry block at offset 0x110, and in this example it was pointing to entry block 0x240 (LE). This system file describes the child- parent dependencies, and can be used to rebuild the directory paths for files. Until now we have used special entry blocks (superblock and check point), but in order to parse the normal nodes we introduce the structure for the standard node descriptor in Table 5 and the standard node header in Table 6. We will use them to interpret the normal nodes which we find in the sub nodes pointed to by $Tree_Control. We can think of $Tree_Control as the top block that maintains control of all the sub Bþ trees, or as the root of the single Bþ tree. The standard node descriptor and standard node header can be used from the level we have described as MSBþ and below in the illustration in Fig. 5. The MSBþ is a magic signature for ReFS v3.2, and includes one or more nodes. In ReFS v1.2 the magic signature is not available, but we decided to name Entry Blocks containing nodes for MSB þ anyway to make this consistent with ReFS v3.2. The abbreviation, we assume, is for Microsoft Bþ tree. In this section we have shown how to find the starting check point, named $Tree_Control. This checkpoint makes it possible to find all other nodes in the file system. In the illustration in Fig. 5 we have visualized the main structures of ReFS. The superblock was Table 5 Structure of the standard node descriptor. E Offset R offset Length Description 0x30 0x34 0x48 0x4A 0x50 0x54 0x00 0x04 0x18 0x1A 0x20 0x24 0x04 0x14 0x02 0x06 0x04 var Length of Node descriptor Unknown Number of extents Unknown Number of records in node Unknown Fig. 3. Hex dump of the superblock 0x1E. pointers to six different nodes that have a special purpose, and are shown by EnCase as system files. In section 4.4 we show how to find the $Tree_Control checkpoint manually by using a hex viewer. Now we continue to interpret checkpoint node ($Tree_Control). the top level We skip the first 0x30 bytes in Fig. 4, which is for the entry block descriptor, and focus on the checkpoint descriptor. This entry block does not contain a typical Bþ tree node, because the pointer area at the end is missing. When using Table 4 we saw, in the hex dump, in Fig. 4 that the offset to the first record was 0x80, and this first record was a record entry for this $Tree_Control check point. Further, we saw from the major and minor version that this was ReFS v1.2. At offset 0x80 we found a 0x18 byte record. The first 8 bytes of this record had the value 0x1417 (LE). This was the same as the entry block number of the $Tree_Control check point as also could be seen in the first 8 bytes of the entry block. This means that each of the records in this checkpoint started with an entry block number to where the record points. According to byte offset 0x58 the number of records was 0x06. At byte offset 0x5C we found a table of offsets, where each offset was 4 bytes and included byte offsets to each of the additional records in this entry block. In the example hex dump in Fig. 4 we found the following offset values: 0x98, 0xB0, 0xC8, 0xE0, 0xF8, 0x110. The offsets were relative to the start of the entry block. The first record started at offset 0x98, the next at 0xB0, etc. The list below shows what we will find in these records. Each of the records start with an entry block pointer.  Pointer offset to the $Object_Tree system file can be found in entry block at offset 0x98, and in this example it was pointing to entry block 0x23F (LE). This system file includes nodes and re- cords that have information about directories, and files in sub entry blocks. Fig. 4. Hex dump of the first part of check point $Tree_Control.
134 R. Nordvik et al. / Digital Investigation 30 (2019) 127e147 Table 6 Structure of the standard node header. E Offset R offset Length Description 0x120 0x124 0x128 0x12C 0x130 0x134 0x138 0x00 0x04 0x08 0x0C 0x10 0x14 0x18 0x04 0x04 0x04 0x04 0x04 0x04 0x08 Length of Node header Offset to next free record Free space in node Unknown Offset to first pointer Number of pointers in node Offset to end of node found in entry block 0x1E, and we have found the checkpoint $Tree_Control. There was also another $Tree_Control which we assume is a backup. We have already found the pointers to the child nodes of $Tree_Control. Files and folders In this section we will show how the Bþ tree structures can be parsed to find directories and files. We start by looking at the entry block for $Object_Tree which, in this example, was located at 0x23F. We used the command described in Listing 1, but changed 0x1E with the entry block we wished to investigate. The hex dump of $Object_Tree is shown in Fig. 6. We skipped the entry block descriptor, which was 0x30 bytes. Here we found the node descriptor that started with a length field, which had the value 0xF0. We skipped the node descriptor to make this section easier to read. From byte offset 0x120 we found the node header which included information about the length of the node header (0x20 bytes), offset to the next free record, free space in node and offset to first record pointer (0x354C (LE)) This is a relative offset from the start of the node header. To find the entry block byte offset we added 0x120. Therefore, the pointer area started at offset 0x366C. We also found the number of pointers (this should be equal to the amount of allocated records), which was 9. These pointers are 4 bytes in size and pointed to records in this node, and the offset was relative to the node header (we needed to add 0x120 in this case). In the node header we also found an offset to the end of the node. We skipped to the first record by moving to offset 0x20 þ 0x120 ¼ 0x140. This record is followed by the fourth record 0x70 þ 0x120 ¼ 0x190. We could have used the pointers from the pointer area to find all allocated records, which we actually should do for any node. However, we only showed two of the records here. It is the pointers that decide the order of records, we can not assume that all records are in sequence or that all records are in use. At offset 0x140 in Fig. 6 we found the start of the records area. Each record was 0x50 bytes. All the records were related to di- rectories, either directories the user created or system created di- rectories. These records could be analyzed using Table 7. The first record was a pointer to $Volume (node id 0x500), which contained information about the volume and also included a timestamp for volume creation, which could have a value for the investigation. The fourth record contained a pointer for the root directory (node id 0x600). At this level in the Bþ tree we did not see the names of the nodes, but most of the names could be found within the entry blocks pointed to by the records at this level. We observed that records with node IDs from 0x700 and above were normal di- rectories either created by the user or system created directories. We have illustrated this in Fig. 7. All the records with a node id within the 0x500 range were metadata directories, and not shown by File Explorer when parsing the root directory. All node IDs of root sub-directories were in the 0x700 range. Volume information We used Table 7 to interpret the records in the entry block $Object_Tree. The record for the 0x500 node id was found in entry Fig. 5. Illustration of the ReFS structure.