ReFS文件系统逆向工程.pdf

发布时间：2022-06-07 发布人：admin 分类：说明书资料大小：8.69M 资料格式：pdf 举报版权申诉

jim19_encase-12204893-4744302543365991864.pdf-第1页.png

第1页 / 共21页

jim19_encase-12204893-4744302543365991864.pdf-第2页.png

第2页 / 共21页

jim19_encase-12204893-4744302543365991864.pdf-第3页.png

第3页 / 共21页

jim19_encase-12204893-4744302543365991864.pdf-第4页.png

第4页 / 共21页

jim19_encase-12204893-4744302543365991864.pdf-第5页.png

第5页 / 共21页

jim19_encase-12204893-4744302543365991864.pdf-第6页.png

第6页 / 共21页

jim19_encase-12204893-4744302543365991864.pdf-第7页.png

第7页 / 共21页

jim19_encase-12204893-4744302543365991864.pdf-第8页.png

第8页 / 共21页

文本预览

Digital Investigation 30 (2019) 127e147 Contents lists available at ScienceDirect Digital Investigation j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / d i i n Reverse engineering of ReFS Rune Nordvik a, b, *, Henry Georges a, Fergus Toolan b, Stefan Axelsson a, c a Norwegian University of Science and Technology, Norway b Norwegian Police University College, Norway c Halmstad University, Sweden a r t i c l e i n f o a b s t r a c t Article history: Received 23 March 2019 Received in revised form 4 July 2019 Accepted 17 July 2019 Available online 23 July 2019 Keywords: Digital forensics ReFS File system File system forensics is an important part of Digital Forensics. Investigators of storage media have traditionally focused on the most commonly used ﬁle systems such as NTFS, FAT, ExFAT, Ext2-4, HFSþ, APFS, etc. NTFS is the current ﬁle system used by Windows for the system volume, but this may change in the future. In this paper we will show the structure of the Resilient File System (ReFS), which has been available since Windows Server 2012 and Windows 8. The main purpose of ReFS is to be used on storage spaces in server systems, but it can also be used in Windows 8 or newer. Although ReFS is not the current standard ﬁle system in Windows, while users have the option to create ReFS ﬁle systems, digital forensic investigators need to investigate the ﬁle systems identiﬁed on a seized media. Further, we will focus on remnants of non-allocated metadata structures or attributes. This may allow metadata carving, which means searching for speciﬁc attributes that are not allocated. Attributes found can then be used for ﬁle recovery. ReFS uses superblocks and checkpoints in addition to a VBR, which is different from other Windows ﬁle systems. If the partition is reformatted with another ﬁle system, the backup superblocks can be used for partition recovery. Further, it is possible to search for checkpoints in order to recover both metadata and content. Another concept not seen for Windows ﬁle systems, is the sharing of blocks. When a ﬁle is copied, both the original and the new ﬁle will share the same content blocks. If the user changes the copy, new data runs will be created for the modiﬁed content, but unchanged blocks remain shared. This may impact ﬁle carving, because part of the blocks previously used by a deleted ﬁle might still be in use by another ﬁle. The large default cluster size, 64 KiB, in ReFS v1.2 is an advantage when carving for deleted ﬁles, since most deleted ﬁles are less than 64 KiB and therefore only use a single cluster. For ReFS v3.2 this advantage has decreased because the standard cluster size is 4 KiB. Preliminary support for ReFS v1.2 has been available in EnCase 7 and 8, but the implementation has not been documented or peer-reviewed. The same is true for Paragon Software, which recently added ReFS support to their forensic product. Our work documents how ReFS v1.2 and ReFS v3.2 are structured at an abstraction level that allows digital forensic investigation of this new ﬁle system. At the time of writing this paper, Paragon Software is the only digital forensic tool that supports ReFS v3.x. It is the most recent version of the ReFS ﬁle system that is most relevant for digital forensics, as Windows automatically updates the ﬁle system to the latest version on mount. This is why we have included information about ReFS v3.2. However, it is possible to change a registry value to avoid updating. The latest ReFS version observed is 3.4, but the information presented about 3.2 is still valid. In any criminal case, the investigator needs to investigate the ﬁle system version found. © 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). * Corresponding author. Norwegian University of Science and Technology, Norway. E-mail address: rune.nordvik@phs.no (R. Nordvik). Introduction Reverse engineering of closed source ﬁle systems is a prereq- uisite for digital forensic investigations (Marshall and Paige, 2018). Hence, when investigating a digital storage medium it is imperative to retrieve the pertinent ﬁles from different ﬁle systems. Most in- vestigators use digital forensic tools to retrieve this information, https://doi.org/10.1016/j.diin.2019.07.004 1742-2876/© 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

128 R. Nordvik et al. / Digital Investigation 30 (2019) 127e147 and depend on these tools because of large backlogs (Scanlon, 2016). Digital forensic tools do not support all existing ﬁle sys- tems, and there might be deviations between which ﬁle systems different tools support. Further, different tools might implement support for the same ﬁle system differently, and all versions of the ﬁle system are not necessarily supported. Because of limited bud- gets, digital forensic investigators only have access to a limited number of digital forensic tools (Garﬁnkel, 2010). Without tools to parse the underlying ﬁle system in a forensically sound manner, we are left with a large unknown volume. It will still be possible to carve ﬁles based on known ﬁle internal signatures (headers and/or footers), but this approach will not ﬁnd the metadata structures of ﬁles, and often only partly retrieves the content of fragmented ﬁles (Garﬁnkel, 2007). Showing both metadata and the corresponding ﬁles in the directory structure increases the evidentiary value of the evidence, and the lack of ﬁle system parsing support in current digital forensic tools motivated us to reverse engineer the Resilient ﬁle system (ReFS). At the start of this research EnCase v.8 had support for ReFS v1.2, and we needed to understand the structures of this ﬁle system in order to verify results from this tool. We found that Paragon Software now supports full forensic ReFS access (Paragon Software, 2019b), but their implementation is closed source. Objectives The main research problem is that the low level structures of the new ReFS ﬁle system are undocumented, and investigators are left with just a few tools that support parsing. It is best practice to test tools, but this is not feasible for tools that implement ReFS support without describing the structures. Hence the objectives are: How can digital forensic investigators verify the reliability of ReFS ﬁle system support in existing Digital Forensic tools? How will Bþ tree balancing impact the recoverability of ﬁles in ReFS? Can digital forensic investigators be conﬁdent in the ReFS integrity protection mechanism of metadata? How will ReFS impact the recovery of ﬁles compared to NTFS? We aim to solve the ﬁrst research question by describing the structures necessary to parse the ﬁle system. Knowing the struc- tures will give digital forensic investigators the ability to verify the digital forensic tools that claim ReFS support. The investigator will be able to verify tool results, if they understand the low level structures, by manually parsing the ﬁle system on a low level of abstraction. The investigator can do this by using an existing methodology, for instance the Framework of Reliable Experimental Design (FRED) (Horsman, 2018). The second research question is about Bþ tree balancing, which often leaves remnants (Stahlberg et al., 2007), and we aim to verify if Bþ tree node remnants in ReFS contain artifacts relevant for ﬁle recoverability. An introduction to Bþ trees related to ﬁle systems has been described by Carrier [5, p.290]. The aim is to identify possibilities for recovering ﬁles based on unallocated metadata structures. The third research question is to test if there are existing tools that can manipulate metadata in ReFS in such a way that impact our conﬁdence in the integrity protection mechanism. Would it be possible to manually manipulate a timestamp using a hex editor, without the ﬁle system detecting or ﬁxing it. Will ReFS be resilient for this kind of manipulations? The fourth research question is about the use of remnants found in unallocated space for recovery of ﬁles. This is important to know since it may allow the investigator to restore previous ﬁles. We compare this with NTFS, where unallocated records in the $MFT can be recovered as long as they are not overwritten [5, p.328]. In order to answer the research questions we will need to ﬁrst reverse engineer and interpret the structures used. Features important for digital forensics ReFS, Microsoft's newest ﬁle system, increases the availability of data (Microsoft, 2018c). If integrity streams for data (ﬁle content) are enabled, the integrity of the data is also increased. Unfortu- nately, integrity streams for data are not enabled by default (Microsoft, 2018b). However, integrity streams for metadata are enabled. This is a very important feature, because it means increased reliability of the metadata. ReFS still uses the concept of attributes. The attributes found in the NTFS $MFT are similar to the attributes found in ReFS, but not identical (Head, 2015). However, the $MFT is not a part of ReFS (Head, 2015). Instead, the attributes are now located in Bþ tree nodes. If Bþ tree reorganizing leaves remnants of conﬁdential infor- mation, then these remnants might further support recovery of deleted data, highly relevant to digital forensics. The use of Bþ trees in databases implies remnants of privacy data (Stahlberg et al., 2007). Currently, there are tools used for manipulation of metadata on NTFS, for instance SetMace (Schicht, 2014), and we tested this tool on ReFS, but it did not work since there was no $MFT in ReFS. This will be similar for other tools that depend on manipulating the $MFT. As long as no ReFS metadata manipulation tools exist, digital forensic investigators may have increased trust in the validity of the metadata. In this paper we describe the main structures necessary to manually interpret ReFS, and we have published a prototype tool that is able to parse the structures of ReFS v1.2. We have published the prototype tool under an open source license, and the tool is available from: https://github.com/chef2505/refs. Publishing the tool allows peers to review our interpretation of ReFS, and to test our research reproducability in order to make it comply with the Daubert criteria (US-Supreme-Court, 1993). Making the prototype tool available as open source also allows other developers to implement support for newer versions. Its purpose is to automate the manual parsing of the ﬁle system in order to test our hypoth- eses. Therefore, the prototype tool is not discussed further in this paper. Limitations This reverse engineering of ReFS, a closed source ﬁle system, was initially performed without the checked/debug version of Windows 10, and therefore the names of the structures might not correspond with the names given by the ﬁle system developers. When ﬁnalizing this paper, we debugged the ﬁle system using the partly debug/checked version from Windows 10.0.17134x64 bit, including their symbols. We found that the names for structures and structure ﬁeld names were not included, only the function names. Therefore, the real names of the structures are still unknown. The selections of ﬁle system instances does not handle all possible use cases. Further, the results of this study are only valid for the ﬁle system versions described. We describe only ReFS v1.2 and ReFS v3.2 in this paper. A driver is a software running within the kernel, and as all software it may be updated to newer versions. Microsoft may develop new features that may change the structures described in this paper. During the

R. Nordvik et al. / Digital Investigation 30 (2019) 127e147 129 ﬁnalizing of this paper we found that the latest ReFS version was ReFS v3.4 (Unknown, 2019). We were able to manually parse the ReFS v3.4, using the same structures as those deﬁned for ReFS v3.2. Organization of this paper The remainder of this paper describes previous work and our contributions in section 2. Then we describe the main methods we have used to reverse engineer ReFS in section 3. Our results for ReFS v1.2 are described in section 4, and the results for ReFS v3.2 are described in section 5. In section 6 we discuss the results, and ﬁnally in section 7 we summarize and describe further work. Literature review Little peer-reviewed research on ReFS has been published, and therefore we will describe similar research performed on other popular ﬁle systems in addition to previous work on ReFS. This is relevant because there are similarities between ﬁle systems. Jeff Ham (Hamm, 2009) was the ﬁrst to publish documentation about the ExFAT ﬁle system structures, and this work helped practitioners to understand and analyze this ﬁle system. The sys- tem is similar to the old FAT systems, but each ﬁle has multiple directory entries which describe metadata about ﬁles. In addition there is a ﬁle allocation table which is used mainly for fragmented ﬁles, and a bitmap ﬁle for describing which cluster (block) is allo- cated. Vandermeer et al. (2018) continued research of ExFAT and were able to separate deleted entries from renamed entries by correlation of the FAT table, the bitmap ﬁle and the directory entries. Brian Carrier (2005) has documented the NTFS ﬁle system, a work based on the reverse engineering performed by the Linux- NTFS Project. In NTFS everything is a ﬁle, and one of the most important system ﬁles with forensic value is the master ﬁle table ($MFT). All ﬁles have at least one entry, even the $MFT itself. NTFS is the main ﬁle system on Windows, and is still used on Windows system volumes including Windows 10. Microsoft (2018a) has built the new Resilient File System (ReFS) on the basis of NTFS. However, instead of using a master ﬁle table, they now use a single Bþ tree where metadata, bitmaps (alloca- tors), ﬁles and folders can be found. Unfortunately, the inner structures are not documented. Metz (2013) has partly described some of the structures within the ReFS ﬁle system, we assume it is for ReFS v1.1 and 1.2, however, the results are preliminary. He identiﬁed structures that he referred to as Object identiﬁers at levels 0,1,2 and 3. These object identiﬁers are metadata structures that are 16 KiB in size. We have named level 0 as the superblock, level 2 as $Object_tree and level 3 as directories which can contain ﬁles and subdirectories. level 1 as checkpoint, Head (2015) has compared FAT32, NTFS and ReFS and he found that the File System Recognition Structure was used in the ReFS volume boot record. Head also described that there is no $MFT in ReFS, and there are no instances of FILE0 or FILE to indicate any MFT records. Further, Head found attribute style entries, and a $I30 in- dex attribute in every folder. Head also describes that ReFS ﬁle content is not saved resident within an attribute. Head also shows that a previous name of a renamed folder was found in an earlier 16 KiB metadata block. Gudadhe et al. (2015) describe some of the features that ReFS has and describe that ReFS uses Bþ trees. Further they describe that a checksum is always used for preserving the integrity of metadata, and that a checksum for preserving ﬁle content can be enabled per ﬁle, directory, or volume. They do not describe structures or arti- facts that might be important for digital forensic investigation. Ballenthin has published information about ReFS in memory structures (Ballenthin, 2018a) and ReFS on disk structures (Ballenthin, 2018b). Our work started on the basis of this work. Georges (2018) has published a masters thesis about the reverse engineering of ReFS v1.2, and his interpretation is at a low level. We scrutinize his work, and improve it. Georges describes that he was unable to document the structures of ReFS v3.2, therefore, we continue the reverse engineering of ReFS v3.2. Paragon Software (Paragon Software, 2019a) was the ﬁrst to release a ReFS driver for Linux, and it supports ReFS v1.x and ReFS v3.x. They have not released the source code for the driver, and customers need to contact them in order to get information about this driver. They have also included support for ReFS in their digital forensic tool (Paragon Software, 2019b). Brian Carrier (2005) has also documented Ext2 and Ext3, which use superblocks and group descriptors for ﬁle system layout, and inodes for ﬁle metadata. Kevin Fairbanks (2012) has documented Ext4, which is similar to Ext2 and Ext3, but Ext4 has additional features. Hansen and Toolan (2017) reverse engineered the APFS ﬁle system, which enabled investigators to analyse iOS and Mac de- vices. In 2017, none of the commercial digital forensic tools had support for APFS. APFS also uses inodes for describing metadata about ﬁles. The APFS ﬁle system uses Bþ trees extensively. Plum and Dewald (2018) continued the work where they pro- posed novel methods for ﬁle recovery in APFS. They utilize known structures in order to recover ﬁles. In October 2018 Apple released the APFS speciﬁcations which show the actual structures and their meaning (Apple, 2018a), which also could be used for digital forensic purposes. Apple has also published technical information about the HFS þ ﬁle system, which also uses Bþ trees (Apple, 2018b). Stahlberg et al. (2007) describe the threat of privacy when using database systems that utilize Bþ trees. They propose a system that overwrites obsolete data when the Bþ tree is balanced and still in memory. This is relevant for this paper because this gave us the hypothesis that Bþ tree balancing in ReFS will leave remnants of conﬁdential information. Method We used reverse engineering as a method for ﬁnding the structures of ReFS. Initially we used the diskpart command or the format command in Windows 10x64 (ver 10.0.14393) to format different partitions with ReFS ﬁle systems on one disk. We tried to use different cluster sizes when formatting ReFS partitions, how- ever this was not possible for ReFS v1.2. We also created ReFS volumes where we added both small ﬁles and large ﬁles. In order to enable formatting of ReFS v1.2 we added a registry hack which allows formatting ReFS over non-mirrored volumes (Winaero, 2018). In Windows 10 Pro we were able to format ReFS v3.2 without any hack. Even when the registry hack was enabled, we were not able to format USB thumb drives using ReFS. We have observed that Microsoft has removed the option to format ReFS in Windows v 10.0.17134x64 Pro, and the previous Registry hacks do not work. We are still able to mount with read and write support in this version of Windows 10. We can still use one of the Windows Server editions to format a ReFS volume, and they can be attached to Windows 10, which automatically update the volume to the latest ReFS version. We are not sure why Microsoft has removed the support for formatting ReFS volumes in Windows 10 Pro. We used FTK Imager or ewfacquire to create forensic images of the disks, and mounted these forensic images using ewfmount in Linux. In order to ﬁnd the volume boot records we parsed the MBR or

130 R. Nordvik et al. / Digital Investigation 30 (2019) 127e147 the GPT using a hex viewer/editor. Then we skipped to the sector location where the volume boot record starts in order to interpret the volume (Carrier, 2005). This start sector location is always important when investigating a disk image with multiple ReFS volumes, because the ReFS ﬁle system volume uses pointer offsets that are relative to the start of the volume boot record. Comparing to other solutions When we started this research only EnCase v7 or newer had support for parsing ReFS v1.2, but no tool was able to parse ReFS v3.2. We used information we observed when opening our forensic images in EnCase v7, and have tried to use the same names on system ﬁles as EnCase used. ReFS volume boot record Using knowledge about the NTFS volume boot record, we assumed ReFS should contain information such as sector size, cluster size, volume serial number, the location of the metadata structure for ﬁles ($MFT), etc. Since the command line tool fsutil can yield important properties for the ﬁle system, we used this tool to verify our interpretation of the ﬁelds found in the VBR. Bþ tree We knew from the sparse documentation from Microsoft that they extensively used Bþ trees in ReFS (Microsoft, 2018a). Microsoft describes that one single Bþ tree structure was used that could include other Bþ tree structures (Microsoft, 2018a). Therefore, we started searching for blocks of data that could have a typical node structure. Then we tried to identify the entry point that gives access to the top node of the Bþ tree. When analyzing the Bþ tree structures we used knowledge about the HFS þ ﬁle system (Apple, 2018b). We knew from Stahlberg et al. (2007) that Bþ trees in database systems could be a threat to privacy, because balancing Bþ trees without wiping the content might leave remnants. For ReFS rem- nants of previous metadata records, for example records containing attributes, could have an evidentiary value for digital forensic investigation. Therefore, we tried to identify if attributes are intact even when they are not pointed to by pointers in the pointer area of a node. Keywords or signatures When we found different keywords or signatures, we searched for information about these to see if they could be connected to known structures. In memory structures Often structures in memory are saved directly to disk. We used previous work on ReFS memory structures in order to see if they are also found on disk (Ballenthin, 2018a). We did not perform any in depth kernel debugging or memory analysis, but rather used structures from previous work. The main reason for this was that we were not able to get hold of a checked/debug build of Windows 10. When using partly checked Windows 10, we did not get access to private symbols for the refs.sys driver. We did, however, list public function names that were available in the partially checked Windows 10 version. Known structures Automation Manually performing every test in a hex viewer is time consuming, and therefore we created a prototype tool to parse the structures found. This tool was used in our testing of ReFS v1.2. We have made the tool available as open source for other researchers to validate our work. Experiments When we reverse engineered ReFS v1.2, we used several ex- periments comparing different states of sectors within the ﬁle system, and we started by trying to understand the ﬁrst sector of the ﬁle system, the volume boot record. Describing all these ex- periments is beyond the scope of this paper. Based on observations, we deﬁned research hypotheses that could explain what we observed. Then we performed new experiments trying to falsify the null hypothesis, in order to indirectly get support for our main hypothesis about the meaning of a ﬁeld. For example one ﬁeld that was unknown was the two bytes from 0x28 in the VBR. We observed the values 0x0102. When the ReFS ﬁle system was upgraded to version 3.2 we saw that the value was changed to 0x0302. Therefore, we deﬁned the H1 that this was the ﬁeld for the ﬁle system major and minor version. We tested by formatting new instances of the ﬁle system with the old and the new version several times, and always the ﬁelds were corre- sponding to the version of the ﬁle system. We also identiﬁed that the ﬁle system was automatically updated after Microsoft released a new version of the driver. When we used fsutil to verify the ﬁle system version, it always corresponded to the value found in the VBR. We had to reject our null hypothesis H0 that the changes of these values was only a result by chance alone. We did not observe once that the null hypothesis H0 was true. Therefore, the alternate hypothesis H1 was indirectly supported. Similar experiments were performed for the other values in the VBR, and values found in other structures. After the reverse engineering of the ReFS ﬁle system, we per- formed a number of experiments in order to test if ReFS is resilient to metadata manipulation. First we tested the tool SetMace (Schicht, 2014) to see if it succeeds in changing the timestamp of a ﬁle located on a ReFS volume. We also tested to manually change the timestamp in a FNA ﬁle attribute. We also checked if there are remnants of attributes not currently in use by the system, and if it is possible to recover data based on information found in these remnants. Results - ReFS v1.2 We used our knowledge of known structures and compared them to patterns discovered in hexdumps in order to give them meaning. Whenever a structure was interpreted, multiple tests were performed in order to try to falsify our interpretation (Popper, 1953). We also compared our observations with the ReFS structures on disk that Ballenthin has published (Ballenthin, 2018b), which also gave us an indication of searching for the entry block 0x1E and the standard block size of an entry block. In this section we will describe the structures necessary to manually parse the ReFS v1.2 ﬁle system. The forensic container ﬁle refs-v1_2.E01, available at Mendeley (Nordvik, 2019), allows the reader to follow our examples. These structures are the result of performing reverse engineering on the ﬁle system and testing forensic image containers containing ReFS ﬁle systems. We present the results as a guided tour through the structures necessary to interpret in order to ﬁnd the metdatata, ﬁles and their contents. We will start by describing our results for ReFS v1.2. The ﬁrst

R. Nordvik et al. / Digital Investigation 30 (2019) 127e147 131 Table 1 Structure of the volume boot record. Length Description Offset 0x00 0x03 0x0B 0x10 0x14 0x16 0x18 0x20 0x24 0x28 0x29 0x2A 0x38 3 8 5 4 2 2 8 4 4 1 1 14 8 Jmp (Jump instructions) FSName MustBeZero Identiﬁer Length (of FSRS) Checksum (of FSRS) Sectors in volume Bytes per sector Sectors per cluster File system major version File system minor version Unknown Volume Serial Number Description Entry Block number Unknown Unknown Node ID Unknown Unknown Fig. 1. File system recognition structure. thing to note is that ReFS v1.2 always has 64 KiB clusters. The format tools in Windows 10 did not support formatting ReFS v1.2 with cluster sizes other than 64 KiB, even though the documenta- tion (Microsoft, 2018c) claims support for both 4 KiB (default) and 64 KiB cluster sizes. Our observations show that ReFS v3.2 corre- sponds to the documentation in (Microsoft, 2018c). ReFS volume boot record For ReFS, the volume boot record is in the ﬁrst sector of the ﬁle system volume, as it is for FAT32 and NTFS (Carrier, 2005). For FAT32 and NTFS the ﬁrst 3 bytes are jump instructions to the boot code [5, p. 254]. However, since ReFS v1.2 and ReFS v3.2 can not be booted, the 3 ﬁrst bytes are just zeros. We also found a C structure from Microsoft (Computing, 2018) deﬁning the ﬁelds of the ﬁle system recognition structure (FSRS) as can be seen in Fig. 11. Therefore, we got a kick start in interpreting the ReFS VBR. The ﬁrst 6 ﬁelds in Table 1 were found by using Fig. 1. When interpreting the length of the types, we assume a Windows OS. The names used for the FSRS structure are the names from the developers of ReFS, while the names from byte offset 0x18 are given based on our experiments. Entry block During our reverse engineering we found blocks of 16 KiB and blocks of 64 KiB. The blocks of 16 KiB were used for metadata and system ﬁles. However, for data streams 64 KiB allocation blocks were used. We named the 16 KiB blocks, entry blocks. The ﬁrst entry blocks we identiﬁed were ﬁle system metadata blocks with pointers to other entry blocks, and ﬁnally to the ﬁrst entry block containing a Bþ tree. We will describe how we found these entry blocks in section 4.4. Entry blocks in the Bþ tree include nodes that followed a typical Bþ tree pattern. We found a similar structure as used by other ﬁle systems that use Bþ trees. We identiﬁed that all these 16 KiB blocks started with a descriptor for the block. The entry block includes a descriptor (size 0x30) and can include one or more nodes as shown in Fig. 2. When using the structure tables presented in this paper, the E offset means a byte offset relative to the entry block, while the R offset means a relative offset to the actual structure. Whenever the E and R offsets are equal, we will only show the E offset. The ﬁrst 8 byte ﬁeld found at offset 0x0 of the entry block descriptor contains its entry block number. We also found at offset 0x18 a ﬁeld describing the node identiﬁer (Node ID). However, nodes that contain metadata will typically have the value 0 for the node id. We found just one or two nodes in an entry block, but it could be more. Each node can have one or more records. A record contains the sub entries of a node. A node describing a directory will contain 1 We added comments to make the structure easier to read for those not familiar with C structures. Table 2 Structure of the entry block descriptor. E offset 0x00 0x08 0x10 0x18 0x20 0x28 Length 0x8 0x08 0x08 0x08 0x08 0x08 records of ﬁles and sub directories. There will be more than one record per ﬁle. In NTFS each ﬁle has at least one MFT record, which consists of a number of attributes, while in ReFS each of the attri- butes are contained within records in directory nodes. A ﬁle's standard information attribute is often more than 1000 bytes, which means there are not many ﬁles required until the entry block node is full. If an entry block contains a node that runs out of space for new records, then the Bþ tree system will utilize a new entry block which consists of a node that has extent records. Extents make it possible for a node to extend its capacity by including re- cords to other entry blocks, and adding more nodes, which for instance allows for more ﬁles within a directory node. In this case record 1 will contain an extent pointer to the entry blocks con- taining the existing node that is running out of space, and record 2 will have another extent pointer that points to a new entry block where the new records can be stored in a new node. We have not tested if the records could be reorganized between the two nodes. The order of records within a node is decided by the order of pointers in the pointer area. This means that records can appear in any order in the hex dump. We detected the extents records when we were experimenting with different numbers of ﬁles in the same directory. Superblock At ﬁrst we had only identiﬁed this superblock as an entry block that points to $Tree_Control. When we reversed engineered ReFS v3.2, we found the string SUPB in the entry block descriptor, which we believe is an abbreviation for superblock. From other ﬁle sys- tems, such as ext4, we know the superblock is similar to the volume boot record. We were experimenting by trying to ﬁnd pointers to the structures within the volume boot record, however we were looking in the wrong place. Microsoft has included these pointers in these superblock entries. We ﬁnd it strange that they did not included all the information found in the VBR within the superblock.

132 R. Nordvik et al. / Digital Investigation 30 (2019) 127e147 Fig. 2. Standard structures used by ReFS. Finding the Bþ tree check point In order to ﬁnd the Bþ tree check point, the root of the Bþ tree, we ﬁrst need to ﬁnd the superblock, then we follow the pointer to the check point. In this section we will try to manually ﬁnd the starting node for the ReFS Bþ tree. EnCase had a system ﬁle named $Tree_Control, which we believe is a check point. In ReFS v3.2 Microsoft includes the string CHKP in the entry block descriptor for $Tree_Control. Using the sector offset to this structure, we found the location where this structure started. However, Guidance Software/Open Text does not explain how to ﬁnd the structure. In order to ﬁnd out how, we searched for the values in the 0x1E entry block, the superblock, that could point to the check point. Assuming every entry block was 16 Kib, we could ﬁnd the entry block (EB) offset number of the checkpoint $Tree_Control by multiplying the sector offset with 512 (sector size), and dividing it by 16 KiB (entry block size). Since the EB offset is relative to the start of the VBR, we subtract the VBR sector. The equation used to ﬁnd the EB offset is shown in Equation (1). EB offset¼ EB sector*Sector size EB size VBR sector (1) Converting the value to a hex value, made it easy to ﬁnd the same value in the superblock. By searching for this EB offset hex value (using Big Endian), we always found the superblock in entry block 0x1E. However, it was not just one superblock. We found three superblocks that had records with pointers to $Tree_Control. One of these three superblocks was always found in entry block 0x1E, and another one was found in the third last entry block in the volume. In addition we found an extra superblock backup in the entry block after the second one. When using ewf- mount (in order to mount E01 ﬁles) and tools like dd and xxd (a hex viewer), we could show the content of the 0x1E super block. The command used is shown in Listing 1. The start of the VBR volume is at sector 0x800 on the test image, and 0x1E is the entry block we would like to show. In Listing 1 we have used a block size of 512 bytes, and a count of 1 block, which will only show 512 bytes of output. However, in the ﬁgures showing hex dumps, we have not included all 512 bytes to make it more understandable. In Equation (2) we show how we compute the sector offset to the sector start of the entry block, which we use in the skip option of the dd command. EB sector¼ VBR sector þ EB number*EB size Sector size (2) Listing 1: Command to show the superblock at 0x1E. Both the 0x1E superblock and $Tree_Control check point have a special structure. Table 3 shows the structure of these in order to parse the superblock (entry point), and an extract from the hex dump is shown in Fig. 3. In the superblock 0x1E at offset 0x50 (4 bytes in length) the value 0xA0 was found, which is the byte offset to where we ﬁnd the ﬁrst entry block pointer. At the entry block byte offset 0xA0 (8 bytes in length) we ﬁnd the entry block pointer to $Tree_Control. There is another pointer offset in 0xA8 (8 bytes in length), and we assume this is a pointer to the backup $Tree_Control. The highlighted value found in the hex dump shown in Fig. 3 is 0x1471 (LE) for the ﬁrst pointer and 0xF3F7 (LE) for the second pointer. To show the content of entry block 0x1471, we use the same command as shown in Listing 1, but we change 0x1E with the value 0x1471. This will change location to $Tree_Control, which is the checkpoint for our Bþ tree structure. In this section we have shown how to navigate to the entry block that controls the Bþ tree. This entry block is a check point to the Bþ tree. This entry block is named $Tree_Control because it can be used to navigate the Bþ tree. Top of the node tree The entry block containing the top node of the Bþ tree is called $Tree_Control by EnCase. From this entry block we can ﬁnd Table 3 Structure of the superblock. E Offset R offset Length Description 0x30 0x40 0x50 0x54 0x58 0x5C 0x00 0x10 0x20 0x24 0x28 0x2C 0x10 0x10 0x04 0x04 0x04 0x04 GUID Unknown Offset to ﬁrst entry block pointer Amount of entry block pointers Offset to ﬁrst record Length of record each record

R. Nordvik et al. / Digital Investigation 30 (2019) 127e147 133 Table 4 Structure of the tree control checkpoint entry block. E Offset R offset Length Description 0x30 0x34 0x36 0x38 0x3C 0x40 0x50 0x58 0x5C 0x00 0x04 0x06 0x08 0x0C 0x10 0x20 0x28 0x2C 0x4 0x2 0x2 0x04 0x04 0x10 0x8 0x04 Var Unknown Major version Minor version Offset to ﬁrst record Size of a record Unknown Unknown Amount of additional records 4 byte offset to each records Pointer offset to the $Allocator_Lrg system ﬁle can be found in the entry block at offset 0xB0, and in this example it was pointing to entry block 0x38 (LE). This system ﬁle includes a bitmap of large blocks (512 MiB). Pointer offset to the $Allocator_Med system ﬁle can be found in the entry block at offset 0xC8, and in this example it was pointing to entry block 0x20 (LE). This system ﬁle includes a bitmap of medium sized blocks (64 KiB). Pointer offset to the $Allocator_Sml system ﬁle can be found in the entry block at offset 0xE0, and in this example it was pointing to entry block 0x21 (LE). This system ﬁle includes a bitmap of small sized blocks (16 KiB). Pointer offset to the $Attribute_List system ﬁle can be found in the entry block at offset 0xF8, and in this example it was pointing to entry block 0x244 (LE). We are uncertain of the meaning of this system ﬁle. Pointer offset to the $Object system ﬁle can be found in the entry block at offset 0x110, and in this example it was pointing to entry block 0x240 (LE). This system ﬁle describes the child- parent dependencies, and can be used to rebuild the directory paths for ﬁles. Until now we have used special entry blocks (superblock and check point), but in order to parse the normal nodes we introduce the structure for the standard node descriptor in Table 5 and the standard node header in Table 6. We will use them to interpret the normal nodes which we ﬁnd in the sub nodes pointed to by $Tree_Control. We can think of $Tree_Control as the top block that maintains control of all the sub Bþ trees, or as the root of the single Bþ tree. The standard node descriptor and standard node header can be used from the level we have described as MSBþ and below in the illustration in Fig. 5. The MSBþ is a magic signature for ReFS v3.2, and includes one or more nodes. In ReFS v1.2 the magic signature is not available, but we decided to name Entry Blocks containing nodes for MSB þ anyway to make this consistent with ReFS v3.2. The abbreviation, we assume, is for Microsoft Bþ tree. In this section we have shown how to ﬁnd the starting check point, named $Tree_Control. This checkpoint makes it possible to ﬁnd all other nodes in the ﬁle system. In the illustration in Fig. 5 we have visualized the main structures of ReFS. The superblock was Table 5 Structure of the standard node descriptor. E Offset R offset Length Description 0x30 0x34 0x48 0x4A 0x50 0x54 0x00 0x04 0x18 0x1A 0x20 0x24 0x04 0x14 0x02 0x06 0x04 var Length of Node descriptor Unknown Number of extents Unknown Number of records in node Unknown Fig. 3. Hex dump of the superblock 0x1E. pointers to six different nodes that have a special purpose, and are shown by EnCase as system ﬁles. In section 4.4 we show how to ﬁnd the $Tree_Control checkpoint manually by using a hex viewer. Now we continue to interpret checkpoint node ($Tree_Control). the top level We skip the ﬁrst 0x30 bytes in Fig. 4, which is for the entry block descriptor, and focus on the checkpoint descriptor. This entry block does not contain a typical Bþ tree node, because the pointer area at the end is missing. When using Table 4 we saw, in the hex dump, in Fig. 4 that the offset to the ﬁrst record was 0x80, and this ﬁrst record was a record entry for this $Tree_Control check point. Further, we saw from the major and minor version that this was ReFS v1.2. At offset 0x80 we found a 0x18 byte record. The ﬁrst 8 bytes of this record had the value 0x1417 (LE). This was the same as the entry block number of the $Tree_Control check point as also could be seen in the ﬁrst 8 bytes of the entry block. This means that each of the records in this checkpoint started with an entry block number to where the record points. According to byte offset 0x58 the number of records was 0x06. At byte offset 0x5C we found a table of offsets, where each offset was 4 bytes and included byte offsets to each of the additional records in this entry block. In the example hex dump in Fig. 4 we found the following offset values: 0x98, 0xB0, 0xC8, 0xE0, 0xF8, 0x110. The offsets were relative to the start of the entry block. The ﬁrst record started at offset 0x98, the next at 0xB0, etc. The list below shows what we will ﬁnd in these records. Each of the records start with an entry block pointer. Pointer offset to the $Object_Tree system ﬁle can be found in entry block at offset 0x98, and in this example it was pointing to entry block 0x23F (LE). This system ﬁle includes nodes and re- cords that have information about directories, and ﬁles in sub entry blocks. Fig. 4. Hex dump of the ﬁrst part of check point $Tree_Control.

134 R. Nordvik et al. / Digital Investigation 30 (2019) 127e147 Table 6 Structure of the standard node header. E Offset R offset Length Description 0x120 0x124 0x128 0x12C 0x130 0x134 0x138 0x00 0x04 0x08 0x0C 0x10 0x14 0x18 0x04 0x04 0x04 0x04 0x04 0x04 0x08 Length of Node header Offset to next free record Free space in node Unknown Offset to ﬁrst pointer Number of pointers in node Offset to end of node found in entry block 0x1E, and we have found the checkpoint $Tree_Control. There was also another $Tree_Control which we assume is a backup. We have already found the pointers to the child nodes of $Tree_Control. Files and folders In this section we will show how the Bþ tree structures can be parsed to ﬁnd directories and ﬁles. We start by looking at the entry block for $Object_Tree which, in this example, was located at 0x23F. We used the command described in Listing 1, but changed 0x1E with the entry block we wished to investigate. The hex dump of $Object_Tree is shown in Fig. 6. We skipped the entry block descriptor, which was 0x30 bytes. Here we found the node descriptor that started with a length ﬁeld, which had the value 0xF0. We skipped the node descriptor to make this section easier to read. From byte offset 0x120 we found the node header which included information about the length of the node header (0x20 bytes), offset to the next free record, free space in node and offset to ﬁrst record pointer (0x354C (LE)) This is a relative offset from the start of the node header. To ﬁnd the entry block byte offset we added 0x120. Therefore, the pointer area started at offset 0x366C. We also found the number of pointers (this should be equal to the amount of allocated records), which was 9. These pointers are 4 bytes in size and pointed to records in this node, and the offset was relative to the node header (we needed to add 0x120 in this case). In the node header we also found an offset to the end of the node. We skipped to the ﬁrst record by moving to offset 0x20 þ 0x120 ¼ 0x140. This record is followed by the fourth record 0x70 þ 0x120 ¼ 0x190. We could have used the pointers from the pointer area to ﬁnd all allocated records, which we actually should do for any node. However, we only showed two of the records here. It is the pointers that decide the order of records, we can not assume that all records are in sequence or that all records are in use. At offset 0x140 in Fig. 6 we found the start of the records area. Each record was 0x50 bytes. All the records were related to di- rectories, either directories the user created or system created di- rectories. These records could be analyzed using Table 7. The ﬁrst record was a pointer to $Volume (node id 0x500), which contained information about the volume and also included a timestamp for volume creation, which could have a value for the investigation. The fourth record contained a pointer for the root directory (node id 0x600). At this level in the Bþ tree we did not see the names of the nodes, but most of the names could be found within the entry blocks pointed to by the records at this level. We observed that records with node IDs from 0x700 and above were normal di- rectories either created by the user or system created directories. We have illustrated this in Fig. 7. All the records with a node id within the 0x500 range were metadata directories, and not shown by File Explorer when parsing the root directory. All node IDs of root sub-directories were in the 0x700 range. Volume information We used Table 7 to interpret the records in the entry block $Object_Tree. The record for the 0x500 node id was found in entry Fig. 5. Illustration of the ReFS structure.

分享到：

赞收藏

资料库

ReFS文件系统逆向工程.pdf

相关推荐

操作系统

热门标签

最新资料