Friday, September 26, 2014

Back to basics - NTFS Data Runs

This is not really the basics, but an advanced knowledge from a technical point-of-view.  Since it is a published process explained in great details, it becomes basic knowledge.  Those in non-scientific fields are not used to calculating and verifying steps and procedures and that basic premise moves the field of digital forensics into an educational definition of STEM fields.  STEM stands for Science Technology Engineering and Math.

This post will discuss the complex process and understanding of data storage in the New Technology File System ( NTFS ) specifically the $80 attribute's lesser understood structure of it's data runs.


This image is from the book "Guide to Computer Forensics and Investigations", September 28, 2009, by Bill Nelson (Author), Amelia Phillips (Author), Christopher Steuart (Author) 


Thus, based on the image above, the data run can be extracted and analyzed for the actual data cluster locations.


If you want to create the same analysis and documentation of the data clusters, here is the actual string of the data runs: 32B1078C8C0022630795ED32BC063C360122350302FA210B6CFE229E01E904

The example above contains 6830 clusters for the file with positive and negative offsets to cluster runs.  You can not get any more complex than this one.  If you understand this example, you understand how NTFS saves non-resident files.  If you are into programming, I would suggest you do this analysis by hand or with a simple application like I did here with Excel before attempting to write a program in a lower level programming language.

Good luck practicing and getting better in understanding technology at a deeper level.

Sunday, September 21, 2014

Drive size IEC vs. ISU

What is the big deal?  The size of the drive is reported by the forensic tool and I just need to bookmark it or document it.  Forensic tools are tested and vetted in courts, so I don't need to worry about them.  Right?  The answer is not that simple since 1998.  In 1998 the International Electrotechnical Commission (IEC) decided to resolve the old standing conflict of orders of magnitudes like kilo or mega that are used to represent a Base-10 prefix and not a Base-2 prefix.  Thus, a 1000m run can be referred to as a 1Km while a 1024 Byte memory block is referred to as 1KiB, 1 kibibyte.

The calculation does not change, only the unit of measure reflects the binary nature the order of magnitude.


There is not much focus on this change and many experts might not even know about it, but it is annoying if the tools we use do not confirm to this changed standard.  As long as we can refer to the byte value, there is no problem since only the prefix that needs to be examined for the correct spelling.

I have seen the hard drive manufacturers following this new standard for years now while the software vendors lagging behind.

http://www.seagate.com/www-content/product-content/nas-fam/nas-hdd/en-us/docs/100724684.pdf
i.e.
7814037168 * 512 = 4000787030016 / 1000000000 = 4TB.

So, what do we see in forensic tools, in operating systems, and in generic tools?  Well, it depends.

AccessData FTK Imager 3.1.3 calculates the drive sizes for an easy and quick reference.  We can also easily find the drive sector sizes in this tool.

Physicaldrive0 Sector Count = 103,824      =  53157888 bytes
Physicaldrive1 Sector Count = 18,874,368 =  9663676416 bytes
Physicaldrive2 Sector Count = 20,480        =  10485760 bytes
Physicaldrive3 Sector Count = 208,896      =  106954752 bytes
Physicaldrive4 Sector Count = 31,457,280 =  16106127360 bytes

Reference calculations:
Physicaldrive0 Size = 50.69MiB  or  53.15MB
Physicaldrive1 Size = 9GiB          or  9.66 GB
Physicaldrive2 Size = 10MiB       or  10.48MB
Physicaldrive3 Size = 102MiB     or  106.95MB
Physicaldrive4 Size = 15GiB        or  16.1GB


Sample calculation based on PhysicalDrive4
Total Sectors
31,457,280
Bytes
16106127360
International Electrotechnical Commission (IEC)
International System of Units System ( Metric )
Kibibytes
KiB
15728640
kilobyte
kB
16106127
megibyte
MiB
15360
megabyte
MB
16106.127
gibibyte
GiB
15
gigabyte
GB
16.106127

I'm not really sure where FTK Imager got some of the values for its physical size, drive 1 seems to be in GiB, drive 2 is a mystery number, drive 3 seems to be in MB, and drive in GB.

Encase_forensic_imager_7.06 also shows the cluster count and the drive sizes in an easy format.  It also lists the sizes in a Base-2 format while using the Base-10 unit of measures, but it is more consistent than FTK Imager.


Windows Management Instrumentation Command-line (WMIC) shows the physical devices, but the size and total sectors are not the physical size values.

Windows shows the physical sizes, but not even close to the actual size of the devices, but we know from the MBR master partition table calculations that partition size calculations are based on Base-2 calculations.



Example value calcualted from MBR of Disk 1, first partition entry.
00200300 = 0x32000 = 204,800 sectors in partition, thus the value of the number of bytes in the partition is 204800*512 =  104857600 bytes / ( 1024 * 1024 * 1024 ) = 100MiB

So, Microsoft is using the wrong measures of unit to display storage device size information. Disk 2 and Disk 3 size values are way off from either of the calculated values, but those that are the right values, those are calculated by the Base-2 conversion method, so the unit of measures should be MiB and GiB not MB and GB.

Linux on the other hand is using the Base-10 conversion for the correct unit of measures in MB and GB.

/dev/hda size was an anomaly and I was not able to find a suitable explanation why the value was off, but it might have had something to do with a virtual IDE hard drive.  I have verified the existence of sector 104447 using dcfldd and xxd  ( dcfldd if/dev/hda bs=512 skip=104447|xxd ).  Even though all other tools showed only 103,824 sectors on the drive, I did locate 104448.


/dev/sda->18874368 sectors consistent with other Windows tools, but the capacity is correctly calculated in MB to 9663MB or 9.66GB.

/dev/sdb->20480 sectors consistent with other Windows tools, but the capacity is correctly calculated in MB to 10MB.

/dev/sdc->208896 sectors consistent with other Windows tools, but the capacity is correctly calculated in MB to 106MB.

/dev/sdd->31457280 sectors consistent with other Windows tools, but the capacity is correctly calculated in GB to 16.1GB.

So, my conclusion is that Windows based software vendors did not make the adjustment in the last 16 years to label their storage device sizes properly.  The most surprising are the forensic tool vendors not seeing the need to label properly or show the proper capacity of the drives.  As long as the size is referred to in bytes, the values are correct and it might be needed to start referring to evidence size in bytes to avoid confusion.

Sunday, September 14, 2014

Back to basics - Operator Precedence

Why do we need to test forensic tools why the programmers compiled the code without any errors?  The concept of logical errors and algorithm implementations can not be detected by compiling code, they can be found by continuous testing with the right input and output needs to be monitored for the correct values.  We need to avoid garbage in, garbage out conditions for reliable tool testings.  One of the implementation issues that can be detected by testing is the operator precedence.

In this presentation, I wanted to talk about the order of operations that are ignored in many cases.  Order of operations are used by systems to evaluate the value of an expression by parsing the expression by operator precedence as defined for the given system.

Analyzing code requires not just pattern recognition to specific code, but also the recognition of logical errors that might have been exploited.

In this chart, I give an example of the flow of operator evaluation, but the accompanying video will give a more in-depth explanation.  http://youtu.be/7EQ5YZOU7tw

             You can practice operator precedence on the command line by setting variables 
             by arithmetic operations.                                                                                         
             C:\>set /a test=(9*9)*4/(9*(5*5*5)-(14-6))                                                               
             0                                                                                                                              

This operation can also represented in postfix notation and used with DC command line utility.  The above expression in postfix notation is 9 9 * 4 * 9 5 5 * 5 * 14 * 6 - / 

 Download UnxUtils.zip from
https://sourceforge.net/projects/unxutils/
Extract files from UnxUtils.zip to c:\temp
Change directory to
cd c:\temp\UnxUtils\usr\local\wbin
Type dc to start
dc

You will only see a blinking cursor, but that is your prompt and you can just type values.
Type
34
2
/
p

p is to print the result to the screen.  If you are done using it, type q to exit


 c:\Users\<UID>\Downloads\UnxUtils\usr\local\wbin>dc
9
9
*
4
*
9
5
5
*
5
*
14
*
6
-
/
p
0
q

c:\Users\<UID>\Downloads\UnxUtils\usr\local\wbin>


 As you can see, DC only works as integer operations, so the result will be screwed, but is should still give you a good idea how protfix notation works.

Here is an online converter to make the conversion easier, but only use it to verify your convernsions otherwise you will never learn how to do it on your own.  It is very important for you to learn this in order to understand queue and stack operations.

http://www.mathblog.dk/tools/infix-postfix-converter/ 

Saturday, September 6, 2014

Back to Basics - FAT File/Folder Structure

Have you ever wondered how File Allocation Table ( FAT ) maintains the file system structure?  Many forensic books and certification exams discuss the structure of the file system, but I yet to see discussion on how the file system links the directory structure together.  In this post, I wanted to examine and model the links between files and folders.

Many books discuss the concept that we can navigate the file system by running cd . or cd .. to change directory to the current directory or to the parent of the current directory.  The . and .. files turned out to be very important to understand how FAT maintains the directory structure.

Each directory maintains its own Directory Entry ( DE ) in a unique cluster where the root DE is considered as the cluster 0.  Cluster 1 was never referenced.  Referring to the FAT table, we know that FAT signature in FAT16 is F8FF and another FFFF that refers to the DE.  Thus, F8FF is cluster reference 0 while FFFF following F8FF should be the reference to cluster 1.  Thus, the first usable cluster for files is cluster 2.  

I have created test case on a thumb drive using the following structure:

D:\file1.txt
D:\folder1
         ->file2.txt
         ->folder1-1
                 ->file3.txt

I have traced the file system structures to their starting and ending sector numbers to find a pattern that lead me to understand how the files are stored.


The chart of sector numbers was used to develop a model of file structure on storage device.


The model can be verified by examining the actual structure of the DEs to establish the links between the DE entries.


A simplified view of relevant cluster number designations shows the repeating pattern of folders pointing to themselves by referring to the cluster number where the DE resides holding the DE entry for the file and the .. file entry is referring to the parent's DE cluster.


In some cases, we can examine the actual data structures on disk to reveal patterns that can be used to understand how technology works.  The steps, documentation, and methodology are all crucial skills for any beginning forensic examiner or analyst while forensic technicians would not have to know technology at this level.  Only education and hard work can develop a forensic analyst for a higher level of understanding of data structures while training of forensic technicians will never be able to develop professionals capable of this type of skills.  I hope, the type of documents will help even technicians understand that there is more to learn about technology than pushing buttons and reading output from invalidated tools.