Saturday, April 30, 2016

ListIterator in Java as It Relates to File Recovery


Understanding lists and especially linked lists are not just essential for a flexible array operations, but also key concept in data structures.  The list iterator class is a useful class to traverse the list contents in an easy an convenient way.

I can just imagine Bill Gates learning about data structures and linked lists in college and thinking to himself that this kind of structure can be used to catalog file chunks and keep all of the chunks in a simple object array. 

In the technical fields, we can find linked lists in the implementation of file systems like File Allocation Table ( FAT ).  The Directory entry is an array of pointers pointing to the first node in the list of nodes connecting all clusters that belong to a single file.  The end of the file is marked by setting of the next pointer to null or FFFF in a case of FAT16.  Looking at the FAT table, you would expect to see these values in little endian format, 2001 2101 FFFF written at cluster numbers 287, 288,289 respectively.


In this case, you can see the file named _ESTFIL.TXT is the deleted file that we'll try to reconstruct.  We know its size is 1443 bytes and the start of the linked list that contains the file contents is starting at cluster 287.  The underscore character is replaced by the σ ( sigma or 0xE5 in Base-16 value ) character when the file is deleted.  Since we know the file size and since this is a floppy drive with cluster size of 1 or 512 bytes, we can easily calculate that it will take 3 clusters to store this file, thus 3 clusters need to be located in order to recover it.  Thus, we'll have 512bytes in the first cluster, 512 bytes in the second cluster, and 419 bytes in the third cluster.  As you can see below, ( look at the yellow numbers only ) 288,289, and <EOF> are highlighted. The number of 288 is on top of cluster 287 where our file started.  So, 287 is the location of the cluster number and the number is the next pointer that points to cluster 288.  At cluster number 288, the value is 289 meaning that cluster 288 is pointing to 289.  As you can see, cluster value 289 ( see image below ) is holding the value <EOF> designating the end of the file.
java.util.LinkedList -  is a doubly-linked list implementation of the List and Deque interfaces. Implements all optional list operations, and permits all elements (including null).

java.util.List - An ordered collection (also known as a sequence). The user of this interface has precise control over where in the list each element is inserted. The user can access elements by their integer index (position in the list), and search for elements in the list.

java.util.ListIterator - An iterator for lists that allows the programmer to traverse the list in either direction, modify the list during iteration, and obtain the iterator's current position in the list.

java.util.ArrayList - Resizable-array implementation of the List interface. Implements all optional list operations, and permits all elements, including null.

Simplified UML diagram of the example code discussed in this post.



import java.util.ArrayList;
import java.util.List;
import java.util.ListIterator;
import java.util.LinkedList;


public class CarDemo{
  public static void main(String args[]){
    ListIterator<Car> litr = null;
    List<Car> cars = new ArrayList<Car>();
    cars.add(new Car("Toyota","Supra"));
    cars.add(new Car("Honda","Accord"));
    cars.add(new Car("Mazda","MPV"));
    cars.add(new Car("Porsche","Carrera"));
    cars.add(new Car("BMW","M3"));

   
    LinkedList<Car> linkedlist = new LinkedList<Car>();

         linkedlist.add(new Car("Toyota","Supra"));
         linkedlist.add(new Car("Honda","Accord"));
         linkedlist.add(new Car("Mazda","MPV"));
         linkedlist.add(new Car("Porsche","Carrera"));
         linkedlist.add(new Car("BMW","M3"));

        
    litr=cars.listIterator();

    while (litr.hasNext()) { //all of this is done on a single node
           //set a new model for the first node, next() will advance
           litr.next().setModel("Volkswagen");
          //go back to the first node and get its node and get its model, sets the position back one node
          System.out.println(litr.previous().getModel()); 
         //go forward again to the first node and get its make
         System.out.println(litr.next().getMake());
         litr.previous();      //this will cause us to be in an infinite loop
    }
  }
}


public class Car{
    private String make;
    private String model;

    public Car() {
        make="";
        model="";

    }
    public Car(String ma,String mo) {
        make=ma;
        model=mo;

    }
    public String getMake()    {return make;    }
    public String getModel()    {return model;    }
    public void setMake(String ma)    { make=ma;    }
    public void setModel(String mo)    { model=mo;    }

}

The code above will create an array of file objects of Car.


The linked list is structured very differently and it is much closer structure to the file system example we started this discussion.  Each car object added to the list has its next and prev pointers.

         linkedlist.add(new Car("Toyota","Supra"))       //first node, prev=null
         linkedlist.add(new Car("Honda","Accord"))
         linkedlist.add(new Car("Mazda","MPV"))
         linkedlist.add(new Car("Porsche","Carrera"))
         linkedlist.add(new Car("BMW","M3"))             
//last node, next=null


Object structure pointing to the first node of the linked list.


As you can see below the last pointer's next pointer is null indicating the last node and the end of the linked list.
 

Prepare to use Regular Expressions in Java



Regular expression quick reference:
  .        Wildcard: any character
  *        Repeat: zero or more occurrences of previous character or class
  ^        Line position: beginning of line
  $        Line position: end of line
  [class]  Character class: any one character in set
  [^class] Inverse class: any one character not in set
  [x-y]    Range: any characters within the specified range
  \x       Escape: literal use of metacharacter x
  \<xyz    Word position: beginning of word
  xyz\>    Word position: end of word

For full information on FINDSTR regular expressions refer to the online Command Reference.

Change to a directory where we can work with files.
C:\>cd c:\temp

Create a file that we can copy later.  The contents are not important.  ( Press Ctrl+Z to close the file )
C:\temp>copy con a.txt
hello
^Z
        1 file(s) copied.

Create a few files that we can work with.
C:\temp>for %i in ( 1 2 3 4 5 6 7 8 9 ) do copy a.txt myFile%i.txt

List the files and find a pattern to match.  Your times and dates will be different, but think about the pattern not the exact command.
C:\temp>dir
 Volume in drive C is OS
 Volume Serial Number is E828-D083

 Directory of C:\temp

11/11/2015  10:18 PM    <DIR>          .
11/11/2015  10:18 PM    <DIR>          ..
11/11/2015  10:17 PM                 7 a.txt
11/11/2015  10:17 PM                 7 myFile1.txt
11/11/2015  10:17 PM                 7 myFile2.txt
11/11/2015  10:17 PM                 7 myFile3.txt
11/11/2015  10:17 PM                 7 myFile4.txt
11/11/2015  10:17 PM                 7 myFile5.txt
11/11/2015  10:17 PM                 7 myFile6.txt
11/11/2015  10:17 PM                 7 myFile7.txt
11/11/2015  10:17 PM                 7 myFile8.txt
11/11/2015  10:17 PM                 7 myFile9.txt
              16 File(s)        992,802 bytes
               3 Dir(s)  415,352,266,752 bytes free

You can learn about the options we can use with FINDSTR utility by typing findstr /? To learn about the regular expression syntax as we pipe the DIR output as an input to FINDSTR.  In this example, we’ll match any line that contains two digitsfollowed by a forward slash.
C:\temp>dir|findstr /r [0-9][0-9]/
11/11/2015  10:18 PM    <DIR>          .
11/11/2015  10:18 PM    <DIR>          ..
11/11/2015  10:17 PM                 7 a.txt
11/11/2015  10:17 PM                 7 myFile1.txt
11/11/2015  10:17 PM                 7 myFile2.txt
11/11/2015  10:17 PM                 7 myFile3.txt
11/11/2015  10:17 PM                 7 myFile4.txt
11/11/2015  10:17 PM                 7 myFile5.txt
11/11/2015  10:17 PM                 7 myFile6.txt
11/11/2015  10:17 PM                 7 myFile7.txt
11/11/2015  10:17 PM                 7 myFile8.txt
11/11/2015  10:17 PM                 7 myFile9.txt

In this example, we’ll match an line that contains 11 followed by a forward slash.
Note: the first two digits will be different in your case, just type what you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r 11/
11/11/2015  10:18 PM    <DIR>          .
11/11/2015  10:18 PM    <DIR>          ..
11/11/2015  10:17 PM                 7 a.txt
11/11/2015  10:17 PM                 7 myFile1.txt
11/11/2015  10:17 PM                 7 myFile2.txt
11/11/2015  10:17 PM                 7 myFile3.txt
11/11/2015  10:17 PM                 7 myFile4.txt
11/11/2015  10:17 PM                 7 myFile5.txt
11/11/2015  10:17 PM                 7 myFile6.txt
11/11/2015  10:17 PM                 7 myFile7.txt
11/11/2015  10:17 PM                 7 myFile8.txt
11/11/2015  10:17 PM                 7 myFile9.txt

In this example, we’ll match any line that starts with 11 followed by a forward slash.
Note: the first two digits will be different in your case, just type what you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/
11/11/2015  10:18 PM    <DIR>          .
11/11/2015  10:18 PM    <DIR>          ..
11/11/2015  10:17 PM                 7 a.txt
11/11/2015  10:17 PM                 7 myFile1.txt
11/11/2015  10:17 PM                 7 myFile2.txt
11/11/2015  10:17 PM                 7 myFile3.txt
11/11/2015  10:17 PM                 7 myFile4.txt
11/11/2015  10:17 PM                 7 myFile5.txt
11/11/2015  10:17 PM                 7 myFile6.txt
11/11/2015  10:17 PM                 7 myFile7.txt
11/11/2015  10:17 PM                 7 myFile8.txt
11/11/2015  10:17 PM                 7 myFile9.txt

In this example, we’ll match any line that starts with 11 followed by a forward slash and any string.
Note: the first two digits will be different in your case, just type what you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/.*
11/11/2015  10:18 PM    <DIR>          .
11/11/2015  10:18 PM    <DIR>          ..
11/11/2015  10:17 PM                 7 a.txt
11/11/2015  10:17 PM                 7 myFile1.txt
11/11/2015  10:17 PM                 7 myFile2.txt
11/11/2015  10:17 PM                 7 myFile3.txt
11/11/2015  10:17 PM                 7 myFile4.txt
11/11/2015  10:17 PM                 7 myFile5.txt
11/11/2015  10:17 PM                 7 myFile6.txt
11/11/2015  10:17 PM                 7 myFile7.txt
11/11/2015  10:17 PM                 7 myFile8.txt
11/11/2015  10:17 PM                 7 myFile9.txt

In this example, we’ll match any line that starts with 11 followed by forward slash, any number of characters followed by the string File.  Notice how the directory entries and file a.txt does not show up anymore. Note: the first two digits will be different in your case, just type what you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/.*File
11/11/2015  10:17 PM                 7 myFile1.txt
11/11/2015  10:17 PM                 7 myFile2.txt
11/11/2015  10:17 PM                 7 myFile3.txt
11/11/2015  10:17 PM                 7 myFile4.txt
11/11/2015  10:17 PM                 7 myFile5.txt
11/11/2015  10:17 PM                 7 myFile6.txt
11/11/2015  10:17 PM                 7 myFile7.txt
11/11/2015  10:17 PM                 7 myFile8.txt
11/11/2015  10:17 PM                 7 myFile9.txt

In this example, we’ll match any line that starts with 11 followed by a forward slash, any number of characters, and ends with the letter t. Note: the first two digits will be different in your case, just type what you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/.*t$
11/11/2015  10:17 PM                 7 a.txt
11/11/2015  10:17 PM                 7 myFile1.txt
11/11/2015  10:17 PM                 7 myFile2.txt
11/11/2015  10:17 PM                 7 myFile3.txt
11/11/2015  10:17 PM                 7 myFile4.txt
11/11/2015  10:17 PM                 7 myFile5.txt
11/11/2015  10:17 PM                 7 myFile6.txt
11/11/2015  10:17 PM                 7 myFile7.txt
11/11/2015  10:17 PM                 7 myFile8.txt
11/11/2015  10:17 PM                 7 myFile9.txt

In this example, we’ll match any lines that starts with 11 followed by a forward slash, followed by any number of characters followed by a digit.txt. Notice the meaning of the period is escaped by the forward slash to take it as a literal character. Note: the first two digits will be different in your case, just type what you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/.*[1-9]\.txt
11/11/2015  10:17 PM                 7 myFile1.txt
11/11/2015  10:17 PM                 7 myFile2.txt
11/11/2015  10:17 PM                 7 myFile3.txt
11/11/2015  10:17 PM                 7 myFile4.txt
11/11/2015  10:17 PM                 7 myFile5.txt
11/11/2015  10:17 PM                 7 myFile6.txt
11/11/2015  10:17 PM                 7 myFile7.txt
11/11/2015  10:17 PM                 7 myFile8.txt
11/11/2015  10:17 PM                 7 myFile9.txt

Let’s change one of the files to a unique sequence of characters.
C:\temp>ren myFile4.txt myFileZ.tat

In this example, we’ll match any lines that start with 11 followed by a forward slash, any number of characters followed by a digit, followed by any number of characters, and the line will end with a letter t.
Note: the first two digits will be different in your case, just type what you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/.*[1-9].*t$
11/11/2015  10:17 PM                 7 a.txt
11/11/2015  10:17 PM                 7 myFile1.txt
11/11/2015  10:17 PM                 7 myFile2.txt
11/11/2015  10:17 PM                 7 myFile3.txt
11/11/2015  10:17 PM                 7 myFile5.txt
11/11/2015  10:17 PM                 7 myFile6.txt
11/11/2015  10:17 PM                 7 myFile7.txt
11/11/2015  10:17 PM                 7 myFile8.txt
11/11/2015  10:17 PM                 7 myFile9.txt
11/11/2015  10:17 PM                 7 myFileZ.tat

In this example, we’ll match any line that matches all the lines as before, but now we’ll end the line with at and not just a t. Note: the first two digits will be different in your case, just type what you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/.*[1-9].*at$
11/11/2015  10:17 PM                 7 myFileZ.tat

In this example, we’ll match our uniquely named file ignoring everything else.
Note: the first two digits will be different in your case, just type what you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/.*[1-9]\.[a-zA-Z].*at$

Oops, something went wrong.  Why did the previous search pattern not match our file name, but the following pattern will?
Note: the first two digits will be different in your case, just type what you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/.*[a-zA-Z].*at$
11/11/2015  10:17 PM                 7 myFileZ.tat

Write your answer here ( or the paper handed to you by your instructor ) and hand it back to your instructor.



Now, create a new file named myfile.txt using notepad and add the text (214)234-4567 to the file.  Find the text in all text files using FINDSTR utility.
C:\temp>findstr /r ^[(\[][0-9][0-9] *.txt

myfile.txt:(214)234-4567

Example Java code using regular expression to match pattern.


import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * This program tests a customer number to determine
 * whether it is in the proper format.
 */
public class SuspectNumbers  {
   public static void main(String[] args) throws IOException   {  
         File inputFile = new File("zip.txt");
         Scanner input=new Scanner(inputFile);
         boolean isEqual,isEqual2;
         Matcher m;
         String value;
         String match="75201";
         Pattern p = Pattern.compile("^75201");

      while(input.hasNext()){
        value=input.nextLine();
        if(value.startsWith(match))
              System.out.println(value);
        
        m = p.matcher(value);
        isEqual=Pattern.matches("^75201.*", value);
        if(isEqual)
              System.out.println("Regular Expression matched: "+value);
        
        isEqual2=m.find();
        if(isEqual2)
              System.out.println("Regular Expression 2 matched: "+value);
        
        if(Pattern.matches(" [0-9]{3}-[0-9]{2}-[0-9]{4} ", value))
              System.out.println("Social Security number matched: "+value);
        
        if(Pattern.matches("[\\[\\(][0-9]{3}[\\]\\)][0-9]{3}-[0-9]{4}.*", value))
              System.out.println("Phone number matched: "+value);
        
        if(Pattern.matches("[\\[\\(][0-9]{3}[\\]\\)][0-9]{3}-[0-9]{3}[a-zA-Z].*", value))
             System.out.println("Fake phone number matched: "+value);
      }
   }
}

Based on the code above, what sample value would you place in the input file to match fake phone numbers?




Appendix A: How to use regular expressions in MS Word


Save a Word Document with a phone number in it in a format (214)345-3456. Click on Find->Advanced Find …., check Use wildcards and type the following into the “Find what:”  <[0-9]{3}\)[0-9]{3}-[0-9]{4}>

      

Appendix A: How to use regular expressions in Notepad++


Save a Document with a phone number in it like [124]123-3456. Click on Search, check “Regular expression” and type the following into the “Find what:”  \[[0-9]{3}\][0-9]{3}-[0-9]{4}>


You can generate simple text files with patterns that you can search for and match with regular expressions.


75201    Dallas    TX
75202    Dallas    TX
75203    Dallas    TX
75204    Dallas    TX
75205    Dallas    TX
75206    Dallas    TX
 75378    Dallas    TX
(214)234-456I                                     // notice 456I is not ending with a one, it is the capital letter 'i'
75379    Dallas    TX
75047    Garland    TX
75048    Garland    TX
[123]345_4567                                  //phone numbers can be in different format
75049    Garland    TX
75080    Richardson    TX
75081    Richardson    TX
75082    Richardson    TX
76016    Arlington    TX
76017    Arlington    TX
234-05-3456                                     //social security numbers should be easy to match
76018    Arlington    TX
76019    Arlington    TX