Saturday, April 30, 2016

Prepare to use Regular Expressions in Java



Regular expression quick reference:
  .        Wildcard: any character
  *        Repeat: zero or more occurrences of previous character or class
  ^        Line position: beginning of line
  $        Line position: end of line
  [class]  Character class: any one character in set
  [^class] Inverse class: any one character not in set
  [x-y]    Range: any characters within the specified range
  \x       Escape: literal use of metacharacter x
  \<xyz    Word position: beginning of word
  xyz\>    Word position: end of word

For full information on FINDSTR regular expressions refer to the online Command Reference.

Change to a directory where we can work with files.
C:\>cd c:\temp

Create a file that we can copy later.  The contents are not important.  ( Press Ctrl+Z to close the file )
C:\temp>copy con a.txt
hello
^Z
        1 file(s) copied.

Create a few files that we can work with.
C:\temp>for %i in ( 1 2 3 4 5 6 7 8 9 ) do copy a.txt myFile%i.txt

List the files and find a pattern to match.  Your times and dates will be different, but think about the pattern not the exact command.
C:\temp>dir
 Volume in drive C is OS
 Volume Serial Number is E828-D083

 Directory of C:\temp

11/11/2015  10:18 PM    <DIR>          .
11/11/2015  10:18 PM    <DIR>          ..
11/11/2015  10:17 PM                 7 a.txt
11/11/2015  10:17 PM                 7 myFile1.txt
11/11/2015  10:17 PM                 7 myFile2.txt
11/11/2015  10:17 PM                 7 myFile3.txt
11/11/2015  10:17 PM                 7 myFile4.txt
11/11/2015  10:17 PM                 7 myFile5.txt
11/11/2015  10:17 PM                 7 myFile6.txt
11/11/2015  10:17 PM                 7 myFile7.txt
11/11/2015  10:17 PM                 7 myFile8.txt
11/11/2015  10:17 PM                 7 myFile9.txt
              16 File(s)        992,802 bytes
               3 Dir(s)  415,352,266,752 bytes free

You can learn about the options we can use with FINDSTR utility by typing findstr /? To learn about the regular expression syntax as we pipe the DIR output as an input to FINDSTR.  In this example, we’ll match any line that contains two digitsfollowed by a forward slash.
C:\temp>dir|findstr /r [0-9][0-9]/
11/11/2015  10:18 PM    <DIR>          .
11/11/2015  10:18 PM    <DIR>          ..
11/11/2015  10:17 PM                 7 a.txt
11/11/2015  10:17 PM                 7 myFile1.txt
11/11/2015  10:17 PM                 7 myFile2.txt
11/11/2015  10:17 PM                 7 myFile3.txt
11/11/2015  10:17 PM                 7 myFile4.txt
11/11/2015  10:17 PM                 7 myFile5.txt
11/11/2015  10:17 PM                 7 myFile6.txt
11/11/2015  10:17 PM                 7 myFile7.txt
11/11/2015  10:17 PM                 7 myFile8.txt
11/11/2015  10:17 PM                 7 myFile9.txt

In this example, we’ll match an line that contains 11 followed by a forward slash.
Note: the first two digits will be different in your case, just type what you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r 11/
11/11/2015  10:18 PM    <DIR>          .
11/11/2015  10:18 PM    <DIR>          ..
11/11/2015  10:17 PM                 7 a.txt
11/11/2015  10:17 PM                 7 myFile1.txt
11/11/2015  10:17 PM                 7 myFile2.txt
11/11/2015  10:17 PM                 7 myFile3.txt
11/11/2015  10:17 PM                 7 myFile4.txt
11/11/2015  10:17 PM                 7 myFile5.txt
11/11/2015  10:17 PM                 7 myFile6.txt
11/11/2015  10:17 PM                 7 myFile7.txt
11/11/2015  10:17 PM                 7 myFile8.txt
11/11/2015  10:17 PM                 7 myFile9.txt

In this example, we’ll match any line that starts with 11 followed by a forward slash.
Note: the first two digits will be different in your case, just type what you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/
11/11/2015  10:18 PM    <DIR>          .
11/11/2015  10:18 PM    <DIR>          ..
11/11/2015  10:17 PM                 7 a.txt
11/11/2015  10:17 PM                 7 myFile1.txt
11/11/2015  10:17 PM                 7 myFile2.txt
11/11/2015  10:17 PM                 7 myFile3.txt
11/11/2015  10:17 PM                 7 myFile4.txt
11/11/2015  10:17 PM                 7 myFile5.txt
11/11/2015  10:17 PM                 7 myFile6.txt
11/11/2015  10:17 PM                 7 myFile7.txt
11/11/2015  10:17 PM                 7 myFile8.txt
11/11/2015  10:17 PM                 7 myFile9.txt

In this example, we’ll match any line that starts with 11 followed by a forward slash and any string.
Note: the first two digits will be different in your case, just type what you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/.*
11/11/2015  10:18 PM    <DIR>          .
11/11/2015  10:18 PM    <DIR>          ..
11/11/2015  10:17 PM                 7 a.txt
11/11/2015  10:17 PM                 7 myFile1.txt
11/11/2015  10:17 PM                 7 myFile2.txt
11/11/2015  10:17 PM                 7 myFile3.txt
11/11/2015  10:17 PM                 7 myFile4.txt
11/11/2015  10:17 PM                 7 myFile5.txt
11/11/2015  10:17 PM                 7 myFile6.txt
11/11/2015  10:17 PM                 7 myFile7.txt
11/11/2015  10:17 PM                 7 myFile8.txt
11/11/2015  10:17 PM                 7 myFile9.txt

In this example, we’ll match any line that starts with 11 followed by forward slash, any number of characters followed by the string File.  Notice how the directory entries and file a.txt does not show up anymore. Note: the first two digits will be different in your case, just type what you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/.*File
11/11/2015  10:17 PM                 7 myFile1.txt
11/11/2015  10:17 PM                 7 myFile2.txt
11/11/2015  10:17 PM                 7 myFile3.txt
11/11/2015  10:17 PM                 7 myFile4.txt
11/11/2015  10:17 PM                 7 myFile5.txt
11/11/2015  10:17 PM                 7 myFile6.txt
11/11/2015  10:17 PM                 7 myFile7.txt
11/11/2015  10:17 PM                 7 myFile8.txt
11/11/2015  10:17 PM                 7 myFile9.txt

In this example, we’ll match any line that starts with 11 followed by a forward slash, any number of characters, and ends with the letter t. Note: the first two digits will be different in your case, just type what you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/.*t$
11/11/2015  10:17 PM                 7 a.txt
11/11/2015  10:17 PM                 7 myFile1.txt
11/11/2015  10:17 PM                 7 myFile2.txt
11/11/2015  10:17 PM                 7 myFile3.txt
11/11/2015  10:17 PM                 7 myFile4.txt
11/11/2015  10:17 PM                 7 myFile5.txt
11/11/2015  10:17 PM                 7 myFile6.txt
11/11/2015  10:17 PM                 7 myFile7.txt
11/11/2015  10:17 PM                 7 myFile8.txt
11/11/2015  10:17 PM                 7 myFile9.txt

In this example, we’ll match any lines that starts with 11 followed by a forward slash, followed by any number of characters followed by a digit.txt. Notice the meaning of the period is escaped by the forward slash to take it as a literal character. Note: the first two digits will be different in your case, just type what you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/.*[1-9]\.txt
11/11/2015  10:17 PM                 7 myFile1.txt
11/11/2015  10:17 PM                 7 myFile2.txt
11/11/2015  10:17 PM                 7 myFile3.txt
11/11/2015  10:17 PM                 7 myFile4.txt
11/11/2015  10:17 PM                 7 myFile5.txt
11/11/2015  10:17 PM                 7 myFile6.txt
11/11/2015  10:17 PM                 7 myFile7.txt
11/11/2015  10:17 PM                 7 myFile8.txt
11/11/2015  10:17 PM                 7 myFile9.txt

Let’s change one of the files to a unique sequence of characters.
C:\temp>ren myFile4.txt myFileZ.tat

In this example, we’ll match any lines that start with 11 followed by a forward slash, any number of characters followed by a digit, followed by any number of characters, and the line will end with a letter t.
Note: the first two digits will be different in your case, just type what you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/.*[1-9].*t$
11/11/2015  10:17 PM                 7 a.txt
11/11/2015  10:17 PM                 7 myFile1.txt
11/11/2015  10:17 PM                 7 myFile2.txt
11/11/2015  10:17 PM                 7 myFile3.txt
11/11/2015  10:17 PM                 7 myFile5.txt
11/11/2015  10:17 PM                 7 myFile6.txt
11/11/2015  10:17 PM                 7 myFile7.txt
11/11/2015  10:17 PM                 7 myFile8.txt
11/11/2015  10:17 PM                 7 myFile9.txt
11/11/2015  10:17 PM                 7 myFileZ.tat

In this example, we’ll match any line that matches all the lines as before, but now we’ll end the line with at and not just a t. Note: the first two digits will be different in your case, just type what you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/.*[1-9].*at$
11/11/2015  10:17 PM                 7 myFileZ.tat

In this example, we’ll match our uniquely named file ignoring everything else.
Note: the first two digits will be different in your case, just type what you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/.*[1-9]\.[a-zA-Z].*at$

Oops, something went wrong.  Why did the previous search pattern not match our file name, but the following pattern will?
Note: the first two digits will be different in your case, just type what you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/.*[a-zA-Z].*at$
11/11/2015  10:17 PM                 7 myFileZ.tat

Write your answer here ( or the paper handed to you by your instructor ) and hand it back to your instructor.



Now, create a new file named myfile.txt using notepad and add the text (214)234-4567 to the file.  Find the text in all text files using FINDSTR utility.
C:\temp>findstr /r ^[(\[][0-9][0-9] *.txt

myfile.txt:(214)234-4567

Example Java code using regular expression to match pattern.


import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * This program tests a customer number to determine
 * whether it is in the proper format.
 */
public class SuspectNumbers  {
   public static void main(String[] args) throws IOException   {  
         File inputFile = new File("zip.txt");
         Scanner input=new Scanner(inputFile);
         boolean isEqual,isEqual2;
         Matcher m;
         String value;
         String match="75201";
         Pattern p = Pattern.compile("^75201");

      while(input.hasNext()){
        value=input.nextLine();
        if(value.startsWith(match))
              System.out.println(value);
        
        m = p.matcher(value);
        isEqual=Pattern.matches("^75201.*", value);
        if(isEqual)
              System.out.println("Regular Expression matched: "+value);
        
        isEqual2=m.find();
        if(isEqual2)
              System.out.println("Regular Expression 2 matched: "+value);
        
        if(Pattern.matches(" [0-9]{3}-[0-9]{2}-[0-9]{4} ", value))
              System.out.println("Social Security number matched: "+value);
        
        if(Pattern.matches("[\\[\\(][0-9]{3}[\\]\\)][0-9]{3}-[0-9]{4}.*", value))
              System.out.println("Phone number matched: "+value);
        
        if(Pattern.matches("[\\[\\(][0-9]{3}[\\]\\)][0-9]{3}-[0-9]{3}[a-zA-Z].*", value))
             System.out.println("Fake phone number matched: "+value);
      }
   }
}

Based on the code above, what sample value would you place in the input file to match fake phone numbers?




Appendix A: How to use regular expressions in MS Word


Save a Word Document with a phone number in it in a format (214)345-3456. Click on Find->Advanced Find …., check Use wildcards and type the following into the “Find what:”  <[0-9]{3}\)[0-9]{3}-[0-9]{4}>

      

Appendix A: How to use regular expressions in Notepad++


Save a Document with a phone number in it like [124]123-3456. Click on Search, check “Regular expression” and type the following into the “Find what:”  \[[0-9]{3}\][0-9]{3}-[0-9]{4}>


You can generate simple text files with patterns that you can search for and match with regular expressions.


75201    Dallas    TX
75202    Dallas    TX
75203    Dallas    TX
75204    Dallas    TX
75205    Dallas    TX
75206    Dallas    TX
 75378    Dallas    TX
(214)234-456I                                     // notice 456I is not ending with a one, it is the capital letter 'i'
75379    Dallas    TX
75047    Garland    TX
75048    Garland    TX
[123]345_4567                                  //phone numbers can be in different format
75049    Garland    TX
75080    Richardson    TX
75081    Richardson    TX
75082    Richardson    TX
76016    Arlington    TX
76017    Arlington    TX
234-05-3456                                     //social security numbers should be easy to match
76018    Arlington    TX
76019    Arlington    TX

Even / Odd Array Index

In some cases, you might have to process array values where only every other value needs to be processed.  This example should help you understand the process including how to use Regular Expressions.  


public class GradeSplit{
      public static void main(String[] args){
            String input="claude 99 mike 89 paul 88";
            //Review regular expression syntax
            //https://docs.oracle.com/javase/7/docs/api/
            // \\s will split on white space and the + is matching the white 

            //        space character one or more times
            //Look at the Java API's String class split method, you'll see that it returns a String[]
            //Thus every even index will hold names ( generate even index numbers by 2*i )
            //and every odd index will hold the grade value ( generate odd index values by 2*i+1 )

            String[] data=input.split("\\s+");
            
//Since you always have pairs of values, divide the length by 2 
             //to generate an array holding only the grades
            int[] x=new int[data.length/2];                       //add break point and examine data array

            //fill the array to hold grades by parsing all odd index values
            for(int i=0;i<x.length;i++){
                   x[i]=Integer.parseInt(data[2*i+1]);      
//add break point and step through the loop
             }

 
          //print all even values from the original array and all values 
            //from the grade array - processed as parallel array
            for(int i=0;i<x.length;i++){
                  System.out.printf("Your name is: %-6s and your grade is: %3d\n", data[2*i],x[i]);
            }
      }
}


Thus you can calculate the index value to use in your code by simple arithmetic calculations. Computer Science is about problem solving by pattern recognition where pattern recognition does not work without writing things down on paper in a logical fashion until a pattern emerge.

index odd values even values
index*2+1 index*2
0 1 0
1 3 2
2 5 4
3 7 6
4 9 8
5 11 10
6 13 12


Contents of data array after the split operation.




Contents of x integer array that is half of the data array and now only containing values as integers.

 



Output of code above
Your name is: claude and your grade is:  99
Your name is: mike    and your grade is:  89
Your name is: paul     and your grade is:  88


Tuesday, April 26, 2016

Create Runnable JAR File in Eclipse

Create a project in Eclipse.
Add a package practice
Add all three source code as separate source files into the practice package
Set Run Configurations... to have Main class as practice.MyEffort
Compile and make sure it runs
Export the project as Runnable JAR file
Double click on the result to see your program run


package practice;

import javax.swing.JOptionPane;

/** This class will be used as a driver class for the application
*@author Zoltan Szabo
*@version 1.0.0
*/
public class MyEffort {

    public static void main(String[] args) {
        // TODO Auto-generated method stub
        Alphabet letters=new Alphabet();
        String word;
        //char answer;
        int answer;
        do{
            word=JOptionPane.showInputDialog(null,"Please, enter a single word you'd like to convert to a percentage.","Word 2 Effort",JOptionPane.QUESTION_MESSAGE);
            Word2Percentage result=new Word2Percentage(word,letters);
            StringBuilder msg=new StringBuilder("The effort you put in your work is: "+result.getResult()+"%");

            JOptionPane.showMessageDialog(null, msg.toString(),"Answer",JOptionPane.INFORMATION_MESSAGE);
           
            //answer=JOptionPane.showInputDialog("Would you like to convert another word?(y/n): ").charAt(0);
            //answer=Character.toUpperCase(answer);
            answer=JOptionPane.showConfirmDialog(null,"Would you like to convert another word?","Convert Another One!!!",JOptionPane.YES_NO_OPTION,JOptionPane.INFORMATION_MESSAGE );
            if ( answer == 0){
                continue;
            }
            else if (answer == 1){
                JOptionPane.showMessageDialog(null,"Thank you for using this application, hope to see you again soon!!!","Good Bye",JOptionPane.INFORMATION_MESSAGE);
                break;
            }
            else{
                JOptionPane.showMessageDialog(null,"Invalid selection, bye!!!","Good Bye",JOptionPane.INFORMATION_MESSAGE);
                break;
            }
        } while (answer == 0);
    }
}


 package practice;

/**This is a class that populates an array with alphabet characters using a loop
 * @author Zoltan
 * @version 1.0.0

 */
public class Alphabet {
     
/**
 * The field representing a value
 */
public final static int SIZE=26;   
private Character letters[];

/**
 * Class constructor of Driver class with no particular function in this example
 * @param value as an integer
 */
public Alphabet(){
    letters=new Character[SIZE];
    for(int i=0;i<SIZE;i++)
        letters[i]=(char) ('A'+i);
}

/**This method will return a character corresponding to its index value
 * @param index of character
 * @return character at index
 */
public char getCharacter(int index){
    return letters[index].charValue();
}

}

package practice;

/**Class that converts the given word to a percentage representation
*@author Zoltan Szabo
*@version 1.0.0
*/
public class Word2Percentage {
private int total;

/**Constructor to the class
*@param word the user supplied word
*@param toConvert the corresponding array where the index value will be used to convert each character to a numerical value
*/
public Word2Percentage(String word, Alphabet toConvert) {
        super();

        String temp_word=word.toUpperCase();
        for (int i = 0; i < temp_word.length(); i++){
            for (int j = 0; j < Alphabet.SIZE; j++){
                if (toConvert.getCharacter(j) == temp_word.charAt(i)){
                    total += (j + 1);
                }
            }
        }
    }

/**Method to return the final converted numerical value representation of the given word
*@return numerical value of given word
*/
    public int getResult(){
        return total;
    }

}