Regular expression quick reference:
.
Wildcard: any character
*
Repeat: zero or more occurrences of previous character or class
^
Line position: beginning of line
$
Line position: end of line
[class]
Character class: any one character in set
[^class] Inverse class: any one character not
in set
[x-y]
Range: any characters within the specified range
\x
Escape: literal use of metacharacter x
\<xyz
Word position: beginning of word
xyz\>
Word position: end of word
For full
information on FINDSTR regular expressions refer to the online Command
Reference.
Change to a directory where we can work with files.
C:\>cd c:\temp
Create a file that we can copy later.
The contents are not important. (
Press Ctrl+Z to close the file )
C:\temp>copy con a.txt
hello
^Z
1 file(s) copied.
Create a few files that we can work with.
C:\temp>for %i in ( 1 2 3 4 5 6 7 8 9 ) do copy a.txt myFile%i.txt
List the files and find a pattern to match. Your times and dates will be different, but
think about the pattern not the exact command.
C:\temp>dir
Volume in drive C is OS
Volume Serial Number is E828-D083
Directory of C:\temp
11/11/2015 10:18 PM
<DIR> .
11/11/2015 10:18 PM
<DIR> ..
11/11/2015 10:17 PM 7 a.txt
11/11/2015 10:17 PM 7 myFile1.txt
11/11/2015 10:17 PM 7 myFile2.txt
11/11/2015 10:17 PM 7 myFile3.txt
11/11/2015 10:17 PM 7 myFile4.txt
11/11/2015 10:17 PM 7 myFile5.txt
11/11/2015 10:17 PM 7 myFile6.txt
11/11/2015 10:17 PM 7 myFile7.txt
11/11/2015 10:17 PM 7 myFile8.txt
11/11/2015 10:17 PM 7 myFile9.txt
16 File(s) 992,802 bytes
3 Dir(s) 415,352,266,752 bytes free
You can learn about the options we can use with FINDSTR utility by typing
findstr /? To learn about the regular expression syntax as we pipe the DIR
output as an input to FINDSTR. In this
example, we’ll match any line that contains two digitsfollowed by a forward
slash.
C:\temp>dir|findstr /r [0-9][0-9]/
11/11/2015 10:18 PM
<DIR> .
11/11/2015 10:18 PM
<DIR> ..
11/11/2015 10:17 PM 7 a.txt
11/11/2015 10:17 PM 7 myFile1.txt
11/11/2015 10:17 PM
7 myFile2.txt
11/11/2015 10:17 PM 7 myFile3.txt
11/11/2015 10:17 PM 7 myFile4.txt
11/11/2015 10:17 PM 7 myFile5.txt
11/11/2015 10:17 PM 7 myFile6.txt
11/11/2015 10:17 PM
7 myFile7.txt
11/11/2015 10:17 PM 7 myFile8.txt
11/11/2015 10:17 PM 7 myFile9.txt
In this example, we’ll match an line that contains 11 followed by a
forward slash.
Note: the first two digits will be different in your case, just type what
you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r 11/
11/11/2015 10:18 PM
<DIR> .
11/11/2015 10:18 PM
<DIR> ..
11/11/2015 10:17 PM 7 a.txt
11/11/2015 10:17 PM 7 myFile1.txt
11/11/2015 10:17 PM 7 myFile2.txt
11/11/2015 10:17 PM 7 myFile3.txt
11/11/2015 10:17 PM 7 myFile4.txt
11/11/2015 10:17 PM 7 myFile5.txt
11/11/2015 10:17 PM 7 myFile6.txt
11/11/2015 10:17 PM 7 myFile7.txt
11/11/2015 10:17 PM 7 myFile8.txt
11/11/2015 10:17 PM 7 myFile9.txt
In this example, we’ll match any line that starts with 11 followed by a
forward slash.
Note: the first two digits will be different in your case, just type what
you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/
11/11/2015 10:18 PM
<DIR> .
11/11/2015 10:18 PM
<DIR> ..
11/11/2015 10:17 PM 7 a.txt
11/11/2015 10:17 PM 7 myFile1.txt
11/11/2015 10:17 PM 7 myFile2.txt
11/11/2015 10:17 PM 7 myFile3.txt
11/11/2015 10:17 PM 7 myFile4.txt
11/11/2015 10:17 PM 7 myFile5.txt
11/11/2015 10:17 PM 7 myFile6.txt
11/11/2015 10:17 PM 7 myFile7.txt
11/11/2015 10:17 PM 7 myFile8.txt
11/11/2015 10:17 PM 7 myFile9.txt
In this example, we’ll match any line that starts with 11 followed by a
forward slash and any string.
Note: the first two digits will be different in your case, just type what
you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/.*
11/11/2015 10:18 PM
<DIR> .
11/11/2015 10:18 PM
<DIR> ..
11/11/2015 10:17 PM 7 a.txt
11/11/2015 10:17 PM 7 myFile1.txt
11/11/2015 10:17 PM 7 myFile2.txt
11/11/2015 10:17 PM 7 myFile3.txt
11/11/2015 10:17 PM 7 myFile4.txt
11/11/2015 10:17 PM 7 myFile5.txt
11/11/2015 10:17 PM 7 myFile6.txt
11/11/2015 10:17 PM 7 myFile7.txt
11/11/2015 10:17 PM 7 myFile8.txt
11/11/2015 10:17 PM 7 myFile9.txt
In this example, we’ll match any line that starts with 11 followed by forward
slash, any number of characters followed by the string File. Notice how the directory entries and file
a.txt does not show up anymore. Note: the first two digits will be
different in your case, just type what you see as the first two digits after running
dir in the current directory
C:\temp>dir|findstr /r ^11/.*File
11/11/2015 10:17 PM 7 myFile1.txt
11/11/2015 10:17 PM 7 myFile2.txt
11/11/2015 10:17 PM 7 myFile3.txt
11/11/2015 10:17 PM 7 myFile4.txt
11/11/2015 10:17 PM 7 myFile5.txt
11/11/2015 10:17 PM 7 myFile6.txt
11/11/2015 10:17 PM 7 myFile7.txt
11/11/2015 10:17 PM 7 myFile8.txt
11/11/2015 10:17 PM 7 myFile9.txt
In this example, we’ll match any line that starts with 11 followed by a
forward slash, any number of characters, and ends with the letter t. Note: the first two digits will be different in your case, just type what
you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/.*t$
11/11/2015 10:17 PM 7 a.txt
11/11/2015 10:17 PM 7 myFile1.txt
11/11/2015 10:17 PM 7 myFile2.txt
11/11/2015 10:17 PM
7 myFile3.txt
11/11/2015 10:17 PM 7 myFile4.txt
11/11/2015 10:17 PM 7 myFile5.txt
11/11/2015 10:17 PM 7 myFile6.txt
11/11/2015 10:17 PM 7 myFile7.txt
11/11/2015 10:17 PM
7 myFile8.txt
11/11/2015 10:17 PM 7 myFile9.txt
In this example, we’ll match any lines that starts with 11 followed by a
forward slash, followed by any number of characters followed by a digit.txt.
Notice the meaning of the period is escaped by the forward slash to take it as
a literal character. Note: the first two digits will be
different in your case, just type what you see as the first two digits after
running dir in the current directory
C:\temp>dir|findstr /r ^11/.*[1-9]\.txt
11/11/2015 10:17 PM 7 myFile1.txt
11/11/2015 10:17 PM 7 myFile2.txt
11/11/2015 10:17 PM 7 myFile3.txt
11/11/2015 10:17 PM 7 myFile4.txt
11/11/2015 10:17 PM 7 myFile5.txt
11/11/2015 10:17 PM 7 myFile6.txt
11/11/2015 10:17 PM 7 myFile7.txt
11/11/2015 10:17 PM 7 myFile8.txt
11/11/2015 10:17 PM 7 myFile9.txt
Let’s change one of the files to a unique sequence of characters.
C:\temp>ren myFile4.txt myFileZ.tat
In this example, we’ll match any lines that start with 11 followed by a
forward slash, any number of characters followed by a digit, followed by any
number of characters, and the line will end with a letter t.
Note: the first two digits will be different in your case, just type what
you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/.*[1-9].*t$
11/11/2015 10:17 PM 7 a.txt
11/11/2015 10:17 PM 7 myFile1.txt
11/11/2015 10:17 PM 7 myFile2.txt
11/11/2015 10:17 PM 7 myFile3.txt
11/11/2015 10:17 PM 7 myFile5.txt
11/11/2015 10:17 PM 7 myFile6.txt
11/11/2015 10:17 PM 7 myFile7.txt
11/11/2015 10:17 PM 7 myFile8.txt
11/11/2015 10:17 PM 7 myFile9.txt
11/11/2015 10:17 PM 7 myFileZ.tat
In this example, we’ll match any line that matches all the lines as
before, but now we’ll end the line with at
and not just a t. Note: the first two digits will be different in your case, just type what
you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/.*[1-9].*at$
11/11/2015 10:17 PM 7 myFileZ.tat
In this example, we’ll match our uniquely named file ignoring everything
else.
Note: the first two digits will be different in your case, just type what
you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/.*[1-9]\.[a-zA-Z].*at$
Oops, something went wrong. Why
did the previous search pattern not match our file name, but the following
pattern will?
Note: the first two digits will be different in your case, just type what
you see as the first two digits after running dir in the current directory
C:\temp>dir|findstr /r ^11/.*[a-zA-Z].*at$
11/11/2015 10:17 PM 7 myFileZ.tat
Write your answer here ( or the paper handed to you by your instructor ) and
hand it back to your instructor.
Now, create a new file named myfile.txt using notepad and add the text
(214)234-4567 to the file. Find the text
in all text files using FINDSTR utility.
C:\temp>findstr /r ^[(\[][0-9][0-9] *.txt
myfile.txt:(214)234-4567
Example Java code using regular expression to match pattern.
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* This program tests a
customer number to determine
* whether it is in the
proper format.
*/
public class SuspectNumbers {
public static void main(String[] args)
throws IOException {
File inputFile = new File("zip.txt");
Scanner input=new
Scanner(inputFile);
boolean isEqual,isEqual2;
Matcher m;
String value;
String match="75201";
Pattern p = Pattern.compile("^75201");
while(input.hasNext()){
value=input.nextLine();
if(value.startsWith(match))
System.out.println(value);
m =
p.matcher(value);
isEqual=Pattern.matches("^75201.*",
value);
if(isEqual)
System.out.println("Regular Expression
matched: "+value);
isEqual2=m.find();
if(isEqual2)
System.out.println("Regular Expression 2
matched: "+value);
if(Pattern.matches(" [0-9]{3}-[0-9]{2}-[0-9]{4} ",
value))
System.out.println("Social Security
number matched: "+value);
if(Pattern.matches("[\\[\\(][0-9]{3}[\\]\\)][0-9]{3}-[0-9]{4}.*", value))
System.out.println("Phone number
matched: "+value);
if(Pattern.matches("[\\[\\(][0-9]{3}[\\]\\)][0-9]{3}-[0-9]{3}[a-zA-Z].*", value))
System.out.println("Fake phone number
matched: "+value);
}
}
}
Based on the code above, what sample value would you place in the input
file to match fake phone numbers?
Appendix A: How to use regular expressions in MS Word
Save a Word
Document with a phone number in it in a format (214)345-3456. Click on
Find->Advanced Find …., check Use wildcards and type the following into the
“Find what:” <[0-9]{3}\)[0-9]{3}-[0-9]{4}>
Appendix A: How to use regular expressions in Notepad++
Save a
Document with a phone number in it like [124]123-3456.
Click on Search, check “Regular expression” and type the following into the
“Find what:” \[[0-9]{3}\][0-9]{3}-[0-9]{4}>
You can generate simple text files with patterns that you can search for and match with regular expressions.
75201 Dallas TX
75202 Dallas TX
75203 Dallas TX
75204 Dallas TX
75205 Dallas TX
75206 Dallas TX
75378 Dallas TX
(214)234-456I // notice 456I is not ending with a one, it is the capital letter 'i'
75379 Dallas TX
(214)234-456I // notice 456I is not ending with a one, it is the capital letter 'i'
75379 Dallas TX
75047 Garland TX
75048 Garland TX
75048 Garland TX
[123]345_4567 //phone numbers can be in different format
75049 Garland TX
75080 Richardson TX
75081 Richardson TX
75082 Richardson TX
75049 Garland TX
75080 Richardson TX
75081 Richardson TX
75082 Richardson TX
76016 Arlington TX
76017 Arlington TX
234-05-3456 //social security numbers should be easy to match
76018 Arlington TX
76019 Arlington TX
76017 Arlington TX
234-05-3456 //social security numbers should be easy to match
76018 Arlington TX
76019 Arlington TX
No comments:
Post a Comment