Using Regular Expressions ------------------------- A regular expression is a string of text that contains symbolic characters representing various patterns the user wants to find. It is very similar to a mathematical expression. Regular expressions are built from the following elements: . Matches any single character. [ ] Matches any one of the characters in the brackets, or any of a range of characters separated by a hyphen '-', or a character class operator (see below). [^] Matches any characters except for those after the caret '^'. ^ Anchors matching to the start of a line. $ Anchors matching to the end of a line. ( ) Encloses a subexpression. * Matches 0 or more of what precedes. ? Matches 0 or 1 of what precedes. + Matches 1 or more of what precedes. {count} Matches count of what precedes. {min,} Matches at least min of what precedes. {min,max} Matches between min and max of what precedes. | Matches either what is to the left or right. \. Matches '.'. \[ Matches '['. \] Matches ']'. \^ Matches '^'. \$ Matches '$'. \( Matches '('. \) Matches ')'. \* Matches '*'. \? Matches '?'. \+ Matches '+'. \{ Matches '{'. \} Matches '}'. \| Matches '|'. \\ Matches '\'. Character Class Operators ------------------------- [:alnum:] The set of alpha-numeric characters. [:alpha:] The set of alphabetic characters. [:blank:] Tab and space. [:cntrl:] The control characters. [:digit:] Decimal digits. [:graph:] All printable characters except space. [:lower:] Lower case letters. [:print:] The "printable" characters. [:punct:] Punctuation. [:space:] Whitespace characters. [:upper:] Upper case letters. [:xdigit:] Hexadecimal digits. Examples -------- abcde Matches the characters 'a', 'b', 'c', 'd', and 'e' in contiguous order. d.n Matches three characters starting with a 'd' and ending with an 'n'. [abcde] Matches any one of the characters 'a', 'b', 'c', 'd', or 'e'. [a-e] Matches any one of the characters 'a', 'b', 'c', 'd', or 'e'. [A-Za-z] Matches any single letter. re[aei]d Matches 'read', 'reed', or 'reid'. [^a-e] Matches any character except one of 'a', 'b', 'c', 'd', or 'e'. h[^u]t Matches three characters starting with 'h' and ending with 't', except 'hut'. ^start Matches 'start' when it is the first word on a line. end$ Matches 'end' when it is the last word on a line. ^$ Matches a blank line (start, nothing, and end). ^[0-9]$ Matches a line containing only one digit. ho*p Matches 'hp', 'hop', 'hoop', and so on. ho?p Matches 'hp' and 'hop', but not 'hoop'. ho+p Matches 'hop', 'hoop', and so on, but not 'hp'. [0-9]+ Matches an integer (one or more digits). ho{2}p Matches 'hoop', but not 'hop'. ho{1,}p Matches 'hop', 'hoop', and so on, but not 'hp'. ho{1,2}p Matches 'hop' and 'hoop', but not 'hp' or 'hooop'. [2-9][0-9][0-9]-[0-9]{4} Matches a North American Dialing Zone phone number; i.e., a digit between 2 and 9, two digits, a hyphen and four digits. hop|hoop Matches 'hop' or 'hoop'. (Sam|Mary|Bill) (Smith|Jones) Matches any one of the six names 'Sam Smith', 'Sam Jones', 'Mary Smith', 'Mary Jones', 'Bill Smith' or 'Bill Jones'. \[\]\\ Matches the characters '[', ']', and '\' in contiguous order. Using Replacement Expressions ----------------------------- In both regular and replacement expressions, the following special characters are defined as: \n The newline character. \t The tab character. \\ The backslash character. The following elements are used in regular expressions to match, or in replacement expressions to replace, subexpressions enclosed by '(' and ')'. Subexpressions are numbered from left to right: \0 The entire expression. \1 The 1st subexpression. \2 The 2nd subexpression. \3 The 3rd subexpression. \4 The 4th subexpression. \5 The 5th subexpression. \6 The 6th subexpression. \7 The 7th subexpression. \8 The 8th subexpression. \9 The 9th subexpression. Examples -------- the regular expression: (word) \1 \1 matches: word word word and the regular expression: (first) (second) (third) and the replacement expression: \3 \2 \1 reverses the order of the words: first second third to become: third second first