BRE use three different types of "wildcard" characters. The question mark "?" is used to match any single character. The asterics "*" is used to match 0 or more characters. Square brackets are used to match a range of characters. (NOTE: Technically, there are a few other special characters, but I will not be describing them here.)
People are most familiar with the use of BRE in directory listings. For example, to list a single file named "autoexec.bat", you would type:
The question mark matches a single character. For example, autoexec.ba? matches "autoexec.bat" and "autoexec.bak". If you want to match exactly 4 characters, you can use many "?" characters. For example, "auto????.bat" matches "autoexec.bat" but not "auto123.bat".
Square brackets are used to denote ranges of characters. "[abcde]" will match a single character that is one of "a", "b", "c", "d", or "e". This can also be written "[a-e]". To list all files that begin with a number, you can use:
Ranges can also be negated. To match all strings that do not begin with a number, you can use "[^0-9]*". The initial "^" in the range means "any character not in the range".
ERE definitions contain characters for matching and counters to define the number of matches. The counter always comes after the character(s) that match.
Matching Purpose | BRE | ERE |
---|---|---|
single character | ? (question mark) | . (period) |
range of characters | [...] (brackets) | [...] (brackets) |
zero or more characters | * (asterics) | .* (match any character, 0 or more times) |
one or more characters | ?* | .+ (match any character, 1 or more times) |
5 characters | ????? | .{5} (match any character, 5 times -- the {...} defines the count.) |
5 to 8 characters | Requires 4 expressions: ?????, ??????, ???????, ???????? | .{5,8} match any character, at least 5 times and at most 8 times -- the {low,high} defines the count range |
any string containing "car" | *car* | (.*)car(.*) the parenthesis keeps characters with their counters |
any string containing "car" or "rat" | Requires two expressions: *car* and *rat* | (.*)(car|rat)(.*) the "|" means "or" |
In addition, most ERE systems (such as Fern's, or the ones used in Perl and grep), define other characters for "shorthand" notations:
Match purpose | Longhand notation | Shorthand notation |
---|---|---|
whitespace | [ \f\n\r\t\v] | \s [[:space:]] |
non-whitespace | [^ \f\n\r\t\v] | \S |
spaces or tabs (not newlines or vertical tabs) | [ \t] | [[:blank:]] |
digit | [0-9] | \d [[:digit:]] |
hexidecimal digits | [0-9a-fA-F] | \d [[:xdigit:]] |
non-digit | [^0-9] | \D |
word character | [a-zA-Z0-9_] | \w |
non-word character | [^a-zA-Z0-9_] | \W |
letters | [a-zA-Z] | [[:alpha:]] |
letters or numbers | [a-zA-Z0-9] | [[:alnum:]] |
lowercase letters | [a-z] | [[:lower:]] |
uppercase letters | [A-Z] | [[:upper:]] |
non-blank characters | [[:graph:]] | |
printable characters (like non-blank but includes spaces) | [[:print:]] | |
control characters | [[:cntrl:]] | |
punctuation characters | [[:punct:]] |
A few examples of ERE usage:
Example | ERE |
---|---|
Match a generic "username@hostname" | \w+@(\w\.)+ The "\." means "really a period and not the match-any-character". |
Match a generic "username@hostname" where the hostname is no more than 64 characters and the username is up to 8 characters | \w{1,8}@(\w\.){1,64} |
...and there are no two periods next to each other in the hostname and it doesn't end with a period. | \w{1,8}@(\w{1,64}|((\.\w|\w\w|\w){0,32})) |