Extended Regular Expressions

Due to the complexity of extended regular expressions (ERE), this document only describes the basics. For further information, we highly recommend the book Mastering Regular Expressions by J. Friedl, published by O'Reilly and Associates. (ISBN #1-56592-257-3).

What are "Regular Expressions"?

A regular expression is a set of special characters used to match patterns. They are generally classified into two types: Basic Regular Expressions (BRE) and Extended Regular Expressions (ERE).

Basic Regular Expressions (BRE)

Basic Regular Expressions (BRE) are the simpiler form of regular expressions. Although they are not implemented in Fern, they provide a basis for understanding the Extended Regular Expressions used by Fern.

BRE use three different types of "wildcard" characters. The question mark "?" is used to match any single character. The asterics "*" is used to match 0 or more characters. Square brackets are used to match a range of characters. (NOTE: Technically, there are a few other special characters, but I will not be describing them here.)

People are most familiar with the use of BRE in directory listings. For example, to list a single file named "autoexec.bat", you would type:

dir autoexec.bat To list all "autoexec" files: dir autoexec.* And to list any file beginning with "auto": dir auto* In these examples, the "*" expands to match "any string of characters". Just typing "dir" is the same as typing "dir *".

The question mark matches a single character. For example, autoexec.ba? matches "autoexec.bat" and "autoexec.bak". If you want to match exactly 4 characters, you can use many "?" characters. For example, "auto????.bat" matches "autoexec.bat" but not "auto123.bat".

Square brackets are used to denote ranges of characters. "[abcde]" will match a single character that is one of "a", "b", "c", "d", or "e". This can also be written "[a-e]". To list all files that begin with a number, you can use:

dir [0-9]* The range matches the first character (in this case, it must be a number) and the "*" matches all other characters in the file name.

Ranges can also be negated. To match all strings that do not begin with a number, you can use "[^0-9]*". The initial "^" in the range means "any character not in the range".

Extended Regular Expressions

BRE have difficulty matching some complex regular expressions. For example, they cannot match patterns like "all strings that begin with 'car' or 'rat'." Extended Regular Expressions (ERE) are used to define very complex patterns for matching.

ERE definitions contain characters for matching and counters to define the number of matches. The counter always comes after the character(s) that match.

Matching PurposeBREERE
single character? (question mark). (period)
range of characters[...] (brackets)[...] (brackets)
zero or more characters* (asterics).* (match any character, 0 or more times)
one or more characters?*.+ (match any character, 1 or more times)
5 characters?????.{5} (match any character, 5 times -- the {...} defines the count.)
5 to 8 charactersRequires 4 expressions: ?????, ??????, ???????, ???????? .{5,8}
match any character, at least 5 times and at most 8 times -- the {low,high} defines the count range
any string containing "car"*car*(.*)car(.*)
the parenthesis keeps characters with their counters
any string containing "car" or "rat" Requires two expressions: *car* and *rat* (.*)(car|rat)(.*)
the "|" means "or"

In addition, most ERE systems (such as Fern's, or the ones used in Perl and grep), define other characters for "shorthand" notations:

Match purposeLonghand notationShorthand notation
whitespace[ \f\n\r\t\v]\s
[[:space:]]
non-whitespace[^ \f\n\r\t\v]\S
spaces or tabs (not newlines or vertical tabs)[ \t][[:blank:]]
digit[0-9]\d
[[:digit:]]
hexidecimal digits[0-9a-fA-F]\d
[[:xdigit:]]
non-digit[^0-9]\D
word character[a-zA-Z0-9_]\w
non-word character[^a-zA-Z0-9_]\W
letters[a-zA-Z][[:alpha:]]
letters or numbers[a-zA-Z0-9][[:alnum:]]
lowercase letters[a-z][[:lower:]]
uppercase letters[A-Z][[:upper:]]
non-blank characters [[:graph:]]
printable characters (like non-blank but includes spaces) [[:print:]]
control characters [[:cntrl:]]
punctuation characters [[:punct:]]

A few examples of ERE usage:

ExampleERE
Match a generic "username@hostname" \w+@(\w\.)+
The "\." means "really a period and not the match-any-character".
Match a generic "username@hostname" where the hostname is no more than 64 characters and the username is up to 8 characters \w{1,8}@(\w\.){1,64}
...and there are no two periods next to each other in the hostname and it doesn't end with a period. \w{1,8}@(\w{1,64}|((\.\w|\w\w|\w){0,32}))


[Main Menu] [Running Fern]
Document revision: 18 June 2000 for Fern 2.40
Copyright 1996-2000 N.A. Krawetz
Modification, republication, and redistribution of this document is strictly prohibited. All rights reserved.