Regular expression
From Conservapedia
A regular expression is particular instance of a non-deterministic finite state automaton. Regular expressions are a type-3 grammar in the Chomsky hierarchy of language.
Contents |
Overview
A regular expression is read from left to right and is processed one piece at a time. Certain characters have special meanings within the description of a regular language:
- '*' - the previous construction is matched 0 or more times
- '+' - the previous construction is matched 1 or more times
- '?' - the previous construction is matched 0 or 1 times
- '(...)' - the contents between the parentheses is considered a single construction
- '...|...' - the construction on the left of the '|' may be matched or the construction on the right of the '|' may be matched.
- '[...]' - match one of the characters contained within the square braces.
- '.' - match any single character
- '^' - beginning of a string
- '$' - end of a string
There also exists a wide range of special character classes distinguished with a backslash (a small list):
- '\\' - a literal backslash
- '\.' - a literal period
- '\s' - any whitespace character
- '\b' - a word boundary
- '\d' - a digit
Regular expressions have also been extended by many languages, some of which extend them to the point where they are able to match a wider range of languages than is specified by a regular language.
Examples
- /b[ae]t/ - matches 'bat', 'bet', 'batik', 'abet'
- /Mrs?\. Smith/ - matches 'Mr. Smith', 'Mrs. Smith'
- /^a*b*$/ - matches 'a', 'ab', 'b', 'aaaab', 'abbbbb', 'aaabbb',
- /[1-9][0-9]*(\.0|([1-9][0-9]*))+/ - matches version numbers: 1, 1.0, 2.3, 103.4, 42.5.6.8.9, 7.0.8.9
Limitations of regular expressions
A regular expression is not able to count. This is because there is a finite number of states. Consider the language that is specified by anban. Examples of this language include b, aba, aabaa, aaabaaa, etc... A regular expression - being a finite state automaton itself - has a finite number of states that it can be in. If there a point at which the state loops back on itself it is no longer able to match that language.
