Regular expression

From Conservapedia

Jump to: navigation, search

A regular expression is particular instance of a non-deterministic finite state automaton. Regular expressions are a type-3 grammar in the Chomsky hierarchy of language.

Contents

Overview

A regular expression is read from left to right and is processed one piece at a time. Certain characters have special meanings within the description of a regular language:

  • '*' - the previous construction is matched 0 or more times
  • '+' - the previous construction is matched 1 or more times
  • '?' - the previous construction is matched 0 or 1 times
  • '(...)' - the contents between the parentheses is considered a single construction
  • '...|...' - the construction on the left of the '|' may be matched or the construction on the right of the '|' may be matched.
  • '[...]' - match one of the characters contained within the square braces.
  • '.' - match any single character
  • '^' - beginning of a string
  • '$' - end of a string

There also exists a wide range of special character classes distinguished with a backslash (a small list):

  • '\\' - a literal backslash
  • '\.' - a literal period
  • '\s' - any whitespace character
  • '\b' - a word boundary
  • '\d' - a digit

Regular expressions have also been extended by many languages, some of which extend them to the point where they are able to match a wider range of languages than is specified by a regular language.

Examples

  • /b[ae]t/ - matches 'bat', 'bet', 'batik', 'abet'
  • /Mrs?\. Smith/ - matches 'Mr. Smith', 'Mrs. Smith'
  • /^a*b*$/ - matches 'a', 'ab', 'b', 'aaaab', 'abbbbb', 'aaabbb',
  • /[1-9][0-9]*(\.0|([1-9][0-9]*))+/ - matches version numbers: 1, 1.0, 2.3, 103.4, 42.5.6.8.9, 7.0.8.9

Limitations of regular expressions

A regular expression is not able to count. This is because there is a finite number of states. Consider the language that is specified by anban. Examples of this language include b, aba, aabaa, aaabaaa, etc... A regular expression - being a finite state automaton itself - has a finite number of states that it can be in. If there a point at which the state loops back on itself it is no longer able to match that language.

See also

Further reading

Personal tools