The Chomsky hierarchy, sometimes known as the Chomsky-Schützenberger hierarchy, is a hierarchy of formal grammatical systems for describing various classes of languages; the hierarchy can apply to both human and computer languages. This hierarchy was first published in 1956 by the linguist Chomsky.  These grammatical systems were referred to by Chomsky as transformational grammars. Other linguists have since developed a variety of similar grammars; together these are known as generative grammars.
The basic idea of a generative grammar is to provide a set of rules which are capable of generating all possible grammatical sentences in some language; the rules must constrain the generation so that only grammatical sentences are generated. The rules can also be used in reverse to parse a sentence and check if it is grammatical.
In the late 1950s, John Backus and Peter Naur developed a notation for describing programming languages. This notation is known as Backus-Naur Form (or BNF) and turns out to be equivalent to Chomsky's grammar number two (context-free grammar), although it was developed independently of Chomsky's work. Backus-Naur Form was first used in the Algol 60 Report to specify the syntax of Algol 60, and has since been used for specifying the syntax of other programming languages.
- 1 Example of a very simple generative grammar
- 2 Caution
- 3 Formal grammar
- 4 Chomsky hierarchy
- 5 References
Example of a very simple generative grammar
This very simple example can handle only a tiny subset of English and should not be taken seriously as a guide to English grammar or vocabulary. It is based on an example by David Crystal  and is a slight extension of an example on slide 9 of this lecture. 
Sentence -> NounPhrase VerbPhrase # A Sentence consists of a NounPhrase followed by a VerbPhrase NounPhrase -> Determiner Noun # A NounPhrase consists of a Determiner followed by a Noun VerbPhrase -> Verb NounPhrase # A VerbPhrase consists of a Verb followed by a NounPhrase Determiner -> a, the # A Determiner is one of: a or the Noun -> boy, girl, dog # A Noun is one of: boy or girl or dog Verb -> chased, heard, saw # A Verb is one of: chased or heard or saw
Notation used above:
- The symbol "->" means "can be replaced by".
- When there are several items on the right hand side separated by commas, just one of these items can be selected.
- The comments preceded by "#" are not part of the generative grammar, but hopefully will help folks understand the example a little better.
By making appropriate substitutions we can generate such sentences as:
- The dog chased a girl.
- The girl heard the dog.
- The boy saw a girl.
The stages in the generation of these examples are as follows:
- -> NounPhrase VerbPhrase
- -> Determiner Noun Verb NounPhrase
- -> Determiner Noun Verb Determiner Noun
By appropriate selection of words to substitute for Determiner, Noun, and Verb we can produce the 3 example sentences above.
The simple grammar defines a few words which can be substituted for Determiner, Noun, and Verb. There are two possibilities for Determiner, three for Noun, and three for Verb, with both Determiner and Noun appearing twice in the expansion. Hence the total number of grammatical sentences which can be generated by this simple grammar is 108 (= 2 * 3 * 3 * 2 * 3). In this particular case the possible words have been carefully chosen so that all of these 108 sentences also actually mean something.
Note that the use of italics and boldface in this example is merely a convenient way of indicating the difference between items which can be expanded further (e.g. concepts such as NounPhrase) and items which can not be expanded further (e.g. words such as girl). In the linguistic literature, items which can be expanded further are referred to as 'non-terminal' items, while items which can't be expanded further are known as 'terminal' items.
The 'comma' notation for alternatives is a form of shorthand. In some formulations this would be shown as several separate rules, so that:
Determiner -> a, the
could be shown as two rules:
Determiner -> a Determiner -> the
While a generative grammar may produce sentences which are grammatically correct, there is no certainty that such sentences will actually mean anything. The classic example of such a meaningless sentence is from Chomsky's book  with the grammatically correct sentence:
- "Colorless green ideas sleep furiously."
- This example illustrates the difference between grammar/syntax and meaning/semantics.
Another problem when analysing a sentence is that many English words have multiple meanings, so that there may be more than one way to interpret a sentence. A simple example is the sentence:
- "Time flies."
- Is this an instruction to check how fast flies are flying, or an indication that time appears to be passing quickly?
- A varied and entertaining selection of such ambiguities has been compiled by Jeff Gray.
Note: this section and subsequent sections are, by necessity, slightly more technical/formal than the earlier parts of this article, though specialist notation and abbreviations have been largely avoided. Make sure that you understand the 'Simple example' above before proceeding further, since references will be made to that example to help explain the technicalities, and variations on that example will be used to illustrate new ideas.
A formal generative grammar has four components (as described in slide 19 of this lecture  ):
- a finite number of non-terminal concepts (e.g. VerbPhrase in the example, or anything else in italics),
- a finite number of terminal items (e.g. girl in the example, or anything else in bold-face).
- a finite set of rules for transforming non-terminal items (e.g. the example has 6 rules).
- a single starting point (e.g. Sentence in the example).
- Terminal items are those which can not be transformed into anything else.
- A list of all the terminal items is effectively the vocabulary from which sentences are built.
- Non-terminal items are those which can be transformed into something else.
- Linguists have developed various classes of grammars by using constraints to govern what sort of rules can be used for any particular grammar.
Changing the rules
An relatively small change to even a single rule can make a large change in the language which can be generated. For example, change the first rule in the simple example to read:
Sentence -> NounPhrase VerbPhrase, NounPhrase VerbPhrase and NounPhrase VerbPhrase
This means that a Sentence is either a NounPhrase followed by a VerbPhrase or a NounPhrase followed by a VerbPhrase followed by and followed by a NounPhrase followed by a VerbPhrase. In addition to the shorter sentences already discussed, this could also generate a sentence such as: "A dog saw a girl and the dog chased the girl". The total number of grammatically-correct sentences which can be generated by this modified grammar is now 11,772 (108 + 108*108), though some of these may not be very meaingful.
A significantly different rule change can have an astounding effect. Consider changing the first rule to read:
Sentence -> NounPhrase VerbPhrase, NounPhrase VerbPhrase and Sentence
This can generate a potentially infinite number of sentences (any number of these five-word sentences linked by and). This is an example of what is known as a recursive definition, whereby a Sentence is partially defined in terms of itself. In such a recursive rule there must be at least one transformation option which does not involve Sentence, otherwise the transformation could never stop.
The above change is known as right-recursive; a left-recursive formulation is also possible, and can in this example generate exactly the same sentences. As shown below under 'Context-free grammar', there are situations where the choice of left- or right-recursive formulations can change the meaning.
Sentence -> NounPhrase VerbPhrase, Sentence and NounPhrase VerbPhrase
The distinction between left- and right-recursive grammars becomes important when trying to parse/analyse a sentence; some types of parser work best with left-recursive grammars, while other types work best with right-recursive grammars. This distinction is particularly relevant in connection with compiling or translating a program written in some high-level programming language so that it can be used on a computer.
By imposing different constraints  as to what sort of transformation rules could be used, Chomsky was able to define four strictly nested transformational grammars, which he numbered from 0 to 3. As the grammar number increases, the generated language becomes simpler. As the grammar number decreases, parsing a sentence in that language becomes more difficult. Other linguists have described formal grammars which fit in between the various Chomsky levels.
The following table provides a summary, which is then explained further in the subsequent subsections.
|linguistic name||Rule constraints||Recogniser+||Notes|
|0||recursively enumerable||[anything] -> [anything]||Turing machine||most human languages|
|1||context-sensitive||X A Y -> X [anything] Y||linear bounded automaton||X and Y provide the context in which A can be transformed|
|2||context-free||A -> [anything]||non-deterministic push-down automaton||syntax analysis for most programming languages|
|3||regular|| A -> [nothing]
A -> xyz
A -> xyz B
| finite automaton
or finite state machine
|lexical analysis (combining characters into words/tokens)|
+ The recogniser column indicates the type of mathematical or computer science theoretical model needed to recognise that a sentence belongs to a language, essentially by applying the transformation rules in reverse to work backwards from the actual words in order to finish up with a sentence (in programming languages the starting concept is usually a program). A finite automaton uses only a small fixed amount of memory, operates in time proportional to the length of the input, and always reaches a decision, while a Turing machine may use an infinite amount of memory and time and may fail to reach a decision (may never stop).
In the linguistic literature, [nothing] is often represented by the symbol ε.
The descriptions of the various Chomsky levels which follow are presented in reverse order, so as to work up from a simple case to a rather complicated case.
Level 3: regular grammar
The transformation rules for regular grammars are required to be very simple.
- Only a single non-terminal item can appear on the left-hand side of a rule.
- The right-hand side of any rule must be one of:
- a terminal item.
- a terminal item followed by a non-terminal item.
Note that instead of a terminal item we could have a non-terminal item whose transformation rule immediately resolves to a single terminal item, or to one of several terminal items.
Informally we can define a 'word' as a sequence of one or more Letters. The following regular grammar provides a more formal definition; for simplicity it is restricted to lower-case letters.
Word -> Letter PartWord PartWord -> [nothing] PartWord -> Letter PartWord Letter -> a, b, ... z
Note that PartWord is recursively defined, with the recursion stopping if it is transformed into [nothing].
For one-letter words we expand as:
- Word -> Letter PartWord -> Letter [nothing] -> Letter
For two-letter words we expand as:
- Word -> Letter PartWord -> Letter Letter PartWord -> Letter Letter [nothing] -> Letter Letter
Obviously, words of any length can be constructed, though they may well not actually be words from any language.
Trying to devise transformation rules to produce words which are more English-like is much more difficult - how do you insist on at least one vowel or on no more than 5 consecutive consonants (e.g. as in 'strengths').
Further examples in Regular grammar show how numbers can be defined via a regular grammar, and also show how a hyphenated word could be defined.
One of the limitations of a regular grammar is that it can't cope with nested matching brackets, so that it is impossible to write a regular grammar to describe an expression such as 2 * (3+4) where that grammar is expected to allow brackets nested to any depth. A context-free grammar is needed for this, as described in the next section.
Regular grammars or equivalent are widely used by compilers in the lexical analysis of programming languages.
Level 2: context-free grammar
The transformation rules for context-free grammars don't appear that much different from those for regular grammars.
- Only a single non-terminal item can appear on the left-hand side of a rule.
- The right-hand side of any rule must be one of:
- any combination of terminal items and non-terminal items.
The key difference is that in a regular grammar, any non-empty rule must effectively start with terminal item so that every rule transformation makes some immediate progress, while in a context-free grammar a rule could start with a non-terminal item which itself could be subject to arbitrary further transformations.
The following example shows how context-free grammars can cope with arithmetic expressions involving possibly nested brackets. Also an expression such as "2 + 3 * 4" could be interpreted as either "(2 + 3) * 4" = 20 or as "2 + (3 * 4)" = 14. The latter interpretation is that which is normally used - multiplication "*" is said to have a higher priority than addition "+" - so the grammar must incorporate this priority concept. Furthermore, consecutive operators of the same priority are normally applied from left to right, so that "8 - 2 + 3" is interpreted as "(8 - 2) + 3" = 9, rather than as "8 - (2 + 3)" = 3. This ordering must also be built in to the grammar.
To avoid distracting side-issues, this example only deals with numbers which consist solely of a single digit. The example is a simplified version of the definition of an arithmetic expression given in the Revised Algol 60 Report (Section 3.3.1). 
Expression -> Term # an Expression is either a Term Expression -> Expression AddSubOp Term # or an Expression followed by + or - followed by a Term # Term -> Primary # a Term is either a Primary Term -> Term MulDivOp Primary # or a Term followed by * or / followed by a Primary # Primary -> Number # a Primary is either a Number Primary -> ( Expression ) # or an Expression enclosed in brackets # MulDivOp -> *. / # multiplication and division have the same priority AddSubOp -> +. - # addition and subtraction have the same priority Number -> 0, 1, ... 9 # this simple-minded definition of a Number is enough for this example
- Both Expression and Term are each defined via a left-recursive rule; this ensures that operators of the same priority are applied from left to right.
- The distinction between Expression and Term ensures that multiplication/division are applied before addition/subtraction.
- Expression is indirectly recursive via Term and Primary to allow bracketed sub-expressions to any depth.
Example 1: generate 2 + 3 * 4
Expression -> Expression AddSubOp Term -> Term AddSubOp Term -> Primary AddSubOp Term -> 2 + Term # + can not be applied until Term has been evaluated -> 2 + Term MulDivOp Primary -> 2 + Primary MulDipOp Primary -> 2 + 3 * 4
Example 2: generate 2 * 3 + 4
Expression -> Expression AddSubOp Term -> Term AddSubOp Term -> Term MulDivOp Primary AddSubOp Term -> Primary MulDivOp Primary AddSubOp Term -> 2 * 3 AddSubOp Term # Term to left of AddSubOp is evaluated first -> 2 * 3 AddSubOp Primary -> 2 * 3 + 4
Context-free grammars (or equivalent definitions using Backus-Naur-Form) are widely used both to define the syntax of programming languages and to (semi-automatically) generate parsers for these languages.
A limitation/feature of both regular grammars and context-free grammars is that the left-hand side of any rule is always a single non-terminal item, so any transformation is applied regardless of the context in which that non-terminal item appears. A simple example which depends on context is the use of singular and plural nouns and verbs in English. We say: the man runs or the men run, but neither of the man run nor the men runs. Which form of verb to use depends on the noun, i.e. there is a context dependency.
Level 1: context-sensitive grammar
For context-sensitive grammars the left-hand side of a rule may indicate the context in which that rule applies, so that the same non-terminal item may be transformed in different ways depending on the context in which it appears.
There is only one transformation rule for context-sensitive grammars:
- X [non-terminal] Y -> X [anything] Y
where X and Y determine the context in which the [non-terminal] is transformed. Note that X and Y may each be terminals or non-terminals or [nothing]. When both X and Y are nothing there is no context and this reduces to a Level 2 or Level 3 grammar.
Taylor has provided several examples  and the following, inspired by his third example, is an extension of the original simple example above to handle singulars and plurals:
Sentence -> NounPhrase VerbPhrase # simple sentence structure NounPhrase -> Determiner Noun # simple phrase structure VerbPhrase -> Verb NounPhrase # simple phrase structure # Noun -> SingularNoun, PluralNoun # nouns can be singular or plural # Determiner SingularNoun -> SingularDeterminer SingularNoun # context-sensitive: Determiner PluralNoun -> PluralDeterminer PluralNoun # determiner must agree with noun # SingularNoun Verb -> SingularNoun SingularVerb # context-sensitive: PluralNoun Verb -> PluralNoun PluralVerb # verb must agree with noun # SingularDeterminer -> a, the # singular determiners PluralDeterminer -> the # plural determiners # SingularNoun -> boy, sheep, dog # singular nouns PluralNoun -> boys, sheep, dogs # plural nouns # SingularVerb -> chases, hears, sees # singular verbs PluralVerb -> chase, hear, see # plural verbs
The above grammar can generate sentences such as: a boy chases a dog, the boys chase a dog, etc. Because some words have the same form in both singular and plural, it can also generate meaningful (and grammatically-correct) sentences such as: the sheep sees the dog, the sheep see the dog.
But the above grammar can NOT generate sentences such as: a boys chases the sheep, since this would violate the context-sensitive rules.
It is in fact possible to handle singular/plural issues by using only a Level 2 context-free grammar, but the above rules expressed as a Level 1 context-sensitive grammar are a closer match to ordinary concepts of English grammar (e.g. nouns and verbs must agree in number i.e. singular/plural).
Level 0: recursively enumerable grammar
A recursively enumerable grammar has absolutely no restriction on what sort of transformation rules can be applied; hence it is also known as an unrestricted grammar. Rules take the general form:
- [anything] -> [anything]
The following rules can generate all symbols of the form: abc, aabbcc, aaabbbccc, etc., where there are equal numbers of a's, b's and c's. This example is taken from a lecture.
S -> a B S c S -> a B c B a -> a B B c -> b c B b -> b b
The derivations of the first three items in this infinite sequence are:
- S -> aBc -> abc
- S -> aBSc -> aBaBcc -> aaB Bcc -> aaBbcc -> aabbcc
- S -> aBSc -> aBaBScc -> aBaBaBccc -> aaBBaBccc -> aaBaBBccc -> aaaBBBccc -> aaaBBbccc -> aaaBbbccc -> aaabbbccc
Beyond Level 0
These lecture notes  show that languages exist which can not be described by any set of rules. As is common with logical proofs of this nature, the proof doesn't actually produce or demonstrate such a language.
- Chomsky, Noam (1956). "Three models for the description of language". IRE Transactions on Information Theory 2: 113–124. http://www.chomsky.info/articles/195609--.pdf. Retrieved 2012-04-08.
- Crystal, David (1987). The Cambridge Encyclopedia of Language. Cambridge University Press, page 97. ISBN 0-521-42443-7.
- Lecture notes: Formal Languages. Retrieved on 2012-04-07.
- Chomsky, Noam (1957). Syntactic Structures. The Hague/Paris: Mouton, page 15. ISBN 3-11-017279-8.
- Jeff Gray. Collection of Ambiguous or Inconsistent/Incomplete Statements. Retrieved on 2012-04-10.
- The Chomsky Hierarchy. Retrieved on 2012-04-06.
- Lecture notes: Languages & Grammars. Retrieved on 2012-04-07.
- Revised Report on the Algorithmic Language Algol 60. Retrieved on 2012-04-22.
- Taylor. Context-sensitive Examples. Retrieved on 2012-04-26.
- Grammars, Recursively Enumerable Languages, and Turing Machines. Retrieved on 2012-05-10.
- Recursive and Recursively Enumerable Languages. Retrieved on 2012-05-05.