Changes

Context-free grammar

13,300 bytes added, 22:05, May 15, 2012
provide formal definition and example
A :'''ContextAn equivalent grammar [[Backus-Free''' Grammar Naur Form]] is a tuple (T, V, S, P);T is a set often used for describing the syntax/grammar of atomic symbols.V is a set of variables.S is a special start symbol.P is a set of productions, each production describes how to produce a variable using a sequence of tokens and variablesprogramming languages.''
For example, A '''context-free grammar''' is a Context-Free Grammar formal grammatical system for the real numbers would describing a particular class of language. The basic idea of such a grammar is to provide a set of rules which can be::T = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, used to generate all possible grammatical 'sentences'; these rules must also avoid generating any ungrammatical sentences.}:V = {realA context-numberfree grammar is not powerful enough to describe a natural language such as [[English]], part, digit}:S = real-number:P = { digit → 0::digit → 1::digit → 2::digit → 3::digit → 4::digit → 5::digit → 6::digit → 7::digit → 8::digit → 9::part → digit::part → digit part::real-number → part but it is powerful enough to describe nearly all [[programming language]]s. part:}
A context-free grammar is the level 2 grammar in the [[Chomsky hierarchy]] and was first described in 1956 by the linguist [[Noam Chomsky|Chomsky]].
<ref>{{cite journal
| title = Three models for the description of language
| year = 1956
| last = Chomsky
| first = Noam
| journal = IRE Transactions on Information Theory
| volume = 2
| pages = 113–124
| url = http://www.chomsky.info/articles/195609--.pdf
| accessdate = 2012-04-08
}}</ref>
 
==Example of a very simple context-free grammar==
This very simple example can handle only a tiny subset of English and should not be taken seriously as a guide to English grammar or vocabulary. It is based on an example by David Crystal
<ref>{{cite book
| last = Crystal
| first = David
| title = The Cambridge Encyclopedia of Language
| publisher = Cambridge University Press
| year = 1987
| isbn = 0-521-42443-7
| pages = page 97
}}</ref> and is a slight extension of an example on slide 9 of this lecture.
<ref>{{cite web
| url = http://www.cs.princeton.edu/~rywang/99f126/slides/16th.pdf
| title = Lecture notes: Formal Languages
| accessdate = 2012-04-07}}
</ref>
 
''Sentence'' -> ''NounPhrase'' ''VerbPhrase'' # A ''Sentence'' consists of a ''NounPhrase'' followed by a ''VerbPhrase''
''NounPhrase'' -> ''Determiner'' ''Noun'' # A ''NounPhrase'' consists of a ''Determiner'' followed by a ''Noun''
''VerbPhrase'' -> ''Verb'' ''NounPhrase'' # A ''VerbPhrase'' consists of a ''Verb'' followed by a ''NounPhrase''
''Determiner'' -> '''a''', '''the''' # A ''Determiner'' is one of: '''a''' or '''the'''
''Noun'' -> '''boy''', '''girl''', '''dog''' # A ''Noun'' is one of: '''boy''' or '''girl''' or '''dog'''
''Verb'' -> '''chased''', '''heard''', '''saw''' # A ''Verb'' is one of: '''chased''' or '''heard''' or '''saw'''
 
Notation used above:
:The symbol "->" means "can be replaced by".
:When there are several items on the right hand side separated by commas, just one of these items can be selected.
:The comments preceded by "#" are not part of the generative grammar, but hopefully will help folks understand the example a little better.
 
By making appropriate substitutions we can generate such sentences as:
:The dog chased a girl.
:The girl heard the dog.
:The boy saw a girl.
 
The stages in the generation of these examples are as follows:
:''Sentence''
:-> ''NounPhrase'' ''VerbPhrase''
:-> ''Determiner'' ''Noun'' ''Verb'' ''NounPhrase''
:-> ''Determiner'' ''Noun'' ''Verb'' ''Determiner'' ''Noun''
 
By appropriate selection of words to substitute for ''Determiner'', ''Noun'', and ''Verb'' we can produce the 3 example sentences above.
 
The simple grammar defines a few words which can be substituted for ''Determiner'', ''Noun'', and ''Verb''. There are two possibilities for ''Determiner'', three for ''Noun'', and three for ''Verb'', with both ''Determiner'' and ''Noun'' appearing twice in the expansion. Hence the total number of grammatical sentences which can be generated by this simple grammar is 108 (= 2 * 3 * 3 * 2 * 3). In this particular case the possible words have been carefully chosen so that all of these 108 sentences also actually mean something.
 
Note that the use of ''italics'' and '''boldface''' in this example is merely a convenient way of indicating the difference between items which can be expanded further (e.g. concepts such as ''NounPhrase'') and items which can not be expanded further (e.g. words such as '''girl'''). In the linguistic literature, items which can be expanded further are referred to as 'non-terminal' items, while items which can't be expanded further are known as 'terminal' items.
 
The 'comma' notation for alternatives is a form of shorthand. In some formulations this would be shown as several separate rules, so that:
''Determiner'' -> '''a''', '''the'''
could be shown as two rules:
''Determiner'' -> '''a'''
''Determiner'' -> '''the'''
 
==Caution==
While a generative grammar may produce sentences which are grammatically correct, there is no certainty that such sentences will actually mean anything. The classic example of such a meaningless sentence is from Chomsky's book
<ref>{{cite book
| last = Chomsky
| first = Noam
| title = Syntactic Structures
| year = 1957
| publisher = Mouton
| location = The Hague/Paris
| pages = page 15
| isbn = 3-11-017279-8
}}</ref> with the grammatically correct sentence:
 
:"Colorless green ideas sleep furiously."
 
:This example illustrates the difference between grammar/syntax and meaning/semantics.
 
Another problem when analysing a sentence is that many English words have multiple meanings, so that there may be more than one way to interpret a sentence. A simple example is the sentence:
:"Time flies."
:Is this an instruction to check how fast flies are flying, or an indication that time appears to be passing quickly?
:A varied and entertaining selection of such ambiguities has been compiled by Jeff Gray.<ref>{{Cite web
| title = Collection of Ambiguous or Inconsistent/Incomplete Statements
| url = http://www.gray-area.org/Research/Ambig/
| author = Jeff Gray
| accessdate = 2012-04-10
}}</ref>
 
==Formal grammar==
'''Note''': this section and subsequent sections are, by necessity, slightly more technical/formal than the earlier parts of this article, though specialist notation and abbreviations have been largely avoided. Make sure that you understand the 'Simple example' above before proceeding further, since references will be made to that example to help explain the technicalities, and variations on that example will be used to illustrate new ideas.
 
A formal generative grammar has four components (as described in slide 19 of this lecture
<ref>{{cite web
| url = http://adammikeal.org/courses/compute/Chomsky%20Presentation.pdf
| title = The Chomsky Hierarchy
| accessdate = 2012-04-06
}}</ref>
):
# a finite number of non-terminal concepts (e.g. ''VerbPhrase'' in the example, or anything else in ''italics''),
# a finite number of terminal items (e.g. '''girl''' in the example, or anything else in '''bold-face''').
# a finite set of rules for transforming non-terminal items (e.g. the example has 6 rules).
# a single starting point (e.g. ''Sentence'' in the example).
 
Notes:
:Terminal items are those which can not be transformed into anything else.
:A list of all the terminal items is effectively the vocabulary from which sentences are built.
:Non-terminal items are those which can be transformed into something else.
:Linguists have developed various classes of grammars by using constraints to govern what sort of rules can be used for any particular grammar. The constraints for context-free grammars are detailed in a later section.
 
===Changing the rules===
An relatively small change to even a single rule can make a large change in the language which can be generated. For example, change the first rule in the simple example to read:
''Sentence'' -> ''NounPhrase'' ''VerbPhrase'', ''NounPhrase'' ''VerbPhrase'' '''and''' ''NounPhrase'' ''VerbPhrase''
This means that a ''Sentence'' is either a ''NounPhrase'' followed by a ''VerbPhrase'' or a ''NounPhrase'' followed by a ''VerbPhrase'' followed by '''and''' followed by a ''NounPhrase'' followed by a ''VerbPhrase''. In addition to the shorter sentences already discussed, this could also generate a sentence such as: "A dog saw a girl and the dog chased the girl". The total number of grammatically-correct sentences which can be generated by this modified grammar is now 11,772 (108 + 108*108), though some of these may not be very meaingful.
 
===Towards infinity===
A significantly different rule change can have an astounding effect. Consider changing the first rule to read:
''Sentence'' -> ''NounPhrase'' ''VerbPhrase'', ''NounPhrase'' ''VerbPhrase'' '''and''' ''Sentence''
This can generate a potentially infinite number of sentences (any number of these five-word sentences linked by '''and'''). This is an example of what is known as a [[Recursion|recursive]] definition, whereby a ''Sentence'' is partially defined in terms of itself. In such a recursive rule there must be at least one transformation option which does not involve ''Sentence'', otherwise the transformation could never stop.
 
The above change is known as right-recursive; a left-recursive formulation is also possible, and can in this example generate exactly the same sentences. As shown below under 'Context-free grammar', there are situations where the choice of left- or right-recursive formulations can change the meaning.
''Sentence'' -> ''NounPhrase'' ''VerbPhrase'', ''Sentence'' '''and''' ''NounPhrase'' ''VerbPhrase''
 
The distinction between left- and right-recursive grammars becomes important when trying to parse/analyse a sentence; some types of parser work best with left-recursive grammars, while other types work best with right-recursive grammars. This distinction is particularly relevant in connection with [[compiler|compiling or translating]] a program written in some high-level [[programming language]] so that it can be used on a [[computer]].
 
The use of left-recursive or right-recursive rules may also affect the meaning, as shown in the 'arithmetic expression' example below.
 
===Constraints for a context-free grammar===
The transformation rules for context-free grammars are fairly simple:
*Only a single non-terminal item can appear on the left-hand side of a rule.
*The right-hand side of any rule must be one of:
**[nothing].
**any combination of terminal items and non-terminal items.
 
In the linguistic literature, [nothing] is often represented by the symbol '''ε'''.
 
==Examples==
===Arithmetic expressions===
The following example shows how context-free grammars can cope with arithmetic expressions involving possibly nested brackets. Also an expression such as "2 + 3 * 4" could be interpreted as either "(2 + 3) * 4" = 20 or as "2 + (3 * 4)" = 14. The latter interpretation is that which is normally used - multiplication "*" is said to have a higher priority than addition "+" - so the grammar must incorporate this priority concept. Furthermore, consecutive operators of the same priority are normally applied from left to right, so that "8 - 2 + 3" is interpreted as "(8 - 2) + 3" = 9, rather than as "8 - (2 + 3)" = 3. This ordering must also be built in to the grammar.
 
To avoid distracting side-issues, this example only deals with numbers which consist solely of a single digit. The example is a simplified version of the definition of an arithmetic expression given in the Revised Algol 60 Report (Section 3.3.1).
<ref>{{cite web
| url = http://www.masswerk.at/algol60/report.htm
| title = Revised Report on the Algorithmic Language Algol 60
| accessdate = 2012-04-22
}}</ref>
 
''Expression'' -> ''Term'' # an ''Expression'' is either a ''Term''
''Expression'' -> ''Expression'' ''AddSubOp'' ''Term'' # or an ''Expression'' followed by '''+''' or '''-''' followed by a ''Term''
#
''Term'' -> ''Primary'' # a ''Term'' is either a ''Primary''
''Term'' -> ''Term'' ''MulDivOp'' ''Primary'' # or a ''Term'' followed by '''*''' or '''/''' followed by a ''Primary''
#
''Primary'' -> ''Number'' # a ''Primary'' is either a ''Number''
''Primary'' -> '''(''' ''Expression'' ''')''' # or an ''Expression'' enclosed in brackets
#
''MulDivOp'' -> '''*'''. '''/''' # multiplication and division have the same priority
''AddSubOp'' -> '''+'''. '''-''' # addition and subtraction have the same priority
''Number'' -> '''0''', '''1''', ... '''9''' # this simple-minded definition of a ''Number'' is enough for this example
 
Notes:
* Both ''Expression'' and ''Term'' are each defined via a left-recursive rule; this ensures that operators of the same priority are applied from left to right.
* The distinction between ''Expression'' and ''Term'' ensures that multiplication/division are applied before addition/subtraction.
* ''Expression'' is indirectly recursive via ''Term'' and ''Primary'' to allow bracketed sub-expressions to any depth.
 
Application 1: generate 2 + 3 * 4
 
''Expression'' -> ''Expression'' ''AddSubOp'' ''Term''
-> ''Term'' ''AddSubOp'' ''Term''
-> ''Primary'' ''AddSubOp'' ''Term''
-> '''2''' '''+''' ''Term'' # '''+''' can not be applied until ''Term'' has been evaluated
-> '''2''' '''+''' ''Term'' ''MulDivOp'' ''Primary''
-> '''2''' '''+''' ''Primary'' ''MulDipOp'' ''Primary''
-> '''2''' '''+''' '''3''' '''*''' '''4'''
 
Application 2: generate 2 * 3 + 4
 
''Expression'' -> ''Expression'' ''AddSubOp'' ''Term''
-> ''Term'' ''AddSubOp'' ''Term''
-> ''Term'' ''MulDivOp'' ''Primary'' ''AddSubOp'' ''Term''
-> ''Primary'' ''MulDivOp'' ''Primary'' ''AddSubOp'' ''Term''
-> '''2''' '''*''' '''3''' ''AddSubOp'' ''Term'' # ''Term'' to left of ''AddSubOp'' is evaluated first
-> '''2''' '''*''' '''3''' ''AddSubOp'' ''Primary''
-> '''2''' '''*''' '''3''' '''+''' '''4'''
 
 
==References==
{{reflist}}
Or in '''Backus-Naur notation''':
:<real-numbers> → <part> . <part>
:<part> → <digit> | <digit> <part>
:<digit> → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
[[category:linguistics]]
[[categoryCategory:computer scienceComputer Science]]
SkipCaptcha
265
edits