Читать книгу A Companion to Chomsky - Группа авторов - Страница 55
5.2.2 Restrictions on Grammars
ОглавлениеThere are (at least) two reasons why we might question the usefulness of unrestricted rewriting grammars in our theories of natural language.
An obvious one, perhaps, is that since they can carry out any computational procedure at all, adopting this class of grammars would not constitute a meaningful hypothesis about what is a possible human language. This is an objection on the grounds of the generative capacity of the formalism.
The concern that is more prominent in Chomsky's early discussions, however, involves the notion of a structural description. This concern is not about the range of possible grammars made available by the formalism, but about what is said by the fact that a particular grammar generates a particular string. The idea is that if a grammar generates a sentence , then we would like to be able to identify “a structural description of (with respect to the grammar ) giving certain information which will facilitate and serve as the basis for an account of how is used and understood by speakers of the language whose grammar is ; i.e., which will indicate whether is ambiguous, to what other sentences it is structurally similar, etc.” (Chomsky, 1959, p. 138). Chomsky's chosen way to make this notion precise was to import the existing idea of “immediate constituent analysis” (Harris, 1946, 1954; Wells, 1947) and take trees of the now‐familiar sort to be the desired kind of structural descriptions (Chomsky, 1959, p. 141).10 But there is no obvious way to associate a derivation in an unrestricted rewriting grammar with a tree structure that provides an immediate constituent analysis for the generated string. For example, there is no natural way to assign such a tree structure to the string aaaa on the basis of the derivation shown in Figure 5.1. This is roughly because the grammar contains rules that do not simply replace a single symbol with a string (Chomsky and Miller, 1963, p. 294).
Another computational device that had attracted much attention in the 1950s was the finite‐state automaton (FSA) (e.g. Rabin and Scott, 1959). Chomsky (1956) investigated the generative capacity of FSAs and concluded that it was insufficient for natural language, and showed that a certain sort of phrase structure grammar formalizing the notion of immediate constituent analysis could handle at least some cases that FSAs could not.11
The classes of rewriting grammars investigated in Chomsky 1959 are therefore motivated by the interest in “devices with more generative capacity than finite automata but that are more structured (and, presumably, have less generative capacity) than arbitrary Turing machines” (Chomsky, 1963, p. 360), the idea being that “the intermediate systems are those that assign a phrase structure description to the resulting sentence” (Chomsky, 1959, p. 139). A sequence of three increasingly strict restrictions on rewriting grammars (Chomsky, 1959, p. 142) produces a hierarchy of classes of grammars, the broadest of which corresponds to Turing machines and the narrowest of which corresponds to finite‐state automata.
These three restrictions are shown in Table 5.1. In this table, stands for any nonterminal symbol, and and stand for any (possibly empty) strings of terminal and nonterminal symbols. Notice that each of these restrictions requires that a rule replaces a single symbol with some non‐empty string, in order to allow the possibility of constructing a tree expressing immediate‐constituent structure from any derivation.12
Table 5.1 Three increasing restrictions on rewriting grammars
Restriction | Required form of rules | ||
---|---|---|---|
Restriction 1, “context‐sensitive” | where | is a non‐empty string | |
Restriction 2, “context‐free” | where | is a non‐empty string | |
Restriction 3, “finite‐state” | or | where | is a terminal symbol |
and | is a nonterminal symbol |
To be precise, these are restrictions on the form that individual rules can take, and a grammar is of a certain type if and only if all of its rules satisfy the corresponding restriction. But notice that rules that satisfy Restriction 2 necessarily also satisfy Restriction 1 (by choosing and to be the empty string), and similarly rules that satisfy Restriction 3 also satisfy both of the others. So these restrictions on the form of rules give us four classes of grammars:
1 The Type 0 grammars are simply all unrestricted rewriting grammars.The Type 1 grammars are those satisfying Restriction 1.The Type 2 grammars are those satisfying Restriction 2 (and therefore also Restriction 1).The Type 3 grammars are those satisfying Restriction 3 (and therefore also Restrictions 1 and 2).
The grammar in Figure 5.1 is a Type 0 grammar and does not belong to any of the more restricted classes, since rules such as “a B B a” do not satisfy Restriction 1. It turns out, however, that there is a Type 1 grammar that generates this same powers‐of‐two stringset.13 This raises the question of how these classes of grammars relate to generability of stringsets: are there stringsets that can be generated by a Type 0 grammar but not by any Type 1 grammar? Or is the extra rule‐writing freedom that comes with Type 0 grammars actually inconsequential from the perspective of generable stringsets?
It turns out that all four classes of grammars are distinct in their generative capacity: at each boundary, there are stringsets that can be generated by a Type grammar that cannot be generated by any Type grammar. So there are, for example, Type 0 grammars that generate stringsets that no Type 1 grammar can generate – even if the Type 0 grammar in Figure 5.1 is not one of them.