Читать книгу A Companion to Chomsky - Группа авторов - Страница 58
5.5 Beyond Context‐Free Grammars
ОглавлениеFrom the very outset there were doubts about about whether CFGs could form the basis of a theory of natural language syntax. Chomsky (1956, Section 4) argued that even if the generative capacity of CFGs (unlike FSGs) turned out to be sufficient for English (a question he left open), the resulting grammars would be unreasonably complex.16 Relatedly, one motivation for considering Type 1 rules in the first place was the recognition that, in practice, linguists found uses for contextual restrictions on rewrite rules, for example to state selectional restrictions (Chomsky, 1959, p. 148; Chomsky and Miller, 1963, p. 295; Chomsky, 1963, p. 363).
Furthermore, it has more recently been discovered that CFGs might be insufficient, even on the straightforward basis of generative capacity, to describe some natural languages. The best‐known case is a construction in Swiss‐German (Shieber, 1985) that exhibits crossing dependencies of the sort exhibited by in (6); these contrast with the nested dependencies exhibited by , which are neatly handled by CFGs. See e.g. Pullum (1986), Partee et al. (1990, pp. 503–505), Frank (2004), Kallmeyer (2010, pp. 17–20) and Jäger and Rogers (2012) for useful discussion; ideas closely related to the crucial point about Swiss‐German can be traced back to Huybregts (1976, 1984) and Bresnan et al. (1982).
But despite these reasons for looking beyond CFGs, Type 1 or context‐sensitive grammars (CSGs) have not proven to be a particularly useful tool for linguistics; they have turned out to be “too close” to unrestricted rewriting grammars. While CSGs can generate the crossing‐dependency patterns of and Swiss‐German, their generative capacity extends far beyond this. For example, there is even a CSG that generates .17 The sense that this stringset seems not at all “language‐like” plausibly stems from the property of CSGs that caused Chomsky the most concern initially: contextually restricted rewrites produced structural descriptions that could not be interpreted along the lines of immediate constituent analysis. Immediately after showing that CSGs could generate stringsets that no CFG could generate, Chomsky (1959, p. 148) surmised that “the extra power of grammars that do not meet Restriction 2 appears …to be a defect of such grammars, with regard to the intended interpretation.” The underlying issue here is the absence of any meaningful kind of intersubstitutability at the core of CSGs: what distinguishes a Type 1 grammar from FSGs and CFGs is exactly the fact that the substrings derivable from a symbol in the context might not be derivable from in another context.
Chomsky's discussion of the undesirable properties of CSGs focuses on their ability to, in effect, reorder constituents. For example, a permuting rule “CD DC,” which does not itself satisfy Restriction 1 (recall that the Type 0 grammar in Figure 5.1 contains rules like this), can be mimicked by a sequence of Type 1 rules “CD XD XC DC” (Chomsky, 1959, p. 148; Chomsky, 1963, p. 365). Chomsky considers using this kind of reordering to derive a question form such as will John come in a way that relates it to its corresponding declarative John will come. The CSG in (8) shows how this would work. The first group of rules shown in (8) generates the declaratives John will come and John comes as shown in (9); these are all context‐free rules, and notice that they correctly capture the intersubstitutability of will come with comes, via the nonterminal Pred. The second group of rules in (8) serves to turn “NP Aux” into “Aux NP”; in particular, the derivation in (10) uses them to derive “Aux NP come,” and then eventually will John come, from the (canonically ordered, intuitively) “NP Aux come.”
1 (8)S NP PredNP JohnPred Aux VPred comesAux willV comeNP Aux X AuxX Aux X NPX NP Aux NP
1 (9)
2 (10)
The fact that each step of the derivation in (10) rewrites only a single nonterminal symbol ensures that we can construct a tree structure that indicates which parts of the eventual string were derived from which nonterminal symbols.18 (This would not be possible for a derivation that implemented the reordering directly with the rule “NP Aux Aux NP”; recall Figure 5.1.) But the resulting tree structure says “that will in this sentence is a noun phrase …and that John is a modal auxiliary, contrary to our intention” (Chomsky, 1963, p. 365). This result is undesirable because we do not want will to be in general intersubstitutable with John, or the other strings that we would expect to be derivable from the nonterminal symbol NP if this tiny grammar were expanded. So the labeled constituency relationships that can be read off the trees associated with Type 1 derivations are not interpretable as statements about intersubstitutability, as they are in more restricted grammars. In other words, Restriction 1's requirement that each derivational step rewrites only a single nonterminal symbol turned out to be insufficient to capture the important linguistic intuitions regarding categorization and intersubstitutability that underlie immediate constituent analysis.
In the light of more recent developments, the difficulties raised by the issue of reordering can be seen as stemming from the tight connection between intersubstitutability (in the sense that can be captured in rewriting systems of the sort Chomsky was exploring) and linear contiguity. Only linearly contiguous strings of symbols have the chance to be placed in an equivalence class. While familiar, there is nothing necessary about this connection: a sub‐part of a string might belong to a class of intersubstitutable subexpressions without being contiguous. In this case, the relevant sub‐parts will not themselves be strings, but will be tuples of strings. To illustrate, it suffices to consider tuples of size two, i.e. pairs of strings that are co‐dependent, and together constitute an expression belonging to a meaningful grammatical category, but need not be pronounced together. For example:
1 (11)The pair (will, come) and the pair (must, leave) are intersubstitutable, in the sense that we can replace the former with the latter in will the students come to produce must the students leave. (As well as in John will come to produce John must leave.)The pair (John, to be tall) and the pair (the girl, to win) are intersubstitutable, in the sense that we can replace the former with the latter in John is likely to be tall to produce the girl is likely to win.19The pair (buy, which book) and the pair (eat, what) are intersubstitutable, in the sense that we can replace the former with the latter in which book did you buy yesterday to produce what did you eat yesterday.
Chomsky's chosen approach to these phenomena, the notion of a grammatical transformation – extensively elaborated elsewhere (e.g., Chomsky, 1957, 1965) but formally somewhat removed from the work described here – was one way to resolve the tension created by the conflation of intersubstitutability and contiguity.20 In the transformational approach, the patterns described in (11) are handled by first deriving a structure in which the co‐dependent elements (e.g., will and come, or John and to be tall) are linearly contiguous, in a base component which functions essentially like a CFG and therefore ties contiguity and intersubstitutability together. This correctly prevents generating an expression that contains will without an accompanying verb like come (see (9)), or contains which book without a verb to select it, or contains the predicate be tall without a subject – but at the cost of grouping these co‐dependent elements together in ways that do not align with their relative linear positions. The transformational component resolves the tension created by tying co‐occurrence to contiguity, transforming a structure that has such co‐dependent elements adjacent into one where they are separated.
But another logically possibility, when we are confronted with the patterns in (11), is to simply break the link between co‐dependence and linear contiguity right from the beginning. Multiple context‐free grammars (MCFGs) (Seki et al., 1991) provide a canonical instantiation of this option; see, e.g., Kallmeyer (2010, ch. 6) and Clark (2014) for overviews. Derivations in these grammars are most naturally understood in terms of a “bottom‐up” composition process, unlike the “top‐down” rewriting grammars that serve as the framework for the Chomsky hierarchy. MCFGs have proven to be a useful reference point for understanding and comparing various mildly context‐sensitive grammar formalisms (Joshi, 1985; Joshi et al., 1990), which sit between CFGs and CSGs on the scale of generative capacity, including formalisms expressed in terms of transformation‐like tree‐manipulating operations, such as Minimalist Grammars (Stabler, 1997, 2011) and Tree‐Adjoining Grammars (Joshi et al., 1975; Abeillé and Rambow, 2000; Frank, 2002).