Chomsky–Schützenberger enumeration theorem

Chomsky–Schützenberger enumeration theorem In formal language theory, the Chomsky–Schützenberger enumeration theorem is a theorem derived by Noam Chomsky and Marcel-Paul Schützenberger about the number of words of a given length generated by an unambiguous context-free grammar. The theorem provides an unexpected link between the theory of formal languages and abstract algebra.
Contents 1 Statement 2 Usage 2.1 Asymptotic estimates 2.2 Inherent ambiguity 3 References Statement In order to state the theorem, a few notions from algebra and formal language theory are needed.
Let {displaystyle mathbb {N} } denote the set of nonnegative integers. A power series over {displaystyle mathbb {N} } is an infinite series of the form {displaystyle f=f(x)=sum _{k=0}^{infty }a_{k}x^{k}=a_{0}+a_{1}x^{1}+a_{2}x^{2}+a_{3}x^{3}+cdots } with coefficients {displaystyle a_{k}} in {displaystyle mathbb {N} } . The multiplication of two formal power series {displaystyle f} and {displaystyle g} is defined in the expected way as the convolution of the sequences {displaystyle a_{n}} and {displaystyle b_{n}} : {displaystyle f(x)cdot g(x)=sum _{k=0}^{infty }left(sum _{i=0}^{k}a_{i}b_{k-i}right)x^{k}.} In particular, we write {displaystyle f^{2}=f(x)cdot f(x)} , {displaystyle f^{3}=f(x)cdot f(x)cdot f(x)} , and so on. In analogy to algebraic numbers, a power series {displaystyle f(x)} is called algebraic over {displaystyle mathbb {Q} (x)} , if there exists a finite set of polynomials {displaystyle p_{0}(x),p_{1}(x),p_{2}(x),ldots ,p_{n}(x)} each with rational coefficients such that {displaystyle p_{0}(x)+p_{1}(x)cdot f+p_{2}(x)cdot f^{2}+cdots +p_{n}(x)cdot f^{n}=0.} A context-free grammar is said to be unambiguous if every string generated by the grammar admits a unique parse tree or, equivalently, only one leftmost derivation. Having established the necessary notions, the theorem is stated as follows.
Chomsky–Schützenberger theorem. If {displaystyle L} is a context-free language admitting an unambiguous context-free grammar, and {displaystyle a_{k}:=|L cap Sigma ^{k}|} is the number of words of length {displaystyle k} in {displaystyle L} , then {displaystyle G(x)=sum _{k=0}^{infty }a_{k}x^{k}} is a power series over {displaystyle mathbb {N} } that is algebraic over {displaystyle mathbb {Q} (x)} .
Proofs of this theorem are given by Kuich & Salomaa (1985), and by Panholzer (2005).
Usage Asymptotic estimates The theorem can be used in analytic combinatorics to estimate the number of words of length n generated by a given unambiguous context-free grammar, as n grows large. The following example is given by Gruber, Lee & Shallit (2012): the unambiguous context-free grammar G over the alphabet {0,1} has start symbol S and the following rules S → M | U M → 0M1M | ε U → 0S | 0M1U.
To obtain an algebraic representation of the power series {displaystyle G(x)} associated with a given context-free grammar G, one transforms the grammar into a system of equations. This is achieved by replacing each occurrence of a terminal symbol by x, each occurrence of ε by the integer '1', each occurrence of '→' by '=', and each occurrence of '|' by '+', respectively. The operation of concatenation at the right-hand-side of each rule corresponds to the multiplication operation in the equations thus obtained. This yields the following system of equations: S = M + U M = M²x² + 1 U = Sx + MUx² In this system of equations, S, M, and U are functions of x, so one could also write {displaystyle S(x)} , {displaystyle M(x)} , and {displaystyle U(x)} . The equation system can be resolved after S, resulting in a single algebraic equation: {displaystyle x(2x-1)S^{2}+(2x-1)S+1=0} .
This quadratic equation has two solutions for S, one of which is the algebraic power series {displaystyle G(x)} . By applying methods from complex analysis to this equation, the number {displaystyle a_{n}} of words of length n generated by G can be estimated, as n grows large. In this case, one obtains {displaystyle a_{n}in O(2+epsilon )^{n}} but {displaystyle a_{n}notin O(2-epsilon )^{n}} for each {displaystyle epsilon >0} . See (Gruber, Lee & Shallit 2012) for a detailed exposition.
Inherent ambiguity In classical formal language theory, the theorem can be used to prove that certain context-free languages are inherently ambiguous. For example, the Goldstine language {displaystyle L_{G}} over the alphabet {displaystyle {a,b}} consists of the words {displaystyle a^{n_{1}}ba^{n_{2}}bcdots a^{n_{p}}b} with {displaystyle pgeq 1} , {displaystyle n_{i}>0} for {displaystyle iin {1,2,ldots ,p}} , and {displaystyle n_{j}neq j} for some {displaystyle jin {1,2,ldots ,p}} .
It is comparably easy to show that the language {displaystyle L_{G}} is context-free (Berstel & Boasson 1990). The harder part is to show that there does not exist an unambiguous grammar that generates {displaystyle L_{G}} . This can be proved as follows: If {displaystyle g_{k}} denotes the number of words of length {displaystyle k} in {displaystyle L_{G}} , then for the associated power series holds {displaystyle G(x)=sum _{k=0}^{infty }g_{k}x^{k}={frac {1-x}{1-2x}}-{frac {1}{x}}sum _{kgeq 1}x^{k(k+1)/2-1}} . Using methods from complex analysis, one can prove that this function is not algebraic over {displaystyle mathbb {Q} (x)} . By the Chomsky-Schützenberger theorem, one can conclude that {displaystyle L_{G}} does not admit an unambiguous context-free grammar. See (Berstel & Boasson 1990) for detailed account.
References Berstel, Jean; Boasson, Luc (1990). "Context-free languages" (PDF). In van Leeuwen, Jan (ed.). Handbook of Theoretical Computer Science, Volume B: Formal Models and Semantics. Elsevier and MIT press. pp. 59–102. ISBN 0-444-88074-7. Chomsky, Noam; Schützenberger, Marcel-Paul (1963). "The Algebraic Theory of Context-Free Languages" (PDF). In P. Braffort and D. Hirschberg, eds., Computer Programming and Formal Systems (pp. 118–161). Amsterdam: North-Holland. Flajolet, Philippe; Sedgewick, Robert (2009). Analytic Combinatorics. Cambridge: Cambridge University Press. ISBN 978-0-521-89806-5. Gruber, Hermann; Lee, Jonathan; Shallit, Jeffrey (2012). "Enumerating regular expressions and their languages". arXiv:1204.4982 [cs.FL]. Kuich, Werner; Salomaa, Arto (1985). Semirings, Automata, Languages. Berlin: Springer-Verlag. ISBN 978-3-642-69961-0. Panholzer, Alois (2005). "Gröbner Bases and the Defining Polynomial of a Context-free Grammar Generating Function". Journal of Automata, Languages and Combinatorics. 10: 79–97. hide vte Noam Chomsky BibliographyChomsky hierarchy"Colorless green ideas sleep furiously"Honorary degreesPolitical positions Select bibliography Linguistics Syntactic Structures (1957)Current Issues in Linguistic Theory (1964)Aspects of the Theory of Syntax (1965)Cartesian Linguistics (1966)Language and Mind (1966)The Sound Pattern of English (1968)"Remarks on Nominalization" (1970)"Conditions on Transformations" (1973)Reflections on Language (1975)The Logical Structure of Linguistic Theory (1975)Lectures on Government and Binding (1981)Knowledge of Language (1986)The Minimalist Program (1995)New Horizons in the Study of Language and Mind (2000) Politics "The Responsibility of Intellectuals" (1967)American Power and the New Mandarins (1969)Counter-Revolutionary Violence (1973), with Edward S. HermanThe Fateful Triangle (1983)Manufacturing Consent (1988), with Edward S. HermanNecessary Illusions (1989)Deterring Democracy (1991)Letters from Lexington (1993)The Prosperous Few and the Restless Many (1993)World Orders Old and New (1994)Objectivity and Liberal Scholarship (1997)Hegemony or Survival (2003)Failed States (2006)Requiem for the American Dream (2017) Collections Class Warfare (1996)Middle East Illusions (2003)Imperial Ambitions (2005)Interventions (2007)Gaza in Crisis (2010)9-11: Was There An Alternative? (2011)Making the Future (2012)Occupy (2012)On Palestine (2015), with Ilan Pappé Filmography Manufacturing Consent: Noam Chomsky and the Media (1992)Last Party 2000 (2001)Power and Terror: Noam Chomsky in Our Times (2002)Distorted Morality – America's War on Terror? (2003)Noam Chomsky: Rebel Without a Pause (2003) (TV)Peace, Propaganda & the Promised Land (2004)Is the Man Who Is Tall Happy? (2013) Family William Chomsky (father)Carol Chomsky (wife)Aviva Chomsky (daughter) Categories: Noam ChomskyFormal languagesTheorems in discrete mathematics
Si quieres conocer otros artículos parecidos a Chomsky–Schützenberger enumeration theorem puedes visitar la categoría Formal languages.
Deja una respuesta