Born, Logan Orion - Properties of prefix lexicalized synchronous grammars
Term : Summer 2018
Degree : M.Sc.
Degree type : Thesis
Department : School of Computing Science
Faculty : Applied Sciences
Senior supervisor : Anoop Sarkar
Thesis title : Properties of prefix lexicalized synchronous grammars
Given names : Logan Orion
Surname : Born
Abstract : Synchronous grammars, which may be broadly characterized as sets of rules for generating sentence pairs, are used in linguistics and natural language processing (NLP) as a framework for modelling human languages. Prefix lexicalization is a property of some grammars in which each rule contains a terminal symbol (also called a lexical item) at its left edge. This work shows that the class of synchronous context-free grammars (SCFG) is not closed under prefix lexicalization, and presents an algorithm for converting a finitely ambiguous, epsilon-free SCFG to a weakly equivalent prefix lexicalized synchronous tree-adjoining grammar (STAG). This transformation only increases the rank of the grammar by 1; it at most cubes the grammar size, but we provide empirical results showing that the size increase is generally much smaller. We show that this result can also be extended to weighted synchronous grammars without affecting the weights they assign to string pairs. Theoretically, these results serve to generalize extended Greibach normal form from CFGs to SCFGs, and practically they have potential applications in machine translation. They also serve as an avenue for investigating the properties of related grammar formalisms in future work.
Keywords : Formal language theory; Synchronous grammars; Context-free grammars; Tree-adjoining grammars; Weighted grammars; Greibach normal form
Total pages : 66