Vacariu, Andrei Vlad - A High-Throughput Dependency Parser...

View the thesis

This thesis has been submitted to the Library for purposes of graduation, but needs to be audited for technical details related to publication in order to be approved for inclusion in the Library collection.
Term: 
Fall 2017
Degree: 
M.Sc.
Degree type: 
Thesis
Department: 
School of Computing Science
Faculty: 
Applied Sciences
Senior supervisor: 
Anoop Sarkar
Thesis title: 
A High-Throughput Dependency Parser
Given Names: 
Andrei Vlad
Surname: 
Vacariu
Abstract: 
Dependency parsing is an important task in NLP, and it is used in many downstream tasks for analyzing the semantic structure of sentences. Analyzing very large corpora in a reasonable amount of time, however, requires a fast parser. In this thesis we develop a transition-based dependency parser with a neural-network decision function which outperforms spaCy, Stanford CoreNLP, and MALTParser in terms of speed while having a comparable, and in some cases better, accuracy. We also develop several variations of our model to investigate the trade-off between accuracy and speed. This leads to a model with a greatly reduced feature set which is much faster but less accurate, as well as a more complex model involving a BiLSTM simultaneously trained to produce POS tags which is more accurate, but much slower. We compare the accuracy and speed of our different parser models against the three mentioned parsers on the Penn Treebank, Universal Dependencies English, and Ontonotes datasets using two different dependency tree representations to show how our parser competes on data from very different domains. Our experimental results reveal that our main model is much faster than the 3 external parsers while also being more accurate in some cases; our reduced feature set model is significantly faster while remaining competitive in terms of accuracy; and our BiLSTM-using model is somewhat faster than CoreNLP and is significantly more accurate.
Keywords: 
natural language processing; dependency parsing; transition parsing system; neural network
Total pages: 
69