Bailey, Sarah Reid - Forecasting Batting Averages in MLB...

View the thesis

This project has been submitted to the Library for purposes of graduation, but needs to be audited for technical details related to publication in order to be approved for inclusion in the Library collection.
Term: 
Fall 2017
Degree: 
M.Sc.
Degree type: 
Project
Department: 
Department of Statistics and Actuarial Science
Faculty: 
Science
Senior supervisor: 
Timothy Swartz
Co-supervisor, if any: 
Jason Loeppky
Thesis title: 
Forecasting Batting Averages in MLB
Given Names: 
Sarah Reid
Surname: 
Bailey
Abstract: 
We consider new baseball data from Statcast which includes launch angle, launch velocity, and hit distance for batted balls in Major League Baseball during the 2015, and 2016 seasons. Using logistic regression, we train two models on 2015 data to get the probability that a player will get a hit on each of their 2015 at-bats. For each player we sum these predictions and divide by their total at bats to predict their 2016 batting average. We then use linear regression, which expresses 2016 actual batting averages as a linear combination of 2016 Statcast predictions and 2016 PECOTA predictions. When using this procedure to obtain 2017 predictions, we find that the combined prediction performs better than PECOTA. This information may be used to make better predictions of batting averages for future seasons.
Keywords: 
Batting Average; MLB; Logistic Regression; Big Data; Forecasting
Total pages: 
46