Zhao, Zijin - Classification in the Presence of Heavy Label Noise: A Markov Chain Sampling Framework...

View the thesis

This thesis has been submitted to the Library for purposes of graduation, but needs to be audited for technical details related to publication in order to be approved for inclusion in the Library collection.
Term: 
Summer 2017
Degree: 
M.Sc.
Degree type: 
Thesis
Department: 
School of Computing Science
Faculty: 
Applied Sciences
Senior supervisor: 
Jian Pei
Thesis title: 
Classification in the Presence of Heavy Label Noise: A Markov Chain Sampling Framework
Given Names: 
Zijin
Surname: 
Zhao
Abstract: 
Heavy label noise is often present in many practical scenarios where observed labels of instances are corrupted. Classification with heavy label noise has great significance and attracts a lot of attention, since label noise may lead to many potential negative consequences. Many state-of-the-art approaches assume that label noise is class-dependent, and thus cannot be generalized to situations without this assumption. In this thesis, we propose a Markov chain sampling framework, MCS, to conquer the limitations of the existing methods in the binary classification problem. The main idea is to utilize the predictions of a sequence of classifiers in an ensemble way to detect mislabeled instances, the sequence of classifiers is trained on different subsets of the training data by sampling the states of a carefully designed Markov chain with random walk. Our proposed MCS framework is general and can entertain a wide spectrum of classification algorithms. We theoretically prove the correctness and effectiveness of the MCS framework. We further present experimental results showing the effectiveness and efficiency of the proposed framework and derivative algorithms.
Keywords: 
classification; label noise; Markov chain; sampling
Total pages: 
43