Advanced

A Study of Gradient-Based Algorithms

Hallén, Rasmus (2017) MASK01 20171
Mathematical Statistics
Abstract
Gradient-based algorithms are popular when solving unconstrained optimization problems. By exploiting knowledge of the gradient of the objective function to optimize, each iteration of a gradient-based algorithm aims at approaching the minimizer of said function.
In the age of web-scale prediction problems, many venerable algorithms may encounter difficulties. However, as the data sets grow larger, so does our capability to interpret them. Thanks to developments in the computer-science subfield called Machine Learning, it is possible to produce self-adapting optimizers able to deal with different challenging scenarios, without requiring the user to rewrite algorithms specialized for the experiment at hand.
This thesis shall compare the... (More)
Gradient-based algorithms are popular when solving unconstrained optimization problems. By exploiting knowledge of the gradient of the objective function to optimize, each iteration of a gradient-based algorithm aims at approaching the minimizer of said function.
In the age of web-scale prediction problems, many venerable algorithms may encounter difficulties. However, as the data sets grow larger, so does our capability to interpret them. Thanks to developments in the computer-science subfield called Machine Learning, it is possible to produce self-adapting optimizers able to deal with different challenging scenarios, without requiring the user to rewrite algorithms specialized for the experiment at hand.
This thesis shall compare the performance of two different gradient-based algorithms; Gradient Descent (GD) and Stochastic Gradient Descent (SGD). Firstly, a synthetic linear regression model is considered. Performance of the algorithms are evaluated by comparison with the maximum likelihood estimator (MLE). Thereafter, a convergence study is presented. Lastly, SGD is evaluated over increasingly large, random subsets of the data. Such a subset is known as a mini-batch.
When transitioning to multinomial logistic regression setting, a model to recognize handwritten digits is constructed. The data is provided by Mixed National Institute of Standards and Technology (MNIST) database. Each digit is represented by a vector with pixel weights as entires. The algorithms are assessed by how well they estimate a model to predict which vector belongs to which number.
In the case of linear regression, it is found that SGD outperform GD as the data sets grow larger. Furthermore, the precision of SGD does not necessarily improve when increasing the mini-batch size. For multinomial logistic regression, GD produces the most accurate model with a 73.39% success rate. However, the training of the model is time consuming and by partitioning the data into mini-batches, SGD achieves 70.49% success rate in a more resonable timeframe. (Less)
Please use this url to cite or link to this publication:
author
Hallén, Rasmus
supervisor
organization
course
MASK01 20171
year
type
M2 - Bachelor Degree
subject
language
English
id
8904399
date added to LUP
2017-03-09 14:53:59
date last changed
2017-03-09 14:53:59
@misc{8904399,
  abstract     = {Gradient-based algorithms are popular when solving unconstrained optimization problems. By exploiting knowledge of the gradient of the objective function to optimize, each iteration of a gradient-based algorithm aims at approaching the minimizer of said function.
In the age of web-scale prediction problems, many venerable algorithms may encounter difficulties. However, as the data sets grow larger, so does our capability to interpret them. Thanks to developments in the computer-science subfield called Machine Learning, it is possible to produce self-adapting optimizers able to deal with different challenging scenarios, without requiring the user to rewrite algorithms specialized for the experiment at hand.
This thesis shall compare the performance of two different gradient-based algorithms; Gradient Descent (GD) and Stochastic Gradient Descent (SGD). Firstly, a synthetic linear regression model is considered. Performance of the algorithms are evaluated by comparison with the maximum likelihood estimator (MLE). Thereafter, a convergence study is presented. Lastly, SGD is evaluated over increasingly large, random subsets of the data. Such a subset is known as a mini-batch.
When transitioning to multinomial logistic regression setting, a model to recognize handwritten digits is constructed. The data is provided by Mixed National Institute of Standards and Technology (MNIST) database. Each digit is represented by a vector with pixel weights as entires. The algorithms are assessed by how well they estimate a model to predict which vector belongs to which number.
In the case of linear regression, it is found that SGD outperform GD as the data sets grow larger. Furthermore, the precision of SGD does not necessarily improve when increasing the mini-batch size. For multinomial logistic regression, GD produces the most accurate model with a 73.39% success rate. However, the training of the model is time consuming and by partitioning the data into mini-batches, SGD achieves 70.49% success rate in a more resonable timeframe.},
  author       = {Hallén, Rasmus},
  language     = {eng},
  note         = {Student Paper},
  title        = {A Study of Gradient-Based Algorithms},
  year         = {2017},
}