A Study of Gradient-Based Algorithms

Hallén, Rasmus

A Study of Gradient-Based Algorithms

Mark

Hallén, Rasmus (2017) In Bachelor's Theses in Mathematical Sciences MASK01 20171
Mathematical Statistics

Abstract: Gradient-based algorithms are popular when solving unconstrained optimization problems. By exploiting knowledge of the gradient of the objective function to optimize, each iteration of a gradient-based algorithm aims at approaching the minimizer of said function.
In the age of web-scale prediction problems, many venerable algorithms may encounter diﬃculties. However, as the data sets grow larger, so does our capability to interpret them. Thanks to developments in the computer-science subﬁeld called Machine Learning, it is possible to produce self-adapting optimizers able to deal with diﬀerent challenging scenarios, without requiring the user to rewrite algorithms specialized for the experiment at hand.
This thesis shall compare the... (More); Gradient-based algorithms are popular when solving unconstrained optimization problems. By exploiting knowledge of the gradient of the objective function to optimize, each iteration of a gradient-based algorithm aims at approaching the minimizer of said function.
In the age of web-scale prediction problems, many venerable algorithms may encounter diﬃculties. However, as the data sets grow larger, so does our capability to interpret them. Thanks to developments in the computer-science subﬁeld called Machine Learning, it is possible to produce self-adapting optimizers able to deal with diﬀerent challenging scenarios, without requiring the user to rewrite algorithms specialized for the experiment at hand.
This thesis shall compare the performance of two diﬀerent gradient-based algorithms; Gradient Descent (GD) and Stochastic Gradient Descent (SGD). Firstly, a synthetic linear regression model is considered. Performance of the algorithms are evaluated by comparison with the maximum likelihood estimator (MLE). Thereafter, a convergence study is presented. Lastly, SGD is evaluated over increasingly large, random subsets of the data. Such a subset is known as a mini-batch.
When transitioning to multinomial logistic regression setting, a model to recognize handwritten digits is constructed. The data is provided by Mixed National Institute of Standards and Technology (MNIST) database. Each digit is represented by a vector with pixel weights as entires. The algorithms are assessed by how well they estimate a model to predict which vector belongs to which number.
In the case of linear regression, it is found that SGD outperform GD as the data sets grow larger. Furthermore, the precision of SGD does not necessarily improve when increasing the mini-batch size. For multinomial logistic regression, GD produces the most accurate model with a 73.39% success rate. However, the training of the model is time consuming and by partitioning the data into mini-batches, SGD achieves 70.49% success rate in a more resonable timeframe. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/8904399

author

Hallén, Rasmus

supervisor

Umberto Picchini ^LU

organization

Mathematical Statistics

course

MASK01 20171

year

2017

type

M2 - Bachelor Degree

subject

Mathematics and Statistics

publication/series

Bachelor's Theses in Mathematical Sciences

report number

LUNFMS-4019-2017

ISSN

1654-6229

other publication id

2017:K4

language

English

id

8904399

date added to LUP

2017-03-09 14:53:59

date last changed

2024-09-20 11:46:17

@misc{8904399,
  abstract     = {{Gradient-based algorithms are popular when solving unconstrained optimization problems. By exploiting knowledge of the gradient of the objective function to optimize, each iteration of a gradient-based algorithm aims at approaching the minimizer of said function.
In the age of web-scale prediction problems, many venerable algorithms may encounter diﬃculties. However, as the data sets grow larger, so does our capability to interpret them. Thanks to developments in the computer-science subﬁeld called Machine Learning, it is possible to produce self-adapting optimizers able to deal with diﬀerent challenging scenarios, without requiring the user to rewrite algorithms specialized for the experiment at hand.
This thesis shall compare the performance of two diﬀerent gradient-based algorithms; Gradient Descent (GD) and Stochastic Gradient Descent (SGD). Firstly, a synthetic linear regression model is considered. Performance of the algorithms are evaluated by comparison with the maximum likelihood estimator (MLE). Thereafter, a convergence study is presented. Lastly, SGD is evaluated over increasingly large, random subsets of the data. Such a subset is known as a mini-batch.
When transitioning to multinomial logistic regression setting, a model to recognize handwritten digits is constructed. The data is provided by Mixed National Institute of Standards and Technology (MNIST) database. Each digit is represented by a vector with pixel weights as entires. The algorithms are assessed by how well they estimate a model to predict which vector belongs to which number.
In the case of linear regression, it is found that SGD outperform GD as the data sets grow larger. Furthermore, the precision of SGD does not necessarily improve when increasing the mini-batch size. For multinomial logistic regression, GD produces the most accurate model with a 73.39% success rate. However, the training of the model is time consuming and by partitioning the data into mini-batches, SGD achieves 70.49% success rate in a more resonable timeframe.}},
  author       = {{Hallén, Rasmus}},
  issn         = {{1654-6229}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Bachelor's Theses in Mathematical Sciences}},
  title        = {{A Study of Gradient-Based Algorithms}},
  year         = {{2017}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

A Study of Gradient-Based Algorithms