A Study of GradientBased Algorithms
(2017) MASK01 20171Mathematical Statistics
 Abstract
 Gradientbased algorithms are popular when solving unconstrained optimization problems. By exploiting knowledge of the gradient of the objective function to optimize, each iteration of a gradientbased algorithm aims at approaching the minimizer of said function.
In the age of webscale prediction problems, many venerable algorithms may encounter diﬃculties. However, as the data sets grow larger, so does our capability to interpret them. Thanks to developments in the computerscience subﬁeld called Machine Learning, it is possible to produce selfadapting optimizers able to deal with diﬀerent challenging scenarios, without requiring the user to rewrite algorithms specialized for the experiment at hand.
This thesis shall compare the... (More)  Gradientbased algorithms are popular when solving unconstrained optimization problems. By exploiting knowledge of the gradient of the objective function to optimize, each iteration of a gradientbased algorithm aims at approaching the minimizer of said function.
In the age of webscale prediction problems, many venerable algorithms may encounter diﬃculties. However, as the data sets grow larger, so does our capability to interpret them. Thanks to developments in the computerscience subﬁeld called Machine Learning, it is possible to produce selfadapting optimizers able to deal with diﬀerent challenging scenarios, without requiring the user to rewrite algorithms specialized for the experiment at hand.
This thesis shall compare the performance of two diﬀerent gradientbased algorithms; Gradient Descent (GD) and Stochastic Gradient Descent (SGD). Firstly, a synthetic linear regression model is considered. Performance of the algorithms are evaluated by comparison with the maximum likelihood estimator (MLE). Thereafter, a convergence study is presented. Lastly, SGD is evaluated over increasingly large, random subsets of the data. Such a subset is known as a minibatch.
When transitioning to multinomial logistic regression setting, a model to recognize handwritten digits is constructed. The data is provided by Mixed National Institute of Standards and Technology (MNIST) database. Each digit is represented by a vector with pixel weights as entires. The algorithms are assessed by how well they estimate a model to predict which vector belongs to which number.
In the case of linear regression, it is found that SGD outperform GD as the data sets grow larger. Furthermore, the precision of SGD does not necessarily improve when increasing the minibatch size. For multinomial logistic regression, GD produces the most accurate model with a 73.39% success rate. However, the training of the model is time consuming and by partitioning the data into minibatches, SGD achieves 70.49% success rate in a more resonable timeframe. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/studentpapers/record/8904399
 author
 Hallén, Rasmus
 supervisor

 Umberto Picchini ^{LU}
 organization
 course
 MASK01 20171
 year
 2017
 type
 M2  Bachelor Degree
 subject
 language
 English
 id
 8904399
 date added to LUP
 20170309 14:53:59
 date last changed
 20170309 14:53:59
@misc{8904399, abstract = {Gradientbased algorithms are popular when solving unconstrained optimization problems. By exploiting knowledge of the gradient of the objective function to optimize, each iteration of a gradientbased algorithm aims at approaching the minimizer of said function. In the age of webscale prediction problems, many venerable algorithms may encounter diﬃculties. However, as the data sets grow larger, so does our capability to interpret them. Thanks to developments in the computerscience subﬁeld called Machine Learning, it is possible to produce selfadapting optimizers able to deal with diﬀerent challenging scenarios, without requiring the user to rewrite algorithms specialized for the experiment at hand. This thesis shall compare the performance of two diﬀerent gradientbased algorithms; Gradient Descent (GD) and Stochastic Gradient Descent (SGD). Firstly, a synthetic linear regression model is considered. Performance of the algorithms are evaluated by comparison with the maximum likelihood estimator (MLE). Thereafter, a convergence study is presented. Lastly, SGD is evaluated over increasingly large, random subsets of the data. Such a subset is known as a minibatch. When transitioning to multinomial logistic regression setting, a model to recognize handwritten digits is constructed. The data is provided by Mixed National Institute of Standards and Technology (MNIST) database. Each digit is represented by a vector with pixel weights as entires. The algorithms are assessed by how well they estimate a model to predict which vector belongs to which number. In the case of linear regression, it is found that SGD outperform GD as the data sets grow larger. Furthermore, the precision of SGD does not necessarily improve when increasing the minibatch size. For multinomial logistic regression, GD produces the most accurate model with a 73.39% success rate. However, the training of the model is time consuming and by partitioning the data into minibatches, SGD achieves 70.49% success rate in a more resonable timeframe.}, author = {Hallén, Rasmus}, language = {eng}, note = {Student Paper}, title = {A Study of GradientBased Algorithms}, year = {2017}, }