Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Expanding an adaptive learning-rate algorithm to handle mini-batch training

Holst Andersen, Sören LU (2024) FYSK04 20241
Department of Physics
Abstract
Resilient backpropagation, Rprop, is a robust and accurate optimization method used in neural network training with batch-learning. As a result of its adaptive step sizes, Rprop requires copious amounts of data at each iteration which slows it down when dealing with large datasets, compared with mini-batch methods. We create and empirically evaluate a version of Rprop, S-Rprop, which can handle mini-batch learning. S-Rprop with optimized hyper-paramaters matches the Stochastic Gradient Descent (SGD) benchmark performance with optimized hyper-parameters using the same convolutional neural network (CNN) architecture. In a deep-learning setting designed to generate vanishing gradient problems, we show that S-Rprop outperforms both Rprop and... (More)
Resilient backpropagation, Rprop, is a robust and accurate optimization method used in neural network training with batch-learning. As a result of its adaptive step sizes, Rprop requires copious amounts of data at each iteration which slows it down when dealing with large datasets, compared with mini-batch methods. We create and empirically evaluate a version of Rprop, S-Rprop, which can handle mini-batch learning. S-Rprop with optimized hyper-paramaters matches the Stochastic Gradient Descent (SGD) benchmark performance with optimized hyper-parameters using the same convolutional neural network (CNN) architecture. In a deep-learning setting designed to generate vanishing gradient problems, we show that S-Rprop outperforms both Rprop and SGD when re-using the optimal parameters from the CNN. (Less)
Popular Abstract
Artificial intelligence plays an increasingly important role in our social lives and in daily economic activity. But artificial intelligence without an optimization method is just artificial.

An optimization method is an algorithm used in a neural network that tells the network how to improve itself in response to processing new data. It carries the instructions which the network uses to achieve the optimal solution with the least amount of error possible. Optimization comes for the Latin word optimus which means best. This is how we get from a random cacophony of numbers, when a neural network model is first created, to the best possible answer to our problem. Different algorithms are used depending on the characteristics of the... (More)
Artificial intelligence plays an increasingly important role in our social lives and in daily economic activity. But artificial intelligence without an optimization method is just artificial.

An optimization method is an algorithm used in a neural network that tells the network how to improve itself in response to processing new data. It carries the instructions which the network uses to achieve the optimal solution with the least amount of error possible. Optimization comes for the Latin word optimus which means best. This is how we get from a random cacophony of numbers, when a neural network model is first created, to the best possible answer to our problem. Different algorithms are used depending on the characteristics of the dataset at hand, such as its size or complexity, and how much computing power is available to attack the dataset. The DNA-sourced optimization method which is default for most humans is called "learning from your mistakes" and takes many years to take effect. Artificial neural networks have the advantage of being able to make hundreds or thousands of mistakes per second.

In my thesis, I seek to combine two optimization methods which have previously been considered incompatible. One, Rprop, needs a significant volume of data before taking a step towards the optimal solution. This yields high accuracy and rapid convergence, as long as the data set is small enough. On the other hand, Stochastic Gradient Descent uses small subsets of data in each step. This means that each step taken towards the optimal model is less precise, vaguely in the correct direction, but so many more steps are taken that we are sure to reach the desired destination. The contrasting amount of data needed for each optimizer step is the reason for their incompatibility, which has received little attention. In my efforts to integrate these two, I hope to shed light on the nature of the algorithms, and, with cautious optimism, produce a novel optimization method which could find its own area of application. The development of optimization methods are key to the continued impact AI is having on all aspects of society. (Less)
Please use this url to cite or link to this publication:
author
Holst Andersen, Sören LU
supervisor
organization
course
FYSK04 20241
year
type
M2 - Bachelor Degree
subject
keywords
Rprop, adaptive learning-rates, deep learning, SGD, algorithm, algorithm development, CNN, supervised-learning, bench-marking, Machine Learning, vanishing gradient problem, optimization method, training, mini-batch training, batch training
language
English
id
9166149
date added to LUP
2024-06-24 08:32:16
date last changed
2024-06-24 08:32:16
@misc{9166149,
  abstract     = {{Resilient backpropagation, Rprop, is a robust and accurate optimization method used in neural network training with batch-learning. As a result of its adaptive step sizes, Rprop requires copious amounts of data at each iteration which slows it down when dealing with large datasets, compared with mini-batch methods. We create and empirically evaluate a version of Rprop, S-Rprop, which can handle mini-batch learning. S-Rprop with optimized hyper-paramaters matches the Stochastic Gradient Descent (SGD) benchmark performance with optimized hyper-parameters using the same convolutional neural network (CNN) architecture. In a deep-learning setting designed to generate vanishing gradient problems, we show that S-Rprop outperforms both Rprop and SGD when re-using the optimal parameters from the CNN.}},
  author       = {{Holst Andersen, Sören}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Expanding an adaptive learning-rate algorithm to handle mini-batch training}},
  year         = {{2024}},
}