Expanding an adaptive learning-rate algorithm to handle mini-batch training

Holst Andersen, Sören

Expanding an adaptive learning-rate algorithm to handle mini-batch training

Mark

Holst Andersen, Sören ^LU (2024) FYSK04 20241
Department of Physics

Abstract: Resilient backpropagation, Rprop, is a robust and accurate optimization method used in neural network training with batch-learning. As a result of its adaptive step sizes, Rprop requires copious amounts of data at each iteration which slows it down when dealing with large datasets, compared with mini-batch methods. We create and empirically evaluate a version of Rprop, S-Rprop, which can handle mini-batch learning. S-Rprop with optimized hyper-paramaters matches the Stochastic Gradient Descent (SGD) benchmark performance with optimized hyper-parameters using the same convolutional neural network (CNN) architecture. In a deep-learning setting designed to generate vanishing gradient problems, we show that S-Rprop outperforms both Rprop and... (More); Resilient backpropagation, Rprop, is a robust and accurate optimization method used in neural network training with batch-learning. As a result of its adaptive step sizes, Rprop requires copious amounts of data at each iteration which slows it down when dealing with large datasets, compared with mini-batch methods. We create and empirically evaluate a version of Rprop, S-Rprop, which can handle mini-batch learning. S-Rprop with optimized hyper-paramaters matches the Stochastic Gradient Descent (SGD) benchmark performance with optimized hyper-parameters using the same convolutional neural network (CNN) architecture. In a deep-learning setting designed to generate vanishing gradient problems, we show that S-Rprop outperforms both Rprop and SGD when re-using the optimal parameters from the CNN. (Less)
Popular Abstract: Artificial intelligence plays an increasingly important role in our social lives and in daily economic activity. But artificial intelligence without an optimization method is just artificial.

An optimization method is an algorithm used in a neural network that tells the network how to improve itself in response to processing new data. It carries the instructions which the network uses to achieve the optimal solution with the least amount of error possible. Optimization comes for the Latin word optimus which means best. This is how we get from a random cacophony of numbers, when a neural network model is first created, to the best possible answer to our problem. Different algorithms are used depending on the characteristics of the... (More); Artificial intelligence plays an increasingly important role in our social lives and in daily economic activity. But artificial intelligence without an optimization method is just artificial.

An optimization method is an algorithm used in a neural network that tells the network how to improve itself in response to processing new data. It carries the instructions which the network uses to achieve the optimal solution with the least amount of error possible. Optimization comes for the Latin word optimus which means best. This is how we get from a random cacophony of numbers, when a neural network model is first created, to the best possible answer to our problem. Different algorithms are used depending on the characteristics of the dataset at hand, such as its size or complexity, and how much computing power is available to attack the dataset. The DNA-sourced optimization method which is default for most humans is called "learning from your mistakes" and takes many years to take effect. Artificial neural networks have the advantage of being able to make hundreds or thousands of mistakes per second.

In my thesis, I seek to combine two optimization methods which have previously been considered incompatible. One, Rprop, needs a significant volume of data before taking a step towards the optimal solution. This yields high accuracy and rapid convergence, as long as the data set is small enough. On the other hand, Stochastic Gradient Descent uses small subsets of data in each step. This means that each step taken towards the optimal model is less precise, vaguely in the correct direction, but so many more steps are taken that we are sure to reach the desired destination. The contrasting amount of data needed for each optimizer step is the reason for their incompatibility, which has received little attention. In my efforts to integrate these two, I hope to shed light on the nature of the algorithms, and, with cautious optimism, produce a novel optimization method which could find its own area of application. The development of optimization methods are key to the continued impact AI is having on all aspects of society. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9166149

author

Holst Andersen, Sören ^LU

supervisor

Patrik Edén ^LU

organization

Department of Physics

course

FYSK04 20241

year

2024

type

M2 - Bachelor Degree

subject

keywords

Rprop, adaptive learning-rates, deep learning, SGD, algorithm, algorithm development, CNN, supervised-learning, bench-marking, Machine Learning, vanishing gradient problem, optimization method, training, mini-batch training, batch training

language

English

id

9166149

date added to LUP

2024-06-24 08:32:16

date last changed

2024-06-24 08:32:16

@misc{9166149,
  abstract     = {{Resilient backpropagation, Rprop, is a robust and accurate optimization method used in neural network training with batch-learning. As a result of its adaptive step sizes, Rprop requires copious amounts of data at each iteration which slows it down when dealing with large datasets, compared with mini-batch methods. We create and empirically evaluate a version of Rprop, S-Rprop, which can handle mini-batch learning. S-Rprop with optimized hyper-paramaters matches the Stochastic Gradient Descent (SGD) benchmark performance with optimized hyper-parameters using the same convolutional neural network (CNN) architecture. In a deep-learning setting designed to generate vanishing gradient problems, we show that S-Rprop outperforms both Rprop and SGD when re-using the optimal parameters from the CNN.}},
  author       = {{Holst Andersen, Sören}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Expanding an adaptive learning-rate algorithm to handle mini-batch training}},
  year         = {{2024}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Expanding an adaptive learning-rate algorithm to handle mini-batch training