Expanding an adaptive learning-rate algorithm to handle mini-batch training
(2024) FYSK04 20241Department of Physics
- Abstract
- Resilient backpropagation, Rprop, is a robust and accurate optimization method used in neural network training with batch-learning. As a result of its adaptive step sizes, Rprop requires copious amounts of data at each iteration which slows it down when dealing with large datasets, compared with mini-batch methods. We create and empirically evaluate a version of Rprop, S-Rprop, which can handle mini-batch learning. S-Rprop with optimized hyper-paramaters matches the Stochastic Gradient Descent (SGD) benchmark performance with optimized hyper-parameters using the same convolutional neural network (CNN) architecture. In a deep-learning setting designed to generate vanishing gradient problems, we show that S-Rprop outperforms both Rprop and... (More)
- Resilient backpropagation, Rprop, is a robust and accurate optimization method used in neural network training with batch-learning. As a result of its adaptive step sizes, Rprop requires copious amounts of data at each iteration which slows it down when dealing with large datasets, compared with mini-batch methods. We create and empirically evaluate a version of Rprop, S-Rprop, which can handle mini-batch learning. S-Rprop with optimized hyper-paramaters matches the Stochastic Gradient Descent (SGD) benchmark performance with optimized hyper-parameters using the same convolutional neural network (CNN) architecture. In a deep-learning setting designed to generate vanishing gradient problems, we show that S-Rprop outperforms both Rprop and SGD when re-using the optimal parameters from the CNN. (Less)
- Popular Abstract
- Artificial intelligence plays an increasingly important role in our social lives and in daily economic activity. But artificial intelligence without an optimization method is just artificial.
An optimization method is an algorithm used in a neural network that tells the network how to improve itself in response to processing new data. It carries the instructions which the network uses to achieve the optimal solution with the least amount of error possible. Optimization comes for the Latin word optimus which means best. This is how we get from a random cacophony of numbers, when a neural network model is first created, to the best possible answer to our problem. Different algorithms are used depending on the characteristics of the... (More) - Artificial intelligence plays an increasingly important role in our social lives and in daily economic activity. But artificial intelligence without an optimization method is just artificial.
An optimization method is an algorithm used in a neural network that tells the network how to improve itself in response to processing new data. It carries the instructions which the network uses to achieve the optimal solution with the least amount of error possible. Optimization comes for the Latin word optimus which means best. This is how we get from a random cacophony of numbers, when a neural network model is first created, to the best possible answer to our problem. Different algorithms are used depending on the characteristics of the dataset at hand, such as its size or complexity, and how much computing power is available to attack the dataset. The DNA-sourced optimization method which is default for most humans is called "learning from your mistakes" and takes many years to take effect. Artificial neural networks have the advantage of being able to make hundreds or thousands of mistakes per second.
In my thesis, I seek to combine two optimization methods which have previously been considered incompatible. One, Rprop, needs a significant volume of data before taking a step towards the optimal solution. This yields high accuracy and rapid convergence, as long as the data set is small enough. On the other hand, Stochastic Gradient Descent uses small subsets of data in each step. This means that each step taken towards the optimal model is less precise, vaguely in the correct direction, but so many more steps are taken that we are sure to reach the desired destination. The contrasting amount of data needed for each optimizer step is the reason for their incompatibility, which has received little attention. In my efforts to integrate these two, I hope to shed light on the nature of the algorithms, and, with cautious optimism, produce a novel optimization method which could find its own area of application. The development of optimization methods are key to the continued impact AI is having on all aspects of society. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9166149
- author
- Holst Andersen, Sören LU
- supervisor
-
- Patrik Edén LU
- organization
- course
- FYSK04 20241
- year
- 2024
- type
- M2 - Bachelor Degree
- subject
- keywords
- Rprop, adaptive learning-rates, deep learning, SGD, algorithm, algorithm development, CNN, supervised-learning, bench-marking, Machine Learning, vanishing gradient problem, optimization method, training, mini-batch training, batch training
- language
- English
- id
- 9166149
- date added to LUP
- 2024-06-24 08:32:16
- date last changed
- 2024-06-24 08:32:16
@misc{9166149, abstract = {{Resilient backpropagation, Rprop, is a robust and accurate optimization method used in neural network training with batch-learning. As a result of its adaptive step sizes, Rprop requires copious amounts of data at each iteration which slows it down when dealing with large datasets, compared with mini-batch methods. We create and empirically evaluate a version of Rprop, S-Rprop, which can handle mini-batch learning. S-Rprop with optimized hyper-paramaters matches the Stochastic Gradient Descent (SGD) benchmark performance with optimized hyper-parameters using the same convolutional neural network (CNN) architecture. In a deep-learning setting designed to generate vanishing gradient problems, we show that S-Rprop outperforms both Rprop and SGD when re-using the optimal parameters from the CNN.}}, author = {{Holst Andersen, Sören}}, language = {{eng}}, note = {{Student Paper}}, title = {{Expanding an adaptive learning-rate algorithm to handle mini-batch training}}, year = {{2024}}, }