Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Optimizing L2-regularization for Binary Classification Tasks

Bolinder, Oskar LU (2022) FYTK02 20221
Computational Biology and Biological Physics - Undergoing reorganization
Abstract
An Artificial Neural Network (ANN) is a type of machine learning algorithm with widespread usage. When training an ANN, there is a risk that it gets overtrained and cannot solve the task for new data. Methods to prevent this, such as L2-regularization, introduce hyperparameters that are time-consuming to optimize. In this thesis, I investigate a hypothesis which postulates how the optimal L2-regularization strength for a binary classification task depends on the number of input dimensions and available training patterns. First I generating binary classification tasks consisting of Gaussian clouds of different size in different numbers of dimensions. Several networks were then trained with varying L2-regularization strengths to see which... (More)
An Artificial Neural Network (ANN) is a type of machine learning algorithm with widespread usage. When training an ANN, there is a risk that it gets overtrained and cannot solve the task for new data. Methods to prevent this, such as L2-regularization, introduce hyperparameters that are time-consuming to optimize. In this thesis, I investigate a hypothesis which postulates how the optimal L2-regularization strength for a binary classification task depends on the number of input dimensions and available training patterns. First I generating binary classification tasks consisting of Gaussian clouds of different size in different numbers of dimensions. Several networks were then trained with varying L2-regularization strengths to see which ones achieved the lowest validation strength on the tasks. Results were promising in favor of the hypothesis. No statistical significance is shown, but results were similar to the behaviour predicted by the hypothesis when the number of training patterns was in a certain interval. (Less)
Popular Abstract
Artificial intelligence, often abbreviated AI, is behind many recent advancements in areas such as image and speech recognition. The work horse that allows many computers to learn is the artificial neural network, or ANN for short. Like we learn by forming connections between neurons in our brain, an ANN also forms connections between “nodes” and the strength of these links is what determines what the computer has learned.

In order to learn, we need something to study. For computers it is the same, they need training data that can teach them how to solve a problem. However, just like humans may employ more or less efficient study techniques, an ANN that is trained does not necessarily learn how to solve the problem that its training... (More)
Artificial intelligence, often abbreviated AI, is behind many recent advancements in areas such as image and speech recognition. The work horse that allows many computers to learn is the artificial neural network, or ANN for short. Like we learn by forming connections between neurons in our brain, an ANN also forms connections between “nodes” and the strength of these links is what determines what the computer has learned.

In order to learn, we need something to study. For computers it is the same, they need training data that can teach them how to solve a problem. However, just like humans may employ more or less efficient study techniques, an ANN that is trained does not necessarily learn how to solve the problem that its training data represents. Instead it can become overtrained. An overtrained network is like a student that practices for a difficult math exam by memorizing the answers to old exams. This is a strategy that works great if the upcoming exam has the exact questions that the old exams do. Unfortunately, even if the questions are only slightly different, the student has no chance to pass.

What does this mean for an AI? It may be that an AI that has been trained to recognize cats and dogs from a set of 100 images and can achieve 100\% accuracy on the training data, could fail to classify the correct animal if it is shown an image outside of the training set. Naturally, such an AI would not be useful. Thankfully, methods to prevent overtraining exist. A popular one is to punish networks that show signs of overtraining. How strong this punishment should be depends on how large the ANN is and what type of problem it is designed to solve.

Finding the right strength of this punishment can be difficult as there is currently no way to accurately predict it. Thus, those who design ANNs have to try different strengths, compare the results, and select the one that seems to work best. This can be a very time-consuming process, especially since there are many other settings for the ANN that must be carefully selected. A common method to save time is to randomly choose values for the settings a few times and see what works best. However, there might be a way to predict the best strength for the punishment. This is what our research aims to investigate.

If it is possible to predict this strength, there would be one less setting that has to be determined when designing an ANN. This would be useful in many fields since overtraining is a large issue when there is a lack of training data, such as in medical applications of machine learning. Therefore, being able to immediately decide the optimal punishment strength would make the process of finding a good ANN for these applications faster. (Less)
Please use this url to cite or link to this publication:
author
Bolinder, Oskar LU
supervisor
organization
course
FYTK02 20221
year
type
M2 - Bachelor Degree
subject
keywords
Machine learning, Overtraining, L2-regularization, Model selection, Binary classification
language
English
id
9090312
date added to LUP
2022-06-28 11:30:04
date last changed
2022-06-28 11:30:04
@misc{9090312,
  abstract     = {{An Artificial Neural Network (ANN) is a type of machine learning algorithm with widespread usage. When training an ANN, there is a risk that it gets overtrained and cannot solve the task for new data. Methods to prevent this, such as L2-regularization, introduce hyperparameters that are time-consuming to optimize. In this thesis, I investigate a hypothesis which postulates how the optimal L2-regularization strength for a binary classification task depends on the number of input dimensions and available training patterns. First I generating binary classification tasks consisting of Gaussian clouds of different size in different numbers of dimensions. Several networks were then trained with varying L2-regularization strengths to see which ones achieved the lowest validation strength on the tasks. Results were promising in favor of the hypothesis. No statistical significance is shown, but results were similar to the behaviour predicted by the hypothesis when the number of training patterns was in a certain interval.}},
  author       = {{Bolinder, Oskar}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Optimizing L2-regularization for Binary Classification Tasks}},
  year         = {{2022}},
}