Investigating Relations between Regularization and Weight Initialization in Artificial Neural Networks

Sjöö, Rasmus

Investigating Relations between Regularization and Weight Initialization in Artificial Neural Networks

Mark

Sjöö, Rasmus ^LU (2022) FYTK02 20221
Computational Biology and Biological Physics - Has been reorganised

Abstract: L2 regularization is a common method used to prevent overtraining in artificial neural networks. However, an issue with this method is that the regularization strength has to be properly adjusted for it to work as intended. This value is usually found by trial and error which can take some time, especially for larger networks. This process could be alleviated if there was a mathematical relationship that predicted the best strength based on the network's hyperparameters. The aim of this project is to prove part of such a relation, specifically if the optimal regularization strength is proportional to the inverse number of training patterns. This was tested using a network that solves binary classification problems. Several regularization... (More); L2 regularization is a common method used to prevent overtraining in artificial neural networks. However, an issue with this method is that the regularization strength has to be properly adjusted for it to work as intended. This value is usually found by trial and error which can take some time, especially for larger networks. This process could be alleviated if there was a mathematical relationship that predicted the best strength based on the network's hyperparameters. The aim of this project is to prove part of such a relation, specifically if the optimal regularization strength is proportional to the inverse number of training patterns. This was tested using a network that solves binary classification problems. Several regularization strengths were tested for different amounts of training patterns. The best ones were compared to the proposed relation. Additional tests were performed to check if weight initialization had an effect on said relation, and if it works for L1 regularization as well. (Less)
Popular Abstract (Swedish): Artificiella Neurala Nätverk (ANN) är en form av maskininlärning som är populär just nu. Den har många användningsområden inom dagens teknik, till exempel som bildigenkänning. Att programmera ett sådant nätverk är dock inte det lättaste man kan göra. Ett vanligt problem som uppstår när ett ANN byggs upp är överträning, vilket sker när nätverket är för kraftfullt för problemet det ska lösa. En vanlig metod som används för att kringgå detta problem är L2 regularisering. Den går ut på att modifiera nätverket så att det minskar sina vikter under träning. Detta kräver att man har en lämplig regulariseringsstyrka. Det vanligaste sättet att hitta den bästa styrkan är att testa flera olika värden och se vilket som ger bäst resultat. Denna process... (More); Artificiella Neurala Nätverk (ANN) är en form av maskininlärning som är populär just nu. Den har många användningsområden inom dagens teknik, till exempel som bildigenkänning. Att programmera ett sådant nätverk är dock inte det lättaste man kan göra. Ett vanligt problem som uppstår när ett ANN byggs upp är överträning, vilket sker när nätverket är för kraftfullt för problemet det ska lösa. En vanlig metod som används för att kringgå detta problem är L2 regularisering. Den går ut på att modifiera nätverket så att det minskar sina vikter under träning. Detta kräver att man har en lämplig regulariseringsstyrka. Det vanligaste sättet att hitta den bästa styrkan är att testa flera olika värden och se vilket som ger bäst resultat. Denna process kan dock ta mycket tid, särskilt om man har ett stort nätverk som tar lång tid att träna. Det hade varit lättare att utföra detta arbete om det fanns ett matematiskt förhållande som avgjorde den bästa regulariseringsstyrkan innan träningen. Målet med detta projekt var att ta reda på huruvida ett sådant förhållande existerade och hur det skulle set ut. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9098131

author

Sjöö, Rasmus ^LU

supervisor

Patrik Edén ^LU

organization

Computational Biology and Biological Physics - Has been reorganised

course

FYTK02 20221

year

2022

type

M2 - Bachelor Degree

subject

Physics and Astronomy

keywords

Artificial Neural Networks, L1 Regularization, L2 Regularization, Loss Function, Maximum Likelihood, Regularization Strength Synthetic Data Generation, Weight Initialization

language

English

id

9098131

date added to LUP

2022-08-25 10:24:34

date last changed

2022-08-25 10:24:34

@misc{9098131,
  abstract     = {{L2 regularization is a common method used to prevent overtraining in artificial neural networks. However, an issue with this method is that the regularization strength has to be properly adjusted for it to work as intended. This value is usually found by trial and error which can take some time, especially for larger networks. This process could be alleviated if there was a mathematical relationship that predicted the best strength based on the network's hyperparameters. The aim of this project is to prove part of such a relation, specifically if the optimal regularization strength is proportional to the inverse number of training patterns. This was tested using a network that solves binary classification problems. Several regularization strengths were tested for different amounts of training patterns. The best ones were compared to the proposed relation. Additional tests were performed to check if weight initialization had an effect on said relation, and if it works for L1 regularization as well.}},
  author       = {{Sjöö, Rasmus}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Investigating Relations between Regularization and Weight Initialization in Artificial Neural Networks}},
  year         = {{2022}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Investigating Relations between Regularization and Weight Initialization in Artificial Neural Networks