Combined Regularisation Techniques for Artificial Neural Networks

Binns, Joseph

Combined Regularisation Techniques for Artificial Neural Networks

Mark

Binns, Joseph ^LU (2022) FYTK02 20221
Computational Biology and Biological Physics - Has been reorganised

Abstract: Artificial neural networks are prone to overfitting – the process of learning details specific to a particular training data set. Success in preventing overfitting through combining the L2 and dropout regularisation techniques has led to the combination’s recent popularity. However, with the introduction of each additional regularisation technique to an artificial neural network, there comes new hyperparameters which must be tuned in an increasingly complex and computationally expensive manner. Motivated by L2’s action as a Gaussian prior on the loss function, we hypothesise an analytic relation for an optimal L2 strength’s dependence on the number of patterns. Conducted on an artificial neural network composed of a single hidden layer,... (More); Artificial neural networks are prone to overfitting – the process of learning details specific to a particular training data set. Success in preventing overfitting through combining the L2 and dropout regularisation techniques has led to the combination’s recent popularity. However, with the introduction of each additional regularisation technique to an artificial neural network, there comes new hyperparameters which must be tuned in an increasingly complex and computationally expensive manner. Motivated by L2’s action as a Gaussian prior on the loss function, we hypothesise an analytic relation for an optimal L2 strength’s dependence on the number of patterns. Conducted on an artificial neural network composed of a single hidden layer, this systematic study tests the hypothesis for optimal L2 strength, and considers what interactions the additional involvement of dropout and early stopping may have on the relation. On an otherwise static problem and network calibration, the results of this thesis suggested the success of the hypothesis within a valid working region. The results are useful informants for the choice of L2 strength, drop rate and early stopping usage, and gave promise that the predictor may find real world applications. (Less)
Popular Abstract: Artificial Intelligence’s (AI’s) potential for incredible state-of-the-art performance has not gone unnoticed; from medicine to military, the interest of all manner of fields has been peaked [1]. This has encouraged the rapid integration of AI into our everyday lives [2]. However, in the recent swarm of industrial excitement, whilst new applications have taken the limelight, rigour and understanding have began to lag behind. By shining light on a popular choice of mechanisms which assist in the training of AI, known as dropout, L2 and early stopping, my study aimed to be a small step towards designing AI in a more informed and understood manner.

Artificial Neural Networks (ANNs) are a collection of computational architectures inspired... (More); Artificial Intelligence’s (AI’s) potential for incredible state-of-the-art performance has not gone unnoticed; from medicine to military, the interest of all manner of fields has been peaked [1]. This has encouraged the rapid integration of AI into our everyday lives [2]. However, in the recent swarm of industrial excitement, whilst new applications have taken the limelight, rigour and understanding have began to lag behind. By shining light on a popular choice of mechanisms which assist in the training of AI, known as dropout, L2 and early stopping, my study aimed to be a small step towards designing AI in a more informed and understood manner.

Artificial Neural Networks (ANNs) are a collection of computational architectures inspired by the brain; they are the current most realised form of AI. If an ANN is insufficient in size, it will lack the capacity to solve even the simplest of problems. However, if an ANN is too large, then that excessive capacity seldom lies dormant. Instead, in a process known as overfitting, the ANN tends to learn undesirable peculiarities in a data set, such as fuzzy noise. This, in turn, can result in an ANN that generalises poorly to new data – a tendency to perform insufficiently on previously unseen variations of the same underlying problem [3].

Driven by a desire to suppress overfitting, there have been a variety of developments of so-called regularisation techniques. L2, dropout and early stopping are common such choices. In particular, L2 and dropout have recently received praise and popularity for providing good results when applied in conjunction [4, 5]. Though regularisation techniques offer significant benefits – often being of practical necessity – their implementation does not
come without its costs. Notably, both L2 and dropout have associated values controlling their strengths, each of which must be exhaustively fine-tuned to the specific problem and chosen ANN architecture [6].

To guide in what can become a lengthy and troublesome process of trial-and-error, my study aimed to test a hypothesised predictor for optimal L2 strength. The predictor proposed that optimal L2 strength is proportional to the amount of available training data. The effects on optimal L2 strength, of using L2 in conjunction with both the dropout and early stopping regularisation techniques, were then observed.

The results, which suggest the predictor to be successful within a suitable region, have helped to improve understanding of the interactions between these combined regularisation techniques. There shows promise that the predictor may find real world usage from it’s extrapolation to situations with many training patterns, which would otherwise rely upon a time-consuming hyperparameter search. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9088789

author

Binns, Joseph ^LU

supervisor

Patrik Edén ^LU

organization

Computational Biology and Biological Physics - Has been reorganised

course

FYTK02 20221

year

2022

type

M2 - Bachelor Degree

subject

Physics and Astronomy

keywords

Artificial Neural Networks, ANNs, Overfitting, Combined Regularisation Techniques, Regularisation Techniques, Regularisation, L2, Weight Decay, Dropout, Early Stopping

other publication id

LU-TP 22-28

language

English

id

9088789

date added to LUP

2022-06-23 11:06:44

date last changed

2022-06-29 15:21:29

@misc{9088789,
  abstract     = {{Artificial neural networks are prone to overfitting – the process of learning details specific to a particular training data set. Success in preventing overfitting through combining the L2 and dropout regularisation techniques has led to the combination’s recent popularity. However, with the introduction of each additional regularisation technique to an artificial neural network, there comes new hyperparameters which must be tuned in an increasingly complex and computationally expensive manner. Motivated by L2’s action as a Gaussian prior on the loss function, we hypothesise an analytic relation for an optimal L2 strength’s dependence on the number of patterns. Conducted on an artificial neural network composed of a single hidden layer, this systematic study tests the hypothesis for optimal L2 strength, and considers what interactions the additional involvement of dropout and early stopping may have on the relation. On an otherwise static problem and network calibration, the results of this thesis suggested the success of the hypothesis within a valid working region. The results are useful informants for the choice of L2 strength, drop rate and early stopping usage, and gave promise that the predictor may find real world applications.}},
  author       = {{Binns, Joseph}},
  language     = {{eng}},
  note         = {{Student Paper}},
  title        = {{Combined Regularisation Techniques for Artificial Neural Networks}},
  year         = {{2022}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Combined Regularisation Techniques for Artificial Neural Networks