Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

The derivation of first- and second-order backpropagation methods for fully-connected and convolutional neural networks

Sjögren, Simon LU (2021) In Master's Theses in Mathematical Sciences NUMM03 20211
Mathematics (Faculty of Engineering)
Centre for Mathematical Sciences
Abstract
We introduce rigorous theory for deriving first and second order backpropagation methods for Deep Neural Networks (DNN) whilst satisfying existing theory in DNN optimization. We begin by formally defining a neural network with its respective components and state the first and second order chain rule with respect to its partial derivatives. The partial derivatives in the chain rule formula are related by an operator and this operator depends on the method on which the feedforward process is defined. As a corollary, we notice that the chain rule behave independently of which neural network architecture was used, however, the chain rule operations depend solely on the feedforward structure of the neural network (NN). By changing from a... (More)
We introduce rigorous theory for deriving first and second order backpropagation methods for Deep Neural Networks (DNN) whilst satisfying existing theory in DNN optimization. We begin by formally defining a neural network with its respective components and state the first and second order chain rule with respect to its partial derivatives. The partial derivatives in the chain rule formula are related by an operator and this operator depends on the method on which the feedforward process is defined. As a corollary, we notice that the chain rule behave independently of which neural network architecture was used, however, the chain rule operations depend solely on the feedforward structure of the neural network (NN). By changing from a fully-connected (FC) NN to a convolution neural network (CNN), the backpropagation algorithm, with the chain rule, uses the same structure as the respective feedforward method and can be used for both CNN and FC regardless. We compare results between first and second order optimization methods for the Cifar-10 dataset using Pytorch's own implementation of backpropagation which utilizes Autograd \cite{autograd} to compute its gradients, the alternative is our implementation and derivation of the second-order method AdaHessian \cite{adah}, which shows equal if not slightly improved results. The code for this project can be found at \url{https://github.com/Simonws92/Code/tree/main/Master_thesis}. (Less)
Popular Abstract
Deep learning have replaced many roles for decision making and it continues to be central in existing systems and will no doubt increase its relevancy for the future. Such deep learning models uses mostly a form of machine learning technique called supervised learning, This technique tells the deep learning model how to correctly label a set of directives. The original model makes a guess of what some input is and if the model is wrong, the supervised method will tell the model to make adjustments. These adjustments are based of the error between the guess and the correct label. The higher the error, the bigger the adjustments. The function which checks the error is called the cost function and to calculate these adjustments of the model... (More)
Deep learning have replaced many roles for decision making and it continues to be central in existing systems and will no doubt increase its relevancy for the future. Such deep learning models uses mostly a form of machine learning technique called supervised learning, This technique tells the deep learning model how to correctly label a set of directives. The original model makes a guess of what some input is and if the model is wrong, the supervised method will tell the model to make adjustments. These adjustments are based of the error between the guess and the correct label. The higher the error, the bigger the adjustments. The function which checks the error is called the cost function and to calculate these adjustments of the model we use a method called gradient descent. This is a type of derivative of the cost function to measure the change we need to perform. The gradient of this cost function tell us in which direction the change is the biggest. To update our deep learning model, we go in the negative direction. However, these methods do not tell us anything about the function outside the evaluation of this gradient. Second order methods will give us more information about the cost function itself. So we develop methods with the help of calculus to determine this function’s properties. Some of these properties are valuable for the machine learning process and can help guide the model more efficient. We can combine these properties with existing algorithms for faster convergence in finding the zeros of the cost function which we want to minimize. Higher-order derivatives are also necessary when solving partial differential equations (PDE) for physics induced neural networks (PINN). In this case, these derivatives will act as a constraint for the learning process in the supervised learning algorithm. Physics induced neural networks are deep learning models that will give us a solution of some PDE based on how we use our higher-order derivatives, initial and boundary conditions which we give to this network. The idea is that these PINN models will give us the solution almost immediately instead of having to solve PDEs each and every time we make minor changes to the initial and boundary conditions. For certain applications, solving a PDE may take hours even for modern supercomputers. Using PINN models we may speed up the process significantly. (Less)
Please use this url to cite or link to this publication:
author
Sjögren, Simon LU
supervisor
organization
course
NUMM03 20211
year
type
H2 - Master's Degree (Two Years)
subject
publication/series
Master's Theses in Mathematical Sciences
report number
LUNFNA-3036-2021
ISSN
1404-6342
other publication id
2021:E66
language
English
id
9067752
date added to LUP
2021-11-23 14:31:33
date last changed
2021-11-23 14:31:33
@misc{9067752,
  abstract     = {{We introduce rigorous theory for deriving first and second order backpropagation methods for Deep Neural Networks (DNN) whilst satisfying existing theory in DNN optimization. We begin by formally defining a neural network with its respective components and state the first and second order chain rule with respect to its partial derivatives. The partial derivatives in the chain rule formula are related by an operator and this operator depends on the method on which the feedforward process is defined. As a corollary, we notice that the chain rule behave independently of which neural network architecture was used, however, the chain rule operations depend solely on the feedforward structure of the neural network (NN). By changing from a fully-connected (FC) NN to a convolution neural network (CNN), the backpropagation algorithm, with the chain rule, uses the same structure as the respective feedforward method and can be used for both CNN and FC regardless. We compare results between first and second order optimization methods for the Cifar-10 dataset using Pytorch's own implementation of backpropagation which utilizes Autograd \cite{autograd} to compute its gradients, the alternative is our implementation and derivation of the second-order method AdaHessian \cite{adah}, which shows equal if not slightly improved results. The code for this project can be found at \url{https://github.com/Simonws92/Code/tree/main/Master_thesis}.}},
  author       = {{Sjögren, Simon}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master's Theses in Mathematical Sciences}},
  title        = {{The derivation of first- and second-order backpropagation methods for fully-connected and convolutional neural networks}},
  year         = {{2021}},
}