The derivation of first and secondorder backpropagation methods for fullyconnected and convolutional neural networks
(2021) In Master's Theses in Mathematical Sciences NUMM03 20211Mathematics (Faculty of Engineering)
Centre for Mathematical Sciences
 Abstract
 We introduce rigorous theory for deriving first and second order backpropagation methods for Deep Neural Networks (DNN) whilst satisfying existing theory in DNN optimization. We begin by formally defining a neural network with its respective components and state the first and second order chain rule with respect to its partial derivatives. The partial derivatives in the chain rule formula are related by an operator and this operator depends on the method on which the feedforward process is defined. As a corollary, we notice that the chain rule behave independently of which neural network architecture was used, however, the chain rule operations depend solely on the feedforward structure of the neural network (NN). By changing from a... (More)
 We introduce rigorous theory for deriving first and second order backpropagation methods for Deep Neural Networks (DNN) whilst satisfying existing theory in DNN optimization. We begin by formally defining a neural network with its respective components and state the first and second order chain rule with respect to its partial derivatives. The partial derivatives in the chain rule formula are related by an operator and this operator depends on the method on which the feedforward process is defined. As a corollary, we notice that the chain rule behave independently of which neural network architecture was used, however, the chain rule operations depend solely on the feedforward structure of the neural network (NN). By changing from a fullyconnected (FC) NN to a convolution neural network (CNN), the backpropagation algorithm, with the chain rule, uses the same structure as the respective feedforward method and can be used for both CNN and FC regardless. We compare results between first and second order optimization methods for the Cifar10 dataset using Pytorch's own implementation of backpropagation which utilizes Autograd \cite{autograd} to compute its gradients, the alternative is our implementation and derivation of the secondorder method AdaHessian \cite{adah}, which shows equal if not slightly improved results. The code for this project can be found at \url{https://github.com/Simonws92/Code/tree/main/Master_thesis}. (Less)
 Popular Abstract
 Deep learning have replaced many roles for decision making and it continues to be central in existing systems and will no doubt increase its relevancy for the future. Such deep learning models uses mostly a form of machine learning technique called supervised learning, This technique tells the deep learning model how to correctly label a set of directives. The original model makes a guess of what some input is and if the model is wrong, the supervised method will tell the model to make adjustments. These adjustments are based of the error between the guess and the correct label. The higher the error, the bigger the adjustments. The function which checks the error is called the cost function and to calculate these adjustments of the model... (More)
 Deep learning have replaced many roles for decision making and it continues to be central in existing systems and will no doubt increase its relevancy for the future. Such deep learning models uses mostly a form of machine learning technique called supervised learning, This technique tells the deep learning model how to correctly label a set of directives. The original model makes a guess of what some input is and if the model is wrong, the supervised method will tell the model to make adjustments. These adjustments are based of the error between the guess and the correct label. The higher the error, the bigger the adjustments. The function which checks the error is called the cost function and to calculate these adjustments of the model we use a method called gradient descent. This is a type of derivative of the cost function to measure the change we need to perform. The gradient of this cost function tell us in which direction the change is the biggest. To update our deep learning model, we go in the negative direction. However, these methods do not tell us anything about the function outside the evaluation of this gradient. Second order methods will give us more information about the cost function itself. So we develop methods with the help of calculus to determine this function’s properties. Some of these properties are valuable for the machine learning process and can help guide the model more efficient. We can combine these properties with existing algorithms for faster convergence in finding the zeros of the cost function which we want to minimize. Higherorder derivatives are also necessary when solving partial differential equations (PDE) for physics induced neural networks (PINN). In this case, these derivatives will act as a constraint for the learning process in the supervised learning algorithm. Physics induced neural networks are deep learning models that will give us a solution of some PDE based on how we use our higherorder derivatives, initial and boundary conditions which we give to this network. The idea is that these PINN models will give us the solution almost immediately instead of having to solve PDEs each and every time we make minor changes to the initial and boundary conditions. For certain applications, solving a PDE may take hours even for modern supercomputers. Using PINN models we may speed up the process significantly. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/studentpapers/record/9067752
 author
 Sjögren, Simon ^{LU}
 supervisor

 Tony Stillfjord ^{LU}
 organization
 course
 NUMM03 20211
 year
 2021
 type
 H2  Master's Degree (Two Years)
 subject
 publication/series
 Master's Theses in Mathematical Sciences
 report number
 LUNFNA30362021
 ISSN
 14046342
 other publication id
 2021:E66
 language
 English
 id
 9067752
 date added to LUP
 20211123 14:31:33
 date last changed
 20211123 14:31:33
@misc{9067752, abstract = {{We introduce rigorous theory for deriving first and second order backpropagation methods for Deep Neural Networks (DNN) whilst satisfying existing theory in DNN optimization. We begin by formally defining a neural network with its respective components and state the first and second order chain rule with respect to its partial derivatives. The partial derivatives in the chain rule formula are related by an operator and this operator depends on the method on which the feedforward process is defined. As a corollary, we notice that the chain rule behave independently of which neural network architecture was used, however, the chain rule operations depend solely on the feedforward structure of the neural network (NN). By changing from a fullyconnected (FC) NN to a convolution neural network (CNN), the backpropagation algorithm, with the chain rule, uses the same structure as the respective feedforward method and can be used for both CNN and FC regardless. We compare results between first and second order optimization methods for the Cifar10 dataset using Pytorch's own implementation of backpropagation which utilizes Autograd \cite{autograd} to compute its gradients, the alternative is our implementation and derivation of the secondorder method AdaHessian \cite{adah}, which shows equal if not slightly improved results. The code for this project can be found at \url{https://github.com/Simonws92/Code/tree/main/Master_thesis}.}}, author = {{Sjögren, Simon}}, issn = {{14046342}}, language = {{eng}}, note = {{Student Paper}}, series = {{Master's Theses in Mathematical Sciences}}, title = {{The derivation of first and secondorder backpropagation methods for fullyconnected and convolutional neural networks}}, year = {{2021}}, }