# LUP Student Papers

## LUND UNIVERSITY LIBRARIES

### The derivation of first- and second-order backpropagation methods for fully-connected and convolutional neural networks

(2021) In Master's Theses in Mathematical Sciences NUMM03 20211
Mathematics (Faculty of Engineering)
Centre for Mathematical Sciences
Abstract
We introduce rigorous theory for deriving first and second order backpropagation methods for Deep Neural Networks (DNN) whilst satisfying existing theory in DNN optimization. We begin by formally defining a neural network with its respective components and state the first and second order chain rule with respect to its partial derivatives. The partial derivatives in the chain rule formula are related by an operator and this operator depends on the method on which the feedforward process is defined. As a corollary, we notice that the chain rule behave independently of which neural network architecture was used, however, the chain rule operations depend solely on the feedforward structure of the neural network (NN). By changing from a... (More)
We introduce rigorous theory for deriving first and second order backpropagation methods for Deep Neural Networks (DNN) whilst satisfying existing theory in DNN optimization. We begin by formally defining a neural network with its respective components and state the first and second order chain rule with respect to its partial derivatives. The partial derivatives in the chain rule formula are related by an operator and this operator depends on the method on which the feedforward process is defined. As a corollary, we notice that the chain rule behave independently of which neural network architecture was used, however, the chain rule operations depend solely on the feedforward structure of the neural network (NN). By changing from a fully-connected (FC) NN to a convolution neural network (CNN), the backpropagation algorithm, with the chain rule, uses the same structure as the respective feedforward method and can be used for both CNN and FC regardless. We compare results between first and second order optimization methods for the Cifar-10 dataset using Pytorch's own implementation of backpropagation which utilizes Autograd \cite{autograd} to compute its gradients, the alternative is our implementation and derivation of the second-order method AdaHessian \cite{adah}, which shows equal if not slightly improved results. The code for this project can be found at \url{https://github.com/Simonws92/Code/tree/main/Master_thesis}. (Less)
Popular Abstract
Deep learning have replaced many roles for decision making and it continues to be central in existing systems and will no doubt increase its relevancy for the future. Such deep learning models uses mostly a form of machine learning technique called supervised learning, This technique tells the deep learning model how to correctly label a set of directives. The original model makes a guess of what some input is and if the model is wrong, the supervised method will tell the model to make adjustments. These adjustments are based of the error between the guess and the correct label. The higher the error, the bigger the adjustments. The function which checks the error is called the cost function and to calculate these adjustments of the model... (More)
Please use this url to cite or link to this publication:
author
supervisor
organization
course
NUMM03 20211
year
type
H2 - Master's Degree (Two Years)
subject
publication/series
Master's Theses in Mathematical Sciences
report number
LUNFNA-3036-2021
ISSN
1404-6342
other publication id
2021:E66
language
English
id
9067752
date added to LUP
2021-11-23 14:31:33
date last changed
2021-11-23 14:31:33
```@misc{9067752,
abstract     = {{We introduce rigorous theory for deriving first and second order backpropagation methods for Deep Neural Networks (DNN) whilst satisfying existing theory in DNN optimization. We begin by formally defining a neural network with its respective components and state the first and second order chain rule with respect to its partial derivatives. The partial derivatives in the chain rule formula are related by an operator and this operator depends on the method on which the feedforward process is defined. As a corollary, we notice that the chain rule behave independently of which neural network architecture was used, however, the chain rule operations depend solely on the feedforward structure of the neural network (NN). By changing from a fully-connected (FC) NN to a convolution neural network (CNN), the backpropagation algorithm, with the chain rule, uses the same structure as the respective feedforward method and can be used for both CNN and FC regardless. We compare results between first and second order optimization methods for the Cifar-10 dataset using Pytorch's own implementation of backpropagation which utilizes Autograd \cite{autograd} to compute its gradients, the alternative is our implementation and derivation of the second-order method AdaHessian \cite{adah}, which shows equal if not slightly improved results. The code for this project can be found at \url{https://github.com/Simonws92/Code/tree/main/Master_thesis}.}},
author       = {{Sjögren, Simon}},
issn         = {{1404-6342}},
language     = {{eng}},
note         = {{Student Paper}},
series       = {{Master's Theses in Mathematical Sciences}},
title        = {{The derivation of first- and second-order backpropagation methods for fully-connected and convolutional neural networks}},
year         = {{2021}},
}

```