Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Applications of Diversity and the Self-Attention Mechanism in Neural Networks

Berg, Axel LU orcid (2022) In Licentiate Thesis in Mathematical Sciences 2022(1).
Abstract
This thesis covers three contributions in applications of neural networks. The first is related to diversity and ensemble learning, while the other two cover novel applications of the self-attention mechanism. An important aspect of training a neural network is the choice of objective function. Regression via Classification (RvC) is often used to tackle problems in deep learning where the target variable is continuous, but standard regression objectives fail to capture the underlying distance metric of the domain. This can result in better performance of the trained model, but the optimal choice of discrete classes used in RvC is not well understood. In Paper 1, we introduce the concept of label diversity by generalizing the RvC method. By... (More)
This thesis covers three contributions in applications of neural networks. The first is related to diversity and ensemble learning, while the other two cover novel applications of the self-attention mechanism. An important aspect of training a neural network is the choice of objective function. Regression via Classification (RvC) is often used to tackle problems in deep learning where the target variable is continuous, but standard regression objectives fail to capture the underlying distance metric of the domain. This can result in better performance of the trained model, but the optimal choice of discrete classes used in RvC is not well understood. In Paper 1, we introduce the concept of label diversity by generalizing the RvC method. By exploiting the fact that labels can be generated in arbitrary ways for continuous and ordinal target variables, we show that using multiple labels can improve the prediction accuracy of a neural network compared to using a single label and provide theoretical justification from ensemble theory. We apply our method to several tasks in computer vision and show increased performance compared to regression and RvC baselines. The performance of a neural network is also influenced by the choice of network architecture, and in the design process it is important to consider the domain of the inputs and its symmetries. Graph neural networks (GNNs) is the family of networks that operates on graphs, where in-formation is propagated between the graph nodes using for example self-attention. However, self-attention can be used for other data domains as well if the inputs can be converted into graphs, which is not always trivial. In Paper 2, we do this for audio by using a complete graph over audio features extracted from different time slots. We apply this technique to the task of keyword spotting and show that a neural network solely based on self-attention is more accurate than previously considered architectures. Finally, in Paper 3 we apply attention-based learning to point cloud processing, where the permutation symmetry must be preserved. In order to make the self-attention mechanism both more efficient and more expressive, we propose a hierarchical approach that allows individual points to interact on both a local and global scale. By extensive experiments on several bench-marks, we show that this approach improves the descriptiveness of the learned features, while simultaneously reducing the computational complexity compared to an architecture that applies self-attention naively on all input points. (Less)
Abstract (Swedish)

This thesis covers three contributions in applications of neural networks. The first is related to diversity and ensemble learning, while the other two cover novel applications of the self-attention mechanism. An important aspect of training a neural network is the choice of objective function. Regression via Classification (RvC) is often used to tackle problems in deep learning where the target variable is continuous, but standard regression objectives fail to capture the underlying distance metric of the domain. This can result in better performance of the trained model, but the optimal choice... (More)

This thesis covers three contributions in applications of neural networks. The first is related to diversity and ensemble learning, while the other two cover novel applications of the self-attention mechanism. An important aspect of training a neural network is the choice of objective function. Regression via Classification (RvC) is often used to tackle problems in deep learning where the target variable is continuous, but standard regression objectives fail to capture the underlying distance metric of the domain. This can result in better performance of the trained model, but the optimal choice of discrete classes used in RvC is not well understood. In Paper 1, we introduce the concept of label diversity by generalizing the RvC method. By exploiting the fact that labels can be generated in arbitrary ways for continuous and ordinal target variables, we show that using multiple labels can improve the prediction accuracy of a neural network compared to using a single label and provide theoretical justification from ensemble theory. We apply our method to several tasks in computer vision and show increased performance compared to regression and RvC baselines. The performance of a neural network is also influenced by the choice of network architecture, and in the design process it is important to consider the domain of the inputs and its symmetries. Graph neural networks (GNNs) is the family of networks that operates on graphs, where in-formation is propagated between the graph nodes using for example self-attention. However, self-attention can be used for other data domains as well if the inputs can be converted into graphs, which is not always trivial. In Paper 2, we do this for audio by using a complete graph over audio features extracted from different time slots. We apply this technique to the task of keyword spotting and show that a neural network solely based on self-attention is more accurate than previously considered architectures. Finally, in Paper 3 we apply attention-based learning to point cloud processing, where the permutation symmetry must be preserved. In order to make the self-attention mechanism both more efficient and more expressive, we propose a hierarchical approach that allows individual points to interact on both a local and global scale. By extensive experiments on several bench-marks, we show that this approach improves the descriptiveness of the learned features, while simultaneously reducing the computational complexity compared to an architecture that applies self-attention naively on all input points.

(Less)
Please use this url to cite or link to this publication:
author
supervisor
organization
publishing date
type
Thesis
publication status
published
subject
keywords
point cloud, self-attention, keyword spotting, deep learning, computer vision, machine learning, label diversity
in
Licentiate Thesis in Mathematical Sciences
volume
2022
issue
1
pages
39 pages
publisher
Lund University, Faculty of Science, Centre for Mathematical Sciences, Mathematics
ISSN
1404-0034
ISBN
978­91­8039­151­1
978­91­8039­152­8
project
Deep Learning for Simultaneous Localization and Mapping
language
English
LU publication?
yes
id
81b7e624-4031-41ad-b03f-3cca05002ec8
date added to LUP
2022-04-06 15:20:08
date last changed
2022-05-17 02:19:38
@misc{81b7e624-4031-41ad-b03f-3cca05002ec8,
  abstract     = {{This thesis covers three contributions in applications of neural networks. The first is related to diversity and ensemble learning, while the other two cover novel applications of the self-attention mechanism. An important aspect of training a neural network is the choice of objective function. Regression via Classification (RvC) is often used to tackle problems in deep learning where the target variable is continuous, but standard regression objectives fail to capture the underlying distance metric of the domain. This can result in better performance of the trained model, but the optimal choice of discrete classes used in RvC is not well understood. In Paper 1, we introduce the concept of label diversity by generalizing the RvC method. By exploiting the fact that labels can be generated in arbitrary ways for continuous and ordinal target variables, we show that using multiple labels can improve the prediction accuracy of a neural network compared to using a single label and provide theoretical justification from ensemble theory. We apply our method to several tasks in computer vision and show increased performance compared to regression and RvC baselines. The performance of a neural network is also influenced by the choice of network architecture, and in the design process it is important to consider the domain of the inputs and its symmetries. Graph neural networks (GNNs) is the family of networks that operates on graphs, where in-formation is propagated between the graph nodes using for example self-attention. However, self-attention can be used for other data domains as well if the inputs can be converted into graphs, which is not always trivial. In Paper 2, we do this for audio by using a complete graph over audio features extracted from different time slots. We apply this technique to the task of keyword spotting and show that a neural network solely based on self-attention is more accurate than previously considered architectures. Finally, in Paper 3 we apply attention-based learning to point cloud processing, where the permutation symmetry must be preserved. In order to make the self-attention mechanism both more efficient and more expressive, we propose a hierarchical approach that allows individual points to interact on both a local and global scale. By extensive experiments on several bench-marks, we show that this approach improves the descriptiveness of the learned features, while simultaneously reducing the computational complexity compared to an architecture that applies self-attention naively on all input points.}},
  author       = {{Berg, Axel}},
  isbn         = {{978­91­8039­151­1}},
  issn         = {{1404-0034}},
  keywords     = {{point cloud; self-attention; keyword spotting; deep learning; computer vision; machine learning; label diversity}},
  language     = {{eng}},
  note         = {{Licentiate Thesis}},
  number       = {{1}},
  publisher    = {{Lund University, Faculty of Science, Centre for Mathematical Sciences, Mathematics}},
  series       = {{Licentiate Thesis in Mathematical Sciences}},
  title        = {{Applications of Diversity and the Self-Attention Mechanism in Neural Networks}},
  url          = {{https://lup.lub.lu.se/search/files/116591878/kappa.pdf}},
  volume       = {{2022}},
  year         = {{2022}},
}