Applications of Diversity and the Self-Attention Mechanism in Neural Networks

Berg, Axel

Applications of Diversity and the Self-Attention Mechanism in Neural Networks

Mark

Berg, Axel ^LU

(2022) In Licentiate Thesis in Mathematical Sciences 2022(1).

Abstract: This thesis covers three contributions in applications of neural networks. The first is related to diversity and ensemble learning, while the other two cover novel applications of the self-attention mechanism. An important aspect of training a neural network is the choice of objective function. Regression via Classification (RvC) is often used to tackle problems in deep learning where the target variable is continuous, but standard regression objectives fail to capture the underlying distance metric of the domain. This can result in better performance of the trained model, but the optimal choice of discrete classes used in RvC is not well understood. In Paper 1, we introduce the concept of label diversity by generalizing the RvC method. By... (More); This thesis covers three contributions in applications of neural networks. The first is related to diversity and ensemble learning, while the other two cover novel applications of the self-attention mechanism. An important aspect of training a neural network is the choice of objective function. Regression via Classification (RvC) is often used to tackle problems in deep learning where the target variable is continuous, but standard regression objectives fail to capture the underlying distance metric of the domain. This can result in better performance of the trained model, but the optimal choice of discrete classes used in RvC is not well understood. In Paper 1, we introduce the concept of label diversity by generalizing the RvC method. By exploiting the fact that labels can be generated in arbitrary ways for continuous and ordinal target variables, we show that using multiple labels can improve the prediction accuracy of a neural network compared to using a single label and provide theoretical justification from ensemble theory. We apply our method to several tasks in computer vision and show increased performance compared to regression and RvC baselines. The performance of a neural network is also influenced by the choice of network architecture, and in the design process it is important to consider the domain of the inputs and its symmetries. Graph neural networks (GNNs) is the family of networks that operates on graphs, where in-formation is propagated between the graph nodes using for example self-attention. However, self-attention can be used for other data domains as well if the inputs can be converted into graphs, which is not always trivial. In Paper 2, we do this for audio by using a complete graph over audio features extracted from different time slots. We apply this technique to the task of keyword spotting and show that a neural network solely based on self-attention is more accurate than previously considered architectures. Finally, in Paper 3 we apply attention-based learning to point cloud processing, where the permutation symmetry must be preserved. In order to make the self-attention mechanism both more efficient and more expressive, we propose a hierarchical approach that allows individual points to interact on both a local and global scale. By extensive experiments on several bench-marks, we show that this approach improves the descriptiveness of the learned features, while simultaneously reducing the computational complexity compared to an architecture that applies self-attention naively on all input points. (Less)
Abstract (Swedish): This thesis covers three contributions in applications of neural networks. The first is related to diversity and ensemble learning, while the other two cover novel applications of the self-attention mechanism. An important aspect of training a neural network is the choice of objective function. Regression via Classification (RvC) is often used to tackle problems in deep learning where the target variable is continuous, but standard regression objectives fail to capture the underlying distance metric of the domain. This can result in better performance of the trained model, but the optimal choice... (More); This thesis covers three contributions in applications of neural networks. The first is related to diversity and ensemble learning, while the other two cover novel applications of the self-attention mechanism. An important aspect of training a neural network is the choice of objective function. Regression via Classification (RvC) is often used to tackle problems in deep learning where the target variable is continuous, but standard regression objectives fail to capture the underlying distance metric of the domain. This can result in better performance of the trained model, but the optimal choice of discrete classes used in RvC is not well understood. In Paper 1, we introduce the concept of label diversity by generalizing the RvC method. By exploiting the fact that labels can be generated in arbitrary ways for continuous and ordinal target variables, we show that using multiple labels can improve the prediction accuracy of a neural network compared to using a single label and provide theoretical justification from ensemble theory. We apply our method to several tasks in computer vision and show increased performance compared to regression and RvC baselines. The performance of a neural network is also influenced by the choice of network architecture, and in the design process it is important to consider the domain of the inputs and its symmetries. Graph neural networks (GNNs) is the family of networks that operates on graphs, where in-formation is propagated between the graph nodes using for example self-attention. However, self-attention can be used for other data domains as well if the inputs can be converted into graphs, which is not always trivial. In Paper 2, we do this for audio by using a complete graph over audio features extracted from different time slots. We apply this technique to the task of keyword spotting and show that a neural network solely based on self-attention is more accurate than previously considered architectures. Finally, in Paper 3 we apply attention-based learning to point cloud processing, where the permutation symmetry must be preserved. In order to make the self-attention mechanism both more efficient and more expressive, we propose a hierarchical approach that allows individual points to interact on both a local and global scale. By extensive experiments on several bench-marks, we show that this approach improves the descriptiveness of the learned features, while simultaneously reducing the computational complexity compared to an architecture that applies self-attention naively on all input points.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/81b7e624-4031-41ad-b03f-3cca05002ec8

author

Berg, Axel ^LU

supervisor

organization

Mathematics (Faculty of Engineering)

publishing date

2022

type

Thesis

publication status

published

subject

keywords

point cloud, self-attention, keyword spotting, deep learning, computer vision, machine learning, label diversity

in

Licentiate Thesis in Mathematical Sciences

volume

2022

issue

1

pages

39 pages

publisher

Lund University, Faculty of Science, Centre for Mathematical Sciences, Mathematics

ISSN

1404-0034

ISBN

9789180391528

9789180391511

project

Deep Learning for Simultaneous Localization and Mapping

language

English

LU publication?

yes

id

81b7e624-4031-41ad-b03f-3cca05002ec8

date added to LUP

2022-04-06 15:20:08

date last changed

2025-04-04 15:14:09

@misc{81b7e624-4031-41ad-b03f-3cca05002ec8,
  abstract     = {{This thesis covers three contributions in applications of neural networks. The first is related to diversity and ensemble learning, while the other two cover novel applications of the self-attention mechanism. An important aspect of training a neural network is the choice of objective function. Regression via Classification (RvC) is often used to tackle problems in deep learning where the target variable is continuous, but standard regression objectives fail to capture the underlying distance metric of the domain. This can result in better performance of the trained model, but the optimal choice of discrete classes used in RvC is not well understood. In Paper 1, we introduce the concept of label diversity by generalizing the RvC method. By exploiting the fact that labels can be generated in arbitrary ways for continuous and ordinal target variables, we show that using multiple labels can improve the prediction accuracy of a neural network compared to using a single label and provide theoretical justification from ensemble theory. We apply our method to several tasks in computer vision and show increased performance compared to regression and RvC baselines. The performance of a neural network is also influenced by the choice of network architecture, and in the design process it is important to consider the domain of the inputs and its symmetries. Graph neural networks (GNNs) is the family of networks that operates on graphs, where in-formation is propagated between the graph nodes using for example self-attention. However, self-attention can be used for other data domains as well if the inputs can be converted into graphs, which is not always trivial. In Paper 2, we do this for audio by using a complete graph over audio features extracted from different time slots. We apply this technique to the task of keyword spotting and show that a neural network solely based on self-attention is more accurate than previously considered architectures. Finally, in Paper 3 we apply attention-based learning to point cloud processing, where the permutation symmetry must be preserved. In order to make the self-attention mechanism both more efficient and more expressive, we propose a hierarchical approach that allows individual points to interact on both a local and global scale. By extensive experiments on several bench-marks, we show that this approach improves the descriptiveness of the learned features, while simultaneously reducing the computational complexity compared to an architecture that applies self-attention naively on all input points.}},
  author       = {{Berg, Axel}},
  isbn         = {{9789180391528}},
  issn         = {{1404-0034}},
  keywords     = {{point cloud; self-attention; keyword spotting; deep learning; computer vision; machine learning; label diversity}},
  language     = {{eng}},
  note         = {{Licentiate Thesis}},
  number       = {{1}},
  publisher    = {{Lund University, Faculty of Science, Centre for Mathematical Sciences, Mathematics}},
  series       = {{Licentiate Thesis in Mathematical Sciences}},
  title        = {{Applications of Diversity and the Self-Attention Mechanism in Neural Networks}},
  url          = {{https://lup.lub.lu.se/search/files/116591878/kappa.pdf}},
  volume       = {{2022}},
  year         = {{2022}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Applications of Diversity and the Self-Attention Mechanism in Neural Networks