Points to patches: Enabling the use of self-attention for 3D shape recognition

Berg, Axel; Oskarsson, Magnus; O'Connor, Mark

Points to patches: Enabling the use of self-attention for 3D shape recognition

Mark

Berg, Axel ^LU

; Oskarsson, Magnus ^LU

and O'Connor, Mark (2022) 26TH International Conference on Pattern Recognition, 2022 In International Conference on Pattern Recognition p.528-534

Abstract: While the Transformer architecture has become ubiquitous in the machine learning field, its adaptation to 3D shape recognition is non-trivial. Due to its quadratic computational complexity, the self-attention operator quickly becomes inefficient as the set of input points grows larger. Furthermore, we find that the attention mechanism struggles to find useful connections between individual points on a global scale. In order to alleviate these problems, we propose a two-stage Point Transformer-in-Transformer (Point-TnT) approach which combines local and global attention mechanisms, enabling both individual points and patches of points to attend to each other effectively. Experiments on shape classification show that such an approach... (More); While the Transformer architecture has become ubiquitous in the machine learning field, its adaptation to 3D shape recognition is non-trivial. Due to its quadratic computational complexity, the self-attention operator quickly becomes inefficient as the set of input points grows larger. Furthermore, we find that the attention mechanism struggles to find useful connections between individual points on a global scale. In order to alleviate these problems, we propose a two-stage Point Transformer-in-Transformer (Point-TnT) approach which combines local and global attention mechanisms, enabling both individual points and patches of points to attend to each other effectively. Experiments on shape classification show that such an approach provides more useful features for downstream tasks than the baseline Transformer, while also being more computationally efficient. In addition, we also extend our method to feature matching for scene reconstruction, showing that it can be used in conjunction with existing scene reconstruction pipelines. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/981ccdb2-f80d-4ce4-b744-690fe33f6107

author

Berg, Axel ^LU

; Oskarsson, Magnus ^LU

and O'Connor, Mark

organization

publishing date

2022-08-21

type

Chapter in Book/Report/Conference proceeding

publication status

published

subject

Computer graphics and computer vision

host publication

2022 26th International Conference on Pattern Recognition (ICPR)

series title

International Conference on Pattern Recognition

pages

7 pages

publisher

IEEE - Institute of Electrical and Electronics Engineers Inc.

conference name

26TH International Conference on Pattern Recognition, 2022

conference location

Montreal, Canada

conference dates

2022-08-21 - 2022-08-25

external identifiers

scopus:85143583530

ISSN

2831-7475

1051-4651

ISBN

978-1-6654-9062-7

DOI

10.48550/arXiv.2204.03957

project

Deep Learning for Simultaneous Localization and Mapping

language

English

LU publication?

yes

id

981ccdb2-f80d-4ce4-b744-690fe33f6107

date added to LUP

2022-12-16 08:32:51

date last changed

2025-12-14 01:05:40

@inproceedings{981ccdb2-f80d-4ce4-b744-690fe33f6107,
  abstract     = {{While the Transformer architecture has become ubiquitous in the machine learning field, its adaptation to 3D shape recognition is non-trivial. Due to its quadratic computational complexity, the self-attention operator quickly becomes inefficient as the set of input points grows larger. Furthermore, we find that the attention mechanism struggles to find useful connections between individual points on a global scale. In order to alleviate these problems, we propose a two-stage Point Transformer-in-Transformer (Point-TnT) approach which combines local and global attention mechanisms, enabling both individual points and patches of points to attend to each other effectively. Experiments on shape classification show that such an approach provides more useful features for downstream tasks than the baseline Transformer, while also being more computationally efficient. In addition, we also extend our method to feature matching for scene reconstruction, showing that it can be used in conjunction with existing scene reconstruction pipelines.}},
  author       = {{Berg, Axel and Oskarsson, Magnus and O'Connor, Mark}},
  booktitle    = {{2022 26th International Conference on Pattern Recognition (ICPR)}},
  isbn         = {{978-1-6654-9062-7}},
  issn         = {{2831-7475}},
  language     = {{eng}},
  month        = {{08}},
  pages        = {{528--534}},
  publisher    = {{IEEE - Institute of Electrical and Electronics Engineers Inc.}},
  series       = {{International Conference on Pattern Recognition}},
  title        = {{Points to patches: Enabling the use of self-attention for 3D shape recognition}},
  url          = {{http://dx.doi.org/10.48550/arXiv.2204.03957}},
  doi          = {{10.48550/arXiv.2204.03957}},
  year         = {{2022}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Points to patches: Enabling the use of self-attention for 3D shape recognition