Skip to main content

Lund University Publications

LUND UNIVERSITY LIBRARIES

Adaptive noise-augmented attention for enhancing Transformer fine-tuning on longitudinal medical data

Amirahmadi, Ali ; Etminani, Farzaneh and Ohlsson, Mattias LU orcid (2025) In Frontiers in Artificial Intelligence 8.
Abstract

Transformer models pre-trained on self-supervised tasks and fine-tuned on downstream objectives have achieved remarkable results across a variety of domains. However, fine-tuning these models for clinical predictions from longitudinal medical data, such as electronic health records (EHR), remains challenging due to limited labeled data and the complex, event-driven nature of medical sequences. While self-attention mechanisms are powerful for capturing relationships within sequences, they may underperform when modeling subtle dependencies between sparse clinical events under limited supervision. We introduce a simple yet effective fine-tuning technique, Adaptive Noise-Augmented Attention (ANAA), which injects adaptive noise directly into... (More)

Transformer models pre-trained on self-supervised tasks and fine-tuned on downstream objectives have achieved remarkable results across a variety of domains. However, fine-tuning these models for clinical predictions from longitudinal medical data, such as electronic health records (EHR), remains challenging due to limited labeled data and the complex, event-driven nature of medical sequences. While self-attention mechanisms are powerful for capturing relationships within sequences, they may underperform when modeling subtle dependencies between sparse clinical events under limited supervision. We introduce a simple yet effective fine-tuning technique, Adaptive Noise-Augmented Attention (ANAA), which injects adaptive noise directly into the self-attention weights and applies a 2D Gaussian kernel to smooth the resulting attention maps. This mechanism broadens the attention distribution across tokens while refining it to emphasize more informative events. Unlike prior approaches that require expensive modifications to the architecture and pre-training phase, ANAA operates entirely during fine-tuning. Empirical results across multiple clinical prediction tasks demonstrate consistent performance improvements. Furthermore, we analyze how ANAA shapes the learned attention behavior, offering interpretable insights into the model's handling of temporal dependencies in EHR data.

(Less)
Please use this url to cite or link to this publication:
author
; and
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
adaptive noise, augmentation, electronic health records (EHR), fine-tuning, medical data, representation learning, self-attention, Transformer
in
Frontiers in Artificial Intelligence
volume
8
article number
1663484
publisher
Frontiers Media S. A.
external identifiers
  • scopus:105018335736
ISSN
2624-8212
DOI
10.3389/frai.2025.1663484
language
English
LU publication?
yes
id
6ce119d0-702e-4aaf-837b-bb1f820038bb
date added to LUP
2026-01-08 15:38:42
date last changed
2026-01-08 15:39:40
@article{6ce119d0-702e-4aaf-837b-bb1f820038bb,
  abstract     = {{<p>Transformer models pre-trained on self-supervised tasks and fine-tuned on downstream objectives have achieved remarkable results across a variety of domains. However, fine-tuning these models for clinical predictions from longitudinal medical data, such as electronic health records (EHR), remains challenging due to limited labeled data and the complex, event-driven nature of medical sequences. While self-attention mechanisms are powerful for capturing relationships within sequences, they may underperform when modeling subtle dependencies between sparse clinical events under limited supervision. We introduce a simple yet effective fine-tuning technique, Adaptive Noise-Augmented Attention (ANAA), which injects adaptive noise directly into the self-attention weights and applies a 2D Gaussian kernel to smooth the resulting attention maps. This mechanism broadens the attention distribution across tokens while refining it to emphasize more informative events. Unlike prior approaches that require expensive modifications to the architecture and pre-training phase, ANAA operates entirely during fine-tuning. Empirical results across multiple clinical prediction tasks demonstrate consistent performance improvements. Furthermore, we analyze how ANAA shapes the learned attention behavior, offering interpretable insights into the model's handling of temporal dependencies in EHR data.</p>}},
  author       = {{Amirahmadi, Ali and Etminani, Farzaneh and Ohlsson, Mattias}},
  issn         = {{2624-8212}},
  keywords     = {{adaptive noise; augmentation; electronic health records (EHR); fine-tuning; medical data; representation learning; self-attention; Transformer}},
  language     = {{eng}},
  publisher    = {{Frontiers Media S. A.}},
  series       = {{Frontiers in Artificial Intelligence}},
  title        = {{Adaptive noise-augmented attention for enhancing Transformer fine-tuning on longitudinal medical data}},
  url          = {{http://dx.doi.org/10.3389/frai.2025.1663484}},
  doi          = {{10.3389/frai.2025.1663484}},
  volume       = {{8}},
  year         = {{2025}},
}