Evaluate Transformer model and Self-Attention mechanism in the Yangtze River basin runoff prediction

Wei, Xikun; Wang, Guojie; Schmalz, Britta; Hagan, Daniel Fiifi Tawia; Duan, Zheng

Evaluate Transformer model and Self-Attention mechanism in the Yangtze River basin runoff prediction

Mark

Wei, Xikun ; Wang, Guojie ; Schmalz, Britta ; Hagan, Daniel Fiifi Tawia and Duan, Zheng ^LU (2023) In Journal of Hydrology: Regional Studies 47.

Abstract: Study region: In the Yangtze River basin of China. Study focus: We applied a recently popular deep learning (DL) algorithm, Transformer (TSF), and two commonly used DL methods, Long-Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), to evaluate the performance of TSF in predicting runoff in the Yangtze River basin. We also add the main structure of TSF, Self-Attention (SA), to the LSTM and GRU models, namely LSTM-SA and GRU-SA, to investigate whether the inclusion of the SA mechanism can improve the prediction capability. Seven climatic observations (mean temperature, maximum temperature, precipitation, etc.) are the input data in our study. The whole dataset was divided into training, validation and test datasets. In addition, we... (More); Study region: In the Yangtze River basin of China. Study focus: We applied a recently popular deep learning (DL) algorithm, Transformer (TSF), and two commonly used DL methods, Long-Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), to evaluate the performance of TSF in predicting runoff in the Yangtze River basin. We also add the main structure of TSF, Self-Attention (SA), to the LSTM and GRU models, namely LSTM-SA and GRU-SA, to investigate whether the inclusion of the SA mechanism can improve the prediction capability. Seven climatic observations (mean temperature, maximum temperature, precipitation, etc.) are the input data in our study. The whole dataset was divided into training, validation and test datasets. In addition, we investigated the relationship between model performance and input time steps. New hydrological insights for the region: Our experimental results show that the GRU has the best performance with the fewest parameters while the TSF has the worst performance due to the lack of sufficient data. GRU and the LSTM models are better than TSF for runoff prediction when the training samples are limited (such as the model parameters being ten times larger than the samples). Furthermore, the SA mechanism improves the prediction accuracy when added to the LSTM and the GRU structures. Different input time steps (5 d, 10 d, 15 d, 20 d, 25 d and 30 d) are used to train the DL models with different prediction lengths to understand their relationship with model performance, showing that an appropriate input time step can significantly improve the model performance.
(Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/23b675e2-9e78-44dd-9bc7-8475ce70951f

author

Wei, Xikun ; Wang, Guojie ; Schmalz, Britta ; Hagan, Daniel Fiifi Tawia and Duan, Zheng ^LU

organization

publishing date

2023

type

Contribution to journal

publication status

published

subject

keywords

GRU, LSTM, Runoff prediction, Self-Attention, Transformer

in

Journal of Hydrology: Regional Studies

volume

47

article number

101438

publisher

Elsevier

external identifiers

scopus:85163278876

ISSN

2214-5818

DOI

10.1016/j.ejrh.2023.101438

language

English

LU publication?

yes

id

23b675e2-9e78-44dd-9bc7-8475ce70951f

date added to LUP

2023-09-18 13:12:42

date last changed

2025-04-04 14:46:48

@article{23b675e2-9e78-44dd-9bc7-8475ce70951f,
  abstract     = {{<p>Study region: In the Yangtze River basin of China. Study focus: We applied a recently popular deep learning (DL) algorithm, Transformer (TSF), and two commonly used DL methods, Long-Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), to evaluate the performance of TSF in predicting runoff in the Yangtze River basin. We also add the main structure of TSF, Self-Attention (SA), to the LSTM and GRU models, namely LSTM-SA and GRU-SA, to investigate whether the inclusion of the SA mechanism can improve the prediction capability. Seven climatic observations (mean temperature, maximum temperature, precipitation, etc.) are the input data in our study. The whole dataset was divided into training, validation and test datasets. In addition, we investigated the relationship between model performance and input time steps. New hydrological insights for the region: Our experimental results show that the GRU has the best performance with the fewest parameters while the TSF has the worst performance due to the lack of sufficient data. GRU and the LSTM models are better than TSF for runoff prediction when the training samples are limited (such as the model parameters being ten times larger than the samples). Furthermore, the SA mechanism improves the prediction accuracy when added to the LSTM and the GRU structures. Different input time steps (5 d, 10 d, 15 d, 20 d, 25 d and 30 d) are used to train the DL models with different prediction lengths to understand their relationship with model performance, showing that an appropriate input time step can significantly improve the model performance.</p>}},
  author       = {{Wei, Xikun and Wang, Guojie and Schmalz, Britta and Hagan, Daniel Fiifi Tawia and Duan, Zheng}},
  issn         = {{2214-5818}},
  keywords     = {{GRU; LSTM; Runoff prediction; Self-Attention; Transformer}},
  language     = {{eng}},
  publisher    = {{Elsevier}},
  series       = {{Journal of Hydrology: Regional Studies}},
  title        = {{Evaluate Transformer model and Self-Attention mechanism in the Yangtze River basin runoff prediction}},
  url          = {{http://dx.doi.org/10.1016/j.ejrh.2023.101438}},
  doi          = {{10.1016/j.ejrh.2023.101438}},
  volume       = {{47}},
  year         = {{2023}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Evaluate Transformer model and Self-Attention mechanism in the Yangtze River basin runoff prediction