Imputation of missing values in a precipitation-runoff process database

Kalteh, Aman Mohammad; Hjorth, Peder

Imputation of missing values in a precipitation-runoff process database

Mark

Kalteh, Aman Mohammad ^LU and Hjorth, Peder ^LU (2009) In Hydrology Research 40(4). p.420-432

Abstract: Hydrologists are often faced with the problem of missing values in a precipitation-runoff process database to construct runoff prediction models. They tend to use simple and naive methods to deal with the problem of missing data. Thus far, the common practice has been to discard observations with missing values. In this paper, we present some statistically principled methods for gap filling and discuss the pros and cons of these methods. We employ and discuss imputations of missing values by means of self-organizing map (SOM), multilayer perceptron (MLP), multivariate nearest-neighbor (MNN), regularized expectation-maximization algorithm (REGEM) and multiple imputation (MI) in the context of a precipitation-runoff process database in... (More); Hydrologists are often faced with the problem of missing values in a precipitation-runoff process database to construct runoff prediction models. They tend to use simple and naive methods to deal with the problem of missing data. Thus far, the common practice has been to discard observations with missing values. In this paper, we present some statistically principled methods for gap filling and discuss the pros and cons of these methods. We employ and discuss imputations of missing values by means of self-organizing map (SOM), multilayer perceptron (MLP), multivariate nearest-neighbor (MNN), regularized expectation-maximization algorithm (REGEM) and multiple imputation (MI) in the context of a precipitation-runoff process database in northern Iran in order to construct a serially complete database for analyses such as runoff prediction. In our case, the SOM and MNN tend to give similar and robust results. REGEM and MI build on the assumption of multivariate normal data, which we don't seem to have in one of our cases. MLP tends to produce inferior results because it fragments the data into 68 different models. Therefore, we conclude that it makes most sense to use either the computationally simple MNN method or the more demanding SOM. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/1463185

author

Kalteh, Aman Mohammad ^LU and Hjorth, Peder ^LU

organization

Division of Water Resources Engineering

publishing date

2009

type

Contribution to journal

publication status

published

subject

Water Engineering

keywords

values, missing, MI, REGEM, MNN, MLP, data fill in, imputation methods: SOM, serially complete data

in

Hydrology Research

volume

40

issue

4

pages

420 - 432

publisher

IWA Publishing

external identifiers

wos:000267568500006
scopus:67651163465

ISSN

1998-9563

DOI

10.2166/nh.2009.001

language

English

LU publication?

yes

id

ce578f48-019c-4676-96da-671fd5937157 (old id 1463185)

date added to LUP

2016-04-01 12:00:54

date last changed

2025-04-04 13:56:45

@article{ce578f48-019c-4676-96da-671fd5937157,
  abstract     = {{Hydrologists are often faced with the problem of missing values in a precipitation-runoff process database to construct runoff prediction models. They tend to use simple and naive methods to deal with the problem of missing data. Thus far, the common practice has been to discard observations with missing values. In this paper, we present some statistically principled methods for gap filling and discuss the pros and cons of these methods. We employ and discuss imputations of missing values by means of self-organizing map (SOM), multilayer perceptron (MLP), multivariate nearest-neighbor (MNN), regularized expectation-maximization algorithm (REGEM) and multiple imputation (MI) in the context of a precipitation-runoff process database in northern Iran in order to construct a serially complete database for analyses such as runoff prediction. In our case, the SOM and MNN tend to give similar and robust results. REGEM and MI build on the assumption of multivariate normal data, which we don't seem to have in one of our cases. MLP tends to produce inferior results because it fragments the data into 68 different models. Therefore, we conclude that it makes most sense to use either the computationally simple MNN method or the more demanding SOM.}},
  author       = {{Kalteh, Aman Mohammad and Hjorth, Peder}},
  issn         = {{1998-9563}},
  keywords     = {{values; missing; MI; REGEM; MNN; MLP; data fill in; imputation methods: SOM; serially complete data}},
  language     = {{eng}},
  number       = {{4}},
  pages        = {{420--432}},
  publisher    = {{IWA Publishing}},
  series       = {{Hydrology Research}},
  title        = {{Imputation of missing values in a precipitation-runoff process database}},
  url          = {{http://dx.doi.org/10.2166/nh.2009.001}},
  doi          = {{10.2166/nh.2009.001}},
  volume       = {{40}},
  year         = {{2009}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Imputation of missing values in a precipitation-runoff process database