Advanced

Components of uncertainty in species distribution analysis: a case study of the Great Grey Shrike

Dormann, Carsten F.; Purschke, Oliver LU ; Marquez, Jaime R. Garcia; Lautenbach, Sven and Schroeder, Boris (2008) In Ecology 89(12). p.3371-3386
Abstract
Sophisticated statistical analyses are common in ecological research, particularly in species distribution modeling. The effects of sometimes arbitrary decisions during the modeling procedure on the final outcome are difficult to assess, and to date are largely unexplored. We conducted an analysis quantifying the contribution of uncertainty in each step during the model-building sequence to variation in model validity and climate change projection uncertainty. Our study system was the distribution of the Great Grey Shrike in the German federal state of Saxony. For each of four steps (data quality, collinearity method, model type, and variable selection), we ran three different options in a factorial experiment, leading to 81 different... (More)
Sophisticated statistical analyses are common in ecological research, particularly in species distribution modeling. The effects of sometimes arbitrary decisions during the modeling procedure on the final outcome are difficult to assess, and to date are largely unexplored. We conducted an analysis quantifying the contribution of uncertainty in each step during the model-building sequence to variation in model validity and climate change projection uncertainty. Our study system was the distribution of the Great Grey Shrike in the German federal state of Saxony. For each of four steps (data quality, collinearity method, model type, and variable selection), we ran three different options in a factorial experiment, leading to 81 different model approaches. Each was subjected to a fivefold cross-validation, measuring area under curve (AUC) to assess model quality. Next, we used three climate change scenarios times three precipitation realizations to project future distributions from each model, yielding 729 projections. Again, we analyzed which step introduced most variability (the four model-building steps plus the two scenario steps) into predicted species prevalences by the year 2050. Predicted prevalences ranged from a factor of 0.2 to a factor of 10 of present prevalence, with the majority of predictions between 1.1 and 4.2 (inter-quartile range). We found that model type and data quality dominated this analysis. In particular, artificial neural networks yielded low cross-validation robustness and gave very conservative climate change predictions. Generalized linear and additive models were very similar in quality and predictions, and superior to neural networks. Variations in scenarios and realizations had very little effect, due to the small spatial extent of the study region and its relatively small range of climatic conditions. We conclude that, for climate projections, model type and data quality were the most influential factors. Since comparison of model types has received good coverage in the ecological literature, effects of data quality should now come under more scrutiny. (Less)
Please use this url to cite or link to this publication:
author
organization
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Saxony, stepwise model selection, Germany, sequential, regression, species distribution model, prediction, GLM, Generalized Linear Models, GAM, Generalized Additive Models, data uncertainty, collinearity, climate change, artificial neural network, best subset regression
in
Ecology
volume
89
issue
12
pages
3371 - 3386
publisher
Ecological Society of America
external identifiers
  • wos:000261524000013
  • scopus:60249084749
ISSN
0012-9658
DOI
10.1890/07-1772.1
language
English
LU publication?
yes
id
29da3b3b-bf3d-4c73-a12d-413f3b3b23b2 (old id 1305686)
date added to LUP
2009-03-23 12:24:45
date last changed
2017-11-12 03:36:45
@article{29da3b3b-bf3d-4c73-a12d-413f3b3b23b2,
  abstract     = {Sophisticated statistical analyses are common in ecological research, particularly in species distribution modeling. The effects of sometimes arbitrary decisions during the modeling procedure on the final outcome are difficult to assess, and to date are largely unexplored. We conducted an analysis quantifying the contribution of uncertainty in each step during the model-building sequence to variation in model validity and climate change projection uncertainty. Our study system was the distribution of the Great Grey Shrike in the German federal state of Saxony. For each of four steps (data quality, collinearity method, model type, and variable selection), we ran three different options in a factorial experiment, leading to 81 different model approaches. Each was subjected to a fivefold cross-validation, measuring area under curve (AUC) to assess model quality. Next, we used three climate change scenarios times three precipitation realizations to project future distributions from each model, yielding 729 projections. Again, we analyzed which step introduced most variability (the four model-building steps plus the two scenario steps) into predicted species prevalences by the year 2050. Predicted prevalences ranged from a factor of 0.2 to a factor of 10 of present prevalence, with the majority of predictions between 1.1 and 4.2 (inter-quartile range). We found that model type and data quality dominated this analysis. In particular, artificial neural networks yielded low cross-validation robustness and gave very conservative climate change predictions. Generalized linear and additive models were very similar in quality and predictions, and superior to neural networks. Variations in scenarios and realizations had very little effect, due to the small spatial extent of the study region and its relatively small range of climatic conditions. We conclude that, for climate projections, model type and data quality were the most influential factors. Since comparison of model types has received good coverage in the ecological literature, effects of data quality should now come under more scrutiny.},
  author       = {Dormann, Carsten F. and Purschke, Oliver and Marquez, Jaime R. Garcia and Lautenbach, Sven and Schroeder, Boris},
  issn         = {0012-9658},
  keyword      = {Saxony,stepwise model selection,Germany,sequential,regression,species distribution model,prediction,GLM,Generalized Linear Models,GAM,Generalized Additive Models,data uncertainty,collinearity,climate change,artificial neural network,best subset regression},
  language     = {eng},
  number       = {12},
  pages        = {3371--3386},
  publisher    = {Ecological Society of America},
  series       = {Ecology},
  title        = {Components of uncertainty in species distribution analysis: a case study of the Great Grey Shrike},
  url          = {http://dx.doi.org/10.1890/07-1772.1},
  volume       = {89},
  year         = {2008},
}