Advanced

Application of random forest and generalised linear model and their hybrid methods with geostatistical techniques to count data : Predicting sponge species richness

Li, Jin-Song; Alvarez, Belinda LU ; Siwabessy, Justy; Tran, Maggie; Huang, Zhi; Przeslawski, Rachel; Radke, Lynda; Howard, Floyd and Nichol, Scott (2017) In Environmental Modelling and Software 97. p.112-129
Abstract

Spatial distribution of sponge species richness (SSR) and its relationship with environment are important for marine ecosystem management, but they are either unavailable or unknown. Hence we applied random forest (RF), generalised linear model (GLM) and their hybrid methods with geostatistical techniques to SSR data by addressing relevant issues with variable selection and model selection. It was found that: 1) of five variable selection methods, one is suitable for selecting optimal RF predictive models; 2) traditional model selection methods are unsuitable for identifying GLM predictive models and joint application of RF and AIC can select accuracy-improved models; 3) highly correlated predictors may improve RF predictive accuracy;... (More)

Spatial distribution of sponge species richness (SSR) and its relationship with environment are important for marine ecosystem management, but they are either unavailable or unknown. Hence we applied random forest (RF), generalised linear model (GLM) and their hybrid methods with geostatistical techniques to SSR data by addressing relevant issues with variable selection and model selection. It was found that: 1) of five variable selection methods, one is suitable for selecting optimal RF predictive models; 2) traditional model selection methods are unsuitable for identifying GLM predictive models and joint application of RF and AIC can select accuracy-improved models; 3) highly correlated predictors may improve RF predictive accuracy; 4) hybrid methods for RF can accurately predict count data; and 5) effects of model averaging are method-dependent. This study depicted the non-linear relationships of SSR and predictors, generated spatial distribution of SSR with high accuracy and revealed the association of high SSR with hard seabed features.

(Less)
Please use this url to cite or link to this publication:
author
publishing date
type
Contribution to journal
publication status
published
subject
keywords
Feature selection, Machine learning, Model selection, Predictive accuracy, Spatial prediction, Spatial predictive model
in
Environmental Modelling and Software
volume
97
pages
18 pages
publisher
Elsevier
external identifiers
  • scopus:85026776020
ISSN
1364-8152
DOI
10.1016/j.envsoft.2017.07.016
language
English
LU publication?
no
id
d328d545-c7aa-44b0-9acf-126216ed42f1
date added to LUP
2017-08-25 11:04:55
date last changed
2017-09-10 05:23:15
@article{d328d545-c7aa-44b0-9acf-126216ed42f1,
  abstract     = {<p>Spatial distribution of sponge species richness (SSR) and its relationship with environment are important for marine ecosystem management, but they are either unavailable or unknown. Hence we applied random forest (RF), generalised linear model (GLM) and their hybrid methods with geostatistical techniques to SSR data by addressing relevant issues with variable selection and model selection. It was found that: 1) of five variable selection methods, one is suitable for selecting optimal RF predictive models; 2) traditional model selection methods are unsuitable for identifying GLM predictive models and joint application of RF and AIC can select accuracy-improved models; 3) highly correlated predictors may improve RF predictive accuracy; 4) hybrid methods for RF can accurately predict count data; and 5) effects of model averaging are method-dependent. This study depicted the non-linear relationships of SSR and predictors, generated spatial distribution of SSR with high accuracy and revealed the association of high SSR with hard seabed features.</p>},
  author       = {Li, Jin-Song and Alvarez, Belinda and Siwabessy, Justy and Tran, Maggie and Huang, Zhi and Przeslawski, Rachel and Radke, Lynda and Howard, Floyd and Nichol, Scott},
  issn         = {1364-8152},
  keyword      = {Feature selection,Machine learning,Model selection,Predictive accuracy,Spatial prediction,Spatial predictive model},
  language     = {eng},
  month        = {11},
  pages        = {112--129},
  publisher    = {Elsevier},
  series       = {Environmental Modelling and Software},
  title        = {Application of random forest and generalised linear model and their hybrid methods with geostatistical techniques to count data : Predicting sponge species richness},
  url          = {http://dx.doi.org/10.1016/j.envsoft.2017.07.016},
  volume       = {97},
  year         = {2017},
}