Application of random forest and generalised linear model and their hybrid methods with geostatistical techniques to count data : Predicting sponge species richness
(2017) In Environmental Modelling and Software 97. p.112-129- Abstract
Spatial distribution of sponge species richness (SSR) and its relationship with environment are important for marine ecosystem management, but they are either unavailable or unknown. Hence we applied random forest (RF), generalised linear model (GLM) and their hybrid methods with geostatistical techniques to SSR data by addressing relevant issues with variable selection and model selection. It was found that: 1) of five variable selection methods, one is suitable for selecting optimal RF predictive models; 2) traditional model selection methods are unsuitable for identifying GLM predictive models and joint application of RF and AIC can select accuracy-improved models; 3) highly correlated predictors may improve RF predictive accuracy;... (More)
Spatial distribution of sponge species richness (SSR) and its relationship with environment are important for marine ecosystem management, but they are either unavailable or unknown. Hence we applied random forest (RF), generalised linear model (GLM) and their hybrid methods with geostatistical techniques to SSR data by addressing relevant issues with variable selection and model selection. It was found that: 1) of five variable selection methods, one is suitable for selecting optimal RF predictive models; 2) traditional model selection methods are unsuitable for identifying GLM predictive models and joint application of RF and AIC can select accuracy-improved models; 3) highly correlated predictors may improve RF predictive accuracy; 4) hybrid methods for RF can accurately predict count data; and 5) effects of model averaging are method-dependent. This study depicted the non-linear relationships of SSR and predictors, generated spatial distribution of SSR with high accuracy and revealed the association of high SSR with hard seabed features.
(Less)
- author
- Li, Jin-Song ; Alvarez, Belinda LU ; Siwabessy, Justy ; Tran, Maggie ; Huang, Zhi ; Przeslawski, Rachel ; Radke, Lynda ; Howard, Floyd and Nichol, Scott
- publishing date
- 2017-11-01
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- Feature selection, Machine learning, Model selection, Predictive accuracy, Spatial prediction, Spatial predictive model
- in
- Environmental Modelling and Software
- volume
- 97
- pages
- 18 pages
- publisher
- Elsevier
- external identifiers
-
- scopus:85026776020
- ISSN
- 1364-8152
- DOI
- 10.1016/j.envsoft.2017.07.016
- language
- English
- LU publication?
- no
- id
- d328d545-c7aa-44b0-9acf-126216ed42f1
- date added to LUP
- 2017-08-25 11:04:55
- date last changed
- 2022-03-17 00:33:29
@article{d328d545-c7aa-44b0-9acf-126216ed42f1, abstract = {{<p>Spatial distribution of sponge species richness (SSR) and its relationship with environment are important for marine ecosystem management, but they are either unavailable or unknown. Hence we applied random forest (RF), generalised linear model (GLM) and their hybrid methods with geostatistical techniques to SSR data by addressing relevant issues with variable selection and model selection. It was found that: 1) of five variable selection methods, one is suitable for selecting optimal RF predictive models; 2) traditional model selection methods are unsuitable for identifying GLM predictive models and joint application of RF and AIC can select accuracy-improved models; 3) highly correlated predictors may improve RF predictive accuracy; 4) hybrid methods for RF can accurately predict count data; and 5) effects of model averaging are method-dependent. This study depicted the non-linear relationships of SSR and predictors, generated spatial distribution of SSR with high accuracy and revealed the association of high SSR with hard seabed features.</p>}}, author = {{Li, Jin-Song and Alvarez, Belinda and Siwabessy, Justy and Tran, Maggie and Huang, Zhi and Przeslawski, Rachel and Radke, Lynda and Howard, Floyd and Nichol, Scott}}, issn = {{1364-8152}}, keywords = {{Feature selection; Machine learning; Model selection; Predictive accuracy; Spatial prediction; Spatial predictive model}}, language = {{eng}}, month = {{11}}, pages = {{112--129}}, publisher = {{Elsevier}}, series = {{Environmental Modelling and Software}}, title = {{Application of random forest and generalised linear model and their hybrid methods with geostatistical techniques to count data : Predicting sponge species richness}}, url = {{http://dx.doi.org/10.1016/j.envsoft.2017.07.016}}, doi = {{10.1016/j.envsoft.2017.07.016}}, volume = {{97}}, year = {{2017}}, }