Convolutional Neural Network Emulators for DGVMs - A Supervised Machine Learning Approach to Big Data Processing

Nilsson, Amanda

Convolutional Neural Network Emulators for DGVMs - A Supervised Machine Learning Approach to Big Data Processing

Mark

Nilsson, Amanda ^LU (2019) In LUNFMS-4030-2019 MASK11 20191
Mathematical Statistics

Abstract: This paper investigates the possibility to train a convolutional neural network (CNN) that, by capturing temporal features in weather data, can estimate the expected amount of wheat produced during any year, at any geographical location. The aim is to establish whether a CNN can be used for emulation of simulated global crop production - as responses to changes in CO2, temperature, water, and nitrogen levels - retrieved from the dynamic global vegetation model (DGVM) Lund-Potsdam-Jena General Ecosystem Simulator (LPJ-GUESS), taking part in the Global Gridded Crop Model Intercomparison (GGCMI) study. I.e. if a CNN can be used to obtain yield estimates at a lower computational cost than those coming from the DGVM.
Before investigating... (More); This paper investigates the possibility to train a convolutional neural network (CNN) that, by capturing temporal features in weather data, can estimate the expected amount of wheat produced during any year, at any geographical location. The aim is to establish whether a CNN can be used for emulation of simulated global crop production - as responses to changes in CO2, temperature, water, and nitrogen levels - retrieved from the dynamic global vegetation model (DGVM) Lund-Potsdam-Jena General Ecosystem Simulator (LPJ-GUESS), taking part in the Global Gridded Crop Model Intercomparison (GGCMI) study. I.e. if a CNN can be used to obtain yield estimates at a lower computational cost than those coming from the DGVM.
Before investigating different CNNs and whether they can be used for emulation of annual yield, the paper goes through some analysis of weather data, the basic concepts of convolutional neural networks, how to construct them and analyze what they learn.
The results show that a CNN can be used for emulating annual wheat at any location and year without being given spatiotemporal positional arguments and should hence be considered a worthy candidate for emulation. It could be concluded that the temporal resolution could be decreased from one to five-day averages, however, no further investigation of substituting summary statistics was conducted. We are thus left with the problem of unequally sized input series. It also raises new questions regarding whether we can rely on the assumptions of the expected day of heading and the harvest dates, as it would help the CNN, with its otherwise location invariant pattern recognition, to easier distinguish between different periods around sowing, heading and reaping. (Less)
Popular Abstract: A growing world population alongside climate change and greater uncertainties in weather increases the concern about food security and raises questions about vulnerabilities and potential adaptation strategies in the agricultural sector. This has eventuated in the need for dynamic global vegetation models (DGVMs) that can predict vegetation in many different climate scenarios, out of which many have not yet been seen in the historical record, as well as in new potential cultivation locations that have no previous record of food production.
A problem with such vegetation models is their computational burden, especially when various climate scenarios are of interest. Cheap estimates of the simulator outputs can be retrieved from a mimicking... (More); A growing world population alongside climate change and greater uncertainties in weather increases the concern about food security and raises questions about vulnerabilities and potential adaptation strategies in the agricultural sector. This has eventuated in the need for dynamic global vegetation models (DGVMs) that can predict vegetation in many different climate scenarios, out of which many have not yet been seen in the historical record, as well as in new potential cultivation locations that have no previous record of food production.
A problem with such vegetation models is their computational burden, especially when various climate scenarios are of interest. Cheap estimates of the simulator outputs can be retrieved from a mimicking emulator, or surrogate model, which can be seen as a statistical representation of the simulator, trained to model the mapping of input data to output targets.
This paper investigates the possibility to train a convolutional neural network (CNN) that, by capturing temporal features in weather data, can estimate the expected amount of wheat produced during any year and at any geographical location. The aim is to establish whether a CNN can be used for emulation of simulated global crop yield - as responses to changes in CO2, temperature, water, and nitrogen levels - retrieved from the dynamic global vegetation model (DGVM) Lund-Potsdam-Jena General Ecosystem Simulator (LPJ-GUESS), taking part in the Global Gridded Crop Model Intercomparison (GGCMI) study. I.e., if a CNN can be used to obtain yield estimates at a lower computational cost than those coming from the DGVM.
CNNs have the ability to massively parallel process big data, with many types of well-established machine learning and statistical techniques, and has become a popular tool for pattern recognition in weather and climate-related problems like the one considered here. Neural networks can make wonders without demanding that much of the modeler in terms of understanding or statistical knowledge, but CNNs, in particular, allow for a thorough analysis of what they learn and can easily be visualized. By displaying where the convolutional neural network puts most weight, we can get a better understanding of how the weather affects the yield and on how to - if possible - reduce or aggregate the input weather data.
The results show that a CNN can be used for emulating annual wheat at any location and year without being given spatiotemporal positional arguments and should hence be considered a worthy candidate for emulation. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9004780

author

Nilsson, Amanda ^LU

supervisor

Johan Lindström ^LU

organization

Mathematical Statistics

course

MASK11 20191

year

2019

type

M2 - Bachelor Degree

subject

Mathematics and Statistics

keywords

Big data, convolutional neural network (CNN), emulator, surrogate model, dynamic global vegetation model (DGVM), statistical modeling, feature detection, automated pattern recognition, supervised machine learning, deep learning, predictive analytics.

publication/series

LUNFMS-4030-2019

report number

2019:K28

ISSN

1654-6229

language

English

id

9004780

date added to LUP

2020-10-05 14:28:36

date last changed

2020-10-05 14:28:36

@misc{9004780,
  abstract     = {{This paper investigates the possibility to train a convolutional neural network (CNN) that, by capturing temporal features in weather data, can estimate the expected amount of wheat produced during any year, at any geographical location. The aim is to establish whether a CNN can be used for emulation of simulated global crop production - as responses to changes in CO2, temperature, water, and nitrogen levels - retrieved from the dynamic global vegetation model (DGVM) Lund-Potsdam-Jena General Ecosystem Simulator (LPJ-GUESS), taking part in the Global Gridded Crop Model Intercomparison (GGCMI) study. I.e. if a CNN can be used to obtain yield estimates at a lower computational cost than those coming from the DGVM.
Before investigating different CNNs and whether they can be used for emulation of annual yield, the paper goes through some analysis of weather data, the basic concepts of convolutional neural networks, how to construct them and analyze what they learn.
The results show that a CNN can be used for emulating annual wheat at any location and year without being given spatiotemporal positional arguments and should hence be considered a worthy candidate for emulation. It could be concluded that the temporal resolution could be decreased from one to five-day averages, however, no further investigation of substituting summary statistics was conducted. We are thus left with the problem of unequally sized input series. It also raises new questions regarding whether we can rely on the assumptions of the expected day of heading and the harvest dates, as it would help the CNN, with its otherwise location invariant pattern recognition, to easier distinguish between different periods around sowing, heading and reaping.}},
  author       = {{Nilsson, Amanda}},
  issn         = {{1654-6229}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{LUNFMS-4030-2019}},
  title        = {{Convolutional Neural Network Emulators for DGVMs - A Supervised Machine Learning Approach to Big Data Processing}},
  year         = {{2019}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Convolutional Neural Network Emulators for DGVMs - A Supervised Machine Learning Approach to Big Data Processing