Combined analysis of satellite and ground data for winter wheat yield forecasting
(2023) In Smart Agricultural Technology 3.- Abstract
We built machine learning and image analysis tools in order to forecast winter wheat yield based on a rich multi dimensional tensor of agricultural information spanning different scales. This information consists of satellite multi-band images, local soil samples obtained from national databases, local weather as well as field data from 23 farms cultivating winter wheat in southern Sweden. This is inherently a large multi-scale problem due to the large temporal and spatial variation of the input data. We aggregate the data on spatially averaged features over grids which temporally span a seasonal timeline from seeding to harvest. Data cleaning is performed through interpolation for satellite images due to cloud obstructions. Furthermore... (More)
We built machine learning and image analysis tools in order to forecast winter wheat yield based on a rich multi dimensional tensor of agricultural information spanning different scales. This information consists of satellite multi-band images, local soil samples obtained from national databases, local weather as well as field data from 23 farms cultivating winter wheat in southern Sweden. This is inherently a large multi-scale problem due to the large temporal and spatial variation of the input data. We aggregate the data on spatially averaged features over grids which temporally span a seasonal timeline from seeding to harvest. Data cleaning is performed through interpolation for satellite images due to cloud obstructions. Furthermore data is heavily imbalanced since the amount of satellite information far exceeds that of the ground data. Data variance therefore can be an issue which we counter by using a decision tree approach. We find that the Light Gradient Boosting decision tree trained on 262 input features is able to predict winter wheat yield with 82% accuracy. Subsequently we employ game theory in order to better understand the relational importance of specific input features towards forecasting yield. Specifically we find that some of the most important features towards the resulting predictions are the percent clay and magnesium in the soil. Similarly the most important features from the satellite data are: a) the NORM index (Euclidean distance of all bands) computed in the second week of April, b) the NORM index computed in the middle of May as well as c) the second spectral band from the last week of June.
(Less)
- author
- Broms, Camilla ; Nilsson, Mikael LU ; Oxenstierna, Andreas ; Sopasakis, Alexandros LU and Åström, Karl LU
- organization
- publishing date
- 2023-02
- type
- Contribution to journal
- publication status
- published
- subject
- keywords
- Decision trees, Relational importance, Satellite, Shapley values, Soil samples, Winter wheat yield
- in
- Smart Agricultural Technology
- volume
- 3
- article number
- 100107
- publisher
- Elsevier
- external identifiers
-
- scopus:85148379558
- ISSN
- 2772-3755
- DOI
- 10.1016/j.atech.2022.100107
- language
- English
- LU publication?
- yes
- additional info
- Publisher Copyright: © 2022 The Authors
- id
- 243ccda8-3658-483d-a9e6-fa1f4572db83
- date added to LUP
- 2023-03-05 21:06:54
- date last changed
- 2023-11-17 10:06:36
@article{243ccda8-3658-483d-a9e6-fa1f4572db83, abstract = {{<p>We built machine learning and image analysis tools in order to forecast winter wheat yield based on a rich multi dimensional tensor of agricultural information spanning different scales. This information consists of satellite multi-band images, local soil samples obtained from national databases, local weather as well as field data from 23 farms cultivating winter wheat in southern Sweden. This is inherently a large multi-scale problem due to the large temporal and spatial variation of the input data. We aggregate the data on spatially averaged features over grids which temporally span a seasonal timeline from seeding to harvest. Data cleaning is performed through interpolation for satellite images due to cloud obstructions. Furthermore data is heavily imbalanced since the amount of satellite information far exceeds that of the ground data. Data variance therefore can be an issue which we counter by using a decision tree approach. We find that the Light Gradient Boosting decision tree trained on 262 input features is able to predict winter wheat yield with 82% accuracy. Subsequently we employ game theory in order to better understand the relational importance of specific input features towards forecasting yield. Specifically we find that some of the most important features towards the resulting predictions are the percent clay and magnesium in the soil. Similarly the most important features from the satellite data are: a) the NORM index (Euclidean distance of all bands) computed in the second week of April, b) the NORM index computed in the middle of May as well as c) the second spectral band from the last week of June.</p>}}, author = {{Broms, Camilla and Nilsson, Mikael and Oxenstierna, Andreas and Sopasakis, Alexandros and Åström, Karl}}, issn = {{2772-3755}}, keywords = {{Decision trees; Relational importance; Satellite; Shapley values; Soil samples; Winter wheat yield}}, language = {{eng}}, publisher = {{Elsevier}}, series = {{Smart Agricultural Technology}}, title = {{Combined analysis of satellite and ground data for winter wheat yield forecasting}}, url = {{http://dx.doi.org/10.1016/j.atech.2022.100107}}, doi = {{10.1016/j.atech.2022.100107}}, volume = {{3}}, year = {{2023}}, }