Combined use of Sentinel-2 and Sentinel-1 data for wheat crop yield forecasting with machine learning algorithms

Mallol Díaz, Jesús

Combined use of Sentinel-2 and Sentinel-1 data for wheat crop yield forecasting with machine learning algorithms

Mark

Mallol Díaz, Jesús (2023) In Student thesis series INES NGEM01 20231
Dept of Physical Geography and Ecosystem Science

Abstract: Vegetation indices derived from remotely sensed optical data are commonly used for crop monitoring in precision agriculture, but their information capability can be hindered by the presence of clouds. Consequently, areas with frequent cloud cover like the southwest of Sweden may have limited access to usable optical data. Nonetheless, the addition of synthetic aperture radar (SAR) data can provide supporting information due to being capable of penetrating clouds to retrieve information about the physical characteristics of the surface.

This thesis evaluated the combined use of remotely sensed vegetation indices derived from Sentinel-2 and radar polarizations from Sentinel-1 for developing machine learning models to forecast wheat yield... (More); Vegetation indices derived from remotely sensed optical data are commonly used for crop monitoring in precision agriculture, but their information capability can be hindered by the presence of clouds. Consequently, areas with frequent cloud cover like the southwest of Sweden may have limited access to usable optical data. Nonetheless, the addition of synthetic aperture radar (SAR) data can provide supporting information due to being capable of penetrating clouds to retrieve information about the physical characteristics of the surface.

This thesis evaluated the combined use of remotely sensed vegetation indices derived from Sentinel-2 and radar polarizations from Sentinel-1 for developing machine learning models to forecast wheat yield during the growing season. Multiple linear regression (MLR) and three non-linear machine learning algorithms, namely support vector regression (SVR), random forest regression (RFR), and CatBoost, were used for developing wheat yield forecasting models in three experiments.

The time span experiments, which used multiple dates of Sentinel-2 data, determined that the first half of July was the most optimal time period for forecasting wheat yield relying on MLR and Sentinel-2 data alone (R2 = 0.74; RMSE = 0.48 t/ha). Additionally, the best performance results were achieved by models with fewer Sentinel-2 dates. This observation prompted the use of single Sentinel-2 dates for producing the models in the growth stage experiments, which evaluated the impact of adding Sentinel-1 data on the forecasting accuracy of the four regression methods at different developmental stages of wheat. The most optimal model of the growth stage experiments occurred 42 days before the harvesting date using CatBoost and combining Sentinel-2 and Sentinel-1, with an R2 of 0.75 and an RMSE of 0.48 t/ha. Finally, the performance of the best growth stage model over the course of the growing season was evaluated in the temporal evolution experiments. Crop yield forecasting maps were produced every two weeks, and multidate models were developed to evaluate the impact of adding Sentinel-1 data. Early detection of high and low yield areas in the field was achieved 72 days before the harvesting date using the crop yield maps, thus providing valuable information within the actionable timeframe for adjusting farming management practices. As well, the most accurate forecast was produced 32 days before the harvesting date by combining multiple dates of Sentinel-2 and Sentinel-1 data with the CatBoost algorithm (R2 = 0.83; RMSE = 0.39 t/ha).

This study highlights the potential of empirical regression models for forecasting wheat yield by combining Sentinel-2 and Sentinel-1 data. Specifically, the results indicate that non-linear algorithms outperform linear regression models, while the addition of Sentinel-1 data improves model performance regardless of the evaluated time period. (Less)
Popular Abstract: This thesis evaluated the combined use of optical data from Sentinel-2 and radar data from Sentinel-1 for forecasting wheat crop yield in a field located in the southwest of Sweden. Optical data is commonly used in the form of vegetation indices, which are formulas that combine the different spectral bands of a satellite to quantify plant health and growth. However, the use of vegetation indices in the presence of clouds is limited, thus areas with frequent cloud cover like the study site have access to fewer usable optical data. This limitation can be overcome by supplementing optical data with synthetic aperture radar (SAR) data, which penetrates clouds to collect information about the physical characteristics of the surface.

Four... (More); This thesis evaluated the combined use of optical data from Sentinel-2 and radar data from Sentinel-1 for forecasting wheat crop yield in a field located in the southwest of Sweden. Optical data is commonly used in the form of vegetation indices, which are formulas that combine the different spectral bands of a satellite to quantify plant health and growth. However, the use of vegetation indices in the presence of clouds is limited, thus areas with frequent cloud cover like the study site have access to fewer usable optical data. This limitation can be overcome by supplementing optical data with synthetic aperture radar (SAR) data, which penetrates clouds to collect information about the physical characteristics of the surface.

Four regression methods were implemented for forecasting wheat yield in a series of experiments. One linear approach was used as a benchmark to compare the performance of three non-linear machine learning algorithms. While the linear models studied the linear relationships between the yield and the satellite data, the machine learning algorithms evaluated non-linear relationships that can improve the forecasting accuracy of the models.

The time span experiments used multiple dates of optical data and the linear approach to identify suitable time periods for developing models, concluding that the most optimal models had fewer dates and were close to the first half of July. The growth stage experiments evaluated the use of single dates of optical data and the addition of radar data with the four regression methods. The best growth stage model achieved the highest forecasting accuracy 42 days before the harvesting date using a non-linear algorithm called CatBoost and combining optical and radar data. Finally, the temporal evolution experiments explored the performance of the best growth stage model over the course of the growing season by producing crop yield forecasting maps. In the second part of the temporal evolution experiments, models were produced using the period where the forecasting maps resembled the final observed yield map to test the impact of adding radar data. The earliest forecasting map was produced 72 days before the harvesting date, while the highest forecasting accuracy was achieved 32 days before the harvesting date by a model developed in the second part of the temporal evolution experiments combining optical and radar data.

The best growth stage model and the most accurate temporal evolution model occurred beyond the date for adjusting farming management practices to improve wheat yield. However, the earliest forecasting map provided useful information in time for applying fertilizer in the areas with low yield, thus being an effective crop yield monitoring tool. In conclusion, this thesis shows that accurate and reliable wheat yield forecasts can be achieved by combining optical and radar data and using machine learning algorithms. (Less)

Please use this url to cite or link to this publication: http://lup.lub.lu.se/student-papers/record/9135398

author

Mallol Díaz, Jesús

supervisor

Houssaine Bouras ^LU
Lars Eklundh ^LU

organization

Dept of Physical Geography and Ecosystem Science

course

NGEM01 20231

year

2023

type

H2 - Master's Degree (Two Years)

subject

Earth and Environmental Sciences

keywords

Physical Geography and Ecosystem Analysis, crop yield, precision agriculture, remote sensing, machine learning, Sentinel-2, Sentinel-1

publication/series

Student thesis series INES

report number

612

language

English

id

9135398

date added to LUP

2023-08-29 10:43:28

date last changed

2024-08-28 03:43:10

@misc{9135398,
  abstract     = {{Vegetation indices derived from remotely sensed optical data are commonly used for crop monitoring in precision agriculture, but their information capability can be hindered by the presence of clouds. Consequently, areas with frequent cloud cover like the southwest of Sweden may have limited access to usable optical data. Nonetheless, the addition of synthetic aperture radar (SAR) data can provide supporting information due to being capable of penetrating clouds to retrieve information about the physical characteristics of the surface. 

This thesis evaluated the combined use of remotely sensed vegetation indices derived from Sentinel-2 and radar polarizations from Sentinel-1 for developing machine learning models to forecast wheat yield during the growing season. Multiple linear regression (MLR) and three non-linear machine learning algorithms, namely support vector regression (SVR), random forest regression (RFR), and CatBoost, were used for developing wheat yield forecasting models in three experiments. 

The time span experiments, which used multiple dates of Sentinel-2 data, determined that the first half of July was the most optimal time period for forecasting wheat yield relying on MLR and Sentinel-2 data alone (R2 = 0.74; RMSE = 0.48 t/ha). Additionally, the best performance results were achieved by models with fewer Sentinel-2 dates. This observation prompted the use of single Sentinel-2 dates for producing the models in the growth stage experiments, which evaluated the impact of adding Sentinel-1 data on the forecasting accuracy of the four regression methods at different developmental stages of wheat. The most optimal model of the growth stage experiments occurred 42 days before the harvesting date using CatBoost and combining Sentinel-2 and Sentinel-1, with an R2 of 0.75 and an RMSE of 0.48 t/ha. Finally, the performance of the best growth stage model over the course of the growing season was evaluated in the temporal evolution experiments. Crop yield forecasting maps were produced every two weeks, and multidate models were developed to evaluate the impact of adding Sentinel-1 data. Early detection of high and low yield areas in the field was achieved 72 days before the harvesting date using the crop yield maps, thus providing valuable information within the actionable timeframe for adjusting farming management practices. As well, the most accurate forecast was produced 32 days before the harvesting date by combining multiple dates of Sentinel-2 and Sentinel-1 data with the CatBoost algorithm (R2 = 0.83; RMSE = 0.39 t/ha).

This study highlights the potential of empirical regression models for forecasting wheat yield by combining Sentinel-2 and Sentinel-1 data. Specifically, the results indicate that non-linear algorithms outperform linear regression models, while the addition of Sentinel-1 data improves model performance regardless of the evaluated time period.}},
  author       = {{Mallol Díaz, Jesús}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Student thesis series INES}},
  title        = {{Combined use of Sentinel-2 and Sentinel-1 data for wheat crop yield forecasting with machine learning algorithms}},
  year         = {{2023}},
}

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Combined use of Sentinel-2 and Sentinel-1 data for wheat crop yield forecasting with machine learning algorithms