Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Wind Turbine Recovery Forecasting using Survival Analysis

Palets, Anton LU (2023) In Master’s Theses in Mathematical Sciences MASM02 20231
Mathematical Statistics
Abstract
The goal of this thesis is to present a methodology for predicting time until recovery of failed wind turbines. The necessity is motivated by the potential for more accurate wind energy export forecasts. The current approach rests entirely on having an expert examine the turbine and produce a time estimate. Due to its nature, such a prediction cannot be made immediately upon failure. Five common survival analysis models are evaluated in regard to their ability to correctly classify recovery as happening within or after 24 hours of failure, and point prediction error in the case that the failure event is predicted to resolve within 24 hours. A method for nonparametric clustering of survival curves is developed, that is used to reduce the... (More)
The goal of this thesis is to present a methodology for predicting time until recovery of failed wind turbines. The necessity is motivated by the potential for more accurate wind energy export forecasts. The current approach rests entirely on having an expert examine the turbine and produce a time estimate. Due to its nature, such a prediction cannot be made immediately upon failure. Five common survival analysis models are evaluated in regard to their ability to correctly classify recovery as happening within or after 24 hours of failure, and point prediction error in the case that the failure event is predicted to resolve within 24 hours. A method for nonparametric clustering of survival curves is developed, that is used to reduce the number of variables in the examined models. The Weibull Accelerated Failure Time model with the clustered error codes and logarithm of energy produced in the month prior to failure is found to perform significantly better than alternatives. Classification is optimized by finding optimal thresholds using ROC curves. An attempt is made to present theory necessary to motivate the models used. (Less)
Popular Abstract
This Master's thesis in Mathematical Statistics has as a goal the development of a model that can reliably predict whether a broken wind turbine will recover within 24 hours or not, and if it will, in how many hours. We opt to develop a methodology that could be readily applied to data different from that examined in our work. In particular, this means that the data at hand could be approached with simpler models, but we opt to give it a more general treatment, as our data is simply an example of how a turbine failure and recovery dataset can look like. Such preferences lead us to the choice of survival analysis as a preferred framework. Survival analysis is a statistical field that addresses any time-to-event data, originally getting its... (More)
This Master's thesis in Mathematical Statistics has as a goal the development of a model that can reliably predict whether a broken wind turbine will recover within 24 hours or not, and if it will, in how many hours. We opt to develop a methodology that could be readily applied to data different from that examined in our work. In particular, this means that the data at hand could be approached with simpler models, but we opt to give it a more general treatment, as our data is simply an example of how a turbine failure and recovery dataset can look like. Such preferences lead us to the choice of survival analysis as a preferred framework. Survival analysis is a statistical field that addresses any time-to-event data, originally getting its name from the event being death in clinical trials.

In our case, the event will be recovery of a turbine; nonetheless, we opt to keep all terminology standard, so what intuitively would be called a `probability of recovery' is still referred to as `survival probability'. As the goal is prediction on unseen data, we pursue a Machine Learning approach where the data available is randomly split into a training and test dataset. The training data is used to fit the models and any associated optimizations, whereas the test data is used for evaluation. This approach has the benefit of us knowing what the true recovery times are, while the fitted models predict it.

In this work we will examine five different models. These models vary substantially in their nature, which considerably reduces the shared common ground their performance could be evaluated on. In more formal terms, the goal is to first classify a recovery event into two categories: recovery within 24 hours of failure, and after 24 hours of failure. If the classification is that the recovery will occur within 24 hours, we would like to know a more precise value -- this we refer to as the point prediction (of recovery time). The two problems are coupled as the point prediction is only of interest based on the classification, so they will be evaluated together. To evaluate the classification we will use two common metrics -- sensitivity and accuracy, where the former is a good indicator of how a model deals with false negatives (recovery events that were predicted to resolve within 24 hours, but in reality recovered after 24 hours). For the point prediction, we will use mean absolute error, which is preferred over mean squared error in this case as it does not give additional weight to predictions that were far off from the true value.

As is usual for any survival analysis dataset, the two categories we will classify into are very unbalanced. What this means is that the number of turbines that recover within 24 hours is substantially higher than those that do not. This presents additional challenges with classification, which we address by relatively standard means -- through the study of ROC curves and optimal thresholds. This approach allows us to balance the classification in our situation where one of the categories is inherently more likely.

We use several variables to model the recovery time distribution. The one always used is the error codes, which enter into the models not in their raw form but in clusters. Error codes are unique representations of potential final causes of failure, and each failure event in our dataset is accompanied by one of these. The use of clusters of codes instead of codes directly allows us to reduce the number of variables in the subsequent models, and is justified by the reasonable assumption that recovery from many problems is similar enough as to not warrant distinguishing. We develop a clustering method that allows us to group the codes in a way that does not impose unwanted assumptions. Another variable we use is the logarithm of the energy produced by the turbine in the month before it failed. The assumption behind this variable is that this value could act as a proxy for wear and tear which could ultimately impact the recovery time. Alternatively, well performing turbines could see priority treatment, and the recovery could be accelerated. Finally, we investigate the impact of past failure history on present recovery. For each turbine and for each cluster of failures, a counter is constructed recording the number of times a failure in the cluster happened to the given turbine.

With all of the above, the work attempts to develop the theory to justify the models and methods used. This results in a fairly lengthy theoretical section, most of which can be skipped by those that find no inherent interest in the subject. The model we find to be best is a fully parametric model with fairly few variables, which has the benefit of quick fitting and prediction. (Less)
Please use this url to cite or link to this publication:
author
Palets, Anton LU
supervisor
organization
course
MASM02 20231
year
type
H2 - Master's Degree (Two Years)
subject
keywords
Survival analysis, Recovery Forecast, Wind Turbine, Availability Forecast, AFT model, Aalen's model, Cox regression, Cox Proportional Hazards, Variation Processes
publication/series
Master’s Theses in Mathematical Sciences
report number
LUNFMS-3123-2023
ISSN
1404-6342
other publication id
2023:E67
language
English
id
9133707
date added to LUP
2023-11-30 13:15:14
date last changed
2023-11-30 13:15:14
@misc{9133707,
  abstract     = {{The goal of this thesis is to present a methodology for predicting time until recovery of failed wind turbines. The necessity is motivated by the potential for more accurate wind energy export forecasts. The current approach rests entirely on having an expert examine the turbine and produce a time estimate. Due to its nature, such a prediction cannot be made immediately upon failure. Five common survival analysis models are evaluated in regard to their ability to correctly classify recovery as happening within or after 24 hours of failure, and point prediction error in the case that the failure event is predicted to resolve within 24 hours. A method for nonparametric clustering of survival curves is developed, that is used to reduce the number of variables in the examined models. The Weibull Accelerated Failure Time model with the clustered error codes and logarithm of energy produced in the month prior to failure is found to perform significantly better than alternatives. Classification is optimized by finding optimal thresholds using ROC curves. An attempt is made to present theory necessary to motivate the models used.}},
  author       = {{Palets, Anton}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master’s Theses in Mathematical Sciences}},
  title        = {{Wind Turbine Recovery Forecasting using Survival Analysis}},
  year         = {{2023}},
}