CN114925620A

CN114925620A - Short-term wind power prediction method and system based on ensemble learning algorithm

Info

Publication number: CN114925620A
Application number: CN202210709609.0A
Authority: CN
Inventors: 李晨; 伍仰金; 郭茜婷; 王超君; 林晨杰; 陈岳晟; 黄丁婕; 郑传良; 叶家玮; 张良; 郑涛; 周宇; 魏兰兰; 付馨慧
Original assignee: State Grid Fujian Electric Power Co Ltd; Ningde Power Supply Co of State Grid Fujian Electric Power Co Ltd
Current assignee: State Grid Fujian Electric Power Co Ltd; Ningde Power Supply Co of State Grid Fujian Electric Power Co Ltd
Priority date: 2022-06-21
Filing date: 2022-06-21
Publication date: 2022-08-19

Abstract

The invention relates to a short-term wind power prediction method and a short-term wind power prediction system based on an ensemble learning algorithm, which comprises the following steps of preprocessing data, eliminating abnormal data, supplementing missing data and obtaining sample data; performing correlation analysis according to the sample data of each influence factor, and acquiring influence factors with high correlation with the wind power as reference factors; establishing a Stacking integrated learning model which is a double-layer structure comprising a base learner at a first layer and a meta-learner at a second layer; acquiring a plurality of groups of actual data to make a training sample set; performing K-fold cross validation training on the base learner by using a training sample set to obtain a wind power prediction model; and predicting short-term wind power by using the wind power prediction model. Compared with the existing scheme, the method can effectively increase the prediction precision, can effectively reduce the prediction error in different time steps and different seasons, and improves the controllability of the wind power.

Description

Short-term wind power prediction method and system based on ensemble learning algorithm

Technical Field

The invention relates to a short-term wind power prediction method and system based on an ensemble learning algorithm, and belongs to the field of wind power prediction.

Background

With the increasing trend of wind power application in the energy field of China, the influence of factors such as natural weather and fan operation parameters brings many challenges such as randomness, volatility and the like to wind power generation. According to the predicted time scale and the corresponding control strategy, the wind power prediction can be divided into ultra-short term, medium term, long term prediction and the like, and the wind power prediction method can be divided into a statistical method, a physical method, a combined prediction method, a spatial correlation prediction method and the like. The statistical method is generally used for performing statistical analysis according to historical data under a time scale, finding out the association between the input factors and the wind power and then predicting, and is widely applied to the aspects of ultra-short term and short-term prediction of the wind power. The traditional linear regression statistical methods such as a grey model, Kalman filtering, an autoregressive integral sliding model and the like cannot meet the nonlinear characteristics of wind power. Therefore, methods such as artificial intelligence algorithm and deep learning are provided, and the method has a better prediction effect on ultra-short term and short term prediction of wind power under a time sequence. However, most of the existing prediction models are single prediction models, and the limitations of limited prediction precision improvement, general generalization capability (adaptability of machine learning algorithm to fresh samples) and the like exist. Therefore, the idea of multi-model fusion can effectively improve the prediction accuracy, and the improvement on the insufficient accuracy of a single prediction model is realized by converting the differentiation between different models into advantage complementation.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a novel short-term wind power prediction model based on an ensemble learning algorithm.

The technical scheme of the invention is as follows:

on one hand, the invention provides a short-term wind power prediction method based on an ensemble learning algorithm, which comprises the following steps:

data preprocessing, namely selecting a plurality of influence factors related to wind power, collecting all actual data of each influence factor, detecting abnormal values through a Lauda criterion, removing abnormal data, and supplementing the abnormal data by a forward filling method to obtain sample data of each influence factor;

performing correlation analysis according to the sample data of each influence factor, and acquiring influence factors with high correlation with the wind power as reference factors;

establishing a Stacking integrated learning model which is a double-layer structure comprising a base learner at a first layer and a meta-learner at a second layer, wherein at least one machine learning model is arranged in each of the base learner and the meta-learner;

acquiring a plurality of groups of actual data of each reference factor as characteristic parameters to form a training sample set;

performing K-fold cross validation training on the base learner by using a training sample set, inputting output of the base learner after K times of iteration as original data into a meta-learner, training the meta-learner, evaluating the prediction precision of a result output by the meta-learner, outputting a final prediction result after the preset prediction precision is reached, storing parameters of the base learner and the meta-learner, and outputting a trained Stacking integrated learning model as a wind power prediction model;

and predicting the short-term wind power by using the wind power prediction model.

As a preferred embodiment, the method for performing correlation analysis according to sample data of each influence factor and acquiring the influence factor with high correlation with the wind power as a reference factor specifically includes:

introducing a Copula function, and selecting an optimal Copula function which is closest to sample data of each influence factor from the alternative Copula functions respectively based on the sample data of each influence factor;

calculating Spearman rank correlation coefficients of the optimal Copula function corresponding to the influence factors;

and determining the relevance of each influence factor relative to the wind power according to the Spearman rank correlation coefficient corresponding to each influence factor, and selecting a plurality of influence factors with high relevance as reference factors.

As a preferred embodiment, the specific step of introducing the Copula function, and based on the sample data of each influence factor, selecting the optimal Copula function closest to the sample data of each influence factor from the candidate Copula functions respectively, is:

inputting actual data of each influence factor and short-term wind power, and calling ksDensity functions in MATLAB to obtain edge distribution functions of each group of influence factors and short-term wind power; based on each group of influence factors and the edge distribution function of the short-term wind power, obtaining a parameter estimation value of an alternative Copula function corresponding to each group of influence factors and the short-term wind power by using a maximum likelihood estimation method, and further obtaining an alternative Copula function corresponding to each group of influence factors and the short-term wind power;

inputting actual data of each influencing factor and short-term wind power, calling an ecdf function in the MATLAB to obtain an empirical distribution function of each group of influencing factors and short-term wind power, and calling a Spline function to obtain an empirical distribution function value of an original sample point by adopting a Spline interpolation method; the original sample points are a group of influence factors and actual data of short-term wind power;

defining an empirical Copula function in MATLAB, and calculating the empirical Copula function value of the original sample point and each candidate Copula function value according to the obtained empirical distribution function value of the original sample point;

calculating Euclidean distances between empirical Copula function values of original sample points and alternative Copula function values, and taking an alternative Copula function with the minimum Euclidean distance as an optimal Copula function between corresponding influence factors and short-term wind power;

and calculating a corresponding Spearman rank correlation coefficient through an optimal Copula function.

As a preferred embodiment, the base learner is provided with more than one machine learning model selected from a K nearest neighbor algorithm, a random forest algorithm, a gradient boosting decision tree, an XG-Boost and a Light-GBM;

and a Light-GBM machine learning model is arranged in the meta-learner.

In a preferred embodiment, in the step of evaluating the prediction accuracy of the result output by the meta learner, the evaluation index specifically uses a squared absolute error MAE and a root mean square error RMSE.

On the other hand, the invention provides a short-term wind power prediction system based on an ensemble learning algorithm, which comprises the following components:

the data preprocessing module is used for selecting a plurality of influence factors related to the wind power, collecting all actual data of each influence factor, detecting abnormal values through a Lauda criterion, removing abnormal data, and supplementing the abnormal data by a forward filling method to obtain sample data of each influence factor;

the influence factor screening module is used for carrying out correlation analysis according to the sample data of each influence factor and acquiring influence factors with high correlation with the wind power as reference factors;

the system comprises a Stacking integrated learning module, a Stacking integrated learning module and a learning module, wherein the Stacking integrated learning module is of a double-layer structure comprising a base learner at a first layer and a meta learner at a second layer, and at least one machine learning model is arranged in each of the base learner and the meta learner;

and the sample data making module is used for obtaining a plurality of groups of actual data of each reference factor as characteristic parameters to form a training sample set.

introducing Copula functions, and selecting the optimal Copula function closest to the sample data of each influence factor from the alternative Copula functions respectively based on the sample data of each influence factor;

calculating Spearman rank correlation coefficients of the optimal Copula function corresponding to all the influence factors;

and determining the correlation of each influence factor relative to the wind power according to the Spearman rank correlation coefficient corresponding to each influence factor, and selecting a plurality of influence factors with high correlation as reference factors.

As a preferred embodiment, the specific step of introducing the Copula function, and based on the sample data of each influencing factor, selecting the optimal Copula function closest to the sample data of each influencing factor from the candidate Copula functions respectively is as follows:

inputting actual data of each influence factor and short-term wind power, and calling ksDensity functions in MATLAB to obtain edge distribution functions of each group of influence factors and short-term wind power; based on each group of influence factors and the edge distribution function of the short-term wind power, obtaining a parameter estimation value of each group of influence factors and the alternative Copula function corresponding to the short-term wind power by using a maximum likelihood estimation method, and further obtaining the alternative Copula function corresponding to each group of influence factors and the short-term wind power;

calculating Euclidean distances between empirical Copula function values of original sample points and optional Copula function values, and taking the optional Copula function with the minimum Euclidean distance as an optimal Copula function between corresponding influence factors and short-term wind power;

and a Light-GBM machine learning model is arranged in the meta learner.

In a preferred embodiment, in the step of evaluating the prediction accuracy of the result output by the meta learner, the evaluation index specifically uses the square absolute error MAE and the root mean square error RMSE.

The invention has the following beneficial effects:

according to the invention, the Copula function is introduced to connect the respective joint distribution functions of the variables with the edge distribution function, so that the correlation and nonlinear relation among the variables can be better analyzed; among a plurality of factors influencing the short-term wind power, the Spearman rank correlation coefficient corresponding to the optimal Copula function is calculated by selecting the optimal Copula function, so that the influencing factors with high correlation degree with the short-term wind power can be obtained by analysis, and the workload of subsequent prediction calculation is saved.

The invention also provides a Stacking integrated learning model which comprises a double-layer structure of the base learner and the meta learner. The base learner is formed by combining a plurality of types of machine learning algorithms, and the meta learner is a Light-GBM machine learning model. Due to the combined application of the double-layer training mode and different types of machine learning algorithms, the functions of improving the prediction precision and complementing the advantages and disadvantages of the algorithms compared with a single prediction model are realized, and the training precision of data training on space and structure is greatly improved.

The prediction precision of the output result of the element learner is evaluated through the square absolute error MAE and the root mean square error RMSE, and compared with the existing scheme, the prediction precision can be effectively improved, the prediction error is reduced, and the effectiveness of the wind power is improved.

Drawings

FIG. 1 is a flow chart of short-term wind power prediction integrated learning according to the present invention.

FIG. 2 illustrates the influencing factors of wind power prediction selected in the present invention.

FIG. 3 is a diagram of the setting of the empirical Copula function of the present invention in MATLAB.

Fig. 4 is a calculation result of the euclidean distance of the influence variable in the present invention.

FIG. 5 is the Spearman correlation coefficient data derived from the optimal Copula function.

FIG. 6 is a flow chart of the K-fold cross-validation training of the present invention.

FIG. 7 is a graph comparing MAE and RMSE errors of the prior art scheme and the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be understood that the step numbers used herein are for convenience of description only and are not intended as limitations on the order in which the steps are performed.

It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

The terms "comprises" and "comprising" indicate the presence of the described features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The term "and/or" refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

The first embodiment is as follows:

referring to fig. 1, a short-term wind power prediction method based on an ensemble learning algorithm includes the following steps:

the relevant influence factors are mainly divided into two aspects, as shown in fig. 2:

the method comprises the following steps of (1) including internal factors of the wind turbine generator, such as wheel load temperature, gearbox bearing temperature, gearbox oil temperature, generator rotating speed, rotor rotating speed, generator winding temperature and the like;

and weather factors such as temperature, wind speed, wind direction, etc.

establishing a Stacking integrated learning model, wherein the Stacking integrated learning model is a double-layer structure comprising a base learner at a first layer and a meta learner at a second layer, and at least one machine learning model is arranged in each of the base learner and the meta learner;

and predicting short-term wind power by using the wind power prediction model.

According to the model for predicting the short-term wind power under the Stacking ensemble learning framework, due to the combined application of the double-layer training mode and the machine learning models of different types, compared with a single prediction model, the functions of improving prediction accuracy and complementing algorithm merits are achieved, and the training precision of data training on space and structure is greatly improved. And the machine learning model in the base learner has the characteristics of diversification and flexibility, so that the innovative idea of future research has divergence, and the coping capability of the model is effectively improved.

As a preferred implementation manner of this embodiment, the method for performing correlation analysis according to sample data of each influence factor and acquiring an influence factor with high correlation with the wind power as a reference factor specifically includes:

As a preferred implementation of this embodiment, the specific step of introducing the Copula function, and based on the sample data of each influence factor, selecting the optimal Copula function that is closest to the sample data of each influence factor from the candidate Copula functions respectively is:

inputting actual data of each influence factor and short-term wind power, calling an ecdf function in the MATLAB to obtain an empirical distribution function of each group of influence factors and short-term wind power, and calling a Spline function to obtain an empirical distribution function value of an original sample point by adopting a Spline interpolation method; the original sample points are a group of influence factors and actual data of short-term wind power;

the calculation formula of the euclidean distance is as follows:

in the formula (I), the compound is shown in the specification,

is an empirical Copula function, C _j (u _i ,v _i ) As an alternative Copula function, (u) _i ,v _i ) The function values are empirically distributed.

Specifically referring to fig. 4, fig. 4 is a result of calculation of euclidean distances between each influence factor and the wind power, between each candidate Copula function value and an empirical Copula function value in this embodiment.

The Spearman rank correlation coefficient ρ is calculated as follows:

ρ＝12∫ ₀ ¹ ∫ ₀ ¹ uvdC(u，v)-3

the value range of the spearman rank correlation coefficient is [ -1.1], the spearman rank correlation coefficient can reflect the correlation strength among variables, and the larger the absolute value of the value is, the stronger the dependency among the variables is; when the value is 0, no correlation exists among the variables, and several influencing factors with strongest dependence are screened on the basis of the correlation.

For example, a Frank-Copula function is selected between the Generator Speed (GS) and the wind power as an optimal Copula function, and because the Euclidean distance is the minimum, a Spearman correlation coefficient derived through the Frank-Copula function can best describe the nonlinear correlation between the generator speed and the wind power. And analyzing the correlation coefficient of the selected influence factor from a Copula-Spearman correlation coefficient table (the Copula-Spearman correlation coefficient table is shown in fig. 5), if the correlation coefficient of the selected influence factor is greater than a set threshold value, indicating that the influence factor has stronger correlation with the wind power output power, and indicating that the selected influence factor is effective as an input variable of the wind power prediction model.

Based on the embodiment, a Copula function is introduced, the correlation between each influence factor and the wind power is analyzed, the Spearman rank correlation coefficient is calculated based on the Copula function to determine the nonlinear relation between each influence factor and the wind power, no strict requirement is imposed on the data distribution of the influence factors and the wind power, a flexible choice is provided for solving a joint distribution function, the solving process is simplified, and the influence factors with high correlation strength can be effectively selected in practical application to save workload for subsequent prediction calculation.

Specifically referring to fig. 6, as a preferred embodiment of this embodiment, in a base learner at a first layer of a Stacking ensemble learning model, in addition to selecting a Boosting ensemble learning model represented by a Gradient Boosted Decision Tree (GBDT), XG-Boost, and Light-GBM, a Bagging ensemble learning model represented by a Random Forest (RF) is also selected, and finally, a K-nearest neighbor algorithm (K-nearest neighbor, KNN) is added for use; the selection of the plurality of different types of machine learning models respectively considers that the selection of different types of algorithm models can enable the Stacking ensemble learning model to train data from different data spaces and structures, so that the advantages of different algorithms are fully exerted, and the prediction performance is optimal.

In the selection of the machine learning model of the element learner in the secondary layer, the phenomenon of overfitting needs to be prevented, so that a model with good generalization capability, namely a Light-GBM algorithm, is selected.

In summary, the base learner at the first layer comprises GBDT, XG-Boost, Light-GBM, RF and KNN, the Light-GBM is arranged in the element learner at the second layer, and the selection of the hyper-parameters of each machine learning model directly influences the prediction precision of the final Stacking model. Therefore, the parameters of the hyper-parameters of each model are adjusted by adopting a Bayesian optimization algorithm, a super learner is defined, and the base learner and the meta learner are packaged into the super learner for 5-fold cross validation training.

As a preferred embodiment of this embodiment, in the step of evaluating the prediction accuracy of the result output by the meta learner, the evaluation index specifically adopts a squared absolute error MAE and a root mean square error RMSE.

The specific equations for the squared absolute error MAE and the root mean square error RMSE are as follows:

where K is the amount of training or test data,

is the predicted value, y ⁽ⁱ⁾ Is the actual value of the,

is the predicted average value, y _ave Is the actual average. The smaller the values of the indexes MAE and RMSE are, the better the prediction performance of the model is, and the smaller the error is.

In order to verify that the wind power prediction model provided by the embodiment has higher prediction accuracy under the condition of different prediction step lengths, short-term wind power output power prediction of four different time step lengths, namely 10min, 1h, 6h and 12h, is respectively carried out, and is analyzed and compared with a traditional single prediction model, the effectiveness of the method is verified by analyzing an MAE value and an RMSE value, and the experimental result is shown in figure 7.

Example two:

the embodiment provides a short-term wind power prediction system based on an ensemble learning algorithm, which includes:

the data preprocessing module is used for selecting a plurality of influence factors related to wind power, collecting all actual data of each influence factor, detecting abnormal values through a Lauda criterion, eliminating abnormal data, and supplementing the abnormal data by using a forward filling method to obtain sample data of each influence factor;

the system comprises a Stacking integrated learning module, a learning module and a learning module, wherein the Stacking integrated learning module is established and is of a double-layer structure comprising a base learner at a first layer and a meta-learner at a second layer, and at least one machine learning model is arranged in each of the base learner and the meta-learner;

the sample data making module is used for obtaining a plurality of groups of actual data of each reference factor as characteristic parameters to form a training sample set;

the calculation formula of the Euclidean distance is as follows:

in the formula (I), the compound is shown in the specification,

is an empirical Copula function, C _j (u _i ,v _i ) As an alternative Copula function, (u) _i ,v _i ) The empirical distribution function value.

The Spearman correlation coefficient ρ is calculated as follows:

the value range of the spearman correlation coefficient is [ -1.1], the spearman correlation coefficient can reflect the correlation strength among variables, and the larger the absolute value of the value is, the stronger the dependency among the variables is; when the value is 0, no correlation exists among the variables, and several influencing factors with strongest dependence are screened on the basis of the correlation.

As a preferred embodiment of this embodiment, the base learner is provided with more than one machine learning model selected from a K nearest neighbor algorithm, a random forest algorithm, a gradient boosting decision tree, XG-Boost, and Light-GBM;

and a Light-GBM machine learning model is arranged in the meta-learner.

The selection of a plurality of different types of algorithms in the base learner respectively considers that the selection of different types of algorithms can enable the model to train data from different data spaces and structures, so that the advantages of different algorithms are fully exerted, and the prediction performance is optimal. In the meta-learner selection, the Light-GBM algorithm is selected as a model with good generalization capability in consideration of preventing the over-fitting phenomenon.

where K is the amount of training or test data,

is the predicted value, y ⁽ⁱ⁾ Is the actual value of the,

Example three:

the present embodiment provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the short-term wind power prediction method and system according to any embodiment of the present invention.

Example four:

the embodiment provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the method and the system for predicting short-term wind power according to any embodiment of the present invention are implemented.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A short-term wind power prediction method based on an ensemble learning algorithm is characterized by comprising the following steps:

and predicting short-term wind power by using the wind power prediction model.

2. The short-term wind power prediction method based on the ensemble learning algorithm according to claim 1, wherein the method for performing correlation analysis according to the sample data of the influencing factors and acquiring the influencing factors with high correlation with the wind power as reference factors comprises the following specific steps:

3. The method for predicting short-term wind power based on the ensemble learning algorithm according to claim 2, wherein the specific step of introducing the Copula function, and based on the sample data of each influence factor, selecting the optimal Copula function closest to the sample data of each influence factor from the alternative Copula functions respectively comprises:

inputting actual data of each influence factor and the short-term wind power, and calling ksdiversity functions in the MATLAB to obtain edge distribution functions of each group of influence factors and the short-term wind power; based on each group of influence factors and the edge distribution function of the short-term wind power, obtaining a parameter estimation value of each group of impression factors and the alternative Copula function corresponding to the short-term wind power by using a maximum likelihood estimation method, and further obtaining the alternative Copula function corresponding to each group of influence factors and the short-term wind power;

inputting actual data of each influence factor and short-term wind power, calling an ecdf function in the MATLAB to obtain an empirical distribution function of each group of influence factors and short-term wind power, and calling a Spline function to obtain an empirical distribution function value of an original sample point by adopting a Spline interpolation method; the original sample points are a set of influence factors and actual data of short-term wind power.

4. The short-term wind power prediction method based on the ensemble learning algorithm according to claim 1, wherein,

the base learner is provided with a K nearest neighbor algorithm, a random forest algorithm, a gradient lifting decision tree, an XG-Boost and a Light-GBM;

and a Light-GBM machine learning model is arranged in the meta-learner.

5. The short-term wind power prediction method based on the ensemble learning algorithm as claimed in claim 1, wherein in the step of evaluating the prediction accuracy of the result output by the meta-learner, the evaluation index specifically adopts a squared absolute error MAE and a root mean square error RMSE.

6. A short-term wind power prediction system based on an ensemble learning algorithm is characterized by comprising:

the system comprises a Stacking integrated learning module, a learning module and a learning module, wherein the Stacking integrated learning module is of a double-layer structure comprising a base learner at a first layer and a meta learner at a second layer, and at least one machine learning model is arranged in each of the base learner and the meta learner;

7. The short-term wind power prediction system based on the ensemble learning algorithm according to claim 6, wherein the method for performing correlation analysis according to the sample data of each influence factor to obtain the influence factor with high correlation with the wind power as the reference factor specifically comprises:

8. The short-term wind power prediction system based on ensemble learning algorithm according to claim 6, wherein the specific step of introducing Copula function, based on the sample data of each influencing factor, and selecting the optimal Copula function closest to the sample data of each influencing factor from the alternative Copula functions respectively is as follows:

9. The short-term wind power prediction system based on ensemble learning algorithm as claimed in claim 6,

the base learner is provided with a base learning device comprising: a K nearest neighbor algorithm, a random forest algorithm, a gradient lifting decision tree, an XG-Boost and a Light-GBM;

and a Light-GBM machine learning model is arranged in the meta-learner.

10. The short-term wind power prediction system based on ensemble learning algorithm as claimed in claim 6, wherein in the step of evaluating the prediction accuracy of the result output by the meta-learner, the evaluation index specifically adopts the square absolute error MAE and the root mean square error RMSE.