CN114925620A - Short-term wind power prediction method and system based on ensemble learning algorithm - Google Patents

Short-term wind power prediction method and system based on ensemble learning algorithm Download PDF

Info

Publication number
CN114925620A
CN114925620A CN202210709609.0A CN202210709609A CN114925620A CN 114925620 A CN114925620 A CN 114925620A CN 202210709609 A CN202210709609 A CN 202210709609A CN 114925620 A CN114925620 A CN 114925620A
Authority
CN
China
Prior art keywords
wind power
short
term wind
learner
influence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210709609.0A
Other languages
Chinese (zh)
Inventor
李晨
伍仰金
郭茜婷
王超君
林晨杰
陈岳晟
黄丁婕
郑传良
叶家玮
张良
郑涛
周宇
魏兰兰
付馨慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Fujian Electric Power Co Ltd
Ningde Power Supply Co of State Grid Fujian Electric Power Co Ltd
Original Assignee
State Grid Fujian Electric Power Co Ltd
Ningde Power Supply Co of State Grid Fujian Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Fujian Electric Power Co Ltd, Ningde Power Supply Co of State Grid Fujian Electric Power Co Ltd filed Critical State Grid Fujian Electric Power Co Ltd
Priority to CN202210709609.0A priority Critical patent/CN114925620A/en
Publication of CN114925620A publication Critical patent/CN114925620A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/06Wind turbines or wind farms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/06Power analysis or power optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/08Thermal analysis or thermal optimisation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Geometry (AREA)
  • Computer Hardware Design (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Supply And Distribution Of Alternating Current (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a short-term wind power prediction method and a short-term wind power prediction system based on an ensemble learning algorithm, which comprises the following steps of preprocessing data, eliminating abnormal data, supplementing missing data and obtaining sample data; performing correlation analysis according to the sample data of each influence factor, and acquiring influence factors with high correlation with the wind power as reference factors; establishing a Stacking integrated learning model which is a double-layer structure comprising a base learner at a first layer and a meta-learner at a second layer; acquiring a plurality of groups of actual data to make a training sample set; performing K-fold cross validation training on the base learner by using a training sample set to obtain a wind power prediction model; and predicting short-term wind power by using the wind power prediction model. Compared with the existing scheme, the method can effectively increase the prediction precision, can effectively reduce the prediction error in different time steps and different seasons, and improves the controllability of the wind power.

Description

Short-term wind power prediction method and system based on ensemble learning algorithm
Technical Field
The invention relates to a short-term wind power prediction method and system based on an ensemble learning algorithm, and belongs to the field of wind power prediction.
Background
With the increasing trend of wind power application in the energy field of China, the influence of factors such as natural weather and fan operation parameters brings many challenges such as randomness, volatility and the like to wind power generation. According to the predicted time scale and the corresponding control strategy, the wind power prediction can be divided into ultra-short term, medium term, long term prediction and the like, and the wind power prediction method can be divided into a statistical method, a physical method, a combined prediction method, a spatial correlation prediction method and the like. The statistical method is generally used for performing statistical analysis according to historical data under a time scale, finding out the association between the input factors and the wind power and then predicting, and is widely applied to the aspects of ultra-short term and short-term prediction of the wind power. The traditional linear regression statistical methods such as a grey model, Kalman filtering, an autoregressive integral sliding model and the like cannot meet the nonlinear characteristics of wind power. Therefore, methods such as artificial intelligence algorithm and deep learning are provided, and the method has a better prediction effect on ultra-short term and short term prediction of wind power under a time sequence. However, most of the existing prediction models are single prediction models, and the limitations of limited prediction precision improvement, general generalization capability (adaptability of machine learning algorithm to fresh samples) and the like exist. Therefore, the idea of multi-model fusion can effectively improve the prediction accuracy, and the improvement on the insufficient accuracy of a single prediction model is realized by converting the differentiation between different models into advantage complementation.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a novel short-term wind power prediction model based on an ensemble learning algorithm.
The technical scheme of the invention is as follows:
on one hand, the invention provides a short-term wind power prediction method based on an ensemble learning algorithm, which comprises the following steps:
data preprocessing, namely selecting a plurality of influence factors related to wind power, collecting all actual data of each influence factor, detecting abnormal values through a Lauda criterion, removing abnormal data, and supplementing the abnormal data by a forward filling method to obtain sample data of each influence factor;
performing correlation analysis according to the sample data of each influence factor, and acquiring influence factors with high correlation with the wind power as reference factors;
establishing a Stacking integrated learning model which is a double-layer structure comprising a base learner at a first layer and a meta-learner at a second layer, wherein at least one machine learning model is arranged in each of the base learner and the meta-learner;
acquiring a plurality of groups of actual data of each reference factor as characteristic parameters to form a training sample set;
performing K-fold cross validation training on the base learner by using a training sample set, inputting output of the base learner after K times of iteration as original data into a meta-learner, training the meta-learner, evaluating the prediction precision of a result output by the meta-learner, outputting a final prediction result after the preset prediction precision is reached, storing parameters of the base learner and the meta-learner, and outputting a trained Stacking integrated learning model as a wind power prediction model;
and predicting the short-term wind power by using the wind power prediction model.
As a preferred embodiment, the method for performing correlation analysis according to sample data of each influence factor and acquiring the influence factor with high correlation with the wind power as a reference factor specifically includes:
introducing a Copula function, and selecting an optimal Copula function which is closest to sample data of each influence factor from the alternative Copula functions respectively based on the sample data of each influence factor;
calculating Spearman rank correlation coefficients of the optimal Copula function corresponding to the influence factors;
and determining the relevance of each influence factor relative to the wind power according to the Spearman rank correlation coefficient corresponding to each influence factor, and selecting a plurality of influence factors with high relevance as reference factors.
As a preferred embodiment, the specific step of introducing the Copula function, and based on the sample data of each influence factor, selecting the optimal Copula function closest to the sample data of each influence factor from the candidate Copula functions respectively, is:
inputting actual data of each influence factor and short-term wind power, and calling ksDensity functions in MATLAB to obtain edge distribution functions of each group of influence factors and short-term wind power; based on each group of influence factors and the edge distribution function of the short-term wind power, obtaining a parameter estimation value of an alternative Copula function corresponding to each group of influence factors and the short-term wind power by using a maximum likelihood estimation method, and further obtaining an alternative Copula function corresponding to each group of influence factors and the short-term wind power;
inputting actual data of each influencing factor and short-term wind power, calling an ecdf function in the MATLAB to obtain an empirical distribution function of each group of influencing factors and short-term wind power, and calling a Spline function to obtain an empirical distribution function value of an original sample point by adopting a Spline interpolation method; the original sample points are a group of influence factors and actual data of short-term wind power;
defining an empirical Copula function in MATLAB, and calculating the empirical Copula function value of the original sample point and each candidate Copula function value according to the obtained empirical distribution function value of the original sample point;
calculating Euclidean distances between empirical Copula function values of original sample points and alternative Copula function values, and taking an alternative Copula function with the minimum Euclidean distance as an optimal Copula function between corresponding influence factors and short-term wind power;
and calculating a corresponding Spearman rank correlation coefficient through an optimal Copula function.
As a preferred embodiment, the base learner is provided with more than one machine learning model selected from a K nearest neighbor algorithm, a random forest algorithm, a gradient boosting decision tree, an XG-Boost and a Light-GBM;
and a Light-GBM machine learning model is arranged in the meta-learner.
In a preferred embodiment, in the step of evaluating the prediction accuracy of the result output by the meta learner, the evaluation index specifically uses a squared absolute error MAE and a root mean square error RMSE.
On the other hand, the invention provides a short-term wind power prediction system based on an ensemble learning algorithm, which comprises the following components:
the data preprocessing module is used for selecting a plurality of influence factors related to the wind power, collecting all actual data of each influence factor, detecting abnormal values through a Lauda criterion, removing abnormal data, and supplementing the abnormal data by a forward filling method to obtain sample data of each influence factor;
the influence factor screening module is used for carrying out correlation analysis according to the sample data of each influence factor and acquiring influence factors with high correlation with the wind power as reference factors;
the system comprises a Stacking integrated learning module, a Stacking integrated learning module and a learning module, wherein the Stacking integrated learning module is of a double-layer structure comprising a base learner at a first layer and a meta learner at a second layer, and at least one machine learning model is arranged in each of the base learner and the meta learner;
and the sample data making module is used for obtaining a plurality of groups of actual data of each reference factor as characteristic parameters to form a training sample set.
As a preferred embodiment, the method for performing correlation analysis according to sample data of each influence factor and acquiring the influence factor with high correlation with the wind power as a reference factor specifically includes:
introducing Copula functions, and selecting the optimal Copula function closest to the sample data of each influence factor from the alternative Copula functions respectively based on the sample data of each influence factor;
calculating Spearman rank correlation coefficients of the optimal Copula function corresponding to all the influence factors;
and determining the correlation of each influence factor relative to the wind power according to the Spearman rank correlation coefficient corresponding to each influence factor, and selecting a plurality of influence factors with high correlation as reference factors.
As a preferred embodiment, the specific step of introducing the Copula function, and based on the sample data of each influencing factor, selecting the optimal Copula function closest to the sample data of each influencing factor from the candidate Copula functions respectively is as follows:
inputting actual data of each influence factor and short-term wind power, and calling ksDensity functions in MATLAB to obtain edge distribution functions of each group of influence factors and short-term wind power; based on each group of influence factors and the edge distribution function of the short-term wind power, obtaining a parameter estimation value of each group of influence factors and the alternative Copula function corresponding to the short-term wind power by using a maximum likelihood estimation method, and further obtaining the alternative Copula function corresponding to each group of influence factors and the short-term wind power;
inputting actual data of each influencing factor and short-term wind power, calling an ecdf function in the MATLAB to obtain an empirical distribution function of each group of influencing factors and short-term wind power, and calling a Spline function to obtain an empirical distribution function value of an original sample point by adopting a Spline interpolation method; the original sample points are a group of influence factors and actual data of short-term wind power;
defining an empirical Copula function in MATLAB, and calculating the empirical Copula function value of the original sample point and each candidate Copula function value according to the obtained empirical distribution function value of the original sample point;
calculating Euclidean distances between empirical Copula function values of original sample points and optional Copula function values, and taking the optional Copula function with the minimum Euclidean distance as an optimal Copula function between corresponding influence factors and short-term wind power;
and calculating a corresponding Spearman rank correlation coefficient through an optimal Copula function.
As a preferred embodiment, the base learner is provided with more than one machine learning model selected from a K nearest neighbor algorithm, a random forest algorithm, a gradient boosting decision tree, an XG-Boost and a Light-GBM;
and a Light-GBM machine learning model is arranged in the meta learner.
In a preferred embodiment, in the step of evaluating the prediction accuracy of the result output by the meta learner, the evaluation index specifically uses the square absolute error MAE and the root mean square error RMSE.
The invention has the following beneficial effects:
according to the invention, the Copula function is introduced to connect the respective joint distribution functions of the variables with the edge distribution function, so that the correlation and nonlinear relation among the variables can be better analyzed; among a plurality of factors influencing the short-term wind power, the Spearman rank correlation coefficient corresponding to the optimal Copula function is calculated by selecting the optimal Copula function, so that the influencing factors with high correlation degree with the short-term wind power can be obtained by analysis, and the workload of subsequent prediction calculation is saved.
The invention also provides a Stacking integrated learning model which comprises a double-layer structure of the base learner and the meta learner. The base learner is formed by combining a plurality of types of machine learning algorithms, and the meta learner is a Light-GBM machine learning model. Due to the combined application of the double-layer training mode and different types of machine learning algorithms, the functions of improving the prediction precision and complementing the advantages and disadvantages of the algorithms compared with a single prediction model are realized, and the training precision of data training on space and structure is greatly improved.
The prediction precision of the output result of the element learner is evaluated through the square absolute error MAE and the root mean square error RMSE, and compared with the existing scheme, the prediction precision can be effectively improved, the prediction error is reduced, and the effectiveness of the wind power is improved.
Drawings
FIG. 1 is a flow chart of short-term wind power prediction integrated learning according to the present invention.
FIG. 2 illustrates the influencing factors of wind power prediction selected in the present invention.
FIG. 3 is a diagram of the setting of the empirical Copula function of the present invention in MATLAB.
Fig. 4 is a calculation result of the euclidean distance of the influence variable in the present invention.
FIG. 5 is the Spearman correlation coefficient data derived from the optimal Copula function.
FIG. 6 is a flow chart of the K-fold cross-validation training of the present invention.
FIG. 7 is a graph comparing MAE and RMSE errors of the prior art scheme and the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be understood that the step numbers used herein are for convenience of description only and are not intended as limitations on the order in which the steps are performed.
It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
The terms "comprises" and "comprising" indicate the presence of the described features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The term "and/or" refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.
The first embodiment is as follows:
referring to fig. 1, a short-term wind power prediction method based on an ensemble learning algorithm includes the following steps:
data preprocessing, namely selecting a plurality of influence factors related to wind power, collecting all actual data of each influence factor, detecting abnormal values through a Lauda criterion, removing abnormal data, and supplementing the abnormal data by a forward filling method to obtain sample data of each influence factor;
the relevant influence factors are mainly divided into two aspects, as shown in fig. 2:
the method comprises the following steps of (1) including internal factors of the wind turbine generator, such as wheel load temperature, gearbox bearing temperature, gearbox oil temperature, generator rotating speed, rotor rotating speed, generator winding temperature and the like;
and weather factors such as temperature, wind speed, wind direction, etc.
Performing correlation analysis according to the sample data of each influence factor, and acquiring influence factors with high correlation with the wind power as reference factors;
establishing a Stacking integrated learning model, wherein the Stacking integrated learning model is a double-layer structure comprising a base learner at a first layer and a meta learner at a second layer, and at least one machine learning model is arranged in each of the base learner and the meta learner;
acquiring a plurality of groups of actual data of each reference factor as characteristic parameters to form a training sample set;
performing K-fold cross validation training on the base learner by using a training sample set, inputting output of the base learner after K times of iteration as original data into a meta-learner, training the meta-learner, evaluating the prediction precision of a result output by the meta-learner, outputting a final prediction result after the preset prediction precision is reached, storing parameters of the base learner and the meta-learner, and outputting a trained Stacking integrated learning model as a wind power prediction model;
and predicting short-term wind power by using the wind power prediction model.
According to the model for predicting the short-term wind power under the Stacking ensemble learning framework, due to the combined application of the double-layer training mode and the machine learning models of different types, compared with a single prediction model, the functions of improving prediction accuracy and complementing algorithm merits are achieved, and the training precision of data training on space and structure is greatly improved. And the machine learning model in the base learner has the characteristics of diversification and flexibility, so that the innovative idea of future research has divergence, and the coping capability of the model is effectively improved.
As a preferred implementation manner of this embodiment, the method for performing correlation analysis according to sample data of each influence factor and acquiring an influence factor with high correlation with the wind power as a reference factor specifically includes:
introducing a Copula function, and selecting an optimal Copula function which is closest to sample data of each influence factor from the alternative Copula functions respectively based on the sample data of each influence factor;
calculating Spearman rank correlation coefficients of the optimal Copula function corresponding to all the influence factors;
and determining the relevance of each influence factor relative to the wind power according to the Spearman rank correlation coefficient corresponding to each influence factor, and selecting a plurality of influence factors with high relevance as reference factors.
As a preferred implementation of this embodiment, the specific step of introducing the Copula function, and based on the sample data of each influence factor, selecting the optimal Copula function that is closest to the sample data of each influence factor from the candidate Copula functions respectively is:
inputting actual data of each influence factor and short-term wind power, and calling ksDensity functions in MATLAB to obtain edge distribution functions of each group of influence factors and short-term wind power; based on each group of influence factors and the edge distribution function of the short-term wind power, obtaining a parameter estimation value of an alternative Copula function corresponding to each group of influence factors and the short-term wind power by using a maximum likelihood estimation method, and further obtaining an alternative Copula function corresponding to each group of influence factors and the short-term wind power;
inputting actual data of each influence factor and short-term wind power, calling an ecdf function in the MATLAB to obtain an empirical distribution function of each group of influence factors and short-term wind power, and calling a Spline function to obtain an empirical distribution function value of an original sample point by adopting a Spline interpolation method; the original sample points are a group of influence factors and actual data of short-term wind power;
defining an empirical Copula function in MATLAB, and calculating the empirical Copula function value of the original sample point and each candidate Copula function value according to the obtained empirical distribution function value of the original sample point;
calculating Euclidean distances between empirical Copula function values of original sample points and alternative Copula function values, and taking an alternative Copula function with the minimum Euclidean distance as an optimal Copula function between corresponding influence factors and short-term wind power;
the calculation formula of the euclidean distance is as follows:
Figure BDA0003706555820000111
in the formula (I), the compound is shown in the specification,
Figure BDA0003706555820000112
is an empirical Copula function, C j (u i ,v i ) As an alternative Copula function, (u) i ,v i ) The function values are empirically distributed.
Specifically referring to fig. 4, fig. 4 is a result of calculation of euclidean distances between each influence factor and the wind power, between each candidate Copula function value and an empirical Copula function value in this embodiment.
And calculating a corresponding Spearman rank correlation coefficient through an optimal Copula function.
The Spearman rank correlation coefficient ρ is calculated as follows:
ρ=12∫ 0 10 1 uvdC(u,v)-3
the value range of the spearman rank correlation coefficient is [ -1.1], the spearman rank correlation coefficient can reflect the correlation strength among variables, and the larger the absolute value of the value is, the stronger the dependency among the variables is; when the value is 0, no correlation exists among the variables, and several influencing factors with strongest dependence are screened on the basis of the correlation.
For example, a Frank-Copula function is selected between the Generator Speed (GS) and the wind power as an optimal Copula function, and because the Euclidean distance is the minimum, a Spearman correlation coefficient derived through the Frank-Copula function can best describe the nonlinear correlation between the generator speed and the wind power. And analyzing the correlation coefficient of the selected influence factor from a Copula-Spearman correlation coefficient table (the Copula-Spearman correlation coefficient table is shown in fig. 5), if the correlation coefficient of the selected influence factor is greater than a set threshold value, indicating that the influence factor has stronger correlation with the wind power output power, and indicating that the selected influence factor is effective as an input variable of the wind power prediction model.
Based on the embodiment, a Copula function is introduced, the correlation between each influence factor and the wind power is analyzed, the Spearman rank correlation coefficient is calculated based on the Copula function to determine the nonlinear relation between each influence factor and the wind power, no strict requirement is imposed on the data distribution of the influence factors and the wind power, a flexible choice is provided for solving a joint distribution function, the solving process is simplified, and the influence factors with high correlation strength can be effectively selected in practical application to save workload for subsequent prediction calculation.
Specifically referring to fig. 6, as a preferred embodiment of this embodiment, in a base learner at a first layer of a Stacking ensemble learning model, in addition to selecting a Boosting ensemble learning model represented by a Gradient Boosted Decision Tree (GBDT), XG-Boost, and Light-GBM, a Bagging ensemble learning model represented by a Random Forest (RF) is also selected, and finally, a K-nearest neighbor algorithm (K-nearest neighbor, KNN) is added for use; the selection of the plurality of different types of machine learning models respectively considers that the selection of different types of algorithm models can enable the Stacking ensemble learning model to train data from different data spaces and structures, so that the advantages of different algorithms are fully exerted, and the prediction performance is optimal.
In the selection of the machine learning model of the element learner in the secondary layer, the phenomenon of overfitting needs to be prevented, so that a model with good generalization capability, namely a Light-GBM algorithm, is selected.
In summary, the base learner at the first layer comprises GBDT, XG-Boost, Light-GBM, RF and KNN, the Light-GBM is arranged in the element learner at the second layer, and the selection of the hyper-parameters of each machine learning model directly influences the prediction precision of the final Stacking model. Therefore, the parameters of the hyper-parameters of each model are adjusted by adopting a Bayesian optimization algorithm, a super learner is defined, and the base learner and the meta learner are packaged into the super learner for 5-fold cross validation training.
As a preferred embodiment of this embodiment, in the step of evaluating the prediction accuracy of the result output by the meta learner, the evaluation index specifically adopts a squared absolute error MAE and a root mean square error RMSE.
The specific equations for the squared absolute error MAE and the root mean square error RMSE are as follows:
Figure BDA0003706555820000131
Figure BDA0003706555820000132
Figure BDA0003706555820000141
Figure BDA0003706555820000142
where K is the amount of training or test data,
Figure BDA0003706555820000143
is the predicted value, y (i) Is the actual value of the,
Figure BDA0003706555820000144
is the predicted average value, y ave Is the actual average. The smaller the values of the indexes MAE and RMSE are, the better the prediction performance of the model is, and the smaller the error is.
In order to verify that the wind power prediction model provided by the embodiment has higher prediction accuracy under the condition of different prediction step lengths, short-term wind power output power prediction of four different time step lengths, namely 10min, 1h, 6h and 12h, is respectively carried out, and is analyzed and compared with a traditional single prediction model, the effectiveness of the method is verified by analyzing an MAE value and an RMSE value, and the experimental result is shown in figure 7.
Example two:
the embodiment provides a short-term wind power prediction system based on an ensemble learning algorithm, which includes:
the data preprocessing module is used for selecting a plurality of influence factors related to wind power, collecting all actual data of each influence factor, detecting abnormal values through a Lauda criterion, eliminating abnormal data, and supplementing the abnormal data by using a forward filling method to obtain sample data of each influence factor;
the influence factor screening module is used for carrying out correlation analysis according to the sample data of each influence factor and acquiring influence factors with high correlation with the wind power as reference factors;
the system comprises a Stacking integrated learning module, a learning module and a learning module, wherein the Stacking integrated learning module is established and is of a double-layer structure comprising a base learner at a first layer and a meta-learner at a second layer, and at least one machine learning model is arranged in each of the base learner and the meta-learner;
the sample data making module is used for obtaining a plurality of groups of actual data of each reference factor as characteristic parameters to form a training sample set;
as a preferred implementation manner of this embodiment, the method for performing correlation analysis according to sample data of each influence factor and acquiring an influence factor with high correlation with the wind power as a reference factor specifically includes:
introducing a Copula function, and selecting an optimal Copula function which is closest to sample data of each influence factor from the alternative Copula functions respectively based on the sample data of each influence factor;
calculating Spearman rank correlation coefficients of the optimal Copula function corresponding to the influence factors;
and determining the relevance of each influence factor relative to the wind power according to the Spearman rank correlation coefficient corresponding to each influence factor, and selecting a plurality of influence factors with high relevance as reference factors.
As a preferred implementation of this embodiment, the specific step of introducing the Copula function, and based on the sample data of each influence factor, selecting the optimal Copula function that is closest to the sample data of each influence factor from the candidate Copula functions respectively is:
inputting actual data of each influence factor and short-term wind power, and calling ksDensity functions in MATLAB to obtain edge distribution functions of each group of influence factors and short-term wind power; based on each group of influence factors and the edge distribution function of the short-term wind power, obtaining a parameter estimation value of an alternative Copula function corresponding to each group of influence factors and the short-term wind power by using a maximum likelihood estimation method, and further obtaining an alternative Copula function corresponding to each group of influence factors and the short-term wind power;
inputting actual data of each influencing factor and short-term wind power, calling an ecdf function in the MATLAB to obtain an empirical distribution function of each group of influencing factors and short-term wind power, and calling a Spline function to obtain an empirical distribution function value of an original sample point by adopting a Spline interpolation method; the original sample points are a group of influence factors and actual data of short-term wind power;
defining an empirical Copula function in MATLAB, and calculating the empirical Copula function value of the original sample point and each candidate Copula function value according to the obtained empirical distribution function value of the original sample point;
calculating Euclidean distances between empirical Copula function values of original sample points and optional Copula function values, and taking the optional Copula function with the minimum Euclidean distance as an optimal Copula function between corresponding influence factors and short-term wind power;
the calculation formula of the Euclidean distance is as follows:
Figure BDA0003706555820000161
in the formula (I), the compound is shown in the specification,
Figure BDA0003706555820000162
is an empirical Copula function, C j (u i ,v i ) As an alternative Copula function, (u) i ,v i ) The empirical distribution function value.
And calculating a corresponding Spearman rank correlation coefficient through an optimal Copula function.
The Spearman correlation coefficient ρ is calculated as follows:
Figure BDA0003706555820000171
the value range of the spearman correlation coefficient is [ -1.1], the spearman correlation coefficient can reflect the correlation strength among variables, and the larger the absolute value of the value is, the stronger the dependency among the variables is; when the value is 0, no correlation exists among the variables, and several influencing factors with strongest dependence are screened on the basis of the correlation.
As a preferred embodiment of this embodiment, the base learner is provided with more than one machine learning model selected from a K nearest neighbor algorithm, a random forest algorithm, a gradient boosting decision tree, XG-Boost, and Light-GBM;
and a Light-GBM machine learning model is arranged in the meta-learner.
The selection of a plurality of different types of algorithms in the base learner respectively considers that the selection of different types of algorithms can enable the model to train data from different data spaces and structures, so that the advantages of different algorithms are fully exerted, and the prediction performance is optimal. In the meta-learner selection, the Light-GBM algorithm is selected as a model with good generalization capability in consideration of preventing the over-fitting phenomenon.
As a preferred embodiment of this embodiment, in the step of evaluating the prediction accuracy of the result output by the meta learner, the evaluation index specifically adopts a squared absolute error MAE and a root mean square error RMSE.
The specific equations for the squared absolute error MAE and the root mean square error RMSE are as follows:
Figure BDA0003706555820000172
Figure BDA0003706555820000173
Figure BDA0003706555820000181
Figure BDA0003706555820000182
where K is the amount of training or test data,
Figure BDA0003706555820000183
is the predicted value, y (i) Is the actual value of the,
Figure BDA0003706555820000184
is the predicted average value, y ave Is the actual average. The smaller the values of the indexes MAE and RMSE are, the better the prediction performance of the model is, and the smaller the error is.
Example three:
the present embodiment provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the short-term wind power prediction method and system according to any embodiment of the present invention.
Example four:
the embodiment provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the method and the system for predicting short-term wind power according to any embodiment of the present invention are implemented.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A short-term wind power prediction method based on an ensemble learning algorithm is characterized by comprising the following steps:
data preprocessing, namely selecting a plurality of influence factors related to wind power, collecting all actual data of each influence factor, detecting abnormal values through a Lauda criterion, removing abnormal data, and supplementing the abnormal data by a forward filling method to obtain sample data of each influence factor;
performing correlation analysis according to the sample data of each influence factor, and acquiring influence factors with high correlation with the wind power as reference factors;
establishing a Stacking integrated learning model, wherein the Stacking integrated learning model is a double-layer structure comprising a base learner at a first layer and a meta learner at a second layer, and at least one machine learning model is arranged in each of the base learner and the meta learner;
acquiring a plurality of groups of actual data of each reference factor as characteristic parameters to form a training sample set;
performing K-fold cross validation training on the base learner by using a training sample set, inputting output of the base learner after K times of iteration as original data into a meta-learner, training the meta-learner, evaluating the prediction precision of a result output by the meta-learner, outputting a final prediction result after the preset prediction precision is reached, storing parameters of the base learner and the meta-learner, and outputting a trained Stacking integrated learning model as a wind power prediction model;
and predicting short-term wind power by using the wind power prediction model.
2. The short-term wind power prediction method based on the ensemble learning algorithm according to claim 1, wherein the method for performing correlation analysis according to the sample data of the influencing factors and acquiring the influencing factors with high correlation with the wind power as reference factors comprises the following specific steps:
introducing Copula functions, and selecting the optimal Copula function closest to the sample data of each influence factor from the alternative Copula functions respectively based on the sample data of each influence factor;
calculating Spearman rank correlation coefficients of the optimal Copula function corresponding to all the influence factors;
and determining the relevance of each influence factor relative to the wind power according to the Spearman rank correlation coefficient corresponding to each influence factor, and selecting a plurality of influence factors with high relevance as reference factors.
3. The method for predicting short-term wind power based on the ensemble learning algorithm according to claim 2, wherein the specific step of introducing the Copula function, and based on the sample data of each influence factor, selecting the optimal Copula function closest to the sample data of each influence factor from the alternative Copula functions respectively comprises:
inputting actual data of each influence factor and the short-term wind power, and calling ksdiversity functions in the MATLAB to obtain edge distribution functions of each group of influence factors and the short-term wind power; based on each group of influence factors and the edge distribution function of the short-term wind power, obtaining a parameter estimation value of each group of impression factors and the alternative Copula function corresponding to the short-term wind power by using a maximum likelihood estimation method, and further obtaining the alternative Copula function corresponding to each group of influence factors and the short-term wind power;
inputting actual data of each influence factor and short-term wind power, calling an ecdf function in the MATLAB to obtain an empirical distribution function of each group of influence factors and short-term wind power, and calling a Spline function to obtain an empirical distribution function value of an original sample point by adopting a Spline interpolation method; the original sample points are a set of influence factors and actual data of short-term wind power.
Defining an empirical Copula function in MATLAB, and calculating the empirical Copula function value of the original sample point and each candidate Copula function value according to the obtained empirical distribution function value of the original sample point;
calculating Euclidean distances between empirical Copula function values of original sample points and optional Copula function values, and taking the optional Copula function with the minimum Euclidean distance as an optimal Copula function between corresponding influence factors and short-term wind power;
and calculating a corresponding Spearman rank correlation coefficient through an optimal Copula function.
4. The short-term wind power prediction method based on the ensemble learning algorithm according to claim 1, wherein,
the base learner is provided with a K nearest neighbor algorithm, a random forest algorithm, a gradient lifting decision tree, an XG-Boost and a Light-GBM;
and a Light-GBM machine learning model is arranged in the meta-learner.
5. The short-term wind power prediction method based on the ensemble learning algorithm as claimed in claim 1, wherein in the step of evaluating the prediction accuracy of the result output by the meta-learner, the evaluation index specifically adopts a squared absolute error MAE and a root mean square error RMSE.
6. A short-term wind power prediction system based on an ensemble learning algorithm is characterized by comprising:
the data preprocessing module is used for selecting a plurality of influence factors related to the wind power, collecting all actual data of each influence factor, detecting abnormal values through a Lauda criterion, removing abnormal data, and supplementing the abnormal data by a forward filling method to obtain sample data of each influence factor;
the influence factor screening module is used for carrying out correlation analysis according to the sample data of each influence factor and acquiring influence factors with high correlation with the wind power as reference factors;
the system comprises a Stacking integrated learning module, a learning module and a learning module, wherein the Stacking integrated learning module is of a double-layer structure comprising a base learner at a first layer and a meta learner at a second layer, and at least one machine learning model is arranged in each of the base learner and the meta learner;
and the sample data making module is used for obtaining a plurality of groups of actual data of each reference factor as characteristic parameters to form a training sample set.
7. The short-term wind power prediction system based on the ensemble learning algorithm according to claim 6, wherein the method for performing correlation analysis according to the sample data of each influence factor to obtain the influence factor with high correlation with the wind power as the reference factor specifically comprises:
introducing a Copula function, and selecting an optimal Copula function which is closest to sample data of each influence factor from the alternative Copula functions respectively based on the sample data of each influence factor;
calculating Spearman rank correlation coefficients of the optimal Copula function corresponding to the influence factors;
and determining the correlation of each influence factor relative to the wind power according to the Spearman rank correlation coefficient corresponding to each influence factor, and selecting a plurality of influence factors with high correlation as reference factors.
8. The short-term wind power prediction system based on ensemble learning algorithm according to claim 6, wherein the specific step of introducing Copula function, based on the sample data of each influencing factor, and selecting the optimal Copula function closest to the sample data of each influencing factor from the alternative Copula functions respectively is as follows:
inputting actual data of each influence factor and short-term wind power, and calling ksDensity functions in MATLAB to obtain edge distribution functions of each group of influence factors and short-term wind power; based on each group of influence factors and the edge distribution function of the short-term wind power, obtaining a parameter estimation value of each group of influence factors and the alternative Copula function corresponding to the short-term wind power by using a maximum likelihood estimation method, and further obtaining the alternative Copula function corresponding to each group of influence factors and the short-term wind power;
inputting actual data of each influence factor and short-term wind power, calling an ecdf function in the MATLAB to obtain an empirical distribution function of each group of influence factors and short-term wind power, and calling a Spline function to obtain an empirical distribution function value of an original sample point by adopting a Spline interpolation method; the original sample points are a group of influence factors and actual data of short-term wind power;
defining an empirical Copula function in MATLAB, and calculating the empirical Copula function value of the original sample point and each candidate Copula function value according to the obtained empirical distribution function value of the original sample point;
calculating Euclidean distances between empirical Copula function values of original sample points and alternative Copula function values, and taking an alternative Copula function with the minimum Euclidean distance as an optimal Copula function between corresponding influence factors and short-term wind power;
and calculating a corresponding Spearman rank correlation coefficient through an optimal Copula function.
9. The short-term wind power prediction system based on ensemble learning algorithm as claimed in claim 6,
the base learner is provided with a base learning device comprising: a K nearest neighbor algorithm, a random forest algorithm, a gradient lifting decision tree, an XG-Boost and a Light-GBM;
and a Light-GBM machine learning model is arranged in the meta-learner.
10. The short-term wind power prediction system based on ensemble learning algorithm as claimed in claim 6, wherein in the step of evaluating the prediction accuracy of the result output by the meta-learner, the evaluation index specifically adopts the square absolute error MAE and the root mean square error RMSE.
CN202210709609.0A 2022-06-21 2022-06-21 Short-term wind power prediction method and system based on ensemble learning algorithm Pending CN114925620A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210709609.0A CN114925620A (en) 2022-06-21 2022-06-21 Short-term wind power prediction method and system based on ensemble learning algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210709609.0A CN114925620A (en) 2022-06-21 2022-06-21 Short-term wind power prediction method and system based on ensemble learning algorithm

Publications (1)

Publication Number Publication Date
CN114925620A true CN114925620A (en) 2022-08-19

Family

ID=82814432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210709609.0A Pending CN114925620A (en) 2022-06-21 2022-06-21 Short-term wind power prediction method and system based on ensemble learning algorithm

Country Status (1)

Country Link
CN (1) CN114925620A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117894397A (en) * 2024-03-15 2024-04-16 北京科技大学 Continuous casting mold flux viscosity forecasting method based on machine learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117894397A (en) * 2024-03-15 2024-04-16 北京科技大学 Continuous casting mold flux viscosity forecasting method based on machine learning
CN117894397B (en) * 2024-03-15 2024-05-28 北京科技大学 Continuous casting mold flux viscosity forecasting method based on machine learning

Similar Documents

Publication Publication Date Title
US11888316B2 (en) Method and system of predicting electric system load based on wavelet noise reduction and EMD-ARIMA
US8706670B2 (en) Relative variable selection system and selection method thereof
CN110685868A (en) Wind turbine generator fault detection method and device based on improved gradient elevator
CN114399032B (en) Method and system for predicting metering error of electric energy meter
CN108549962B (en) Wind power prediction method based on historical segmented sequence search and time sequence sparsification
US20210256428A1 (en) Controller for controlling a technical system, and method for configuring the controller
CN111553482B (en) Machine learning model super-parameter tuning method
CN116842459B (en) Electric energy metering fault diagnosis method and diagnosis terminal based on small sample learning
CN114925620A (en) Short-term wind power prediction method and system based on ensemble learning algorithm
CN114253157A (en) Motor multi-parameter optimization method and system based on second-order sensitivity analysis
Zhang et al. A hybrid forecasting system with complexity identification and improved optimization for short-term wind speed prediction
CN111625992B (en) Mechanical fault prediction method based on self-optimal deep learning
CN106296434A (en) A kind of Grain Crop Yield Prediction method based on PSO LSSVM algorithm
CN115099461A (en) Solar radiation prediction method and system based on double-branch feature extraction
CN116245019A (en) Load prediction method, system, device and storage medium based on Bagging sampling and improved random forest algorithm
CN112861289B (en) Fan pitch system fault diagnosis method based on IMM-KEnPF
CN117150898A (en) Power transmission section limit transmission capacity assessment method based on parameter optimization ensemble learning
CN117150882A (en) Engine oil consumption prediction method, system, electronic equipment and storage medium
CN116596129A (en) Electric vehicle charging station short-term load prediction model construction method
CN116186586A (en) Rolling bearing fault diagnosis method based on improved empirical mode decomposition algorithm and optimized deep confidence network
CN116051268A (en) Personal credit evaluation method, system, readable storage medium and computer device
CN114881176A (en) Non-invasive load identification method based on self-adaptive optimization random forest
CN115296298A (en) Wind power plant power prediction method
CN115423091A (en) Conditional antagonistic neural network training method, scene generation method and system
CN114169226A (en) Short-term power load prediction method, computer device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination