CN115511170A

CN115511170A - Multi-photovoltaic power station power prediction error modeling method

Info

Publication number: CN115511170A
Application number: CN202211152635.4A
Authority: CN
Inventors: 王晨旭; 马骏超; 彭琰; 陆承宇; 王松; 吴俊�; 邓晖; 章枫; 程颖
Original assignee: Electric Power Research Institute of State Grid Zhejiang Electric Power Co Ltd
Current assignee: Electric Power Research Institute of State Grid Zhejiang Electric Power Co Ltd
Priority date: 2022-09-21
Filing date: 2022-09-21
Publication date: 2022-12-23

Abstract

The invention discloses a multi-photovoltaic power station power prediction error modeling method. The technical scheme adopted by the invention is as follows: firstly, dividing a multi-photovoltaic power station meteorological and power historical data set into data sets under different weather types by adopting a K-means clustering method; then, for a multi-photovoltaic power station data set under each weather type, a correlation model of the predicted power and the measured power is constructed by adopting a D Vine-Copula function, accurate modeling of multi-dimensional variable correlation is achieved; finally, on the basis of the known photovoltaic power station power prediction value, accurate modeling of photovoltaic power station power prediction error probability distribution under different weather conditions is achieved by means of a D vine correlation model, and the quantization precision of photovoltaic power station power prediction uncertainty is improved.

Description

Multi-photovoltaic power station power prediction error modeling method

Technical Field

The invention belongs to the technical field of new energy power generation control, and particularly relates to a Vine-Copula function-based multi-photovoltaic power station power prediction error modeling method.

Background

With the continuous propulsion of new power systems mainly based on new energy, high-proportion photovoltaic access power systems have become the trend of current energy development. The photovoltaic power generation is influenced by a plurality of meteorological factors such as irradiance, temperature and humidity, the output of the photovoltaic power generation has strong fluctuation and randomness, and the photovoltaic power generation has great influence on peak regulation, frequency modulation, standby and the like of a power grid. With the access of large-scale photovoltaic power grid, the risk brought by the uncertainty of output of the photovoltaic power grid to the regulation and control operation of the power system is more and more prominent. The photovoltaic power generation power is predicted with high precision, important information support can be provided for novel power system planning, operation, stability analysis and market trading, and the method has important significance for improving the comprehensive energy efficiency of the system and promoting the friendly consumption of new energy. At present, aiming at the existing fruitful research result of photovoltaic power prediction, the research idea of the method is mainly a machine learning and deep learning model for describing the relation between meteorological factors and power generation power, so that the power station power prediction value is obtained through a power prediction model on the basis of known meteorological prediction data. However, the photovoltaic power generation power is greatly influenced by weather change fluctuation, so that the power predicted value of the photovoltaic power generation power is prone to have deviation. Therefore, a photovoltaic power generation power prediction error model is needed to be established, and support is provided for quantifying the influence of prediction uncertainty on system operation and standby.

At present, modeling for power prediction errors of photovoltaic power stations at home and abroad mainly aims at a single photovoltaic power station. However, the output between photovoltaic power stations adjacent to the geographical position has strong correlation, and the power prediction error of the photovoltaic power stations also has similar characteristics. Because the multi-photovoltaic power station power prediction error modeling process relates to multi-dimensional variable correlation modeling, a modeling method for a single photovoltaic power station is difficult to directly transplant into an application scene of the multi-photovoltaic power station. N.Zhang et al, the document (Modeling Conditional for Wind Power in Generation Scheduling [ J ]. IEEE transformation on Power Systems,2014,29 (3): 1316-1324), according to a multidimensional Gaussian Copula function, models Wind Power prediction errors, and results show that the Wind Power prediction values have great influence on the probability distribution of the prediction errors. A document (a method for estimating the probability distribution of the condition prediction error of the photovoltaic power generation output [ J ]. Power system automation, 2015,39 (16): 8-15) published by ZhaoWeijia et al, models photovoltaic power prediction errors under different weather types by utilizing a binary Gaussian Copula function, and the result shows that the weather types have great influence on the probability distribution of photovoltaic power prediction, but the asymmetric correlation structure among multidimensional variables cannot be accurately described only by adopting the Gaussian Copula function for describing the correlation of two-dimensional variables in research. The method comprises the steps of adopting fuzzy C-means clustering to classify single photovoltaic prediction errors and establishing a Gaussian mixture model suitable for describing photovoltaic prediction error distribution, but not considering the output influence of photovoltaic power stations with similar geographic positions, and being incapable of fully utilizing existing data information to obtain a more accurate probability distribution model, wherein the literature is published by Zhao Shuqiang et al [ a photovoltaic output prediction error distribution model [ J ] in the day ahead based on numerical characteristic clustering [ power system automation, 2019,43 (13): 36-45 ]. The method is characterized in that a method for predicting the power of the wind power based on clustering and nonparametric kernel density estimation is adopted in a document published by Zhang Ying et al (wind power prediction error analysis [ J ]. Solar science report, 2019,40 (12): 3594-3604), the prediction error data with similar characteristics are classified into one class by adopting a clustering method, and then the nonparametric kernel density estimation is adopted to obtain the probability distribution of the prediction error data, but the research is only suitable for analyzing the error distribution characteristics under the conditions of known power prediction and actually measured historical data, and the probability distribution of the predicted output of the new energy in a future period of time is difficult to construct on the basis of the known power prediction value.

Therefore, the existing new energy power prediction error modeling method is difficult to meet the multi-photovoltaic power station prediction error modeling requirements under different weather types, and a new modeling method needs to be provided.

Disclosure of Invention

Aiming at the problem that the power prediction error model of the multi-photovoltaic power station is difficult to accurately construct, the invention provides a multi-photovoltaic power station power prediction error modeling method based on a Vine-Copula function, which is based on a Vine-Copula theory, constructs a correlation model between the power prediction value and the power measured value of the multi-photovoltaic power station under different weather types, and therefore accurate modeling of the prediction error is realized under the condition that the power prediction value is known.

Therefore, the technical scheme adopted by the invention is as follows: a multi-photovoltaic power station power prediction error modeling method comprises the following model construction steps:

1) According to historical statistical data of a plurality of photovoltaic power stations in an area, a historical data set comprising meteorological factors, photovoltaic predicted power and photovoltaic measured power of the photovoltaic power stations is constructed;

2) Obtaining photovoltaic power station predicted power and actual measurement power data sets under typical weather types by adopting a K-means clustering method according to a plurality of photovoltaic power station historical data sets in the area in the step 1) and taking solar irradiance, air temperature and air pressure as dividing bases;

3) Calculating the power prediction error of the photovoltaic power station in each time period according to the photovoltaic power station predicted power and the actually measured power data set in the typical weather type in the step 2);

4) The method comprises the following steps of constructing a correlation model between predicted power and actually measured power of the photovoltaic power station under different weather types by adopting a Vine-Copula function, wherein the specific process is as follows:

4-a) establishing a D Vine correlation model of the predicted power and the actually measured power of the photovoltaic power station under different weather types by using a D Vine-Copula function structure;

4-b) selecting binary Copula types and parameters in the D rattan correlation model one by adopting Euclidean distance inspection according to the output data sets of the photovoltaic power station under different weather types.

Further, in the step 1), in a photovoltaic power station with meteorological and electrical measurement devices, a historical data set containing meteorological factors and a historical data set containing photovoltaic power are constructed:

W _i,j ＝[I _i,j ,T _i,j ,V _i,j ]

wherein subscript i represents the photovoltaic power plant serial number; subscript j represents the historical data sequence number; w is a group of _i,j Representing the jth set of meteorological historical data of the ith photovoltaic power station; I.C. A _i,j 、T _i,j And V _i,j The meteorological factors used for power prediction of the photovoltaic power station are solar irradiance, temperature and air pressure respectively; d _i,j Representing a jth group of photovoltaic power historical data of an ith photovoltaic power plant;

the predicted power value of the ith photovoltaic power station;

and the measured power value of the ith photovoltaic power station.

Further, in the step 2), according to the meteorological factor historical data set of the photovoltaic power station, the data set is divided into different weather types by adopting a K-means clustering method, the K-means clustering method adopts the Euclidean distance between samples to describe the similarity of the samples, and taking the ith photovoltaic power station as an example, the Euclidean distance between the data samples is as follows:

wherein, Δ w _j.k Representing the Euclidean distance between the jth sample and the kth sample; i | · | | represents a 2 norm.

Further, the K-means clustering method is a typical unsupervised learning method, and an objective function for dividing a data set is as follows:

k is the number of clusters, and the number of clusters is the weather type to be divided; w _i,c The cluster center of the c-th cluster is determined by the expected value of the sample point belonging to the cluster; in the K-means clustering process, firstly, the clustering number K needs to be determined, K sample points are randomly selected as initial clustering, the Euclidean distance from the minimized sample point to the clustering center is used as an optimization objective function, the sample point and the clustering center in the clustering are continuously updated until convergence, and the division of different weather types is realized.

Further, the meteorological historical data set of the photovoltaic power station is divided into sunny days, cloudy days and rainy and snowy days through K-means clustering, and the corresponding data set passes through a variable W _i ¹ ，W _i ² And W _i ² Representing, sample points in each data set as

And

wherein the

usage types

1,2 and 3 correspond to sunny, cloudy and sleet weather.

Further, in the step 2), the meteorological historical data sets of the photovoltaic power station are divided into meteorological data sets under the weather types of sunny days, cloudy days and rainy and snowy days through a K-means clustering method, the photovoltaic power data at the same time are divided according to meteorological data time sequence labels to obtain photovoltaic power data sets under the weather types of sunny days, cloudy days and rainy and snowy days, and the corresponding data sets are subjected to variable quantity

And

representing, sample points in each data set as

And

further, in the step 3), after the photovoltaic power data sets under different weather types are obtained, a prediction error is calculated according to the photovoltaic power prediction data and the photovoltaic power actual measurement data in the data sets:

wherein e is _i,j The prediction error of each jth sample point is calculated;

the predicted power value of the ith photovoltaic power station;

and the measured power value of the ith photovoltaic power station.

Further, in the step 4-a), taking the data sets of the two photovoltaic power stations in the sunny weather as an example, it is assumed that the predicted power value and the measured power value of the first photovoltaic power station are respectively P ₁ ^f And P ₁ ^r The predicted value and the measured value of the power of the second photovoltaic power station are respectively

And

then the variable P ₁ ^r 、P ₁ ^f And

the correlation model between them is expressed as:

wherein F (-) represents a cumulative probability distribution function of the variable; c (-) represents a Copula function for describing the multi-dimensional variable correlation, and further obtains a probability density function for describing the multi-variable correlation:

wherein f (·) represents a cumulative probability distribution function of the variable; c (-) represents a Copula density function for describing the correlation of multidimensional variables;

for convenience of illustration, the variable x is used ₁ 、x ₂ And x ₃ To replace P ₁ ^r 、P ₁ ^f And

and its joint probability density function is further written as:

f(x ₁ ,x ₂ ,x ₃ )＝f(x ₁ )·f(x ₂ |x ₃ )·f(x ₁ |x ₂ ,x ₃ )

wherein, f (x) ₂ |x ₃ ) Further expressed as:

similarly, f (x) ₁ |x ₂ ,x ₃ ) Further expressed as:

f(x ₁ |x ₂ ,x ₃ )＝c _13|2 (F(x ₁ |x ₂ ),F(x ₃ |x ₂ ))·c ₁₂ (F(x ₁ ),F(x ₂ ))·f(x ₁ )

according to the expression, the variable x ₁ ，x ₂ And x ₃ The joint probability density function of (a) is expressed as:

f(x ₁ ,x ₂ ,x ₃ )＝c _13|2 (F(x ₁ |x ₂ ),F(x ₃ |x ₂ ))·c ₁₂ (F(x ₁ ),F(x ₂ ))f(x ₁ )

·c ₂₃ (F(x ₂ ),F(x ₃ ))·f(x ₁ )·f(x ₂ )·f(x ₃ )

the key for constructing the D vine correlation model is that c ₁₂ (·)、c ₂₃ (. And c) _13|2 Selecting a suitable Copula function type and determining parameters of the Copula function type.

Further, in the step 4-b), c is determined one by Euclidean distance test ₁₂ (·)、c ₂₃ (. And c) _13|2 (ii) optimal Copula function type and parameters; with c ₁₂ (. To) assume for example that variable x ₁ And x ₂ With m pieces of historical data, the difference between the empirical value and the theoretical value of the above-mentioned variable joint distribution function is expressed as:

wherein, C _em (F(x _1,i ),F(x _2,i ) Is a sample point (x) _1,i ,x _2,i ) An actual probability value; c (F (x) _1,i ),F(x _2,i ) ) calculating a probability value for the sample point by a theoretical Copula function; a proper function type is selected from Gauss Copula, t Copula and Frank Copula, and parameters of the function type are optimized, so that an empirical value and a theoretical value of a variable joint distribution function are minimum.

Further, when the power prediction value of the photovoltaic power station is known, estimating the probability distribution of the power prediction error according to the D vine correlation model constructed in the step 4); the predicted values of the first photovoltaic power station and the second photovoltaic power station are assumed to be x respectively ₂ ＝P ₁ ^f And

the probability distribution of the actual power of the first photovoltaic power plant is then expressed as:

since the first photovoltaic power plant power prediction error is expressed as:

Δx＝x ₁ -x ₂

after the probability distribution of the actual power of the first photovoltaic power station is obtained, the predicted power value P is subtracted from the probability distribution ₁ ^f The probability distribution of the prediction error deltax is obtained.

According to the method, the historical data sets of the multi-photovoltaic power station under different meteorological conditions are divided by utilizing K-means clustering; the invention realizes multiple photovoltaic by using a D Vine-Copula function modeling the correlation of power station power prediction and measured data; according to the method, the constructed D vine correlation model is utilized, and under the condition that the power predicted value of the photovoltaic power station is known, the photovoltaic power prediction error probability distribution is accurately constructed.

The modeling method is effective to wind power prediction errors and wind-light collaborative prediction errors, and is particularly suitable for the condition that the output of a plurality of power stations has strong correlation.

Drawings

FIG. 1 is a flow chart of a multi-photovoltaic power plant power prediction error modeling method of the present invention;

FIG. 2 is a graph of historical data of predicted power and measured power of a photovoltaic power station in an application example of the present invention;

FIG. 3 is a graph of historical data of predicted power and measured power in sunny weather obtained by clustering;

FIG. 4 is a graph of historical data of predicted power and measured power in cloudy weather obtained by clustering;

FIG. 5 is a graph of historical data of predicted power and measured power in rainy weather obtained by clustering according to the present invention;

FIG. 6 is a scatter diagram of historical data of predicted power and measured power under sunny weather in an application example of the present invention;

FIG. 7 is a scatter diagram of historical data of predicted power and measured power in cloudy weather according to an embodiment of the present invention;

FIG. 8 is a scatter diagram of historical data of predicted power and measured power in rainy and snowy weather in an application example of the present invention;

FIG. 9 is a logical structure diagram of D Teng Vine-Copula in an application example of the present invention;

FIG. 10 is a power prediction error probability distribution diagram obtained under a condition of a fine day when the predicted value of the photovoltaic power station power is 0.2p.u according to the present invention;

FIG. 11 is a power prediction error probability distribution diagram obtained under the condition of multiple clouds when the predicted value of the photovoltaic power station power is 0.2p.u;

fig. 12 is a power prediction error probability distribution diagram obtained when the predicted value of the photovoltaic power station power is 0.2p.u under the rainy weather condition.

Detailed Description

In order to more specifically describe the present invention, the following detailed description of the embodiments of the present invention is provided with reference to the accompanying drawings.

Examples

The embodiment is a multi-photovoltaic power station power prediction error modeling method based on a Vine-Copula function, as shown in fig. 1, the model construction steps are as follows:

2) According to the historical data sets of the photovoltaic power stations in the area in the step 1), taking comprehensive factors such as solar irradiance, air temperature and air pressure as dividing bases, and obtaining predicted power and actually measured power data sets of the photovoltaic power stations in typical weather types such as sunny days, cloudy days, rain and snow by adopting a K-means clustering method;

3) Calculating a photovoltaic power station power prediction error in each period of time according to the photovoltaic power station predicted power and the actually measured power data set in the typical weather type in the step 2);

4-a) establishing a correlation model of the predicted power and the actually measured power of the photovoltaic power station under different weather types by using a D Vine Vine-Copula function structure, wherein the selection range of the binary Copula function can comprise Gaussian Copula, t Copula, frank Copula and the like;

4-b) selecting binary Copula types and parameters in the correlation model one by adopting Euclidean distance detection according to the photovoltaic power station output data sets under different weather types;

when the predicted power value of the photovoltaic power station is known, the probability distribution of the prediction error of the photovoltaic power station under the predicted power can be obtained according to the correlation model of the D Vine Vine-Copula function constructed in the step 4).

Specifically, in the step 1), in a photovoltaic power station with meteorological and electrical measurement devices, a historical data set containing meteorological factors and a historical data set containing photovoltaic power can be constructed:

W _i,j ＝[I _i,j ,T _i,j ,V _i,j ]

wherein subscript i represents the photovoltaic power plant serial number; subscript j represents the historical data sequence number; w is a group of _i,j Representing the jth set of meteorological historical data of the ith photovoltaic power station; i is _i,j 、T _i,j And V _i,j The weather factors commonly used for power prediction of the photovoltaic power station are solar irradiance, temperature and air pressure respectively; d _i,j Representing the jth group of photovoltaic power historical data of the ith photovoltaic power station;

the predicted power value of the ith photovoltaic power station;

and the measured power value is the measured power value of the ith photovoltaic power station.

Specifically, in the step 2), according to the meteorological historical data set of the photovoltaic power station, the data set is divided into different weather types such as sunny days, cloudy days, rain and snow by adopting a K-means clustering method, the K-means clustering method adopts the euclidean distance between samples to describe the similarity of the samples, and taking the ith photovoltaic power station as an example, the euclidean distance between the data samples is as follows:

The K-means clustering method is a typical unsupervised learning method, and the objective function for dividing the data set is as follows:

wherein, K is the number of clusters, which is the weather type to be divided in this embodiment; w _i,c The cluster center for the c-th cluster is determined by the expected value of the sample point belonging to the cluster. In the K-means clustering process, firstly, the clustering number K needs to be determined, K sample points are randomly selected as initial clustering, the Euclidean distance from the minimized sample point to the clustering center is used as an optimization objective function, the sample point and the clustering center in the clustering are continuously updated until convergence, and the division of different weather types is realized.

Through K-means clustering, the meteorological historical data set of the photovoltaic power station can be divided into sunny days, cloudy days and rainy and snowy days, and the corresponding data set can pass through a variable W _i ¹ ，W _i ² And W _i ² Can be expressed as sample points in each data set

And

wherein the

usage types

1,2 and 3 correspond to sunny, cloudy and sleet weather.

Specifically, in the step 2), the meteorological historical data set of the photovoltaic power station is divided into meteorological data sets under sunny, cloudy and rainy-snowy weather types through a K-means clustering method, and the photovoltaic power data at the same moment can be divided according to the meteorological data time sequence labels to obtain sunny, cloudy and rainy days,Photovoltaic power data set under rain and snow weather type, and the corresponding data set can pass through variable

And

can be expressed as sample points in each data set

And

specifically, in step 3), after the photovoltaic power data sets under different weather types are obtained, a prediction error can be calculated according to the photovoltaic power prediction data and the photovoltaic power actual measurement data in the data sets:

wherein e is _i,j The prediction error of each jth sample point.

Specifically, in the step 4-a), based on the obtained photovoltaic power data sets under different weather types, correlation modeling may be performed on the predicted photovoltaic power values and the measured photovoltaic power values. The photovoltaic power prediction aims at accurately obtaining the actual output of the photovoltaic power station as much as possible, so that the predicted value and the measured value have stronger correlation, and the photovoltaic power prediction can be modeled through a Copula theory. Taking the data sets of two photovoltaic power stations in the sunny weather type as an example, suppose that the predicted power value and the measured power value of the first photovoltaic power station are respectively P ₁ ^f And P ₁ ^r The predicted and measured values of the power of the second photovoltaic power station are respectively

And

then the variable P ₁ ^r 、P ₁ ^f And

the correlation model between can be expressed as:

wherein F (-) represents a cumulative probability distribution function of the variable; c (-) represents a Copula function for describing the multi-dimensional variable dependence. Further, a probability density function describing the multivariate correlation can be obtained:

wherein f (·) represents a cumulative probability distribution function of the variable; c (-) represents the Copula density function used to describe the multi-dimensional variable dependence. Commonly used Copula functions include gaussian Copula, t Copula, frank Copula, and the like. For multidimensional variables, the correlation structure may be complex, so that the Vine-Copula function is preferably used for modeling. The Vine-Copula function usually takes two forms, C Vine and D Vine, where the D Vine structure is suitable for describing situations where the correlation degree between two variables is close. For ease of illustration, the variable x is used ₁ 、x ₂ And x ₃ To replace P ₁ ^r 、P ₁ ^f And

and its joint probability density function is further written as:

f(x ₁ ,x ₂ ,x ₃ )＝f(x ₁ )·f(x ₂ |x ₃ )·f(x ₁ |x ₂ ,x ₃ )

wherein f (x) ₂ |x ₃ ) Further expressed as:

similarly, f (x) ₁ |x ₂ ,x ₃ ) Further expressed as:

according to the above expression, the variable x can be expressed ₁ ，x ₂ And x ₃ The joint probability density function of (a) is expressed as:

·c ₂₃ (F(x ₂ ),F(x ₃ ))·f(x ₁ )·f(x ₂ )·f(x ₃ )

according to the expression, the Copula function of the D vine models a correlation structure among multiple variables by adopting a Copula function between every two variables. The key for constructing the D vine correlation model is that c is ₁₂ (·)、c ₂₃ (. Cndot.) and c _13|2 (. Cndot.) A suitable Copula function type is selected and its parameters are determined.

Specifically, in the step 4-b), the Euclidean distance test is adopted to determine c one by one ₁₂ (·)、c ₂₃ (. Cndot.) and c _13|2 (ii) optimal Copula function type and parameters. With c ₁₂ (. Cndot.) As an example, assume the variable x ₁ And x ₂ With m pieces of historical data, the difference between the empirical value and the theoretical value of the above-mentioned variable joint distribution function can be expressed as:

wherein, C _em (F(x _1,i ),F(x _2,i ) Is a sample point (x) _1,i ,x _2,i ) An actual probability value; c (F (x) _1,i ),F(x _2,i ) Is prepared byThe resulting probability values are calculated by the theoretical Copula function at the sample points. A proper function type is selected from common Gauss Copula, t Copula and Frank Copula, and parameters of the function type are optimized, so that an empirical value and a theoretical value of a variable joint distribution function are minimum. According to the above process, the types of the Copula functions in the D-rattan correlation model can be determined one by one.

When the power prediction value of the photovoltaic power station is known, estimating the probability distribution of the power prediction error according to the D vine correlation model constructed in the step 4). The predicted values of the first photovoltaic power station and the second photovoltaic power station are assumed to be x respectively ₂ ＝P ₁ ^f And

the probability distribution of the actual power of the first photovoltaic plant can then be expressed as:

further, since the first photovoltaic power plant power prediction error can be expressed as:

Δx＝x ₁ -x ₂

therefore, after the probability distribution of the actual power of the first photovoltaic power station is obtained, the predicted power value P can be subtracted from the probability distribution ₁ ^f The probability distribution of the prediction error deltax is obtained. Similarly, if a correlation model of the output of a plurality of photovoltaic power stations is constructed in the step 4), when the power predicted value of each photovoltaic power station is known, the probability distribution of the actual power of each power station and the probability distribution of the prediction error can be obtained through the correlation model.

Application example

The method of the invention takes historical data of 2 photovoltaic power stations in east China as an example for explanation, and comprises the following steps:

(1) And according to historical statistical data of a plurality of photovoltaic power stations in the area, constructing a historical data set containing meteorological factors, photovoltaic predicted power and photovoltaic measured power of the photovoltaic power stations.

Fig. 2 is a curve of predicted power and actually measured power of a first photovoltaic power station located in eastern China. According to time sequence data recorded by the power station meteorological and electrical measurement device, a meteorological historical data set and a photovoltaic power historical data set can be constructed:

W _1,j ＝[I _1,j ,T _1,j ,V _1,j ]

wherein, W _1,j Representing a jth set of meteorological historical data for the first photovoltaic power plant; i is _1,j 、T _1,j And V _1,j The meteorological factors used for power prediction of the photovoltaic power station are solar irradiance, temperature and air pressure respectively; d _1,j Representing a jth group of photovoltaic power historical data of the first photovoltaic power station;

a predicted power value for the first photovoltaic power station;

the measured power value is the measured power value of the first photovoltaic power station. Likewise, for a second photovoltaic plant, its meteorological historical dataset and power historical dataset, respectively W, may be obtained _2,j And D _2,j 。

(2) According to the historical data set of the photovoltaic power station, the meteorological factors are used as dividing bases, and prediction data and actual measurement data of the photovoltaic power station in typical weather types such as sunny days, cloudy days, rain and snow are obtained.

And according to the meteorological historical data sets of the photovoltaic power stations, adopting a K-means clustering method to enable the meteorological historical data sets of the first photovoltaic power station to be data sets in different weather types such as sunny days, cloudy days, rain and snow. The corresponding data set passes through the variable W _i ¹ ，W _i ² And W _i ² Can be expressed as sample points in each data set

And

the

use types

1,2 and 3 correspond to sunny, cloudy, and rainy and snowy weather. According to the meteorological data time sequence label, the photovoltaic power data at the same moment are divided to obtain photovoltaic power data sets under sunny days, cloudy days and rainy and snowy weather types, and the corresponding data sets can pass through variables

And

can be expressed as sample points in each data set

And

fig. 3 to 5 are curves of photovoltaic predicted power and measured power under three different weather conditions. It can be seen that the photovoltaic power station output fluctuation under different weather conditions is obviously different. Fig. 6 to 8 are scatter diagrams of photovoltaic predicted power and measured power under three different weather conditions. It can be seen that under the condition of sunny days, the correlation degree between the photovoltaic power predicted value and the actual value is very high, and the scatter diagrams are distributed near the diagonal; the correlation degree of the photovoltaic power predicted value and the actual value is reduced under the cloudy condition, and the area covered by scattered points in the graph is increased; under the rain and snow condition, the correlation degree of the photovoltaic power predicted value and the actual value is obviously reduced, and the characteristic of nonlinear correlation is presented. The difference between the photovoltaic power predicted value and the actual value under different meteorological conditions is obvious, and the necessity of considering the meteorological conditions for prediction error modeling is illustrated.

3) According to the predicted power and actually measured power data sets of the photovoltaic power station under different weather types

And

and calculating the power prediction error of the photovoltaic power station in each time period.

Wherein e is _i,j The prediction error of each jth sample point.

4) The Vine-Copula function is adopted to construct a correlation model between predicted power and actually measured power of 2 photovoltaic power stations under different weather types, and for convenience of explanation, a variable x is adopted ₁ 、x ₂ And x ₃ To replace P ₁ ^r 、P ₁ ^f And

4-a) establishing a correlation model of the predicted power and the measured power of the photovoltaic power station under different weather types by using a D vine Copula function structure, wherein a variable x ₁ 、x ₂ And x ₃ The joint probability density function of (a) may be expressed as:

·c ₂₃ (F(x ₂ ),F(x ₃ ))·f(x ₁ )·f(x ₂ )·f(x ₃ )

according to the expression, the Copula function of the D vine models a correlation structure among multiple variables by adopting a Copula function between every two variables. FIG. 9 is a logical structure of the D rattan Copula function. Wherein, structure c ₁₂ (·)、c ₂₃ (. And c) _13|2 (. Cndot.) is the binary Copula function type to be determined.

4-b) selecting c by European distance test according to two photovoltaic power station power historical data sets ₁₂ (·)、c ₂₃ (. And c) _13|2 (ii) optimal Copula function type and parameters. With c ₁₂ (. To) assume for example that variable x ₁ And x ₂ With m history data, variable x ₁ And x ₂ The difference between the empirical and theoretical values of the joint distribution function can be expressed as:

wherein, C _em (F(x _1,i ),F(x _2,i ) Is a sample point (x) _1,i ,x _2,i ) An actual probability value; c (F (x) _1,i ),F(x _2,i ) The resulting probability value is calculated for that sample point by the theoretical Copula function. Commonly used binary Copula functions include gaussian Copula, tCopula, frank Copula, and the like.

5) According to the correlation model of the predicted value and the measured value of the power of the multi-photovoltaic power station, when the predicted power value of the photovoltaic power station is known, the probability distribution of the prediction error of the photovoltaic power station under the predicted power is obtained through the correlation model:

wherein, the predicted values of the first photovoltaic power station and the second photovoltaic power station are assumed to be x respectively ₂ ＝P ₁ ^f And

when the actual power x of the first photovoltaic power station is obtained ₁ After the probability distribution, the power prediction value P can be subtracted from the probability distribution ₁ ^f And then obtaining the probability distribution of the prediction error.

Fig. 10 to 12 are power prediction error probability distributions of the first photovoltaic power station when the power prediction values of the two photovoltaic power stations are both 0.2p.u under different weather types. As can be seen from comparison, the probability distribution obtained by the modeling method provided by the invention has higher fitting degree with the statistical analysis result of the historical data. Compared with the Beta distribution which is commonly used for fitting the photovoltaic power prediction error at present, the method has higher fitting precision, and the effectiveness of the method is proved.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A multi-photovoltaic power station power prediction error modeling method is characterized in that the model construction steps are as follows:

1) According to historical statistical data of a plurality of photovoltaic power stations in the area, a historical data set containing meteorological factors, photovoltaic predicted power and photovoltaic measured power of the photovoltaic power stations is constructed;

4) The method comprises the following steps of constructing a correlation model between predicted power and measured power of the photovoltaic power station under different weather types by adopting a Vine-Copula function, wherein the correlation model comprises the following specific processes:

2. The modeling method for multi-photovoltaic power plant power prediction errors of claim 1, characterized in that in step 1), in a photovoltaic power plant with meteorological and electrical measurement devices, a historical dataset containing meteorological factors and photovoltaic power is constructed:

W _i,j ＝[I _i,j ,T _i,j ,V _i,j ]

wherein, subscript i represents a photovoltaic power station serial number; subscript j represents the historical data sequence number; w is a group of _i,j Representing a jth set of meteorological historical data for an ith photovoltaic power plant; i is _i,j 、T _i,j And V _i,j The meteorological factors used for power prediction of the photovoltaic power station are solar irradiance, temperature and air pressure respectively; d _i,j Representing the jth group of photovoltaic power historical data of the ith photovoltaic power station;

the predicted power value of the ith photovoltaic power station is obtained;

3. The multi-photovoltaic power station power prediction error modeling method of claim 2, characterized in that in the step 2), according to the historical data set of the meteorological factors of the photovoltaic power station, the data set is divided into different weather types by adopting a K-means clustering method, the K-means clustering method adopts Euclidean distances among samples to describe the similarity of the samples, taking the ith photovoltaic power station as an example, the Euclidean distances among the data samples are as follows:

wherein, Δ w _j.k Representing the Euclidean distance between the jth sample and the kth sample; i | · | | represents a 2-norm.

4. The multi-photovoltaic power plant power prediction error modeling method of claim 2, wherein the K-means clustering method is a typical unsupervised learning method that partitions the data set with an objective function of:

k is the number of clusters, and the number of clusters is the weather type to be divided; w is a group of _i,c Determining the cluster center of the c-th cluster according to the expected value of the sample point belonging to the cluster; in the K-means clustering process, firstly, the clustering number K needs to be determined, K sample points are randomly selected as initial clustering, the Euclidean distance from the minimized sample point to the clustering center is used as an optimization objective function, the sample point and the clustering center in the clustering are continuously updated until convergence, and the division of different weather types is realized.

5. The multi-photovoltaic power plant power prediction error modeling method of claim 4, characterized in that the photovoltaic power plant meteorological historical data sets are divided into sunny, cloudy, and sleet weather by K-means clustering, and the corresponding data sets are passed through a variable W _i ¹ ，W _i ² And W _i ² Showing, the sample points in each data set are represented as

And

wherein the usage types 1,2 and 3 correspond to sunny, cloudy and sleet weather.

6. The multi-photovoltaic power station power prediction error modeling method of claim 5, wherein in step 2), the photovoltaic power station meteorological historical data set is divided into meteorological data sets under sunny, cloudy, rainy and snowy weather types through a K-means clustering method, and the photovoltaic power data at the same time are divided according to meteorological data time sequence labels to obtain sunny, cloudy, rainy and cloudy daysPhotovoltaic power data set in snow weather type, corresponding data set passing variable

And

showing, the sample points in each data set are represented as

And

7. the modeling method for the power prediction errors of the multi-photovoltaic power station of claim 1, wherein in the step 3), after the photovoltaic power data sets under different weather types are obtained, the prediction errors are calculated according to the photovoltaic power prediction data and the photovoltaic power actual measurement data in the data sets:

wherein e is _i,j The prediction error of each jth sample point is calculated;

the predicted power value of the ith photovoltaic power station is obtained;

and the measured power value of the ith photovoltaic power station.

8. The multi-photovoltaic power station power prediction error modeling method of claim 1, characterized in that in step 4-a), the predicted power value and the measured power value of the first photovoltaic power station are assumed to be exemplified by data sets of two photovoltaic power stations in a sunny weather typeRespectively a value of P ₁ ^f And P ₁ ^r The predicted and measured values of the power of the second photovoltaic power station are respectively

And

then variable P ₁ ^r 、P ₁ ^f And P ₂ ^f The correlation model between them is expressed as:

F(P ₁ ^r ,P ₁ ^f ,P ₂ ^f )＝C(F(P ₁ ^r ),F(P ₁ ^f ),F(P ₂ ^f ))

wherein F (·) represents a cumulative probability distribution function of the variable; c (-) represents a Copula function for describing the multi-dimensional variable correlation, and further obtains a probability density function for describing the multi-dimensional variable correlation:

f(P ₁ ^r ,P ₁ ^f ,P ₂ ^f )＝c(F(P ₁ ^r ),F(P ₁ ^f ),F(P ₂ ^f ))·f(P ₁ ^r )·f(P ₁ ^f )·f(P ₂ ^f )

for ease of illustration, the variable x is used ₁ 、x ₂ And x ₃ To replace P ₁ ^r 、P ₁ ^f And P ₂ ^f And its joint probability density function is further written as:

f(x ₁ ,x ₂ ,x ₃ )＝f(x ₁ )·f(x ₂ |x ₃ )·f(x ₁ |x ₂ ,x ₃ )

wherein, f (x) ₂ |x ₃ ) Further expressed as:

similarly, f (x) ₁ |x ₂ ,x ₃ ) Further expressed as:

according to the expression, the variable x ₁ ，x ₂ And x ₃ Is expressed as:

f(x ₁ ,x ₂ ,x ₃ )＝c _13|2 (F(x ₁ |x ₂ ),F(x ₃ |x ₂ ))·c ₁₂ (F(x ₁ ),F(x ₂ ))f(x ₁ )·c ₂₃ (F(x ₂ ),F(x ₃ ))·f(x ₁ )·f(x ₂ )·f(x ₃ )

the key for constructing the D vine correlation model is that c is ₁₂ (·)、c ₂₃ (. Cndot.) and c _13|2 (. Cndot.) A suitable Copula function type is selected and its parameters are determined.

9. The multi-photovoltaic power plant power prediction error modeling method of claim 8, wherein in said step 4-b), c is determined one by one using Euclidean distance testing ₁₂ (·)、c ₂₃ (. Cndot.) and c _13|2 (. H) optimal Copula function type and parameters; with c ₁₂ (. To) assume for example that variable x ₁ And x ₂ With m pieces of historical data, the difference between the empirical value and the theoretical value of the above-mentioned variable joint distribution function is expressed as:

wherein, C _em (F(x _1,i ),F(x _2,i ) Is a sample point (x) _1,i ,x _2,i ) An actual probability value; c (F (x) _1,i ),F(x _2,i ))Calculating a probability value for the sample point through a theoretical Copula function; a proper function type is selected from Gauss Copula, t Copula and Frank Copula, and parameters of the function type are optimized, so that an empirical value and a theoretical value of a variable joint distribution function are minimum.

10. The modeling method for multi-photovoltaic power station power prediction errors according to claim 9, characterized in that when the predicted value of the photovoltaic power station power is known, the probability distribution of the power prediction errors is estimated according to the D-rattan correlation model constructed in step 4); the predicted values of the first photovoltaic power station and the second photovoltaic power station are assumed to be x respectively ₂ ＝P ₁ ^f And x ₃ ＝P ₂ ^f Then, the probability distribution of the actual power of the first photovoltaic power station is expressed as:

Δx＝x ₁ -x ₂