Disclosure of Invention
Due to the complexity of the historical data of the 'wind-solar-load', the correlation of the 'wind-solar-load' is likely to change under different output scenes. Aiming at the characteristic, the invention provides a Copula-based AC/DC hybrid power grid probabilistic power flow calculation method, which adopts a Fuzzy C-means (FCM) Fuzzy clustering algorithm based on a target function, carries out scene division on multidimensional data by the FCM clustering method, further determines the optimal Copula function type and parameters between every two random variables, takes Anderson-Darling (AD) distance as a rattan structure judgment standard, judges the optimal rattan structure of each scene, establishes a hybrid rattan Copula-based wind-solar-load model, and finally carries out probabilistic power flow calculation on an AC/VSC-MTDC hybrid power grid based on a Latin hypercube sampling algorithm by combining the hybrid rattan Copula model.
In order to achieve the purpose, the invention adopts the technical scheme that: a Copula-based AC-DC series-parallel power grid probability load flow calculation method comprises the following steps:
step S1: inputting a historical data set D of random variables of the power system, the number c of clustering centers, a dimension p, a maximum iteration iter and a sample number n, and outputting a clustering scheme by using an FCM algorithm;
step S2: calculating to obtain an improved Xie-Beni index value corresponding to each clustering center number o according to the clustering scheme obtained in the step S1, and outputting the optimal clustering center number obestAnd its corresponding scene division set O ═ (O)1,O2,…,OK) K is equal to the number o of optimal cluster centersbest;
Step S3: input scene partition set O ═ (O)1,O2,…,OK) Outputting an optimal Copula function under each scene;
step S4: respectively building a C rattan Copula model and a D rattan Copula model under each scene according to the optimal Copula function selected in the step S3; inputting a collected historical data set D, and outputting an improved mixed rattan Copula model;
step S5: generating a 'wind-solar-load' sample set L according to the improved mixed rattan Copula model established in the step S4;
step S6: and inputting the wind-solar-load sample set L and the power system parameters generated in the step S5, and outputting the voltage of the AC-DC hybrid power grid, the line load of the AC-DC hybrid power grid and the control parameters of the VSC converter station through the AC-DC hybrid power grid probability load model.
Further preferably, the step S1 includes the following steps:
s11, giving a historical data set D ═ x1,x2,...,xk,...,xn),xkInputting dimension p, maximum iteration iter, weighting index and clustering center number o to be 2-10 respectively for kth data and n being sample number;
s12, initializing a cluster center set R ═ R1,r2,…,ri,…,rj,…ro};riIs the ith cluster center, rjIs the jth cluster center;
s13, calculating a membership matrix U ═ Uik};
Where m represents a weighting index, m ∈ [1, + ∞), typically let m ═ 2; u. ofikRepresenting data xkA degree of membership to class i that satisfies:
dikrepresenting data xkAnd a clustering center riDistance of djkRepresenting data xkAnd a clustering center rjA distance which satisfies:
dik(xk,ri)=||xk-ri|| (3)
s14, updating a clustering center set R;
s15, judging whether the iteration termination condition is met, if not, repeating S13 and S14 until the objective function J is reachedm(U, R) convergence;
t represents transposition, and the selection rule of the matrix A is that when A is a symmetrical positive definite matrix, | | | | is an European norm; when A is a unit matrix, dikRepresenting the euclidean distance;
and S16, obtaining 9 clustering schemes according to the similarity degree of each object in the historical data set and the selected clustering center during each clustering.
Further preferably, the step S2 includes the following steps:
s21, outputting improved Xie-Beni index values V corresponding to results obtained by each clustering schemezmjDetermining the optimal number o of cluster centers of the multi-dimensional databestOutputting a corresponding scene division result;
d(ri,r0)=||ri-r0|| (7)
in the formula, n
iThe number of the class i samples is represented,
is the mean value of the cluster centers;
is the intra-class total variation;
is the total variation between classes, d (r)
i,r
0) Representing the center of the cluster r
iDistance from the mean of the cluster centers;
s22, comparing improved Xie-Beni index values V of all clustering schemeszmjThe number O of clustering centers corresponding to the minimum index value is the optimal number of clustering centers of the multi-dimensional data, and the corresponding scene division set O is output (O)1,O2,…,OK);
S23, according to the FCM clustering analysis result, defining a proportion function Q (i) in each scene, and expressing the formula (9):
in the formula, n (O)i) Representing a scene OiThe number of data contained in (1).
Further preferably, the step S3 includes the following steps:
s31, obtaining an edge cumulative distribution function F (x) of each dimension of data of multi-dimensional random variables by a nonparametric kernel density estimation methodi)
S32, establishing grid intervals by using a Matlab self-carrying mesegrid function and a linear function, and obtaining an experience cumulative distribution function value C of the corresponding experience Copula function of the original multi-dimensional data at the corresponding interval between every two grid intervals by using handle function statisticsn;
S33, taking five Copula functions of Gaussian Copula, t-Copula, Gumbel Copula, Clayton Copula and Frank Copula as candidate Copula functions in each scene;
s34, based on the grid interval obtained in S32, utilizing a copulacdf function with a Matlab self-carrying function to obtain an accumulated distribution function value C of each alternative Copula function in a corresponding intervalb;
S35, calculating Euclidean distance d between the empirical Copula function and each alternative Copula functionEuThe calculation method is as follows:
wherein u isi,viRepresenting two multi-dimensional random variables;
s36, each scene is according to the Euclidean distance dEuAnd respectively selecting an optimal Copula function between every two of the wind power and photovoltaic power, the photovoltaic power and load and the wind power and load.
Further preferably, the building steps of improving the rattan structure of the mixed rattan Copula model in the step S4 are as follows:
s41, inputting a collected historical data set D, and acquiring sampling points of a C rattan Copula model and a D rattan Copula model in each scene;
according to the property of the Copula function, decomposing a joint cumulative distribution function of the multidimensional variable by using a conditional cumulative distribution function, wherein the conditional cumulative distribution function is obtained in the following way:
wherein F (x | e) represents a conditional cumulative distribution function, elRepresenting the l variable, e, in a certain vector e-lIndicates the removal of e from elThe vector remaining after, C (F (x | e)-l),F(el|e-l) Copula function for connecting two conditional cumulative distribution functions, which is a joint cumulative distribution function;
for multidimensional variables X1,X2,…,XnAssuming:
in the formula ZnRepresents [0,1 ]]Uniformly distributed data over the range, Z1For the first dimension of the joint cumulative distribution function, Z2Second dimension data, F (X), being a joint cumulative distribution function1) Is X1The cumulative distribution function of;
the C rattan and D rattan were chosen differently for the vector number l, for C rattan:
for D vines:
the sampling point obtaining mode based on the rattan structure model is that firstly, a p-dimensional sample set which obeys independent uniform distribution is randomly generated, equations are written according to the structures of the C rattan and the D rattan respectively, first-dimensional sample points are uniformly distributed sampling points, sampling points corresponding to a second dimension can be obtained by using the property of condition distribution, and sample points of the first dimension are obtained one by one according to the previous sample points in sequence until sample points of all dimensions are obtained;
s42, generating a simulation data set of two rattan structures in each scene according to the sampling points acquired in the step S41;
s43, obtaining a sample cumulative distribution function value C of the sample multidimensional data Copula function according to the simulation data set generated in the step S42b’;
S44, calculating an experience cumulative distribution function value C of the experience Copula function in each scene at the corresponding intervalnSum sample cumulative integral distribution function value Cb' AD distance between;
the weight of the data location is limited by a weight function,
in the formula, W (u)i,vi) Representing a weight function;
a function value C of the empirical cumulative distribution at the corresponding section based on the empirical Copula function acquired in step S32nEstablishing AD distance d between the Copula model of the mixed rattan and the original dataADIs represented by formula (16):
wherein p is the dimension;
s45, establishing a mixed rattan Copula model according to the average value of the AD distance.
More preferably, the sample set L generating step in step S5 is as follows: sampling a multidimensional random variable by utilizing Latin hypercube sampling; according to the improved mixed rattan Copula model established in the step S4, the multidimensional random can be knownCumulative distribution function F (X) of variable X, and at [0,1 ]]The interval is uniformly distributed; to obtain N samples, [0,1 ]]The interval is divided into N parts to obtain N non-overlapping subintervals, and the length of each interval is 1/N; randomly selecting a sample value from each subinterval by means of the inverse F of the cumulative distribution function F (X)-1(X) calculating a sampling value; if the subinterval midpoint is selected, the sampling point may be calculated as follows:
in the formula, xiIs the sampled value, where i is 1,2, …, N; the N value of each scene is determined by the proportion of the original data, and the specific calculation is according to the formula (9); after the data sets in each scene are obtained, the data sets are combined to generate a 'wind-solar-load' sample set L based on an improved mixed rattan Copula model.
The invention has the advantages that:
firstly, an improved mixed rattan Copula model is adopted to more carefully describe and model the complex correlation between wind and light loads, and the complex correlation is used for the probabilistic load flow calculation of an AC/VSC-MTDC hybrid power grid;
secondly, aiming at the problem that the 'wind-solar-load' data scene with complex correlation is unreasonably divided by the K-means clustering algorithm, the FCM clustering algorithm is applied to endow each object and each class with a weight. A more flexible result is provided for scene division of the wind, light and charge random variable;
third, for the problem that the CvM distance is not reasonable as a scale criterion for selecting the vine structure, it is proposed to apply the Anderson-Darling (AD) distance to give higher weight to the position of high-density data distribution. The modeling precision of the hybrid rattan Copula is improved, and the accuracy of the probability load flow calculation of the AC/VSC-MTDC hybrid power grid is further improved.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings. The embodiments described below with reference to the accompanying drawings are illustrative and are intended to be illustrative of the invention, and should not be construed as limiting the invention.
A Copula-based method for calculating the probability load flow of an alternating-current and direct-current hybrid power grid comprises the following steps:
step S1: inputting a historical data set D of random variables (mainly comprising continuous random variables such as wind speed, illumination and load) of a power system, the number o of clustering centers (set to be 2-10 types), the dimension p, the maximum iteration iter, the number of samples n and the like, and outputting 9 clustering schemes by using an FCM algorithm;
the method comprises the following specific steps:
s11, giving a historical data set D ═ x1,x2,...,xk,...,xn),xkN is the number of samples for the kth data. Inputting a dimension p, a maximum iteration iter, a weighting index and a clustering center number o which is set to be 2-10 respectively;
s12, initializing a cluster center set R ═ R1,r2,…,ri,…,rj,…ro};riIs the ith cluster center, rjIs the jth cluster center;
s13, calculating a membership matrix U ═ Uik};
Where m represents a weighting index, m ∈ [1, + ∞), typically let m ═ 2; u. ofikRepresenting data xkA degree of membership to class i that satisfies:
dikrepresenting data xkAnd a clustering center riDistance of djkRepresenting data xkAnd a clustering center rjA distance which satisfies:
dik(xk,ri)=||xk-ri|| (3)
s14, updating a clustering center set R;
s15, judging whether the iteration termination condition is met, if not, repeating S13 and S14 until the objective function J is reachedm(U, R) convergence;
t represents transposition, and the selection rule of the matrix A is that when A is a symmetrical positive definite matrix, | | | | is an European norm; when A is a unit matrix, dikRepresenting the euclidean distance.
And S16, obtaining 9 clustering schemes according to the similarity degree of each object in the historical data set and the selected clustering center during each clustering. That is, when the number o of clustering centers is 2, the data of the history data set is divided into two types (two sets), and when the number o of clustering centers is 3, the data of the history data set is divided into three types (three sets); and so on, dividing the data of the historical data set into o sets according to the number o of the clustering centers.
Step S2: according to the clustering scheme obtained in the step S1, obtaining an improved Xie-Beni index value corresponding to each clustering center number o by using formulas (7) to (8), and outputting an optimal clustering center number obestAnd its corresponding scene division set O ═ (O)1,O2,…,OK) K is equal to the number o of optimal cluster centersbest。
S21, outputting improved Xie-Beni index values V corresponding to results obtained by each clustering schemezmjDetermining an optimal clustering center number o of the multi-dimensional databestAnd outputting the corresponding scene division result.
d(ri,r0)=||ri-r0|| (7)
In the formula, n
iThe number of the class i samples is represented,
is a class-centered mean;
is the intra-class total variation;
is the total variation between classes, d (r)
i,r
0) Representing the center of the cluster r
iDistance from the mean of class centers;
s22, comparing improved Xie-Beni index values V of all clustering schemeszmjThe number O of clustering centers corresponding to the minimum index value is the optimal number of clustering centers of the multi-dimensional data, and the corresponding scene division set O is output (O)1,O2,…,OK)。
The smaller the total variation in the classes when clustering is considered, the greater the similarity of data in the classes, the greater the total variation between the classes, the greater the separation degree between the classes, and the greater the mean value of the maximum fuzzy value in each class, the more clear the clustering result. Therefore, the best clustering result for FCM clustering is theoretically correspondingly improved by Xie-Beni index value VzmjIs measured.
S23, according to the FCM clustering analysis result, defining a proportion function Q (i) in each scene, and expressing the formula (9):
in the formula, n (O)i) Represents class OiThe number of data contained in (1), and n is the number of samples.
Step S3:input scene partition set O ═ (O)1,O2,…,OK) And outputting the optimal Copula function under each scene.
S31, respectively obtaining an edge cumulative distribution function F (x) of each dimension of data of multi-dimensional random variables by a nonparametric kernel density estimation methodi)。
S32, establishing grid intervals by using a Matlab self-carrying mesegrid function and a linear function, and obtaining experience cumulative distribution function values C of corresponding experience Copula functions of original multi-dimensional data in corresponding intervals by using handle function statistics for each grid intervaln。
S33, taking the five Copula functions of Gaussian Copula, t-Copula, Gumbel Copula, Clayton Copula and Frank Copula as the candidate Copula functions in each scene.
S34, based on the grid interval obtained in S32, utilizing a copulacdf function with a Matlab self-carrying function to obtain an accumulated distribution function value C of each alternative Copula function in a corresponding intervalb。
S35, calculating Euclidean distance d between the empirical Copula function and each alternative Copula functionEuThe smaller the value obtained, the more the Copula function can express its correlation. The calculation method is as follows:
wherein C isbCumulative distribution function value, C, representing each alternative Copula functionnRepresents the empirical cumulative distribution function value, u, of the empirical Copula function at the corresponding intervali,viThe method represents two multi-dimensional random variables, which can be continuous random variables such as wind speed, illumination, load and the like.
S36, each scene is according to the Euclidean distance dEuAnd respectively selecting an optimal Copula function between every two of the wind power and photovoltaic power, the photovoltaic power and load and the wind power and load.
Step S4: respectively building a C rattan Copula model and a D rattan Copula model under each scene according to the optimal Copula function selected in the step S3; and inputting the collected historical data set D, and outputting an improved mixed rattan Copula model.
The construction steps of improving the rattan structure of the mixed rattan Copula model are as follows:
s41, inputting the collected historical data set D, and obtaining sampling points of the C rattan Copula model and the D rattan Copula model in each scene.
According to the property of the Copula function, a joint cumulative distribution function of multidimensional variables can be decomposed by using a conditional cumulative distribution function, and the variables corresponding to the obtained conditional cumulative distribution function are independent from each other. The conditional cumulative distribution function is obtained in the form of equation (11)
Wherein F (x | e) represents a conditional cumulative distribution function, elRepresenting the l variable, e, in a certain vector e-lIndicates the removal of e from elThe vector remaining after, C (F (x | e)-l),F(el|e-l) Is a joint cumulative distribution function, Copula function for connecting two conditional cumulative distribution functions.
Based on this theoretical basis, for multidimensional variable X1,X2,…,XnThe cumulative distribution function of (c) may be assumed as follows:
in the formula ZnRepresents [0,1 ]]Uniformly distributed data over the range, Z1For the first dimension of the joint cumulative distribution function, Z2Second dimension data, F (X), being a joint cumulative distribution function1) Is X1The cumulative distribution function of (a).
The C rattan and D rattan were chosen differently for the vector number l, for C rattan:
for D vines:
the sampling point acquisition method based on the rattan structure model comprises the steps of firstly randomly generating a p-dimensional sample set which obeys independent uniform distribution, writing equations according to the structure columns of the C rattan and the D rattan, respectively, enabling first-dimensional sample points to be uniformly distributed sampling points, obtaining sampling points corresponding to a second dimension by using the property of conditional distribution, and acquiring sample points of the first dimension one by one according to the previous one in sequence until obtaining sample points of all dimensions. Taking the example of establishing a scene Copula model by using a C rattan structure, a specific process can be summarized as follows: generating a set of num multiplied by p (dimension of data number ﹡) uniform variables Z by utilizing Matlab self-contained unifrnd function simulation;
let the first set of variables u to be solved1Equal to the first column Z of the first dimension data1I.e. u1=Z1;
Using equation (11), the second dimension to be solved for variable u
2Can utilize
But it is difficult to obtain each function expression because of the calculation. Thus, Z in the solution
2Generated by step i. C (u)
1,u
2) Represents u
1,u
2The function value at the corresponding Copula function is calculated by the Copula cdf function. Solving the right side, adding a while loop, and solving u by adopting a dichotomy in the loop
2Ending until the difference between the data calculated by the right side of the equation and the left side is very small, and then randomly generating a second dimension variable u to be solved
2The simulation data is the simulation data;
use of
And carrying out iterative solution on subsequent dimension simulation data.
And S42, generating a simulation data set of two rattan structures in each scene according to the sampling points acquired in the step S41.
Because Z ═ generated in step S41 (Z)1,Z2,…,Zn) The interval is still [0,1 ]]It is therefore necessary to convert these data into the desired analog data set according to the inverse of the estimated edge distribution function.
S43, obtaining a sample cumulative distribution function value C of the sample multidimensional data Copula function according to the simulation data set generated in the step S42b’。
Creating a grid interval by using a Matlab self-carrying mesgrid function and a link function, and acquiring a sample cumulative distribution function value C of the optimal Copula function corresponding to the step S3 in the corresponding interval by using a Matlab self-carrying function copulacdf functionb’。
S44, calculating an experience cumulative distribution function value C of the experience Copula function in each scene at the corresponding intervalnAnd sample cumulative distribution function value CbThe AD (Anderson-Darling) distance between'.
The AD distance is actually an improvement over the CvM distance. The weight of the data location is limited by a weight function (as shown in equation (15)). Which corresponds to giving higher weight when calculating the squared distance of the tail data. When the square distance of the intermediate data is calculated, smaller weight is given to the intermediate data, and the modeling precision of the rattan structure model can be tested more effectively.
In the formula, W (u)i,vi) Representing a weight function, Cb’(ui,vi) The values of the distribution functions are accumulated for the samples.
A function value C of the empirical cumulative distribution at the corresponding section based on the empirical Copula function acquired in step S32nEstablishing AD distance d between the Copula model of the mixed rattan and the original dataADIs represented by formula (16):
wherein p is the dimension;
s45, establishing a mixed rattan Copula model according to the average value of the AD distance.
The smaller the AD distance, the higher the modeling accuracy. Thus, d for each sceneADAnd taking the mean value, wherein the minimum value is the optimal rattan Copula model under the scene. Combining the rattan Copula models of all scenes together is the established improved mixed rattan Copula model.
Step S5: and generating a 'wind-solar-load' sample set L according to the improved mixed rattan Copula model established in the step S4.
The sample set L is generated as follows:
multidimensional random variables are sampled using Latin Hypercube Sampling (LHS). The sampling points can completely cover the distribution interval and cannot be repeated, and the method has higher sampling efficiency and better robustness.
According to the improved mixed rattan Copula model established in the step S4, the cumulative distribution function f (X) of the multidimensional random variable X is known, and is a random variable and is in [0,1 ]]The distribution is uniform. To obtain N samples, [0,1 ]]The interval is divided into N parts to obtain N non-overlapping subintervals, and the length of each interval is 1/N. Randomly selecting a sample value from each subinterval by accumulating the inverse F (X) of the distribution function F (X)-1(X) calculating the sample values. If the subinterval midpoint is selected, the sampling point may be calculated as follows:
in the formula, xiIs the sampled value, where i is 1,2, …, N. The N value of each scene is determined by its raw data fraction, and is calculated according to equation (9). After the data sets in each scene are obtained, the data sets are combined to generate a 'wind-solar-load' sample set L based on an improved mixed rattan Copula model.
Step S6: and inputting data such as the wind-solar-load sample set L and the power system parameters generated in the step S5, and outputting the voltage of the alternating-current and direct-current hybrid power grid, the line load of the alternating-current and direct-current hybrid power grid and the control parameters of the VSC converter station through the alternating-current and direct-current hybrid power grid probabilistic load model.
And based on actual historical data of wind speed, illumination and load of a certain provincial power grid, constructing a 'wind-light-load' model considering correlation. In order to verify the effectiveness of the method provided by the invention, the following algorithm is adopted for comparison:
(1) inputting historical data into an alternating current-direct current hybrid power grid group by group to perform probability load flow calculation, and obtaining a probability load flow calculation result, which is called a 'reference method' for short;
(2) the method is characterized in that probability modeling is carried out on historical data of wind-solar-load based on an algorithm provided by a reference document [ Qieyabin, Euonymus, Xubei, and the like ] wind-solar combined power generation correlation modeling based on a mixed rattan Copula model and application thereof in reactive power optimization [ J ] power grid technology, 2017(03):791 and 798 ], and probability load flow calculation is carried out, so that an obtained result is called as a 'comparison method';
(3) the method is characterized in that probability modeling is carried out on historical data of wind and light load on the basis of the algorithm provided by the invention, and probability load flow calculation is carried out, so that the method is called as the method.
Fig. 1 shows the dc voltage amplitude relative error percentage for the "proposed method" and the "comparative method" compared to the "reference method". As can be seen from fig. 1, the relative error of the "contrast method" is much larger than the "proposed method". The average value of the percentage of relative error of the DC voltage amplitude in the comparison method is 13.17 percent, and the average value of the percentage of relative error of the DC voltage amplitude in the proposed method is 3.41 percent. The experimental conclusion fully proves the effectiveness of the calculation method provided by the invention.
The main reasons are as follows:
the method belongs to a hard clustering method and has higher accuracy when the data features are distinct. However, the combined wind and light output has strong correlation and is easily influenced by many factors such as climate, geographical location, etc., and each object in the data set may have characteristics of two or more cluster sets. Therefore, when the scene division is performed, if each object in the data set is divided into a specific scene by using the K-means clustering algorithm, the result of the scene division is not accurate enough. The FCM algorithm adopted by the invention can overcome the defects. The FCM algorithm fuses the essence of a fuzzy theory, and obtains the membership degree of each sample point to all class centers by optimizing an objective function, so as to determine the class of the sample points. The similarity between the objects divided into the same cluster is the largest, the similarity between different clusters is the smallest, and the purpose of classifying the sample data is achieved.
The "contrast method" uses the rattan structure scale index Cramer Von Mise (CvM) distance measure which is the integral of the squared distance of the empirical cumulative distribution function and the target cumulative distribution function. The position distribution of the wind-solar combined output historical data has differences, and most of the differences are concentrated on the tail part. While CvM distance gives the same weight to all positional differences when considering the selection of the vine structure. Therefore, if the CvM distance is used as an evaluation, the key location of the data distribution will be difficult to get important considerations and concerns. If a large amount of wind and light data are distributed at the tail, the rattan structure selected based on the CvM rating index will seriously affect the modeling precision of the mixed rattan Copula. The AD distance chosen by the present invention is actually an improvement over the CvM distance. This weighting function is mainly used to evaluate the weight of the data location by adding a weighting function. Equivalently, higher weight is given when the square distance of the tail data is calculated; when the square distance of the intermediate data is calculated, smaller weight is given to the intermediate data, and the modeling precision of the rattan structure model can be more effectively tested.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.