CN115775045A

CN115775045A - Photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment

Info

Publication number: CN115775045A
Application number: CN202211512958.XA
Authority: CN
Inventors: 陈龙; 杨卫东; 李盛盛; 张子谦; 梁淼; 涂金金; 邓箫
Original assignee: NARI Group Corp; Nari Information and Communication Technology Co
Current assignee: NARI Group Corp; Nari Information and Communication Technology Co
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2023-03-10

Abstract

The invention discloses a photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment, establishes a historical similar day cluster screening mechanism, provides a historical similar day fitting strategy based on mutation adjustment and a real-time trend prediction strategy based on ARIMA, LSTM, lightGBM and other multivariate models, and performs dynamic weight fitting on the two strategies. The invention realizes reasonable clustering of historical days, and can efficiently select a plurality of historical days most similar to the predicted days; a distributed photovoltaic real-time trend prediction mechanism is optimized, and the high-efficiency fusion of the prediction results of the multivariate model is realized; a multi-element fitting and dynamic adjusting method for historical and real-time prediction results is provided, and accuracy of distributed photovoltaic balance prediction is improved. The prediction result of the method can meet the professional requirements of service lines such as scheduling, equipment, marketing, development and the like, and has important significance for safe and stable operation of a novel power system.

Description

Photovoltaic balance prediction method based on historical similar days and real-time multidimensional research and judgment

Technical Field

The invention relates to a photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment, and belongs to the technical field of power grid digitization.

Background

Under the large background of promoting the construction of a novel power system, photovoltaic power generation has become one of important energy source forms in the modern energy internet. The photovoltaic power generation is used as a renewable energy power generation technology, the use scale is gradually enlarged, the accurate prediction of the photovoltaic power generation becomes an important research field of data mining, but the fluctuation and the intermittency of the photovoltaic power generation make the energy management and scheduling problems of a power system become more and more complex, the existing distributed photovoltaic balance prediction is difficult to reach the ideal height, and the adverse effect is caused on the operation stability of a power grid. Therefore, accurate and reliable distributed photovoltaic balance prediction is beneficial to meeting professional requirements of service lines such as scheduling, equipment, marketing and development, and the like, and has important significance for construction of a novel power system.

The distributed photovoltaic power grid power supply system has the advantages that distributed photovoltaic ordered access is promoted for implementation, flexible absorption is achieved, safe operation of a power grid is guaranteed, and the problems that load is difficult to balance, frequency is unstable, electric energy quality is poor and the like caused by intermittent photovoltaic output to the power grid are solved. Depending on an enterprise-level middle platform, integrating distributed photovoltaic user electrical and non-electrical collected data, and supporting photovoltaic operation monitoring and electric energy quality analysis of a platform area; the load forecasting analysis according to the level of days, hours and minutes is carried out by combining the influence factors such as weather and illumination intensity, the photovoltaic monitoring, analysis, alarm and forecasting sharing service capacity is constructed, the flexible consumption of distributed photovoltaic is promoted, and the safe and stable operation of a novel power system is guaranteed.

At present, the following defects exist in the aspect of distributed photovoltaic balance prediction:

1. massive historical data is not effectively gathered and reasonably analyzed. With the advance of work such as whole county roof photovoltaic scale examination, distributed photovoltaic power generation installed capacity increases constantly, has produced massive historical data, has caused the problem of photovoltaic measurement data decentralized acquisition, decentralized storage, and platform district's side photovoltaic measurement data distribution is in distribution automation, two sets of main website systems of power consumption information collection. At present, effective aggregation and reasonable analysis of the data are not realized, so that the photovoltaic measurement data integration mode is complex, a data application chain is too long, a platform needs to be further promoted to aggregate a large amount of historical data, and analysis, study and judgment of related data need to be realized through a reasonable method.

2. Real-time operation data is not accessed and analyzed accurately in time. The distributed photovoltaic of the platform district mainly relies on meter monitoring at present, lacks the monitoring of grid-connected switch, inverter, anti-islanding device, also does not realize temporarily and joins in from the terminal station, with daylighting voltmeter monitoring data minute level access, including photovoltaic electricity type collection item (telemetering measurement, telesignalling, state, incident) and non-electricity type collection item (meteorological, temperature, humidity, illumination intensity, wind-force etc.) data, need further to realize that photovoltaic collection item is accurate, high-efficient, comprehensive, stable access to develop photovoltaic balance prediction analysis and calculation on this basis.

3. The timeliness and the accuracy of the distributed photovoltaic balance prediction are achieved. The problem of power system energy management and scheduling becomes more and more complicated due to the volatility and the intermittency of photovoltaic power generation, the acquisition frequency of information acquisition is different, the acquisition frequency is required to be continuously improved at an hour level and a 15-minute level, and the requirement for scheduling minute level and even second level prediction is met. The existing distributed photovoltaic balance prediction is difficult to reach the ideal height, adverse effects are caused to the operation stability of a power grid, the timeliness and the accuracy of the distributed photovoltaic balance prediction need to be improved, and professional requirements of service lines such as scheduling, equipment, marketing and development are supported.

In view of the above, there is a need for solving the above technical problems in distributed photovoltaic balance prediction.

Disclosure of Invention

The purpose is as follows: in order to overcome the defects in the prior art, the invention provides a photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment.

The technical scheme is as follows: in order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment comprises the following steps:

step 1: and screening out input feature vectors with high contribution degrees according to meteorological elements in historical day data.

Step 2: and clustering the history day data with the screened input features to obtain clustered clusters.

And 3, selecting the cluster which is most similar to the prediction day from the K clusters of the historical day data as a historical similar day through a Pearson correlation coefficient method, and screening input characteristic values of the historical similar day.

And 4, step 4: and adjusting the mutation points of the input characteristic values of the historical similar days to obtain the adjusted input characteristic values of the historical similar days.

And 5: and calculating the adjusted input characteristic value of the historical similar day through a historical similar day weight fitting algorithm to obtain a historical similar day weight fitting result.

Step 6: and performing multivariate real-time trend prediction according to the adjusted input characteristic values of the historical similar days to obtain a real-time prediction result.

And 7: and fitting the historical similar daily weight fitting result and the real-time prediction result to obtain a final real-time prediction result.

Preferably, the step 1 includes the following steps:

step 1-1: obtaining historical daily data

D represents the set of all historical day data, D is the data of one day of D, n ₁ Total days of the historical day. d = { (y) _i ，X _i )，i＝1，2，...，m ₁ }，

The feature vector is input for the ith time instant,

for the j input characteristic value at the ith time _i And the output characteristic value of the actual photovoltaic output at the ith moment is obtained.

Step 1-2: and calculating the characteristic contribution degree of each input characteristic by adopting a Pearson correlation coefficient method according to the historical daily data D, and screening the input characteristic vectors of which the characteristic contribution degree is greater than or equal to a threshold value.

Preferably, the step 2 includes the following steps:

step 2-1: according to the historical day data of spring, autumn, summer, winter, sunny day and non-sunny day

Divided into six disjoint subsets D ₁ ，D ₂ ，...，D ₆ With v = v ₁ ，v ₂ ，...，v ₆ Representing the cluster centers corresponding to the historical daily data subsets, n ₁ Total days of historical days.

Step 2-2: computing subset D _i The average value of all the elements in the table is stored in v _i As the initial point of the cluster center, i takes 1,2,3, \8230;, 6.

Step 2-3: when K = v, v is set _i As initial point of the cluster center for k-means.

Step 2-4: when K is less than v, all v are calculated by Pearson correlation coefficient method _i The correlation between the two is stored in an upper triangular matrix R, and a correlation coefficient R is found _pq The largest two cluster centers.

Wherein r is _pq Is the center of the cluster v _p And v _q Pearson correlation coefficient therebetween.

Step 2-5: and merging the sets corresponding to the two clustering centers with the maximum correlation to obtain a fused clustering center v'.

Step 2-6: repeating the steps until the number of v is equal to K, and deriving a cluster center initial point v'.

Step 2-7: v '= v' ₁ ，v′ ₂ ，...，v′ _K As an initial point of the cluster center.

Step 2-8: for each sample d in the historical daily data _i And calculating Euclidean distances from the Euclidean distance to the K cluster centers and dividing the Euclidean distances into clusters corresponding to the cluster centers with the minimum distances.

Step 2-9: for each cluster D _i Its cluster center v' is recalculated.

Step 2-10: and (5) repeating the steps 2-8 and 2-9 until the clustering center is stable, and outputting the clustered clusters.

Step 2-11: and calculating the contour coefficient of the clustered clusters, and selecting K corresponding to the maximum value of the contour coefficient as the final cluster number.

Preferably, the calculation formula of the fused cluster center v' is as follows:

in the formula: d _i Representing the ith set to be fused, D _j Denotes the jth set to be fused, m ₂ Represents the number of sets that need to be fused simultaneously, | D _i I represents a cluster D _i Number of samples in, K is the number of clusters to be aggregated, v _i To need to fuse set D _i V' is m ₂ An albumAnd merging the fused clustering centers.

Preferably, the step 3 includes the following steps:

step 3-1: and selecting the cluster which is most similar to the prediction day from the K clusters as a history similar day by adopting a Pearson correlation coefficient method.

Step 3-2: and performing a historical similar day characteristic value screening algorithm on the data of the historical similar days to obtain input characteristic values of the historical similar days.

The historical similar day characteristic value screening algorithm has the following calculation formula:

wherein:

representative selection from n ₁ M with the smallest difference between the selected characteristic value and the predicted day in the historical days ₃ The date of the individual history of the day,

is the p-th input characteristic value at the ith moment of the history day, r _p Representing the pearson similarity coefficient corresponding to the p-th input characteristic value.

For predicting the input characteristic value of the p-th time of day, r _j For Pearson's similarity coefficient corresponding to jth input eigenvalue, a ₃ For the number of input eigenvalues, n ₃ The number of values for the current time of day is predicted.

Preferably, the step 4 includes the following steps:

step 4-1: and (3) executing a high-order polynomial fitting algorithm of a least square method on the characteristic values of the historical similar days at all times to obtain a regression fitting curve f (x).

Step 4-2: calculating all characteristic values X of historical similar days _i Median of absolute deviation of (2) MAD = mean (| X) _i -X′ _i ") wherein media (X) represents the median value of X, X' _i Fitting a regression to curve f (X) with X _i Corresponding value of (A), X _i And an input characteristic value representing the value of the ith time in the historical date data.

Step 4-3: obtaining the catastrophe point adjusting value X' of the input characteristic value of the historical similar day _i The calculation is as follows:

wherein, X ″) _i Is X _i Adjusted value, α _mad Are coefficients.

Preferably, the step 5 includes the following steps:

step 5-1: m to be adjusted by mutation point ₃ Calculating the Pearson correlation coefficient of the most similar historical day data and the predicted day data in sequence to obtain the Pearson similarity r between the most similar historical days _i ′。

Step 5-2: arranging the historical daily data according to time sequence, and dividing the historical daily data into t ₁ Within the year, t ₁ -t ₂ Between years, t ₂ Setting different weights beta = beta in three time periods above the year ₁ ，β ₂ ，β ₃ . I.e. beta ₁ Is 0-t ₁ Weight of events within the range, β ₂ Is t ₁ -t ₂ Weight of events within the range, β ₃ Is greater than t ₂ Weight of events of range, where 0 < beta ₁ ，β ₂ ，β ₃ < 1, and β ₃ ＝1-β ₁ -β ₂ 。

Step 5-3: obtaining a historical similar day weight fitting result d according to a historical similar day weight fitting algorithm _history . The historical similarity day weight fitting algorithm has the following calculation formula:

wherein r is _i ′、r _j 'Pearson's correlation coefficient, T, representing ith and jth most similar historical day data and predicted day _i 、T _j Representing that the ith and j historical day data belong to corresponding weights beta, (y ') in three time periods' _i ，X′ _i )＝d′ _i Represents the ith most similar historical day data, theta _i Is a parameter of the ith historical solar photovoltaic actual output power, m ₃ Indicating the number of historical similar days.

As a preferred embodiment, it is possible to,

wherein, V _i The total installed capacity, V, of the photovoltaic cells in the area at the ith historical time ₀ To predict the total installed capacity of the area at the time of day.

Preferably, the step 6 comprises the following specific steps:

step 6-1: inputting the adjusted input characteristic values of the historical similar days into each time series prediction algorithm to obtain a prediction result d _i-future 。

Step 6-2: calculating the accuracy of each time series prediction algorithm, and eliminating the time series prediction algorithms with the accuracy less than the accuracy threshold value and the residual m ₆ The time series prediction algorithm is sorted in descending order according to the accuracy rate, and the weight gamma corresponding to each algorithm is calculated _i 。

Wherein i is the remaining m ₆ The sequence number after the algorithm sorting, j is the rest m ₆ The jth algorithm of the algorithms.

And 6-3: according to the predicted result d _i-future Weights γ corresponding to the respective algorithms _i Obtaining a real-time prediction result d _future 。

Wherein, γ _i Weight of the ith algorithm, d _i-future Is the predicted result of the ith algorithm.

Preferably, the accuracy calculation formula is as follows

C＝1-E _rmse

Wherein:

in the formula: n is the number of all the samples,

is the actual power at time i,

is the predicted power at time i, V _i Is the boot capacity at time i.

The accuracy threshold C ₀ The calculation formula is as follows:

wherein, C _i Is the accuracy of the ith algorithm, n ₆ Represents a total of n ₆ The algorithm performs real-time trend prediction.

Preferably, the step 7 includes the following steps:

step 7-1: fitting result d according to historical similar daily weight _history Calculating an adjustment history value d' _history 。

Wherein the content of the first and second substances,

is the actual power at time i and,

is the historical daily power at time i, n ₇ Is the total time of the similar day.

Step 7-2: respectively calculating historical similar daily weight fitting results d _history And d 'adjustment history value' _history Real-time prediction result d _future Euclidean distances to three time points before the current time of the prediction day.

And 7-3: sorting three Euclidean distances of the ith time point in an ascending order, wherein the Euclidean distance of the jth time point is s _ij The prediction result corresponding to the Euclidean distance is recorded as

Calculating a first fitting result d _fitting-1 Second fitting result d _fitting-2 。

In the formula, σ _i Represents the weight of the ith time point before the current time of the prediction day in the fitting, and K is the number of clusters divided according to the cluster.

And 7-4: respectively calculating historical similar daily weight fitting results d _history And d 'adjustment history value' _history Real-time prediction result d _future First fitting result d _fitting-1 Second fitting result d _fitting-2 And selecting the result with the highest accuracy as the final real-time prediction result.

As a preferred embodiment, wherein

Has the advantages that: the invention provides a photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional research and judgment, which establishes a historical similar day cluster screening mechanism, provides a historical similar day fitting strategy based on mutation adjustment and a real-time trend prediction strategy based on multivariate models such as ARIMA, LSTM and LightGBM, and performs dynamic weight fitting on the two strategies. Therefore, the accuracy of distributed photovoltaic balance prediction is improved, and the safe and stable operation of a novel power system is supported. Compared with the prior art, the method has the following advantages:

1. and establishing a historical similar day clustering screening strategy. A cluster center initial point selection strategy and a multivariate initial cluster fusion algorithm are provided, an improved historical similarity day cluster analysis algorithm is designed, and the most reasonable clustering can be carried out on the historical days. Based on the multi-element characteristic values of photovoltaic electric collection items (telemetering, remote signaling, states and events) and non-electric collection items (weather, temperature, humidity, illumination intensity, wind power and the like), a historical similar day characteristic value screening algorithm is executed, and a plurality of historical days most similar to the predicted days can be selected in the target clustering.

2. And a distributed photovoltaic real-time trend prediction mechanism is optimized. Algorithm models such as a time sequence model ARIMA, an LSTM and a LightGBM are fused, and a multivariate prediction result fusion strategy is provided based on the actual operation accuracy of the multivariate model, so that the generalization capability of the fusion model is enhanced, and the accuracy of the real-time trend prediction of the distributed photovoltaic is improved.

3. A multi-element fitting and dynamic adjusting method for historical and real-time prediction results is provided. And a historical similar daily weight fitting algorithm is provided, a prediction result based on historical data is formed, and the historical prediction result is adjusted according to the actual operation value. The accuracy of the prediction result is improved by using a history and real-time prediction fitting algorithm, and the overall distributed photovoltaic balance prediction method is adjusted in real time according to the matching degree of the multivariate prediction result and the actual value.

Drawings

FIG. 1 is an overall flow chart of the method of the present invention.

Fig. 2 is a flow chart of a cluster center initial point selection strategy.

Fig. 3 is a graph showing a fusion strategy of the multivariate prediction results.

Detailed Description

The present invention will be further described with reference to the following examples.

As shown in fig. 1, a photovoltaic balance prediction method based on historical similar days and real-time multidimensional study and judgment includes the following steps:

s01: and screening out input feature vectors with high contribution degrees according to meteorological elements in historical day data.

Historical daily data comprises meteorological elements of temperature, humidity, irradiance, wind speed, wind direction and pressure, the influence degrees of the meteorological elements on photovoltaic output are different, and too many irrelevant meteorological elements are used in photovoltaic output prediction, so that redundancy and accuracy of calculation are reduced. And carrying out quantitative analysis on the feature contribution degree by a Pearson correlation coefficient method, preliminarily simplifying the input feature vector, and eliminating meteorological elements which are irrelevant to photovoltaic output and have little influence.

Obtaining historical daily data

D represents the set of all historical day data, D is the data of one day of D, n ₁ Total days of historical days. d = { (y) _i ，X _i )，i＝1，2，...，m ₁ }，

The feature vector is input for the ith time instant,

inputting characteristic values including temperature, humidity, irradiance, wind speed, wind direction, pressure intensity and y for the jth input characteristic value at the ith moment _i And the output characteristic value of the actual photovoltaic output at the ith moment is obtained. i corresponds to a time value of one time in each period of the day, for example, one point in 15 minutes, the total time value of 96 time values is total in one day, namely, each historical day data comprises input characteristic vectors corresponding to the 96 time values and output characteristic values of actual photovoltaic output, and j is one of the input characteristic valuesAnd (4) counting.

And calculating the characteristic contribution degree of each input characteristic by adopting a Pearson correlation coefficient method according to the historical daily data D, and screening the input characteristic vectors of which the characteristic contribution degree is greater than or equal to a threshold value.

And carrying out quantitative analysis on the characteristic contribution degree aiming at all factors influencing the photovoltaic power generation, including temperature, humidity, irradiance, wind speed, wind direction and pressure. The pearson correlation coefficient method is as follows:

wherein x is _i Inputting a characteristic value;

is the input feature average; y is _i Outputting power for actual photovoltaic;

is the average of the actual photovoltaic contribution; m is a unit of ₁ The number of the input time values; r is the pearson correlation coefficient, here taken as the degree of contribution. Record the contribution as r _w If | r _w |＜r ₀ Then it indicates that the input feature has little influence on the actual output of the photovoltaic, and the input feature is deleted. If r _w |≥r ₀ Then the input features with high contribution degree are reserved, and the contribution degree of all the input features is calculated so as to preliminarily simplify the input feature vector. r is ₀ Can be configured by the user according to the local actual situation.

S02: and clustering the historical day data with the screened input features to obtain clustered clusters.

The meteorological elements (including temperature, humidity, irradiance, wind speed, wind direction and pressure) of historical days of different seasons and different weather are greatly different, and typical clustering characteristics are presented. Therefore, all historical days are subjected to clustering analysis, and the historical similar days with the maximum similarity can be found in a shorter time.

Clustering is a process of categorically organizing data sets into members of data that are similar in some way, and is a technique for finding such internal structures. The K-means algorithm is the most well-known partitional clustering algorithm, and is the most widely used of all clustering algorithms due to simplicity and efficiency. The algorithm adopts an iterative updating method based on a given clustering target function, each iteration process is carried out in the direction of reducing the target function, and the final clustering result enables the target function to obtain a minimum value, so that a better classification effect is achieved.

But the original k-means algorithm also has some drawbacks, such as sensitivity to initial cluster centers. Different clustering results may result for different initial centers. Aiming at the defect of the original k-means clustering algorithm, the invention uses an improved algorithm to realize the selection of the initial point of the clustering center, so that the k-means clustering effect is better than the effect when the initial point is randomly selected.

In the present invention, preprocessing the data using K-means requires that the number of clusters K to be aggregated be specified in advance. In the absence of a priori knowledge, it is difficult to determine K. And respectively calculating the clustering effect when K =3,4,5 and 6, evaluating the clustering effect when K has different values by using the contour coefficient, and finally selecting K corresponding to the value with the maximum contour coefficient as the final cluster number.

The initial point selection strategy of the cluster center is shown in fig. 2, and comprises the following steps:

(1) According to the seasons (spring, autumn, summer and winter) and (sunny days and non-sunny days), historical day data are obtained

Divided into six disjoint subsets D ₁ ，D ₂ ，...，D ₆ With v = v ₁ ，v ₂ ，...，v ₆ Representing the cluster centers corresponding to the historical daily data subsets, n ₁ Total days of the historical day.

(2) Computing subset D _i The average value of all the elements in the table is stored in v _i As an initial point of the cluster center, i is 1,2,3, \ 8230;, 6.

(3) When K = the number of v,v is to be _i As the initial point of the cluster center of k-means.

(4) When K is less than v, all v are calculated by Pearson correlation coefficient method _i The correlation between the two is stored in an upper triangular matrix R, and a correlation coefficient R is found _pq The largest two cluster centers.

Wherein r is _pq Is the cluster center v _p And v _q Pearson correlation coefficient therebetween.

(5) And combining the sets corresponding to the two clustering centers with the maximum correlation to obtain a fused clustering center v', and particularly, combining by using a multivariate initial clustering fusion algorithm.

The invention provides a multi-element initial clustering fusion algorithm for merging a plurality of sets, which comprises the following specific steps:

in the formula: d _i Representing the ith set to be fused, D _j Denotes the jth set to be fused, m ₂ Represents the number of sets that need to be fused together, | D _i I represents a cluster D _i Number of samples in, K is the number of clusters to be aggregated, v _i To need to fuse set D _i V' is m ₂ And (4) clustering centers after the fusion of the sets.

(6) Repeating the steps until the number of v is equal to K, and deriving a cluster center initial point v'.

The initial point of the cluster center calculated by the steps is used for the cluster calculation of K-means, and the K-means can aggregate historical day data into a K' cluster. The method comprises the following specific steps:

(1) Use of already determined v '= v' ₁ ，v′ ₂ ，...，v′ _K As an initial point of the cluster center.

(2) For each sample d in the dataset _i And calculating Euclidean distances from the Euclidean distance to the K cluster centers and dividing the Euclidean distances into clusters corresponding to the cluster centers with the minimum distances. Euclidean distance ρ is performed according to the following algorithm _ij The calculation of (2):

wherein d is _i Is the ith sample, v 'in the data set' _j Is the fused jth cluster center.

(3) For each cluster D _i Its cluster center v' is recalculated, and the centroids of all samples belonging to the cluster are calculated as a new cluster center by the following algorithm.

Wherein D is _i Is a cluster formed by clustering, | D _i I represents a cluster D _i Number of middle samples, D represents cluster D _i Of (2).

(4) And (4) repeating the steps (2) and (3) until the clustering center is stable, and outputting the clustered cluster.

(5) And calculating the contour coefficient of the clustered clusters, and selecting K corresponding to the maximum value of the contour coefficient as the final cluster number.

The aggregation cluster at the current K value can be calculated through the steps. The K-means algorithm is sensitive to the K value, and different K values will result in different aggregation effects. At this time, the contour coefficient is used to evaluate the clustering effect when the K has different values, and finally the K corresponding to the value with the maximum contour coefficient is selected as the final cluster number.

The contour coefficients combine the degree of agglomeration and the degree of separation of the clusters for evaluating the effect of the clustering. For each sample point d in each cluster _i The contour coefficients are calculated separately. In particular, it is necessary for each sample point d _i The following two indices are calculated:

a(i)：sample point d _i Average of the distances to other sample points that belong to the same cluster as it. The smaller a (i) the more likely it is that the sample belongs to that class, which is used to quantify the degree of aggregation within the cluster.

b (i): selection of d _i Other than clusters D _j Calculating d _i And D _j Average value b of distances of all samples in _ij Traversing all other clusters, finding the minimum of the average distance, denoted b (i), b (i) = min (b) _i1 ，b _i1 ，...，b _ik ) For quantifying the degree of separation between clusters.

Sample point d _i The profile coefficients of (a) are:

calculate all sample points d _i The obtained average value is the overall contour coefficient S of the cluster under the current K value, and the closeness degree of the data cluster is measured. S E [ -1,1]The closer to 1S, the better the clustering effect.

And finally, selecting K corresponding to the maximum value of the profile coefficient as the number of the final clusters, and curing the K clusters. And (3) executing a historical similar day clustering analysis process once every 1 day 0.

S03: and selecting the cluster which is most similar to the prediction day from the K clusters of the historical day data as a historical similar day through a Pearson correlation coefficient method, and screening input characteristic values of the historical similar day.

The electrical quantity (photovoltaic power generation power) and meteorological elements (temperature, humidity, irradiance, wind speed, wind direction and pressure) are recorded on historical days. Selecting the historical days most similar to the weather elements of the predicted days, firstly selecting the clusters most similar to the predicted days in the K clusters by adopting a Pearson correlation coefficient method, and then executing the following historical similar day characteristic value screening algorithm on all the historical days in the clusters:

wherein

Representative selection from n ₁ M with minimum difference between selected characteristic value and predicted day in historical days ₃ The history day.

Is the p-th input characteristic value of the ith moment of the historical day, r _p Representing the pearson similarity coefficient corresponding to the pth input eigenvalue.

For predicting the input characteristic value of the p-th time of day _j Is the Pearson similarity coefficient, a, corresponding to the jth input feature value calculated in the step S01 ₃ For inputting the number of characteristic values, n ₃ The number of values for the current time of day is predicted.

S04: and adjusting the mutation points of the input characteristic values of the historical similar days to obtain the adjusted input characteristic values of the historical similar days.

And judging whether the characteristic values of the historical days selected in the step S03 at all times have mutation points, and if so, adjusting the mutation points by using a regression fitting-based median absolute deviation method. The method comprises the following specific steps:

(1) Executing a high-order polynomial fitting algorithm of a least square method on the characteristic values of the historical similar days at each moment, measuring the deviation between the data points and the fitting curve by using the least square method, and when the square sum of the difference value of the ordinate of the fitting curve and the ordinate of the data points is minimum, determining that the fitting degree is best, and obtaining a regression fitting curve f (x) at the moment

(2) Calculating all elements X _i Median of absolute deviation of (2) MAD = mean (| X) _i -X′ _i ") wherein media (X) represents the median value of X, X' _i Fitting regression to curve f (X) with X _i Corresponding value of (A), X _i And an input characteristic value representing the ith time value in the historical date data.

(3) Then all data can be adjusted with the following algorithm to achieve the adjustment of the mutation point:

wherein, X ″) _i For adjusted X _i Value of alpha _mad As a function of the number of the coefficients,

k is the number of clusters.

S05: and calculating the adjusted input characteristic value of the historical similar day through a historical similar day weight fitting algorithm to obtain a historical similar day weight fitting result.

M to be adjusted by mutation point ₃ Calculating the Pearson correlation coefficient of the most similar historical day data and the predicted day data in sequence to obtain the Pearson similarity r between the most similar historical days _i ', the higher the similarity is, the higher the weight is given. For similar days of different time periods, it is believed that the closer to the predicted day the value of the data reference is, the higher the weight will be given. Arranging the historical daily data according to time sequence, and dividing the historical daily data into t ₁ Within the year, t ₁ -t ₂ Between years, t ₂ Setting different weights beta = beta in three time periods above the year ₁ ，β ₂ ，β ₃ . I.e. beta ₁ Is 0-t ₁ Weight of events within the range, β ₂ Is t ₁ -t ₂ Weight of events within the range, β ₃ Is greater than t ₂ Weight of events of range, where 0 < beta ₁ ，β ₂ ，β ₃ < 1, and β ₃ ＝1-β ₁ -β ₂ And the system can be configured by a user according to local actual conditions.

The historical similarity day weight fitting algorithm is as follows:

wherein r is _i ′、r _j ' means ith, jth, and maximumsPearson's correlation coefficient, T, of similar historical day data to predicted day _i 、T _j Indicates that the ith and j historical day data belong to corresponding weights beta, (y ') in three time periods' _i ，X′ _i )＝d′ _i Indicating the ith most similar historical day data,

is a parameter of the ith historical sunlight-volt actual output power, V _i The total installed capacity, V, of the photovoltaic cells in the area at the ith historical time ₀ To predict the total installed capacity of the area at the time of day, m ₃ Number of similar days in history, d _history Represents m ₃ And (4) the most similar historical days are obtained according to the results after weight fitting.

In order to solve the problems that photovoltaic measurement data has massive historical data, the calculation cost is high, the screening speed is low, the selection of similar days is not accurate enough, and the mutation point interferes with the prediction result in the screening of historical similar days, the analysis and judgment of related data are needed to be realized through a reasonable method.

The original K-means algorithm has some drawbacks, such as sensitivity to initial cluster centers. Different initial centers may result in different clustering results. Aiming at the defect of the original K-means clustering algorithm, the invention uses an improved algorithm, namely a clustering center initial point selection strategy, to realize the selection of the initial point of the clustering center, so that the K-means clustering effect is better than the effect when the initial point is randomly selected.

The historical similar day characteristic value screening algorithm screens the characteristic values of the historical day data, so that the historical day data most similar to the predicted day can be found out from the cluster more quickly, and the calculation speed and the searching accuracy are improved.

The mutation points in the historical date similarity data are adjusted through the historical date similarity adjustment algorithm, and errors caused by the mutation points in the subsequent fitting and prediction are reduced.

The historical similar day weight fitting algorithm processes and fits the similar day data of a plurality of historical days, so that the historical similar days have higher reference and the prediction accuracy is improved.

S06: and performing multivariate real-time trend prediction according to the adjusted input characteristic values of the historical similar days to obtain a real-time prediction result.

In the prior art, time series prediction algorithms have different performances under different conditions, one prediction algorithm cannot show good prediction capability under all conditions, and when the outside world changes greatly, the prediction algorithm has large deviation, so that the robustness and the generalization capability of the prediction algorithm need to be improved.

According to the method, a plurality of algorithms with excellent prediction capability are selected, the accuracy of each prediction algorithm is calculated according to historical data, a result with low accuracy is eliminated by using a multivariate prediction result fusion strategy, a higher weight is given to the result with high accuracy, then the multivariate prediction result is fitted, and the generalization capability and the anti-interference capability of a fusion model are improved.

The method comprises the steps of fusing time sequence models ARIMA, LSTM, lightGBM and other multi-element time sequence analysis models, adjusting the weight of each model, and using a multi-element prediction result fusion strategy to improve the generalization capability of the fusion models so as to form a final integrated algorithm for real-time trend prediction. The time series analysis mainly aims at two fields of time series problems, one is the analysis of historical interval data, and anomaly detection and classification are carried out according to the extraction summary of the historical data characteristics; the other is analysis of future data, i.e. predicting the state or actual value at a certain time point or points in the future based on data at past time points.

ARIMA (p, d, q) -differential autoregressive moving average model

ARIMA (Autoregressive Integrated Moving Average model), a differential Autoregressive Moving Average model, which is written as ARIMA (p, d, q), and is composed of three parts of AR (Autoregressive), I (representing differential), and MA (Moving Average), and is a model for predicting the current based on a time series historical value and a prediction error on the historical value. AR is the autoregressive term and p is the number of autoregressive terms, and the time series data up to how many days needs to be weighted. MA is a moving average term, q is the number of moving average terms, and error data up to how many days needs to be weighted. I represents the difference, and d is the number of differences that need to be made when the time series becomes stationary. The ARIMA model, which may be viewed as a "filter" that attempts to separate the signal from the noise and then extrapolate the signal into the future to obtain a prediction, is particularly suited to fitting data that exhibits non-stationarity.

ARIMA modeling basic steps:

(1) Firstly, stability detection is required to be carried out on an observed value sequence, and if the observed value sequence is not stable, differential operation is carried out on the observed value sequence until the data after differential operation is stable;

(2) After the data are stable, carrying out white noise detection on the data, wherein the white noise is a random stable sequence of zero mean constant variance;

(3) If the sequence is a stable non-white noise sequence, calculating ACF (autocorrelation coefficient) and PACF (partial autocorrelation coefficient), and performing model identification such as ARMA (auto correlation matrix);

(4) And determining model parameters of the identified model, and finally applying prediction and carrying out error analysis.

LSTM-long and short term memory model

Long short-term memory (LSTM) is a special RNN, can learn Long-distance dependence information, and mainly aims to solve the problems of gradient loss and gradient explosion in the Long sequence training process. The LSTM model solves the problem of RNN short-term memory by adding thresholds (Gates) on the basis of the RNN model, so that the recurrent neural network can really and effectively utilize long-distance time sequence information. All RNNs have a chain form of repeating neural network elements. In standard RNNs, this repeating unit has only a very simple structure, e.g. a tanh layer. The LSTM adds 3 logic control units of a forgetting Gate (Forget Gate), an Input Gate (Input Gate) and an Output Gate (Output Gate) on the basis of an RNN basic structure, the logic control units are respectively connected to a multiplication element, partial historical information is selectively forgotten by setting a weight value at the edge where a memory unit of a neural network is connected with other parts, partial current Input information is added, and finally the historical information is integrated to the current state and an Output state is generated.

Forgetting the door: this stage is mainly to selectively forget the input from the previous node, i.e. "forget unimportant and remember important", to control whether the information in the cell unit at the previous time is accumulated in the cell unit at the current time.

An input gate: this stage selectively "remembers" the inputs of this stage. The input information is selected and memorized. Important information is recorded in an emphasized manner, less important information is recorded, and whether input information flows into the cell unit or not is controlled.

Forgetting the door: this phase will determine which will be the output of the current state.

3.LightGBM

The GBDT has the main idea that the weak classifier (decision tree) is used for iterative training to obtain an optimal model, and the model has the advantages of good training effect, difficulty in overfitting and the like. The Light Gradient Boosting Machine (GBDT) is a framework for implementing the GBDT algorithm, supports high-efficiency parallel training, and has the advantages of higher training speed, lower memory consumption, higher accuracy, supporting distributed processing of mass data, and the like. To achieve these advantages, lightGBM is optimized on the traditional GBDT algorithm as follows:

(1) Histogram optimization

When an original GBDT algorithm is used for splitting nodes of a decision tree, traversing each characteristic component in a global data set to obtain an optimal splitting characteristic value of a current splitting node; the algorithm needs to traverse the global samples when building the decision tree, which is very time consuming. Based on this, lightGBM adopts a histogram optimization strategy, whose main principle is: before training, each dimension feature in the sample is sequenced, after the feature is sequenced, histogram division is carried out on the feature (256 histograms are divided by the algorithm in a default mode), in the subsequent training, the algorithm only needs to use the histograms as the features to construct a decision tree, and therefore the traversal times of the sample set are greatly reduced.

(2) Depth-first splitting strategy (leaf-wise)

Before the LightGBM algorithm, most tree models adopt level-width-first splitting (level-wise strategy) when decision trees are constructed, namely, nodes on the same layer can be simultaneously split when the nodes are split, so that multithreading parallelism can be realized to a certain extent, the speed of constructing the decision trees is accelerated, but from another point of view, only samples in a current node set are considered for optimal splitting when the level-wise strategy is constructed, and therefore the possibility of a local optimal solution exists. In addition, parallel generation may exist with partial nodes at the same level without additional splitting. Based on this, the LightGBM algorithm adopts a depth-first splitting strategy, that is, global samples are considered each time a leaf node is split, so that the problem of a local optimal solution is not caused, and the possibility of the number of post-pruning operations is reduced. For the depth-first splitting strategy, the model parameters increase the limit on the maximum depth to reduce the risk of over-fitting, since the depth of the tree may be deeper, causing over-fitting.

(3) Gradient-based One-Side Sampling strategy (GOSS)

The original GBDT algorithm is implemented using the idea that the negative gradient of the loss function is approximately equal to the residual. Compared with the LightGBM algorithm, other tree model algorithms based on the Boosting framework use a random sampling strategy to extract a certain number of samples to perform gradient updating and participate in the construction of the decision tree each time the decision tree is constructed, the LightGBM algorithm uses a single-side sampling strategy to pointedly participate in the construction of the decision tree for all samples with larger gradients, and experiments prove that the single-side sampling strategy at the LightGBM algorithm side is better than the random sampling strategy in order to ensure that the data distribution of the samples is not damaged and simultaneously samples with smaller gradients are randomly sampled to participate in the construction of the decision tree.

(4) Mutual exclusion Feature binding policy (EFB)

Mutually exclusive feature binding is to combine sparse features of different dimensions in a sample, and enter a model as a feature to participate in the construction of a decision tree. For the feature with high dimension, the data with high dimension is usually sparse, and whether a lossless method can be designed to reduce the dimension of the feature. In particular, in sparse feature space, many features are mutually exclusive, e.g., they are never non-zero at the same time. Therefore, mutually exclusive features can be bound into a single feature to participate in the construction of the final feature histogram.

In addition, algorithms such as Holt-Winters, facebook Prophet, waveNet and the like can also be used for real-time trend prediction and are fused with prediction results of all other models to form a final prediction result, so that the generalization capability of the fusion model is improved.

The prediction effect of each time series prediction algorithm is good or bad, and the prediction result of each algorithm needs to be evaluated. And (4) introducing a root mean square error and accuracy rate according to technical requirements of a wind power or photovoltaic power prediction system at a dispatching side to evaluate the quality of the algorithm prediction result. The accuracy calculation formula is as follows:

C＝1-E _rmse

the root mean square error is calculated as follows:

in the formula: n is the number of all samples,

is the actual power at time i,

is the predicted power at time i, V _i Is the boot capacity at time i.

And calculating the accuracy of each time sequence prediction algorithm, eliminating the result with low accuracy according to the multivariate prediction result fusion strategy, and fitting the multivariate prediction result after giving higher weight to the result with high accuracy. As shown in fig. 3, the multivariate prediction result fusion strategy includes the following steps:

(1) Determining a rejection threshold of the accuracy of the real-time prediction algorithm according to an accuracy threshold calculation algorithm, wherein the algorithm is as follows:

wherein, C _i Is the accuracy of the ith algorithm, n ₆ Represents n in total ₆ The algorithm performs real-time trend prediction.

(2) Accuracy C _i ＜C ₀ The prediction result of the algorithm is eliminated, and the rest m is eliminated ₆ The algorithms are sorted in descending order according to the accuracy, and the weight corresponding to each algorithm is as follows:

wherein, γ _i The weight of the ith algorithm after being eliminated and sorted in a descending order.

(3) And the higher the accuracy rate is, the higher the weight obtained by the algorithm is, multiplying the trend predicted value calculated by the time series prediction algorithm by the weight and then adding to obtain the final fitted real-time prediction result. The multivariate prediction result fitting algorithm is as follows:

wherein, gamma is _i Weight of the ith algorithm, d _i - _future For the real-time prediction of the ith algorithm, d _future And (4) representing a real-time prediction result after the fitting of the multivariate trend prediction algorithm.

S07: and fitting the historical similar daily weight fitting result and the real-time prediction result to obtain a final real-time prediction result.

In the prior art, historical similarity daily prediction and real-time trend prediction are obtained by two different ways to obtain prediction results of photovoltaic output, the results of the two methods are directly submitted to a user, the user is troubled about the results, and the user does not know the accuracy of the two schemes.

The invention provides a multivariate fitting and dynamic adjusting method for historical and real-time prediction results, so that two independent prediction results are better fused, the accuracy of the prediction results is improved by mutually contrasting the results of the two methods, the design error possibly existing when one method is used is reduced, and the overall distributed photovoltaic balance prediction method is adjusted in real time according to the matching degree of the multivariate prediction results and actual values.

In order to improve the accuracy of the prediction result, the original historical daily data, the historical value and the weight of the real-time prediction value are dynamically adjusted to obtain the final fitting result.

Using an adjusted history value algorithm based on the original history value d _history Calculating an adjustment history value d' _histor y. The history value adjusting algorithm comprises the following steps:

wherein the content of the first and second substances,

is the actual power at time i,

is the historical daily power at time i. d _history Representing original historical day data, d' _history Is d _history Adjusted historical day data.

Respectively calculating the most similar historical daily fitting value d _history And d 'adjustment history value' _history Fitting value d of multivariate trend prediction algorithm _future Euclidean distances to three time points before the current time of the predicted day. Sorting the three Euclidean distances of the ith time point in an ascending order, wherein the Euclidean distance of the jth row is s _ij The prediction result corresponding to the Euclidean distance is recorded as

Using historical and real-time predictive fitting algorithms:

in the formula, σ _i Representing the weight in the fit of the ith time point before the current time of the forecast day, wherein

d _fitting-1 、d _fitting-2 And K is the historical and real-time prediction fitting result for adjusting the weight according to the number of clusters divided by the clusters.

Finally, respectively calculating the most similar historical daily fitting value d _history And adjusting history value d' _history And fitting value d of multivariate trend prediction algorithm _future Historical and real-time predictive fitting results d _fitting-1 、d _fitting-2 Selecting the result with the highest accuracy, namely the final real-time prediction result:

the invention establishes a historical similar day clustering screening strategy, realizes reasonable clustering of historical days, and can efficiently select a plurality of historical days most similar to predicted days; a distributed photovoltaic real-time trend prediction mechanism is optimized, and the high-efficiency fusion of the prediction results of the multivariate model is realized; a history and real-time prediction result multi-element fitting and dynamic adjusting method is provided, and accuracy of distributed photovoltaic balance prediction is improved. The prediction result of the method can meet the professional requirements of service lines such as scheduling, equipment, marketing and development, and has important significance on the safe and stable operation of a novel power system.

The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims

1.A photovoltaic balance prediction method based on historical similar days and real-time multidimensional research and judgment is characterized by comprising the following steps: the method comprises the following steps:

step 1: screening out input feature vectors with high contribution degrees according to meteorological elements in historical day data;

step 2: clustering the historical day data with the screened input features to obtain clustered clusters;

and step 3: selecting a cluster which is most similar to the prediction day from K clusters of historical day data as a historical similar day through a Pearson correlation coefficient method, and screening input characteristic values of the historical similar day;

and 4, step 4: adjusting the mutation points of the input characteristic values of the historical similar days to obtain the adjusted input characteristic values of the historical similar days;

and 5: calculating the adjusted input characteristic value of the historical similar day through a historical similar day weight fitting algorithm to obtain a historical similar day weight fitting result;

step 6: performing multivariate real-time trend prediction according to the adjusted input characteristic values of the historical similar days to obtain a real-time prediction result;

2. The photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment as claimed in claim 1, wherein: the step 1 comprises the following steps:

step 1-1: obtaining historical daily data

D represents the set of all historical day data, D is the data of one day of D, n ₁ Total days of historical days; d = { (y) _i ，X _i )，i＝1，2，...，m ₁ }，

The feature vector is input for the ith time instant,

for the jth input characteristic value at the ith time instant, y _i The output characteristic value of the actual photovoltaic output at the ith moment is obtained;

step 1-2: and calculating the characteristic contribution degree of each input characteristic by adopting a Pearson correlation coefficient method according to the historical daily data D, and screening out the input characteristic vectors of which the characteristic contribution degree is greater than or equal to a threshold value.

3. The photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment as claimed in claim 1, wherein: the step 2 comprises the following steps:

Divided into six disjoint subsets D ₁ ，D ₂ ，...，D ₆ With v = v ₁ ，v ₂ ，...，v ₆ Representing the cluster centers corresponding to the historical daily data subsets, n ₁ Total days of historical days;

step 2-2: computing subset D _i The average value of all the elements in the table is stored in v _i As the initial point of the cluster center, i takes 1,2,3, \8230, 6;

step 2-3: when K = v, v is set _i As the initial point of the clustering center of k-means;

step 2-4: when K is less than v, all v are calculated by Pearson correlation coefficient method _i The correlation between the two is stored in an upper triangular matrix R, and a correlation coefficient R is found out _pq The largest two cluster centers;

wherein r is _pq Is the center of the cluster v _p And v _q Pearson correlation coefficient therebetween;

step 2-5: merging the sets corresponding to the two clustering centers with the maximum correlation to obtain a fused clustering center v';

step 2-6: repeating the steps until the number of v is equal to K, and deriving a cluster center initial point v';

step 2-7: v '= v' ₁ ，v′ ₂ ，...，v′ _K As an initial point of a cluster center;

step 2-8: for each sample d in the historical daily data _i Calculating Euclidean distances from the Euclidean distance to K cluster centers and dividing the Euclidean distances into clusters corresponding to the cluster centers with the minimum distances;

step 2-9: for each cluster D _i Recalculating its clustering center v';

step 2-10: repeating the steps 2-8 and 2-9 until the clustering center is stable, and outputting clustered clusters;

4. The photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment is characterized by comprising the following steps of: the calculation formula of the fused clustering center v' is as follows:

in the formula: d _i Represents the ith set to be fused, D _j Represents the jth set requiring fusion, m ₂ Represents the number of sets that need to be fused together, | D _i I represents a cluster D _i Number of middle samples, K is the number of clusters to be aggregated, v _i To need to fuse set D _i V' is m ₂ And (4) clustering centers after the fusion of the sets.

5. The photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment according to claim 1, characterized in that: the step 3 comprises the following steps:

step 3-1: selecting a cluster which is most similar to the prediction day from the K clusters as a historical similar day by adopting a Pearson correlation coefficient method;

step 3-2: performing a historical similar day characteristic value screening algorithm on the data of the historical similar days to obtain input characteristic values of the historical similar days;

wherein:

representative selection from n ₁ Selecting the example with the least difference between the characteristic value and the predicted day in the historical days ₃ The date of the individual history of the day,

is the p-th input characteristic value at the ith moment of the history day, r _p Representing the Pearson similarity coefficient corresponding to the p-th input characteristic value;

for predicting the input characteristic value of the p-th time of day, r _j For the Pearson similarity coefficient corresponding to the jth input eigenvalue, a ₃ For inputting the number of characteristic values, n ₃ The number of values for the current time of day is predicted.

6. The photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment according to claim 1, characterized in that: the step 4 comprises the following steps:

step 4-1: executing a high-order polynomial fitting algorithm of a least square method on the characteristic values of the historical similar days at all times to obtain a regression fitting curve f (x);

step 4-2: calculating all characteristic values X of historical similar days _i Median of absolute deviation of (2) MAD = mean (| X) _i -X′ _i "), wherein mean (X) represents the median value of X, X' _i Fitting a regression to curve f (X) with X _i Corresponding value of (A), X _i An input characteristic value representing an ith time value in the historical date data;

wherein, X ″) _i Is X _i Adjusted value, α _mad Are coefficients.

7. The photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment according to claim 1, characterized in that: the step 5 comprises the following steps:

step 5-1: m to be adjusted by mutation point ₃ Calculating the Pearson correlation coefficient of the most similar historical day data and the predicted day data in sequence to obtain the Pearson similarity r between the most similar historical days _i ′；

Step 5-2: arranging the historical daily data according to time sequence, and dividing the historical daily data into t ₁ Within the year, t ₁ -t ₂ Between years, t ₂ Setting different weights beta = beta in three time periods above the year ₁ ，β ₂ ，β ₃ (ii) a I.e. beta ₁ Is 0-t ₁ Weight of events within the range, β ₂ Is t ₁ -t ₂ Weight of events within the range, β ₃ Is greater than t ₂ A weight of the event of the range;

step 5-3: obtaining a historical similar day weight fitting result d according to a historical similar day weight fitting algorithm _history (ii) a The historical similarity day weight fitting algorithm has the following calculation formula:

wherein r is _i ′、r _j ' Pearson correlation coefficient, T, representing the ith, j most similar historical day data and predicted day _i 、T _j Indicates that the ith and j historical day data belong to corresponding weights beta, (y ') in three time periods' _i ，X′ _i )＝d′ _i Represents the ith most similar historical day data, theta _i Is a parameter of the ith historical solar photovoltaic actual output power, m ₃ Indicating the number of historical similar days.

8. The photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment according to claim 7, characterized in that: the step 6 comprises the following specific steps:

step 6-1: inputting the adjusted input characteristic values of the historical similar days into each time series prediction algorithm to obtain a prediction result d _i-future ；

Step 6-2: calculating the accuracy of each time series prediction algorithm, and eliminating the time series prediction algorithms with the accuracy less than the accuracy threshold value and the residual m ₆ The time series prediction algorithm is sorted in descending order according to the accuracy rate, and the weight gamma corresponding to each algorithm is calculated _i ；

Wherein i is the remaining m ₆ The sequence number after the algorithm sorting, j is the rest m ₆ A jth one of the algorithms;

step 6-3: according to the prediction result d _i-future And each calculationWeight gamma of Farad _i Obtaining a real-time prediction result d _future ；

Wherein, gamma is _i Weight of the ith algorithm, d _i-future Is the predicted result of the ith algorithm.

9. The photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment according to claim 8, wherein: the accuracy calculation formula is as follows

C＝1-E _rmse

Wherein:

in the formula: n is the number of all samples,

is the actual power at time i,

is the predicted power at time i, V _i Is the boot capacity at time i;

the accuracy threshold C ₀ The calculation formula is as follows:

10. The photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment according to claim 9, wherein: the step 7 comprises the following steps:

step 7-1: fitting result d according to historical similar daily weight _history Calculating an adjustment history value d' _history ；

Wherein, the first and the second end of the pipe are connected with each other,

is the actual power at time i and,

is the historical daily power at time i, n ₇ The total time of the similar days;

step 7-2: respectively calculating historical similar daily weight fitting results d _history And d 'adjustment history value' _history Real-time prediction result d _future Euclidean distances to three time points before the current time of the prediction day;

Calculating a first fitting result d _fitting-1 Second fitting result d _fitting-2 ；

In the formula, σ _i Representing the weight of the ith time point before the current time of the forecast day in the fitting, wherein K is the number of clusters divided according to the clusters;

and 7-4: respectively calculating historical similar daily weight fitting results d _history And adjusting history value d' _history And d, real-time prediction result _future First fitting result d _fitting-1 Second fitting result d _fitting-2 And selecting the result with the highest accuracy as the final real-time prediction result.