CN115775045A - Photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment - Google Patents

Photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment Download PDF

Info

Publication number
CN115775045A
CN115775045A CN202211512958.XA CN202211512958A CN115775045A CN 115775045 A CN115775045 A CN 115775045A CN 202211512958 A CN202211512958 A CN 202211512958A CN 115775045 A CN115775045 A CN 115775045A
Authority
CN
China
Prior art keywords
historical
time
day
prediction
days
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211512958.XA
Other languages
Chinese (zh)
Inventor
陈龙
杨卫东
李盛盛
张子谦
梁淼
涂金金
邓箫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NARI Group Corp
Nari Information and Communication Technology Co
Original Assignee
NARI Group Corp
Nari Information and Communication Technology Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NARI Group Corp, Nari Information and Communication Technology Co filed Critical NARI Group Corp
Priority to CN202211512958.XA priority Critical patent/CN115775045A/en
Publication of CN115775045A publication Critical patent/CN115775045A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention discloses a photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment, establishes a historical similar day cluster screening mechanism, provides a historical similar day fitting strategy based on mutation adjustment and a real-time trend prediction strategy based on ARIMA, LSTM, lightGBM and other multivariate models, and performs dynamic weight fitting on the two strategies. The invention realizes reasonable clustering of historical days, and can efficiently select a plurality of historical days most similar to the predicted days; a distributed photovoltaic real-time trend prediction mechanism is optimized, and the high-efficiency fusion of the prediction results of the multivariate model is realized; a multi-element fitting and dynamic adjusting method for historical and real-time prediction results is provided, and accuracy of distributed photovoltaic balance prediction is improved. The prediction result of the method can meet the professional requirements of service lines such as scheduling, equipment, marketing, development and the like, and has important significance for safe and stable operation of a novel power system.

Description

Photovoltaic balance prediction method based on historical similar days and real-time multidimensional research and judgment
Technical Field
The invention relates to a photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment, and belongs to the technical field of power grid digitization.
Background
Under the large background of promoting the construction of a novel power system, photovoltaic power generation has become one of important energy source forms in the modern energy internet. The photovoltaic power generation is used as a renewable energy power generation technology, the use scale is gradually enlarged, the accurate prediction of the photovoltaic power generation becomes an important research field of data mining, but the fluctuation and the intermittency of the photovoltaic power generation make the energy management and scheduling problems of a power system become more and more complex, the existing distributed photovoltaic balance prediction is difficult to reach the ideal height, and the adverse effect is caused on the operation stability of a power grid. Therefore, accurate and reliable distributed photovoltaic balance prediction is beneficial to meeting professional requirements of service lines such as scheduling, equipment, marketing and development, and the like, and has important significance for construction of a novel power system.
The distributed photovoltaic power grid power supply system has the advantages that distributed photovoltaic ordered access is promoted for implementation, flexible absorption is achieved, safe operation of a power grid is guaranteed, and the problems that load is difficult to balance, frequency is unstable, electric energy quality is poor and the like caused by intermittent photovoltaic output to the power grid are solved. Depending on an enterprise-level middle platform, integrating distributed photovoltaic user electrical and non-electrical collected data, and supporting photovoltaic operation monitoring and electric energy quality analysis of a platform area; the load forecasting analysis according to the level of days, hours and minutes is carried out by combining the influence factors such as weather and illumination intensity, the photovoltaic monitoring, analysis, alarm and forecasting sharing service capacity is constructed, the flexible consumption of distributed photovoltaic is promoted, and the safe and stable operation of a novel power system is guaranteed.
At present, the following defects exist in the aspect of distributed photovoltaic balance prediction:
1. massive historical data is not effectively gathered and reasonably analyzed. With the advance of work such as whole county roof photovoltaic scale examination, distributed photovoltaic power generation installed capacity increases constantly, has produced massive historical data, has caused the problem of photovoltaic measurement data decentralized acquisition, decentralized storage, and platform district's side photovoltaic measurement data distribution is in distribution automation, two sets of main website systems of power consumption information collection. At present, effective aggregation and reasonable analysis of the data are not realized, so that the photovoltaic measurement data integration mode is complex, a data application chain is too long, a platform needs to be further promoted to aggregate a large amount of historical data, and analysis, study and judgment of related data need to be realized through a reasonable method.
2. Real-time operation data is not accessed and analyzed accurately in time. The distributed photovoltaic of the platform district mainly relies on meter monitoring at present, lacks the monitoring of grid-connected switch, inverter, anti-islanding device, also does not realize temporarily and joins in from the terminal station, with daylighting voltmeter monitoring data minute level access, including photovoltaic electricity type collection item (telemetering measurement, telesignalling, state, incident) and non-electricity type collection item (meteorological, temperature, humidity, illumination intensity, wind-force etc.) data, need further to realize that photovoltaic collection item is accurate, high-efficient, comprehensive, stable access to develop photovoltaic balance prediction analysis and calculation on this basis.
3. The timeliness and the accuracy of the distributed photovoltaic balance prediction are achieved. The problem of power system energy management and scheduling becomes more and more complicated due to the volatility and the intermittency of photovoltaic power generation, the acquisition frequency of information acquisition is different, the acquisition frequency is required to be continuously improved at an hour level and a 15-minute level, and the requirement for scheduling minute level and even second level prediction is met. The existing distributed photovoltaic balance prediction is difficult to reach the ideal height, adverse effects are caused to the operation stability of a power grid, the timeliness and the accuracy of the distributed photovoltaic balance prediction need to be improved, and professional requirements of service lines such as scheduling, equipment, marketing and development are supported.
In view of the above, there is a need for solving the above technical problems in distributed photovoltaic balance prediction.
Disclosure of Invention
The purpose is as follows: in order to overcome the defects in the prior art, the invention provides a photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment.
The technical scheme is as follows: in order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment comprises the following steps:
step 1: and screening out input feature vectors with high contribution degrees according to meteorological elements in historical day data.
Step 2: and clustering the history day data with the screened input features to obtain clustered clusters.
And 3, selecting the cluster which is most similar to the prediction day from the K clusters of the historical day data as a historical similar day through a Pearson correlation coefficient method, and screening input characteristic values of the historical similar day.
And 4, step 4: and adjusting the mutation points of the input characteristic values of the historical similar days to obtain the adjusted input characteristic values of the historical similar days.
And 5: and calculating the adjusted input characteristic value of the historical similar day through a historical similar day weight fitting algorithm to obtain a historical similar day weight fitting result.
Step 6: and performing multivariate real-time trend prediction according to the adjusted input characteristic values of the historical similar days to obtain a real-time prediction result.
And 7: and fitting the historical similar daily weight fitting result and the real-time prediction result to obtain a final real-time prediction result.
Preferably, the step 1 includes the following steps:
step 1-1: obtaining historical daily data
Figure BDA0003969766910000041
D represents the set of all historical day data, D is the data of one day of D, n 1 Total days of the historical day. d = { (y) i ,X i ),i=1,2,...,m 1 },
Figure BDA0003969766910000042
The feature vector is input for the ith time instant,
Figure BDA0003969766910000043
for the j input characteristic value at the ith time i And the output characteristic value of the actual photovoltaic output at the ith moment is obtained.
Step 1-2: and calculating the characteristic contribution degree of each input characteristic by adopting a Pearson correlation coefficient method according to the historical daily data D, and screening the input characteristic vectors of which the characteristic contribution degree is greater than or equal to a threshold value.
Preferably, the step 2 includes the following steps:
step 2-1: according to the historical day data of spring, autumn, summer, winter, sunny day and non-sunny day
Figure BDA0003969766910000044
Divided into six disjoint subsets D 1 ,D 2 ,...,D 6 With v = v 1 ,v 2 ,...,v 6 Representing the cluster centers corresponding to the historical daily data subsets, n 1 Total days of historical days.
Step 2-2: computing subset D i The average value of all the elements in the table is stored in v i As the initial point of the cluster center, i takes 1,2,3, \8230;, 6.
Step 2-3: when K = v, v is set i As initial point of the cluster center for k-means.
Step 2-4: when K is less than v, all v are calculated by Pearson correlation coefficient method i The correlation between the two is stored in an upper triangular matrix R, and a correlation coefficient R is found pq The largest two cluster centers.
Figure BDA0003969766910000051
Wherein r is pq Is the center of the cluster v p And v q Pearson correlation coefficient therebetween.
Step 2-5: and merging the sets corresponding to the two clustering centers with the maximum correlation to obtain a fused clustering center v'.
Step 2-6: repeating the steps until the number of v is equal to K, and deriving a cluster center initial point v'.
Step 2-7: v '= v' 1 ,v′ 2 ,...,v′ K As an initial point of the cluster center.
Step 2-8: for each sample d in the historical daily data i And calculating Euclidean distances from the Euclidean distance to the K cluster centers and dividing the Euclidean distances into clusters corresponding to the cluster centers with the minimum distances.
Step 2-9: for each cluster D i Its cluster center v' is recalculated.
Step 2-10: and (5) repeating the steps 2-8 and 2-9 until the clustering center is stable, and outputting the clustered clusters.
Step 2-11: and calculating the contour coefficient of the clustered clusters, and selecting K corresponding to the maximum value of the contour coefficient as the final cluster number.
Preferably, the calculation formula of the fused cluster center v' is as follows:
Figure BDA0003969766910000061
in the formula: d i Representing the ith set to be fused, D j Denotes the jth set to be fused, m 2 Represents the number of sets that need to be fused simultaneously, | D i I represents a cluster D i Number of samples in, K is the number of clusters to be aggregated, v i To need to fuse set D i V' is m 2 An albumAnd merging the fused clustering centers.
Preferably, the step 3 includes the following steps:
step 3-1: and selecting the cluster which is most similar to the prediction day from the K clusters as a history similar day by adopting a Pearson correlation coefficient method.
Step 3-2: and performing a historical similar day characteristic value screening algorithm on the data of the historical similar days to obtain input characteristic values of the historical similar days.
The historical similar day characteristic value screening algorithm has the following calculation formula:
Figure BDA0003969766910000062
wherein:
Figure BDA0003969766910000063
representative selection from n 1 M with the smallest difference between the selected characteristic value and the predicted day in the historical days 3 The date of the individual history of the day,
Figure BDA0003969766910000064
is the p-th input characteristic value at the ith moment of the history day, r p Representing the pearson similarity coefficient corresponding to the p-th input characteristic value.
Figure BDA0003969766910000065
For predicting the input characteristic value of the p-th time of day, r j For Pearson's similarity coefficient corresponding to jth input eigenvalue, a 3 For the number of input eigenvalues, n 3 The number of values for the current time of day is predicted.
Preferably, the step 4 includes the following steps:
step 4-1: and (3) executing a high-order polynomial fitting algorithm of a least square method on the characteristic values of the historical similar days at all times to obtain a regression fitting curve f (x).
Step 4-2: calculating all characteristic values X of historical similar days i Median of absolute deviation of (2) MAD = mean (| X) i -X′ i ") wherein media (X) represents the median value of X, X' i Fitting a regression to curve f (X) with X i Corresponding value of (A), X i And an input characteristic value representing the value of the ith time in the historical date data.
Step 4-3: obtaining the catastrophe point adjusting value X' of the input characteristic value of the historical similar day i The calculation is as follows:
Figure BDA0003969766910000071
wherein, X ″) i Is X i Adjusted value, α mad Are coefficients.
Preferably, the step 5 includes the following steps:
step 5-1: m to be adjusted by mutation point 3 Calculating the Pearson correlation coefficient of the most similar historical day data and the predicted day data in sequence to obtain the Pearson similarity r between the most similar historical days i ′。
Step 5-2: arranging the historical daily data according to time sequence, and dividing the historical daily data into t 1 Within the year, t 1 -t 2 Between years, t 2 Setting different weights beta = beta in three time periods above the year 1 ,β 2 ,β 3 . I.e. beta 1 Is 0-t 1 Weight of events within the range, β 2 Is t 1 -t 2 Weight of events within the range, β 3 Is greater than t 2 Weight of events of range, where 0 < beta 1 ,β 2 ,β 3 < 1, and β 3 =1-β 12
Step 5-3: obtaining a historical similar day weight fitting result d according to a historical similar day weight fitting algorithm history . The historical similarity day weight fitting algorithm has the following calculation formula:
Figure BDA0003969766910000081
wherein r is i ′、r j 'Pearson's correlation coefficient, T, representing ith and jth most similar historical day data and predicted day i 、T j Representing that the ith and j historical day data belong to corresponding weights beta, (y ') in three time periods' i ,X′ i )=d′ i Represents the ith most similar historical day data, theta i Is a parameter of the ith historical solar photovoltaic actual output power, m 3 Indicating the number of historical similar days.
As a preferred embodiment, it is possible to,
Figure BDA0003969766910000082
wherein, V i The total installed capacity, V, of the photovoltaic cells in the area at the ith historical time 0 To predict the total installed capacity of the area at the time of day.
Preferably, the step 6 comprises the following specific steps:
step 6-1: inputting the adjusted input characteristic values of the historical similar days into each time series prediction algorithm to obtain a prediction result d i-future
Step 6-2: calculating the accuracy of each time series prediction algorithm, and eliminating the time series prediction algorithms with the accuracy less than the accuracy threshold value and the residual m 6 The time series prediction algorithm is sorted in descending order according to the accuracy rate, and the weight gamma corresponding to each algorithm is calculated i
Figure BDA0003969766910000091
Wherein i is the remaining m 6 The sequence number after the algorithm sorting, j is the rest m 6 The jth algorithm of the algorithms.
And 6-3: according to the predicted result d i-future Weights γ corresponding to the respective algorithms i Obtaining a real-time prediction result d future
Figure BDA0003969766910000092
Wherein, γ i Weight of the ith algorithm, d i-future Is the predicted result of the ith algorithm.
Preferably, the accuracy calculation formula is as follows
C=1-E rmse
Wherein:
Figure BDA0003969766910000101
in the formula: n is the number of all the samples,
Figure BDA0003969766910000102
is the actual power at time i,
Figure BDA0003969766910000103
is the predicted power at time i, V i Is the boot capacity at time i.
The accuracy threshold C 0 The calculation formula is as follows:
Figure BDA0003969766910000104
wherein, C i Is the accuracy of the ith algorithm, n 6 Represents a total of n 6 The algorithm performs real-time trend prediction.
Preferably, the step 7 includes the following steps:
step 7-1: fitting result d according to historical similar daily weight history Calculating an adjustment history value d' history
Figure BDA0003969766910000105
Wherein the content of the first and second substances,
Figure BDA0003969766910000106
is the actual power at time i and,
Figure BDA0003969766910000107
is the historical daily power at time i, n 7 Is the total time of the similar day.
Step 7-2: respectively calculating historical similar daily weight fitting results d history And d 'adjustment history value' history Real-time prediction result d future Euclidean distances to three time points before the current time of the prediction day.
And 7-3: sorting three Euclidean distances of the ith time point in an ascending order, wherein the Euclidean distance of the jth time point is s ij The prediction result corresponding to the Euclidean distance is recorded as
Figure BDA0003969766910000111
Calculating a first fitting result d fitting-1 Second fitting result d fitting-2
Figure BDA0003969766910000112
Figure BDA0003969766910000113
In the formula, σ i Represents the weight of the ith time point before the current time of the prediction day in the fitting, and K is the number of clusters divided according to the cluster.
And 7-4: respectively calculating historical similar daily weight fitting results d history And d 'adjustment history value' history Real-time prediction result d future First fitting result d fitting-1 Second fitting result d fitting-2 And selecting the result with the highest accuracy as the final real-time prediction result.
As a preferred embodiment, wherein
Figure BDA0003969766910000114
Has the advantages that: the invention provides a photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional research and judgment, which establishes a historical similar day cluster screening mechanism, provides a historical similar day fitting strategy based on mutation adjustment and a real-time trend prediction strategy based on multivariate models such as ARIMA, LSTM and LightGBM, and performs dynamic weight fitting on the two strategies. Therefore, the accuracy of distributed photovoltaic balance prediction is improved, and the safe and stable operation of a novel power system is supported. Compared with the prior art, the method has the following advantages:
1. and establishing a historical similar day clustering screening strategy. A cluster center initial point selection strategy and a multivariate initial cluster fusion algorithm are provided, an improved historical similarity day cluster analysis algorithm is designed, and the most reasonable clustering can be carried out on the historical days. Based on the multi-element characteristic values of photovoltaic electric collection items (telemetering, remote signaling, states and events) and non-electric collection items (weather, temperature, humidity, illumination intensity, wind power and the like), a historical similar day characteristic value screening algorithm is executed, and a plurality of historical days most similar to the predicted days can be selected in the target clustering.
2. And a distributed photovoltaic real-time trend prediction mechanism is optimized. Algorithm models such as a time sequence model ARIMA, an LSTM and a LightGBM are fused, and a multivariate prediction result fusion strategy is provided based on the actual operation accuracy of the multivariate model, so that the generalization capability of the fusion model is enhanced, and the accuracy of the real-time trend prediction of the distributed photovoltaic is improved.
3. A multi-element fitting and dynamic adjusting method for historical and real-time prediction results is provided. And a historical similar daily weight fitting algorithm is provided, a prediction result based on historical data is formed, and the historical prediction result is adjusted according to the actual operation value. The accuracy of the prediction result is improved by using a history and real-time prediction fitting algorithm, and the overall distributed photovoltaic balance prediction method is adjusted in real time according to the matching degree of the multivariate prediction result and the actual value.
Drawings
FIG. 1 is an overall flow chart of the method of the present invention.
Fig. 2 is a flow chart of a cluster center initial point selection strategy.
Fig. 3 is a graph showing a fusion strategy of the multivariate prediction results.
Detailed Description
The present invention will be further described with reference to the following examples.
As shown in fig. 1, a photovoltaic balance prediction method based on historical similar days and real-time multidimensional study and judgment includes the following steps:
s01: and screening out input feature vectors with high contribution degrees according to meteorological elements in historical day data.
Historical daily data comprises meteorological elements of temperature, humidity, irradiance, wind speed, wind direction and pressure, the influence degrees of the meteorological elements on photovoltaic output are different, and too many irrelevant meteorological elements are used in photovoltaic output prediction, so that redundancy and accuracy of calculation are reduced. And carrying out quantitative analysis on the feature contribution degree by a Pearson correlation coefficient method, preliminarily simplifying the input feature vector, and eliminating meteorological elements which are irrelevant to photovoltaic output and have little influence.
Obtaining historical daily data
Figure BDA0003969766910000131
D represents the set of all historical day data, D is the data of one day of D, n 1 Total days of historical days. d = { (y) i ,X i ),i=1,2,...,m 1 },
Figure BDA0003969766910000132
The feature vector is input for the ith time instant,
Figure BDA0003969766910000133
inputting characteristic values including temperature, humidity, irradiance, wind speed, wind direction, pressure intensity and y for the jth input characteristic value at the ith moment i And the output characteristic value of the actual photovoltaic output at the ith moment is obtained. i corresponds to a time value of one time in each period of the day, for example, one point in 15 minutes, the total time value of 96 time values is total in one day, namely, each historical day data comprises input characteristic vectors corresponding to the 96 time values and output characteristic values of actual photovoltaic output, and j is one of the input characteristic valuesAnd (4) counting.
And calculating the characteristic contribution degree of each input characteristic by adopting a Pearson correlation coefficient method according to the historical daily data D, and screening the input characteristic vectors of which the characteristic contribution degree is greater than or equal to a threshold value.
And carrying out quantitative analysis on the characteristic contribution degree aiming at all factors influencing the photovoltaic power generation, including temperature, humidity, irradiance, wind speed, wind direction and pressure. The pearson correlation coefficient method is as follows:
Figure BDA0003969766910000141
wherein x is i Inputting a characteristic value;
Figure BDA0003969766910000142
is the input feature average; y is i Outputting power for actual photovoltaic;
Figure BDA0003969766910000143
is the average of the actual photovoltaic contribution; m is a unit of 1 The number of the input time values; r is the pearson correlation coefficient, here taken as the degree of contribution. Record the contribution as r w If | r w |<r 0 Then it indicates that the input feature has little influence on the actual output of the photovoltaic, and the input feature is deleted. If r w |≥r 0 Then the input features with high contribution degree are reserved, and the contribution degree of all the input features is calculated so as to preliminarily simplify the input feature vector. r is 0 Can be configured by the user according to the local actual situation.
S02: and clustering the historical day data with the screened input features to obtain clustered clusters.
The meteorological elements (including temperature, humidity, irradiance, wind speed, wind direction and pressure) of historical days of different seasons and different weather are greatly different, and typical clustering characteristics are presented. Therefore, all historical days are subjected to clustering analysis, and the historical similar days with the maximum similarity can be found in a shorter time.
Clustering is a process of categorically organizing data sets into members of data that are similar in some way, and is a technique for finding such internal structures. The K-means algorithm is the most well-known partitional clustering algorithm, and is the most widely used of all clustering algorithms due to simplicity and efficiency. The algorithm adopts an iterative updating method based on a given clustering target function, each iteration process is carried out in the direction of reducing the target function, and the final clustering result enables the target function to obtain a minimum value, so that a better classification effect is achieved.
But the original k-means algorithm also has some drawbacks, such as sensitivity to initial cluster centers. Different clustering results may result for different initial centers. Aiming at the defect of the original k-means clustering algorithm, the invention uses an improved algorithm to realize the selection of the initial point of the clustering center, so that the k-means clustering effect is better than the effect when the initial point is randomly selected.
In the present invention, preprocessing the data using K-means requires that the number of clusters K to be aggregated be specified in advance. In the absence of a priori knowledge, it is difficult to determine K. And respectively calculating the clustering effect when K =3,4,5 and 6, evaluating the clustering effect when K has different values by using the contour coefficient, and finally selecting K corresponding to the value with the maximum contour coefficient as the final cluster number.
The initial point selection strategy of the cluster center is shown in fig. 2, and comprises the following steps:
(1) According to the seasons (spring, autumn, summer and winter) and (sunny days and non-sunny days), historical day data are obtained
Figure BDA0003969766910000151
Divided into six disjoint subsets D 1 ,D 2 ,...,D 6 With v = v 1 ,v 2 ,...,v 6 Representing the cluster centers corresponding to the historical daily data subsets, n 1 Total days of the historical day.
(2) Computing subset D i The average value of all the elements in the table is stored in v i As an initial point of the cluster center, i is 1,2,3, \ 8230;, 6.
(3) When K = the number of v,v is to be i As the initial point of the cluster center of k-means.
(4) When K is less than v, all v are calculated by Pearson correlation coefficient method i The correlation between the two is stored in an upper triangular matrix R, and a correlation coefficient R is found pq The largest two cluster centers.
Figure BDA0003969766910000161
Wherein r is pq Is the cluster center v p And v q Pearson correlation coefficient therebetween.
(5) And combining the sets corresponding to the two clustering centers with the maximum correlation to obtain a fused clustering center v', and particularly, combining by using a multivariate initial clustering fusion algorithm.
The invention provides a multi-element initial clustering fusion algorithm for merging a plurality of sets, which comprises the following specific steps:
Figure BDA0003969766910000162
in the formula: d i Representing the ith set to be fused, D j Denotes the jth set to be fused, m 2 Represents the number of sets that need to be fused together, | D i I represents a cluster D i Number of samples in, K is the number of clusters to be aggregated, v i To need to fuse set D i V' is m 2 And (4) clustering centers after the fusion of the sets.
(6) Repeating the steps until the number of v is equal to K, and deriving a cluster center initial point v'.
The initial point of the cluster center calculated by the steps is used for the cluster calculation of K-means, and the K-means can aggregate historical day data into a K' cluster. The method comprises the following specific steps:
(1) Use of already determined v '= v' 1 ,v′ 2 ,...,v′ K As an initial point of the cluster center.
(2) For each sample d in the dataset i And calculating Euclidean distances from the Euclidean distance to the K cluster centers and dividing the Euclidean distances into clusters corresponding to the cluster centers with the minimum distances. Euclidean distance ρ is performed according to the following algorithm ij The calculation of (2):
Figure BDA0003969766910000171
wherein d is i Is the ith sample, v 'in the data set' j Is the fused jth cluster center.
(3) For each cluster D i Its cluster center v' is recalculated, and the centroids of all samples belonging to the cluster are calculated as a new cluster center by the following algorithm.
Figure BDA0003969766910000181
Wherein D is i Is a cluster formed by clustering, | D i I represents a cluster D i Number of middle samples, D represents cluster D i Of (2).
(4) And (4) repeating the steps (2) and (3) until the clustering center is stable, and outputting the clustered cluster.
(5) And calculating the contour coefficient of the clustered clusters, and selecting K corresponding to the maximum value of the contour coefficient as the final cluster number.
The aggregation cluster at the current K value can be calculated through the steps. The K-means algorithm is sensitive to the K value, and different K values will result in different aggregation effects. At this time, the contour coefficient is used to evaluate the clustering effect when the K has different values, and finally the K corresponding to the value with the maximum contour coefficient is selected as the final cluster number.
The contour coefficients combine the degree of agglomeration and the degree of separation of the clusters for evaluating the effect of the clustering. For each sample point d in each cluster i The contour coefficients are calculated separately. In particular, it is necessary for each sample point d i The following two indices are calculated:
a(i):sample point d i Average of the distances to other sample points that belong to the same cluster as it. The smaller a (i) the more likely it is that the sample belongs to that class, which is used to quantify the degree of aggregation within the cluster.
b (i): selection of d i Other than clusters D j Calculating d i And D j Average value b of distances of all samples in ij Traversing all other clusters, finding the minimum of the average distance, denoted b (i), b (i) = min (b) i1 ,b i1 ,...,b ik ) For quantifying the degree of separation between clusters.
Sample point d i The profile coefficients of (a) are:
Figure BDA0003969766910000191
calculate all sample points d i The obtained average value is the overall contour coefficient S of the cluster under the current K value, and the closeness degree of the data cluster is measured. S E [ -1,1]The closer to 1S, the better the clustering effect.
And finally, selecting K corresponding to the maximum value of the profile coefficient as the number of the final clusters, and curing the K clusters. And (3) executing a historical similar day clustering analysis process once every 1 day 0.
S03: and selecting the cluster which is most similar to the prediction day from the K clusters of the historical day data as a historical similar day through a Pearson correlation coefficient method, and screening input characteristic values of the historical similar day.
The electrical quantity (photovoltaic power generation power) and meteorological elements (temperature, humidity, irradiance, wind speed, wind direction and pressure) are recorded on historical days. Selecting the historical days most similar to the weather elements of the predicted days, firstly selecting the clusters most similar to the predicted days in the K clusters by adopting a Pearson correlation coefficient method, and then executing the following historical similar day characteristic value screening algorithm on all the historical days in the clusters:
Figure BDA0003969766910000201
wherein
Figure BDA0003969766910000202
Representative selection from n 1 M with minimum difference between selected characteristic value and predicted day in historical days 3 The history day.
Figure BDA0003969766910000203
Is the p-th input characteristic value of the ith moment of the historical day, r p Representing the pearson similarity coefficient corresponding to the pth input eigenvalue.
Figure BDA0003969766910000204
For predicting the input characteristic value of the p-th time of day j Is the Pearson similarity coefficient, a, corresponding to the jth input feature value calculated in the step S01 3 For inputting the number of characteristic values, n 3 The number of values for the current time of day is predicted.
S04: and adjusting the mutation points of the input characteristic values of the historical similar days to obtain the adjusted input characteristic values of the historical similar days.
And judging whether the characteristic values of the historical days selected in the step S03 at all times have mutation points, and if so, adjusting the mutation points by using a regression fitting-based median absolute deviation method. The method comprises the following specific steps:
(1) Executing a high-order polynomial fitting algorithm of a least square method on the characteristic values of the historical similar days at each moment, measuring the deviation between the data points and the fitting curve by using the least square method, and when the square sum of the difference value of the ordinate of the fitting curve and the ordinate of the data points is minimum, determining that the fitting degree is best, and obtaining a regression fitting curve f (x) at the moment
(2) Calculating all elements X i Median of absolute deviation of (2) MAD = mean (| X) i -X′ i ") wherein media (X) represents the median value of X, X' i Fitting regression to curve f (X) with X i Corresponding value of (A), X i And an input characteristic value representing the ith time value in the historical date data.
(3) Then all data can be adjusted with the following algorithm to achieve the adjustment of the mutation point:
Figure BDA0003969766910000211
wherein, X ″) i For adjusted X i Value of alpha mad As a function of the number of the coefficients,
Figure BDA0003969766910000212
k is the number of clusters.
S05: and calculating the adjusted input characteristic value of the historical similar day through a historical similar day weight fitting algorithm to obtain a historical similar day weight fitting result.
M to be adjusted by mutation point 3 Calculating the Pearson correlation coefficient of the most similar historical day data and the predicted day data in sequence to obtain the Pearson similarity r between the most similar historical days i ', the higher the similarity is, the higher the weight is given. For similar days of different time periods, it is believed that the closer to the predicted day the value of the data reference is, the higher the weight will be given. Arranging the historical daily data according to time sequence, and dividing the historical daily data into t 1 Within the year, t 1 -t 2 Between years, t 2 Setting different weights beta = beta in three time periods above the year 1 ,β 2 ,β 3 . I.e. beta 1 Is 0-t 1 Weight of events within the range, β 2 Is t 1 -t 2 Weight of events within the range, β 3 Is greater than t 2 Weight of events of range, where 0 < beta 1 ,β 2 ,β 3 < 1, and β 3 =1-β 12 And the system can be configured by a user according to local actual conditions.
The historical similarity day weight fitting algorithm is as follows:
Figure BDA0003969766910000221
wherein r is i ′、r j ' means ith, jth, and maximumsPearson's correlation coefficient, T, of similar historical day data to predicted day i 、T j Indicates that the ith and j historical day data belong to corresponding weights beta, (y ') in three time periods' i ,X′ i )=d′ i Indicating the ith most similar historical day data,
Figure BDA0003969766910000222
is a parameter of the ith historical sunlight-volt actual output power, V i The total installed capacity, V, of the photovoltaic cells in the area at the ith historical time 0 To predict the total installed capacity of the area at the time of day, m 3 Number of similar days in history, d history Represents m 3 And (4) the most similar historical days are obtained according to the results after weight fitting.
In order to solve the problems that photovoltaic measurement data has massive historical data, the calculation cost is high, the screening speed is low, the selection of similar days is not accurate enough, and the mutation point interferes with the prediction result in the screening of historical similar days, the analysis and judgment of related data are needed to be realized through a reasonable method.
The original K-means algorithm has some drawbacks, such as sensitivity to initial cluster centers. Different initial centers may result in different clustering results. Aiming at the defect of the original K-means clustering algorithm, the invention uses an improved algorithm, namely a clustering center initial point selection strategy, to realize the selection of the initial point of the clustering center, so that the K-means clustering effect is better than the effect when the initial point is randomly selected.
The historical similar day characteristic value screening algorithm screens the characteristic values of the historical day data, so that the historical day data most similar to the predicted day can be found out from the cluster more quickly, and the calculation speed and the searching accuracy are improved.
The mutation points in the historical date similarity data are adjusted through the historical date similarity adjustment algorithm, and errors caused by the mutation points in the subsequent fitting and prediction are reduced.
The historical similar day weight fitting algorithm processes and fits the similar day data of a plurality of historical days, so that the historical similar days have higher reference and the prediction accuracy is improved.
S06: and performing multivariate real-time trend prediction according to the adjusted input characteristic values of the historical similar days to obtain a real-time prediction result.
In the prior art, time series prediction algorithms have different performances under different conditions, one prediction algorithm cannot show good prediction capability under all conditions, and when the outside world changes greatly, the prediction algorithm has large deviation, so that the robustness and the generalization capability of the prediction algorithm need to be improved.
According to the method, a plurality of algorithms with excellent prediction capability are selected, the accuracy of each prediction algorithm is calculated according to historical data, a result with low accuracy is eliminated by using a multivariate prediction result fusion strategy, a higher weight is given to the result with high accuracy, then the multivariate prediction result is fitted, and the generalization capability and the anti-interference capability of a fusion model are improved.
The method comprises the steps of fusing time sequence models ARIMA, LSTM, lightGBM and other multi-element time sequence analysis models, adjusting the weight of each model, and using a multi-element prediction result fusion strategy to improve the generalization capability of the fusion models so as to form a final integrated algorithm for real-time trend prediction. The time series analysis mainly aims at two fields of time series problems, one is the analysis of historical interval data, and anomaly detection and classification are carried out according to the extraction summary of the historical data characteristics; the other is analysis of future data, i.e. predicting the state or actual value at a certain time point or points in the future based on data at past time points.
ARIMA (p, d, q) -differential autoregressive moving average model
ARIMA (Autoregressive Integrated Moving Average model), a differential Autoregressive Moving Average model, which is written as ARIMA (p, d, q), and is composed of three parts of AR (Autoregressive), I (representing differential), and MA (Moving Average), and is a model for predicting the current based on a time series historical value and a prediction error on the historical value. AR is the autoregressive term and p is the number of autoregressive terms, and the time series data up to how many days needs to be weighted. MA is a moving average term, q is the number of moving average terms, and error data up to how many days needs to be weighted. I represents the difference, and d is the number of differences that need to be made when the time series becomes stationary. The ARIMA model, which may be viewed as a "filter" that attempts to separate the signal from the noise and then extrapolate the signal into the future to obtain a prediction, is particularly suited to fitting data that exhibits non-stationarity.
ARIMA modeling basic steps:
(1) Firstly, stability detection is required to be carried out on an observed value sequence, and if the observed value sequence is not stable, differential operation is carried out on the observed value sequence until the data after differential operation is stable;
(2) After the data are stable, carrying out white noise detection on the data, wherein the white noise is a random stable sequence of zero mean constant variance;
(3) If the sequence is a stable non-white noise sequence, calculating ACF (autocorrelation coefficient) and PACF (partial autocorrelation coefficient), and performing model identification such as ARMA (auto correlation matrix);
(4) And determining model parameters of the identified model, and finally applying prediction and carrying out error analysis.
LSTM-long and short term memory model
Long short-term memory (LSTM) is a special RNN, can learn Long-distance dependence information, and mainly aims to solve the problems of gradient loss and gradient explosion in the Long sequence training process. The LSTM model solves the problem of RNN short-term memory by adding thresholds (Gates) on the basis of the RNN model, so that the recurrent neural network can really and effectively utilize long-distance time sequence information. All RNNs have a chain form of repeating neural network elements. In standard RNNs, this repeating unit has only a very simple structure, e.g. a tanh layer. The LSTM adds 3 logic control units of a forgetting Gate (Forget Gate), an Input Gate (Input Gate) and an Output Gate (Output Gate) on the basis of an RNN basic structure, the logic control units are respectively connected to a multiplication element, partial historical information is selectively forgotten by setting a weight value at the edge where a memory unit of a neural network is connected with other parts, partial current Input information is added, and finally the historical information is integrated to the current state and an Output state is generated.
Forgetting the door: this stage is mainly to selectively forget the input from the previous node, i.e. "forget unimportant and remember important", to control whether the information in the cell unit at the previous time is accumulated in the cell unit at the current time.
An input gate: this stage selectively "remembers" the inputs of this stage. The input information is selected and memorized. Important information is recorded in an emphasized manner, less important information is recorded, and whether input information flows into the cell unit or not is controlled.
Forgetting the door: this phase will determine which will be the output of the current state.
3.LightGBM
The GBDT has the main idea that the weak classifier (decision tree) is used for iterative training to obtain an optimal model, and the model has the advantages of good training effect, difficulty in overfitting and the like. The Light Gradient Boosting Machine (GBDT) is a framework for implementing the GBDT algorithm, supports high-efficiency parallel training, and has the advantages of higher training speed, lower memory consumption, higher accuracy, supporting distributed processing of mass data, and the like. To achieve these advantages, lightGBM is optimized on the traditional GBDT algorithm as follows:
(1) Histogram optimization
When an original GBDT algorithm is used for splitting nodes of a decision tree, traversing each characteristic component in a global data set to obtain an optimal splitting characteristic value of a current splitting node; the algorithm needs to traverse the global samples when building the decision tree, which is very time consuming. Based on this, lightGBM adopts a histogram optimization strategy, whose main principle is: before training, each dimension feature in the sample is sequenced, after the feature is sequenced, histogram division is carried out on the feature (256 histograms are divided by the algorithm in a default mode), in the subsequent training, the algorithm only needs to use the histograms as the features to construct a decision tree, and therefore the traversal times of the sample set are greatly reduced.
(2) Depth-first splitting strategy (leaf-wise)
Before the LightGBM algorithm, most tree models adopt level-width-first splitting (level-wise strategy) when decision trees are constructed, namely, nodes on the same layer can be simultaneously split when the nodes are split, so that multithreading parallelism can be realized to a certain extent, the speed of constructing the decision trees is accelerated, but from another point of view, only samples in a current node set are considered for optimal splitting when the level-wise strategy is constructed, and therefore the possibility of a local optimal solution exists. In addition, parallel generation may exist with partial nodes at the same level without additional splitting. Based on this, the LightGBM algorithm adopts a depth-first splitting strategy, that is, global samples are considered each time a leaf node is split, so that the problem of a local optimal solution is not caused, and the possibility of the number of post-pruning operations is reduced. For the depth-first splitting strategy, the model parameters increase the limit on the maximum depth to reduce the risk of over-fitting, since the depth of the tree may be deeper, causing over-fitting.
(3) Gradient-based One-Side Sampling strategy (GOSS)
The original GBDT algorithm is implemented using the idea that the negative gradient of the loss function is approximately equal to the residual. Compared with the LightGBM algorithm, other tree model algorithms based on the Boosting framework use a random sampling strategy to extract a certain number of samples to perform gradient updating and participate in the construction of the decision tree each time the decision tree is constructed, the LightGBM algorithm uses a single-side sampling strategy to pointedly participate in the construction of the decision tree for all samples with larger gradients, and experiments prove that the single-side sampling strategy at the LightGBM algorithm side is better than the random sampling strategy in order to ensure that the data distribution of the samples is not damaged and simultaneously samples with smaller gradients are randomly sampled to participate in the construction of the decision tree.
(4) Mutual exclusion Feature binding policy (EFB)
Mutually exclusive feature binding is to combine sparse features of different dimensions in a sample, and enter a model as a feature to participate in the construction of a decision tree. For the feature with high dimension, the data with high dimension is usually sparse, and whether a lossless method can be designed to reduce the dimension of the feature. In particular, in sparse feature space, many features are mutually exclusive, e.g., they are never non-zero at the same time. Therefore, mutually exclusive features can be bound into a single feature to participate in the construction of the final feature histogram.
In addition, algorithms such as Holt-Winters, facebook Prophet, waveNet and the like can also be used for real-time trend prediction and are fused with prediction results of all other models to form a final prediction result, so that the generalization capability of the fusion model is improved.
The prediction effect of each time series prediction algorithm is good or bad, and the prediction result of each algorithm needs to be evaluated. And (4) introducing a root mean square error and accuracy rate according to technical requirements of a wind power or photovoltaic power prediction system at a dispatching side to evaluate the quality of the algorithm prediction result. The accuracy calculation formula is as follows:
C=1-E rmse
the root mean square error is calculated as follows:
Figure BDA0003969766910000281
in the formula: n is the number of all samples,
Figure BDA0003969766910000282
is the actual power at time i,
Figure BDA0003969766910000283
is the predicted power at time i, V i Is the boot capacity at time i.
And calculating the accuracy of each time sequence prediction algorithm, eliminating the result with low accuracy according to the multivariate prediction result fusion strategy, and fitting the multivariate prediction result after giving higher weight to the result with high accuracy. As shown in fig. 3, the multivariate prediction result fusion strategy includes the following steps:
(1) Determining a rejection threshold of the accuracy of the real-time prediction algorithm according to an accuracy threshold calculation algorithm, wherein the algorithm is as follows:
Figure BDA0003969766910000291
wherein, C i Is the accuracy of the ith algorithm, n 6 Represents n in total 6 The algorithm performs real-time trend prediction.
(2) Accuracy C i <C 0 The prediction result of the algorithm is eliminated, and the rest m is eliminated 6 The algorithms are sorted in descending order according to the accuracy, and the weight corresponding to each algorithm is as follows:
Figure BDA0003969766910000292
wherein, γ i The weight of the ith algorithm after being eliminated and sorted in a descending order.
(3) And the higher the accuracy rate is, the higher the weight obtained by the algorithm is, multiplying the trend predicted value calculated by the time series prediction algorithm by the weight and then adding to obtain the final fitted real-time prediction result. The multivariate prediction result fitting algorithm is as follows:
Figure BDA0003969766910000293
wherein, gamma is i Weight of the ith algorithm, d i - future For the real-time prediction of the ith algorithm, d future And (4) representing a real-time prediction result after the fitting of the multivariate trend prediction algorithm.
S07: and fitting the historical similar daily weight fitting result and the real-time prediction result to obtain a final real-time prediction result.
In the prior art, historical similarity daily prediction and real-time trend prediction are obtained by two different ways to obtain prediction results of photovoltaic output, the results of the two methods are directly submitted to a user, the user is troubled about the results, and the user does not know the accuracy of the two schemes.
The invention provides a multivariate fitting and dynamic adjusting method for historical and real-time prediction results, so that two independent prediction results are better fused, the accuracy of the prediction results is improved by mutually contrasting the results of the two methods, the design error possibly existing when one method is used is reduced, and the overall distributed photovoltaic balance prediction method is adjusted in real time according to the matching degree of the multivariate prediction results and actual values.
In order to improve the accuracy of the prediction result, the original historical daily data, the historical value and the weight of the real-time prediction value are dynamically adjusted to obtain the final fitting result.
Using an adjusted history value algorithm based on the original history value d history Calculating an adjustment history value d' histor y. The history value adjusting algorithm comprises the following steps:
Figure BDA0003969766910000301
wherein the content of the first and second substances,
Figure BDA0003969766910000302
is the actual power at time i,
Figure BDA0003969766910000303
is the historical daily power at time i. d history Representing original historical day data, d' history Is d history Adjusted historical day data.
Respectively calculating the most similar historical daily fitting value d history And d 'adjustment history value' history Fitting value d of multivariate trend prediction algorithm future Euclidean distances to three time points before the current time of the predicted day. Sorting the three Euclidean distances of the ith time point in an ascending order, wherein the Euclidean distance of the jth row is s ij The prediction result corresponding to the Euclidean distance is recorded as
Figure BDA0003969766910000311
Using historical and real-time predictive fitting algorithms:
Figure BDA0003969766910000312
Figure BDA0003969766910000313
in the formula, σ i Representing the weight in the fit of the ith time point before the current time of the forecast day, wherein
Figure BDA0003969766910000314
d fitting-1 、d fitting-2 And K is the historical and real-time prediction fitting result for adjusting the weight according to the number of clusters divided by the clusters.
Finally, respectively calculating the most similar historical daily fitting value d history And adjusting history value d' history And fitting value d of multivariate trend prediction algorithm future Historical and real-time predictive fitting results d fitting-1 、d fitting-2 Selecting the result with the highest accuracy, namely the final real-time prediction result:
Figure BDA0003969766910000315
the invention establishes a historical similar day clustering screening strategy, realizes reasonable clustering of historical days, and can efficiently select a plurality of historical days most similar to predicted days; a distributed photovoltaic real-time trend prediction mechanism is optimized, and the high-efficiency fusion of the prediction results of the multivariate model is realized; a history and real-time prediction result multi-element fitting and dynamic adjusting method is provided, and accuracy of distributed photovoltaic balance prediction is improved. The prediction result of the method can meet the professional requirements of service lines such as scheduling, equipment, marketing and development, and has important significance on the safe and stable operation of a novel power system.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (10)

1.A photovoltaic balance prediction method based on historical similar days and real-time multidimensional research and judgment is characterized by comprising the following steps: the method comprises the following steps:
step 1: screening out input feature vectors with high contribution degrees according to meteorological elements in historical day data;
step 2: clustering the historical day data with the screened input features to obtain clustered clusters;
and step 3: selecting a cluster which is most similar to the prediction day from K clusters of historical day data as a historical similar day through a Pearson correlation coefficient method, and screening input characteristic values of the historical similar day;
and 4, step 4: adjusting the mutation points of the input characteristic values of the historical similar days to obtain the adjusted input characteristic values of the historical similar days;
and 5: calculating the adjusted input characteristic value of the historical similar day through a historical similar day weight fitting algorithm to obtain a historical similar day weight fitting result;
step 6: performing multivariate real-time trend prediction according to the adjusted input characteristic values of the historical similar days to obtain a real-time prediction result;
and 7: and fitting the historical similar daily weight fitting result and the real-time prediction result to obtain a final real-time prediction result.
2. The photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment as claimed in claim 1, wherein: the step 1 comprises the following steps:
step 1-1: obtaining historical daily data
Figure FDA0003969766900000011
D represents the set of all historical day data, D is the data of one day of D, n 1 Total days of historical days; d = { (y) i ,X i ),i=1,2,...,m 1 },
Figure FDA0003969766900000012
The feature vector is input for the ith time instant,
Figure FDA0003969766900000021
for the jth input characteristic value at the ith time instant, y i The output characteristic value of the actual photovoltaic output at the ith moment is obtained;
step 1-2: and calculating the characteristic contribution degree of each input characteristic by adopting a Pearson correlation coefficient method according to the historical daily data D, and screening out the input characteristic vectors of which the characteristic contribution degree is greater than or equal to a threshold value.
3. The photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment as claimed in claim 1, wherein: the step 2 comprises the following steps:
step 2-1: according to the historical day data of spring, autumn, summer, winter, sunny day and non-sunny day
Figure FDA0003969766900000022
Divided into six disjoint subsets D 1 ,D 2 ,...,D 6 With v = v 1 ,v 2 ,...,v 6 Representing the cluster centers corresponding to the historical daily data subsets, n 1 Total days of historical days;
step 2-2: computing subset D i The average value of all the elements in the table is stored in v i As the initial point of the cluster center, i takes 1,2,3, \8230, 6;
step 2-3: when K = v, v is set i As the initial point of the clustering center of k-means;
step 2-4: when K is less than v, all v are calculated by Pearson correlation coefficient method i The correlation between the two is stored in an upper triangular matrix R, and a correlation coefficient R is found out pq The largest two cluster centers;
Figure FDA0003969766900000031
wherein r is pq Is the center of the cluster v p And v q Pearson correlation coefficient therebetween;
step 2-5: merging the sets corresponding to the two clustering centers with the maximum correlation to obtain a fused clustering center v';
step 2-6: repeating the steps until the number of v is equal to K, and deriving a cluster center initial point v';
step 2-7: v '= v' 1 ,v′ 2 ,...,v′ K As an initial point of a cluster center;
step 2-8: for each sample d in the historical daily data i Calculating Euclidean distances from the Euclidean distance to K cluster centers and dividing the Euclidean distances into clusters corresponding to the cluster centers with the minimum distances;
step 2-9: for each cluster D i Recalculating its clustering center v';
step 2-10: repeating the steps 2-8 and 2-9 until the clustering center is stable, and outputting clustered clusters;
step 2-11: and calculating the contour coefficient of the clustered clusters, and selecting K corresponding to the maximum value of the contour coefficient as the final cluster number.
4. The photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment is characterized by comprising the following steps of: the calculation formula of the fused clustering center v' is as follows:
Figure FDA0003969766900000041
in the formula: d i Represents the ith set to be fused, D j Represents the jth set requiring fusion, m 2 Represents the number of sets that need to be fused together, | D i I represents a cluster D i Number of middle samples, K is the number of clusters to be aggregated, v i To need to fuse set D i V' is m 2 And (4) clustering centers after the fusion of the sets.
5. The photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment according to claim 1, characterized in that: the step 3 comprises the following steps:
step 3-1: selecting a cluster which is most similar to the prediction day from the K clusters as a historical similar day by adopting a Pearson correlation coefficient method;
step 3-2: performing a historical similar day characteristic value screening algorithm on the data of the historical similar days to obtain input characteristic values of the historical similar days;
the historical similar day characteristic value screening algorithm has the following calculation formula:
Figure FDA0003969766900000042
wherein:
Figure FDA0003969766900000043
representative selection from n 1 Selecting the example with the least difference between the characteristic value and the predicted day in the historical days 3 The date of the individual history of the day,
Figure FDA0003969766900000044
is the p-th input characteristic value at the ith moment of the history day, r p Representing the Pearson similarity coefficient corresponding to the p-th input characteristic value;
Figure FDA0003969766900000045
for predicting the input characteristic value of the p-th time of day, r j For the Pearson similarity coefficient corresponding to the jth input eigenvalue, a 3 For inputting the number of characteristic values, n 3 The number of values for the current time of day is predicted.
6. The photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment according to claim 1, characterized in that: the step 4 comprises the following steps:
step 4-1: executing a high-order polynomial fitting algorithm of a least square method on the characteristic values of the historical similar days at all times to obtain a regression fitting curve f (x);
step 4-2: calculating all characteristic values X of historical similar days i Median of absolute deviation of (2) MAD = mean (| X) i -X′ i "), wherein mean (X) represents the median value of X, X' i Fitting a regression to curve f (X) with X i Corresponding value of (A), X i An input characteristic value representing an ith time value in the historical date data;
step 4-3: obtaining the catastrophe point adjusting value X' of the input characteristic value of the historical similar day i The calculation is as follows:
Figure FDA0003969766900000051
wherein, X ″) i Is X i Adjusted value, α mad Are coefficients.
7. The photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment according to claim 1, characterized in that: the step 5 comprises the following steps:
step 5-1: m to be adjusted by mutation point 3 Calculating the Pearson correlation coefficient of the most similar historical day data and the predicted day data in sequence to obtain the Pearson similarity r between the most similar historical days i ′;
Step 5-2: arranging the historical daily data according to time sequence, and dividing the historical daily data into t 1 Within the year, t 1 -t 2 Between years, t 2 Setting different weights beta = beta in three time periods above the year 1 ,β 2 ,β 3 (ii) a I.e. beta 1 Is 0-t 1 Weight of events within the range, β 2 Is t 1 -t 2 Weight of events within the range, β 3 Is greater than t 2 A weight of the event of the range;
step 5-3: obtaining a historical similar day weight fitting result d according to a historical similar day weight fitting algorithm history (ii) a The historical similarity day weight fitting algorithm has the following calculation formula:
Figure FDA0003969766900000061
wherein r is i ′、r j ' Pearson correlation coefficient, T, representing the ith, j most similar historical day data and predicted day i 、T j Indicates that the ith and j historical day data belong to corresponding weights beta, (y ') in three time periods' i ,X′ i )=d′ i Represents the ith most similar historical day data, theta i Is a parameter of the ith historical solar photovoltaic actual output power, m 3 Indicating the number of historical similar days.
8. The photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment according to claim 7, characterized in that: the step 6 comprises the following specific steps:
step 6-1: inputting the adjusted input characteristic values of the historical similar days into each time series prediction algorithm to obtain a prediction result d i-future
Step 6-2: calculating the accuracy of each time series prediction algorithm, and eliminating the time series prediction algorithms with the accuracy less than the accuracy threshold value and the residual m 6 The time series prediction algorithm is sorted in descending order according to the accuracy rate, and the weight gamma corresponding to each algorithm is calculated i
Figure FDA0003969766900000071
Wherein i is the remaining m 6 The sequence number after the algorithm sorting, j is the rest m 6 A jth one of the algorithms;
step 6-3: according to the prediction result d i-future And each calculationWeight gamma of Farad i Obtaining a real-time prediction result d future
Figure FDA0003969766900000072
Wherein, gamma is i Weight of the ith algorithm, d i-future Is the predicted result of the ith algorithm.
9. The photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment according to claim 8, wherein: the accuracy calculation formula is as follows
C=1-E rmse
Wherein:
Figure FDA0003969766900000081
in the formula: n is the number of all samples,
Figure FDA0003969766900000082
is the actual power at time i,
Figure FDA0003969766900000083
is the predicted power at time i, V i Is the boot capacity at time i;
the accuracy threshold C 0 The calculation formula is as follows:
Figure FDA0003969766900000084
wherein, C i Is the accuracy of the ith algorithm, n 6 Represents n in total 6 The algorithm performs real-time trend prediction.
10. The photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment according to claim 9, wherein: the step 7 comprises the following steps:
step 7-1: fitting result d according to historical similar daily weight history Calculating an adjustment history value d' history
Figure FDA0003969766900000085
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003969766900000086
is the actual power at time i and,
Figure FDA0003969766900000087
is the historical daily power at time i, n 7 The total time of the similar days;
step 7-2: respectively calculating historical similar daily weight fitting results d history And d 'adjustment history value' history Real-time prediction result d future Euclidean distances to three time points before the current time of the prediction day;
and 7-3: sorting three Euclidean distances of the ith time point in an ascending order, wherein the Euclidean distance of the jth time point is s ij The prediction result corresponding to the Euclidean distance is recorded as
Figure FDA0003969766900000091
Calculating a first fitting result d fitting-1 Second fitting result d fitting-2
Figure FDA0003969766900000092
Figure FDA0003969766900000093
In the formula, σ i Representing the weight of the ith time point before the current time of the forecast day in the fitting, wherein K is the number of clusters divided according to the clusters;
and 7-4: respectively calculating historical similar daily weight fitting results d history And adjusting history value d' history And d, real-time prediction result future First fitting result d fitting-1 Second fitting result d fitting-2 And selecting the result with the highest accuracy as the final real-time prediction result.
CN202211512958.XA 2022-11-29 2022-11-29 Photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment Pending CN115775045A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211512958.XA CN115775045A (en) 2022-11-29 2022-11-29 Photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211512958.XA CN115775045A (en) 2022-11-29 2022-11-29 Photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment

Publications (1)

Publication Number Publication Date
CN115775045A true CN115775045A (en) 2023-03-10

Family

ID=85391435

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211512958.XA Pending CN115775045A (en) 2022-11-29 2022-11-29 Photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment

Country Status (1)

Country Link
CN (1) CN115775045A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116073436A (en) * 2023-04-06 2023-05-05 山东创宇环保科技有限公司 Capacity optimization control method for photovoltaic new energy power system
CN116402240A (en) * 2023-06-08 2023-07-07 北京中科伏瑞电气技术有限公司 Model input construction method and device for wind power prediction of dispatching side area
CN117390379A (en) * 2023-12-11 2024-01-12 博睿康医疗科技(上海)有限公司 On-line signal measuring device and confidence measuring device for signal characteristics
CN117688464A (en) * 2024-02-04 2024-03-12 国网上海市电力公司 Hidden danger analysis method and system based on multi-source sensor data
CN117688464B (en) * 2024-02-04 2024-04-19 国网上海市电力公司 Hidden danger analysis method and system based on multi-source sensor data

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116073436A (en) * 2023-04-06 2023-05-05 山东创宇环保科技有限公司 Capacity optimization control method for photovoltaic new energy power system
CN116402240A (en) * 2023-06-08 2023-07-07 北京中科伏瑞电气技术有限公司 Model input construction method and device for wind power prediction of dispatching side area
CN116402240B (en) * 2023-06-08 2023-08-18 北京中科伏瑞电气技术有限公司 Model input construction method and device for wind power prediction of dispatching side area
CN117390379A (en) * 2023-12-11 2024-01-12 博睿康医疗科技(上海)有限公司 On-line signal measuring device and confidence measuring device for signal characteristics
CN117390379B (en) * 2023-12-11 2024-03-19 博睿康医疗科技(上海)有限公司 On-line signal measuring device and confidence measuring device for signal characteristics
CN117688464A (en) * 2024-02-04 2024-03-12 国网上海市电力公司 Hidden danger analysis method and system based on multi-source sensor data
CN117688464B (en) * 2024-02-04 2024-04-19 国网上海市电力公司 Hidden danger analysis method and system based on multi-source sensor data

Similar Documents

Publication Publication Date Title
CN108734355B (en) Short-term power load parallel prediction method and system applied to power quality comprehensive management scene
CN115775045A (en) Photovoltaic balance prediction method based on historical similar days and real-time multi-dimensional study and judgment
CN107766990B (en) Method for predicting power generation power of photovoltaic power station
CN112749904B (en) Power distribution network fault risk early warning method and system based on deep learning
CN110503256B (en) Short-term load prediction method and system based on big data technology
CN110110912B (en) Photovoltaic power multi-model interval prediction method
CN111105104A (en) Short-term power load prediction method based on similar day and RBF neural network
CN110929953A (en) Photovoltaic power station ultra-short term output prediction method based on cluster analysis
CN105701572B (en) Photovoltaic short-term output prediction method based on improved Gaussian process regression
CN105069521A (en) Photovoltaic power plant output power prediction method based on weighted FCM clustering algorithm
CN113554466A (en) Short-term power consumption prediction model construction method, prediction method and device
CN111008726B (en) Class picture conversion method in power load prediction
CN115829105A (en) Photovoltaic power prediction method based on historical data feature search
CN111709554A (en) Method and system for joint prediction of net loads of power distribution network
CN105678406A (en) Short-term load prediction method based on cloud model
CN114792156A (en) Photovoltaic output power prediction method and system based on curve characteristic index clustering
CN115859099A (en) Sample generation method and device, electronic equipment and storage medium
CN112232561A (en) Power load probability prediction method based on constrained parallel LSTM quantile regression
CN115115125A (en) Photovoltaic power interval probability prediction method based on deep learning fusion model
CN114399081A (en) Photovoltaic power generation power prediction method based on weather classification
CN115758151A (en) Combined diagnosis model establishing method and photovoltaic module fault diagnosis method
CN115905904A (en) Line loss abnormity evaluation method and device for power distribution network feeder line
CN111626473A (en) Two-stage photovoltaic power prediction method considering error correction
CN115099511A (en) Photovoltaic power probability estimation method and system based on optimized copula
CN115271242A (en) Training method, prediction method and device of photovoltaic power generation power prediction model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination