CN115759467A

CN115759467A - Time-division integrated learning photovoltaic prediction method for error correction

Info

Publication number: CN115759467A
Application number: CN202211546981.0A
Authority: CN
Inventors: 林晨翔; 郑州; 黄建业; 马腾; 谢炜; 郭俊; 姚文旭; 卢淑敏
Original assignee: Electric Power Research Institute of State Grid Fujian Electric Power Co Ltd; State Grid Fujian Electric Power Co Ltd
Current assignee: Electric Power Research Institute of State Grid Fujian Electric Power Co Ltd; State Grid Fujian Electric Power Co Ltd
Priority date: 2022-12-03
Filing date: 2022-12-03
Publication date: 2023-03-07

Abstract

The invention provides an error correction time-sharing integrated learning photovoltaic prediction method, which comprises the following steps of: step 1: acquiring photovoltaic output data of a photovoltaic user to be predicted in the past 1 year; step 2: carrying out exception processing on the output and meteorological data; and step 3: using a GAN algorithm to effectively expand a vector set consisting of the existing associated meteorological data and output data; and 4, step 4: each user id is modeled at different moments; and 5: respectively constructing an ensemble learning algorithm model based on 3 base models in each class cluster; step 6: and time-division error correction is realized. By applying the technical scheme, under the condition of accurate photovoltaic output prediction, the computing resources are effectively saved, and the operating cost of enterprises is saved.

Description

Time-division integrated learning photovoltaic prediction method for error correction

Technical Field

The invention relates to the technical field of new energy, in particular to a time-sharing integrated learning photovoltaic prediction method for error correction.

Background

Photovoltaic power generation fully utilizes solar energy resources, and compared with thermal power generation, photovoltaic power generation belongs to a clean and environment-friendly power generation mode. However, the new energy power generation has intermittence and randomness, and when the photovoltaic power generation participates in grid connection, the safe and stable operation of a power conducting system can be influenced. By formulating a reasonable photovoltaic output day-ahead, intraday and real-time output plan, the high-efficiency output of photovoltaic power generation is ensured. Therefore, the method for accurately predicting the photovoltaic power generation power has important research significance and value.

In the current mainstream method, a prediction method based on deep learning has high accuracy, but consumes a long time. However, although the algorithm based on machine learning has a fast execution speed, the prediction accuracy needs to be improved.

Disclosure of Invention

In view of this, the invention aims to provide an error correction time-sharing integrated learning photovoltaic prediction method, which effectively saves computing resources and saves enterprise operation cost under the condition of realizing accurate photovoltaic output prediction.

In order to achieve the purpose, the invention adopts the following technical scheme: an error-corrected time-of-day integrated learning photovoltaic prediction method comprises the following steps:

step 1: acquiring photovoltaic output data of a photovoltaic user to be predicted in the past 1 year;

and 2, step: carrying out exception processing on the output and meteorological data;

and 3, step 3: using a GAN algorithm to effectively expand a vector set consisting of the existing associated meteorological data and output data;

and 4, step 4: modeling each user id at different moments;

and 5: respectively constructing an ensemble learning algorithm model based on 3 base models in each class cluster;

and 6: and time-division error correction is realized.

In a preferred embodiment, the step 1 specifically comprises: acquiring multi-type meteorological data of a region where a user is located in the history of nearly 1 year, wherein the multi-type meteorological data comprise irradiance, weather type, air temperature, humidity, PM2.5, wind speed and wind direction, and the time interval is 15 minutes or 1 hour; and calculating the Euclidean distance according to the longitude and latitude coordinates of the user and the longitude and latitude coordinates acquired by meteorological data so as to match the one-to-one corresponding relation of the user id and the meteorological grid id.

In a preferred embodiment, the step 2 specifically includes: calculating Pearson correlation coefficients of various meteorological data and photovoltaic output based on the data after the exception processing; and screening a plurality of associated meteorological factors one by one for the photovoltaic users according to a threshold value of 0.2.

In a preferred embodiment, the step 3 specifically includes: the GAN algorithm is based on a two-person zero-sum game idea, and a neural network capable of generating new data samples is trained to serve as a generator; training a neural network which can judge whether the data is real or not as a discriminator; learning the law of the GAN obtained through mutual game training from the existing real data samples, and finally generating new samples required by research;

in the optimization process of GAN, the objective function of the generator is as follows (1):

where E is the mathematical expectation, the Gaussian noise is z, and its probability distribution is P _z (z), the data generated by the generator is G (z), the output of the discriminator for discriminating whether the generated data is true or false is D (G (z)), and the goal of optimizing the generator is to minimize the formula (1) in the training process of the GAN;

the discriminator objective function used in GAN is as follows (2):

in the formula, x is real load and meteorological data, and the distribution of real samples is P _data (x) The arbiter discriminates the output of the real data as D (x), and the goal of optimizing the arbiter is to maximize equation (2) during the GAN training process.

In a preferred embodiment, the step 4 specifically includes: in the data set expanded by the GAN, dividing the output and the related meteorological factors at the same time point in the day into 12 groups by taking the hour as a unit; before constructing the prediction model, the data in each group is further divided into 4 class clusters by using a cosine distance-based improved K-means + + algorithm, so that the historical data of each user in the last 1 year is divided into 12 × 4 class clusters in total.

In a preferred embodiment, the step 5 is specifically:

step 51: the first base model is an FPN convolution pyramid neural network; the original FPN classification branches are removed, leaving only regression branches. And a channel attention mechanism is introduced, so that the neural network is more focused on a specific meteorological factor combination mode. Forecasting the photovoltaic output value in the future time period by inputting the forecast associated meteorological factors within 1 hour before the time period to be forecasted, so that the accumulated influence of the multiple associated meteorological factors on the photovoltaic output is accurately mined;

step 52: the second base model is a Catboost algorithm, and a photovoltaic output value in a future time period is predicted by inputting a forecast-associated meteorological factor in the time period to be predicted;

step 53: and the third base model is a LightGBM algorithm, and is used for inputting forecast associated meteorological factors of a time period to be forecasted and forecasting the photovoltaic output value of the future time period.

In a preferred embodiment, the step 6 specifically includes: according to the photovoltaic output prediction situation of the near 7 days, distinguishing each moment, calculating the 7-day average prediction error of each moment, and using the 7-day average prediction error to correct the photovoltaic output prediction value of the future moment, thereby timely discovering the change of the recent photovoltaic output characteristics of the user.

Compared with the prior art, the invention has the following beneficial effects: the method is based on a high-precision machine learning algorithm, and relatively good in prediction precision on the premise of ensuring the code execution speed. Therefore, under the condition of realizing accurate photovoltaic output prediction, computing resources are effectively saved, and the operation cost of enterprises is saved.

Drawings

FIG. 1 is a general flow diagram of a preferred embodiment of the present invention;

fig. 2 is a flowchart of the GAN algorithm framework of the preferred embodiment of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application; as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

An error-corrected time-of-day integrated learning photovoltaic prediction method is disclosed, wherein an overall algorithm flow chart is shown in fig. 1:

step 1: and acquiring photovoltaic output data of a photovoltaic user to be predicted in the past 1 year at 15-minute intervals. Acquiring multi-type meteorological data of the area where the user is located in the history of nearly 1 year, wherein the multi-type meteorological data comprises irradiance, weather type, air temperature, humidity, PM2.5, wind speed and wind direction, and the time interval is 15 minutes or 1 hour. And calculating the Euclidean distance according to the longitude and latitude coordinates of the user and the longitude and latitude coordinates acquired by meteorological data, so as to match the one-to-one corresponding relation of the user id and the meteorological grid id.

Step 2: and carrying out exception processing on the output and meteorological data. And then, calculating the Pearson correlation coefficient of each meteorological data and the photovoltaic output based on the data after the abnormal processing. And screening a plurality of associated meteorological factors one by one for the photovoltaic users according to a threshold value of 0.2.

And step 3: and effectively expanding a vector set consisting of the existing associated meteorological data and output data by using a GAN algorithm. The framework flow chart of the GAN algorithm is shown in fig. 2: the GAN algorithm is based on a two-person zero-sum game idea, and a neural network capable of generating new data samples is trained to serve as a generator; training a neural network which can judge whether the data is real or not as a discriminator; the GAN obtained through mutual game training can learn the rules of the existing real data samples and finally generate new samples required by research.

where E is the mathematical expectation, the Gaussian noise is z, and its probability distribution is P _z (z), the data generated by the generator is G (z), the output of the discriminator for discriminating whether the generated data is true or false is D (G (z)), and the goal of optimizing the generator is to minimize equation (1) during the training of GAN.

The discriminator objective function used in GAN is as follows (2):

And 4, step 4: each user id, time-shared modeling. In the GAN-extended data set, the contribution and the associated weather factors at the same time point in the daytime (without the need for the evening data) are divided into 12 groups in units of hours. Before the prediction model is constructed, further, the data in each group is continuously divided into 4 class clusters by using a cosine distance-based improved K-means + + algorithm, so that the historical data of each user in the last 1 year is divided into 12 × 4 class clusters in total. (the divided vectors are composed of the weather factors and output data associated with each user at the same time.)

And 5: and respectively constructing an integrated learning algorithm model based on 3 base models in each class cluster.

Step 51: the first base model is the FPN convolutional pyramid neural network. The original FPN classification branches are removed, leaving only regression branches. And a channel attention mechanism is introduced, so that the neural network is more focused on a specific meteorological factor combination mode. By inputting forecast associated meteorological factors within 1 hour (smoothing is 15 minutes interval) before a time period to be predicted, a photovoltaic output value of the future time period is predicted, and therefore the accumulated influence of the multi-associated meteorological factors on the photovoltaic output is accurately mined.

Step 52: the second base model is a Catboost algorithm, and the photovoltaic output value in the future time period is predicted by inputting forecast associated meteorological factors in the time period to be predicted.

And (3) weighting and summing the photovoltaic output predicted values of the 3 base models according to the proportion of 2.

Step 6: and time-division error correction is realized. According to the photovoltaic output prediction situation of the near 7 days, distinguishing each moment, calculating the 7-day average prediction error of each moment, and using the 7-day average prediction error to correct the photovoltaic output prediction value of the future moment, thereby timely discovering the change of the recent photovoltaic output characteristics of the user. (since in training the model, data from the last 1 year was used, which may differ from the photovoltaic characteristics of the user in the near future).

Claims

1. The time-sharing integrated learning photovoltaic prediction method for error correction is characterized by comprising the following steps of:

and step 3: using a GAN algorithm to effectively expand a vector set consisting of the existing associated meteorological data and output data;

and 4, step 4: modeling each user id at different moments;

step 6: and time-division error correction is realized.

2. The method for time-share integrated learning photovoltaic prediction of error correction according to claim 1, wherein the step 1 specifically comprises: acquiring multi-type meteorological data of a region where a user is located in the history of nearly 1 year, wherein the multi-type meteorological data comprise irradiance, weather type, air temperature, humidity, PM2.5, wind speed and wind direction, and the time interval is 15 minutes or 1 hour; and calculating the Euclidean distance according to the longitude and latitude coordinates of the user and the longitude and latitude coordinates acquired by meteorological data so as to match the one-to-one corresponding relation of the user id and the meteorological grid id.

3. The method for time-share integrated learning photovoltaic prediction of error correction according to claim 1, wherein the step 2 specifically comprises: calculating Pearson correlation coefficients of various meteorological data and photovoltaic output based on the data after the exception processing; and screening a plurality of associated meteorological factors one by one according to a threshold value of 0.2.

4. The method for time-share integrated learning photovoltaic prediction of error correction according to claim 1, wherein the step 3 specifically comprises: the GAN algorithm is based on a two-person zero-sum game idea, and a neural network capable of generating new data samples is trained to serve as a generator; training a neural network which can judge whether the data is real or not as a discriminator; the GAN obtained through mutual game training learns the rules of the existing real data samples and finally generates new samples required by research;

in the GAN optimization process, the objective function of the generator is as follows (1):

where E is the mathematical expectation, the Gaussian noise is z, and its probability distribution is P _z (z) the data generated by the generator is G (z), the output of the discriminator for discriminating whether the generated data is true or false is D (G (z)), and the goal of optimizing the generator in the training process of GAN is to optimize the formula (1)Miniaturization;

the discriminator objective function used in GAN is as follows (2):

5. The method for time-share integrated learning photovoltaic prediction of error correction according to claim 1, wherein the step 4 specifically comprises: in the data set expanded by the GAN, dividing the output and the related meteorological factors at the same time point in the day into 12 groups by taking the hour as a unit; before the prediction model is constructed, the data in each group is further divided into 4 class clusters by using a cosine distance-based improved K-means + + algorithm, so that the historical data of each user in the last 1 year is divided into 12 x 4 class clusters in total.

6. The time-of-day integrated learning photovoltaic prediction method for error correction according to claim 1, wherein the step 5 specifically comprises:

step 51: the first base model is an FPN convolution pyramid neural network; the original FPN classification branches are removed, leaving only regression branches. And a channel attention mechanism is introduced, so that the neural network is more focused on a specific meteorological factor combination mode. Forecasting the photovoltaic output value in the future time period by inputting the forecast associated meteorological factors within 1 hour before the time period to be forecasted, so that the accumulated influence of the multi-associated meteorological factors on the photovoltaic output is accurately mined;

step 52: the second base model is a Catboost algorithm, and a photovoltaic output value in a future time period is predicted by inputting forecast associated meteorological factors in the time period to be predicted;

7. The method for time-share integrated learning photovoltaic prediction of error correction according to claim 1, wherein the step 6 specifically comprises: according to the photovoltaic output prediction situation of the near 7 days, distinguishing each moment, calculating the 7-day average prediction error of each moment, and using the 7-day average prediction error to correct the photovoltaic output prediction value of the future moment, thereby timely discovering the change of the recent photovoltaic output characteristics of the user.