CN116702937A

CN116702937A - Photovoltaic output day-ahead prediction method based on K-means mean value clustering and BP neural network optimization

Info

Publication number: CN116702937A
Application number: CN202211610935.2A
Authority: CN
Inventors: 李楠; 黄凯; 王攀; 刘黎
Original assignee: Jingmen Power Supply Co of State Grid Hubei Electric Power Co Ltd
Current assignee: Jingmen Power Supply Co of State Grid Hubei Electric Power Co Ltd
Priority date: 2022-12-14
Filing date: 2022-12-14
Publication date: 2023-09-05

Abstract

The invention relates to a photovoltaic output day-ahead prediction method based on K-means mean value clustering and BP neural network optimization, which comprises the following steps: step 1, collecting historical meteorological data of photovoltaic power stations in a region to be predicted and historical photovoltaic output data corresponding to the historical meteorological data, and preprocessing the historical meteorological data and the historical photovoltaic output data by using an average interpolation method; step 2, calculating the correlation coefficient of the historical meteorological data and the historical photovoltaic output data by adopting the Pearson correlation coefficient, reserving the data with the correlation coefficient larger than a threshold value, and constructing a training set; step 3, clustering the historical photovoltaic output data through K-means clustering and dividing a similar day data set; step 4, optimizing the weight, the threshold value and the hidden layer node number of the BP neural network through a genetic algorithm and an ant colony algorithm, and constructing a photovoltaic output day-ahead prediction model; and 5, inputting the first seventy percent of each similar day data set as training set data into a prediction model to train a neural network and storing the model with the highest photovoltaic output prediction precision.

Description

Photovoltaic output day-ahead prediction method based on K-means mean value clustering and BP neural network optimization

Technical Field

The invention belongs to the technical field of power systems and automation thereof, and particularly relates to a photovoltaic output day-ahead prediction method based on K-means mean value clustering and an optimized BP neural network.

Background

Distributed photovoltaic energy is used as a renewable energy source with mature technology and is applied to power grid power generation on a large scale. Because photovoltaic output has uncertainty factors such as randomness, fluctuation, intermittence and the like, large-scale photovoltaic integration into a power grid can bring a series of problems to the safe and stable operation of a power system, such as voltage and frequency deviation, voltage fluctuation and grid disconnection possibly occur. The accurate future prediction of the photovoltaic power generation power is beneficial to making a future power generation plan, reduces the harm caused by wind power grid connection, and has important value and significance.

The photovoltaic output prediction method mainly comprises a physical method, a statistical method and a deep learning method. The physical method is to build a corresponding mathematical model by researching the characteristics of the photovoltaic power generation equipment so as to predict the power, and the physical model does not need the support of a large amount of historical data, but needs to calibrate the equipment frequently. The statistical method is to build a functional mapping relation between the historical data and the output power, such as a regression prediction method, a gray theory, a time sequence method and the like. Statistical models typically rely on historical data and require that the ill-formed data points in the historical data be excluded prior to prediction. The deep learning method benefits from the rapid increase of computing power, and can learn the mapping relation between input and output by using an artificial intelligence algorithm, and mainly adopts a nonlinear mapping model. The BP neural network is one of the most widely applied modeling methods due to the simple structure and strong nonlinear mapping capability, but has low convergence speed and is easy to be trapped in local optimum, the network weight threshold value, the hidden layer node number and the like need to be manually determined according to experience, and theoretical support is lacked. The photovoltaic output is greatly influenced by weather fluctuation, the output curves under different meteorological conditions are greatly different, and if the similar day division is not carried out, the prediction accuracy of the model is greatly influenced, so that the similar day division is also very important.

Disclosure of Invention

The invention provides a photovoltaic output day-ahead prediction method based on K-means mean clustering and BP neural network optimization, which aims to overcome the defect of low prediction precision in the photovoltaic power generation power prediction method in the prior art, and comprises the following steps:

step 1, collecting historical meteorological data of photovoltaic power stations in a region to be predicted and historical photovoltaic output data corresponding to the historical meteorological data, and preprocessing the historical meteorological data and the historical photovoltaic output data by using an average interpolation method;

step 2, calculating the correlation coefficient of the historical meteorological data and the historical photovoltaic output data by adopting the Pearson correlation coefficient, reserving the data with the correlation coefficient larger than a threshold value, and constructing a training set;

step 3, clustering the historical photovoltaic output data through K-means clustering and dividing a similar day data set;

step 4, optimizing the weight, the threshold value and the hidden layer node number of the BP neural network through a genetic algorithm and an ant colony algorithm, and constructing a photovoltaic output day-ahead prediction model;

step 5, inputting the first seventy percent of each similar day data set as training set data into a prediction model, training a neural network and storing the model with highest prediction precision;

and 6, judging the type of the similar day of the day to be predicted according to the predicted weather, and inputting the historical meteorological data before the day in the similar day data set into a model with highest photovoltaic output prediction precision to obtain the photovoltaic output prediction data of the predicted day.

The example provides a photovoltaic output day-ahead prediction method based on K-means clustering and BP neural network optimization. The method comprises the steps of modeling and training historical meteorological data recorded by photovoltaic power stations in a region to be predicted and historical photovoltaic output data corresponding to the historical meteorological data, preprocessing the collected historical meteorological data by an average interpolation method, calculating correlation coefficients of the historical meteorological data and the historical photovoltaic output data by adopting pearson correlation coefficients, and selecting the historical meteorological data with the correlation coefficients of the historical photovoltaic output data larger than a threshold value as a training set by adopting a proper threshold value; clustering historical photovoltaic output data by adopting a K-means mean value clustering method, and optimizing the weight, the threshold and the hidden layer node number of the BP neural network by a genetic algorithm and an ant colony algorithm, so that the occurrence of a local optimal solution of the BP neural network is avoided; and inputting the historical meteorological data with the correlation coefficient of the photovoltaic output larger than the threshold value into an optimized BP neural network to train a prediction model, wherein the optimized BP neural network carries out regression prediction on the clustered historical meteorological data, and the future prediction precision of the photovoltaic output of the model can be improved.

In the photovoltaic output day-ahead prediction method based on K-means mean value clustering and the BP neural network optimization, step 1 collects historical meteorological data of photovoltaic power stations in a region to be predicted and historical photovoltaic output data corresponding to the meteorological data, and preprocessing the historical meteorological data and the historical photovoltaic output data by using an average interpolation method.

In the photovoltaic output day-ahead prediction method based on K-means mean clustering and the BP neural network optimization, the specific steps of calculating the correlation coefficient of the historical meteorological data and the historical photovoltaic output data by adopting the pearson correlation coefficient, reserving the data with the correlation coefficient larger than a threshold value, and constructing a training set are as follows:

the pearson correlation coefficient P between the historical meteorological data X and the photovoltaic output value Y is calculated, and the calculation formula is as follows:

wherein n is the sequence length; x is x _i And y _i The ith variable of sequence X and sequence Y, respectively;and->The average of sequence X and sequence Y, respectively. The value range of P is [ -1,1]The larger the absolute value of P represents the higher degree of correlation between the two sequences.

In the photovoltaic output day-ahead prediction method based on K-means mean clustering and the BP neural network optimization, the step 3 of clustering historical photovoltaic output data through K-means clustering and dividing similar day data sets specifically comprises the following steps:

for a given dataset x= { X ₁ ,X ₂ ...X _n Each object contains t features and the data set X corresponds to an n X t matrix. Clustering process by studying similarity between objects in dataset X, samples in dataset X are partitioned into k different categories c= { C following a certain clustering criterion ₁ ,C ₂ ...C _k And the different categories are independent of each other. To measure the similarity between objects, a distance function is introduced. In data set X, arbitrary sample X _e And X _f The similarity between the two can be determined by Euclidean distance d _ef Expressed as:

when sample X _e And X _f The more similar or close, d _ef The smaller; otherwise, the larger its value.

And carrying out K-means mean value clustering on the historical photovoltaic output data, dividing the historical photovoltaic output data with high similarity into the same similar day data set, and storing a clustering center of the similar day data set.

In the photovoltaic output day-ahead prediction method based on K-means mean clustering and BP neural network optimization, step 4 optimizes the weight, threshold and hidden layer node number of the BP neural network through a genetic algorithm and an ant colony algorithm, and the construction of the photovoltaic output day-ahead prediction model comprises the following steps:

before modeling a GA-ACO optimized BP neural network, optimizing through a GA algorithm to generate an optimized solution of BP neural network weight, threshold and hidden layer node number; subsequently, the distribution of the pheromones is initialized, the concentration of the pheromones on the optimized solution path is increased, and the purpose of the method is to increase the concentration of the pheromones on the optimized solution path, so that the convergence speed and the accuracy in ACO searching are improved.

The initialization formula of the pheromone is

τ＝τ _G +c

Wherein τ _G The concentration value of the pheromone after GA optimization; c is a pheromone constant.

The main steps of the GA-ACO optimized BP neural network are as follows:

step 4.1: and initializing the BP neural network and the ant colony. Initializing required parameter settings, namely: connection weight omega between input layer and hidden layer _ij A hidden layer threshold alpha, a connection weight omega between a hidden layer and an output layer _jk Output layer threshold β, and hidden layer node number n. The above parameters are denoted as p ₁ 、p ₂ 、…、p _n Composition element set I _ni The method comprises the steps of carrying out a first treatment on the surface of the Initializing the number S of ants, the pheromone volatilization coefficient rho, the target error E and the like in the ant colony algorithm.

Step 4.2: s ants start searching and update pheromone until all S ants complete searching. During searching, ants can update the pheromone values on all sides through which the ants pass in real time, and the pheromone value updating formula is as follows:

wherein ρ is the pheromone volatilization coefficient;for the information quantity of kth ant on the j element path in the current circulation set, +.>The information quantity increment of the kth ant on the j element path in the current circulation set is obtained.

Step 4.3: the genetic algorithm is added to the ant colony algorithm. And performing operations such as crossing, mutation and the like on the ant colony. The most commonly used single-point crossover is selected in the crossover algorithm, namely, a point is randomly selected in the gene sequence to serve as a crossover point, and partial alleles of two different individuals are interchanged by taking the point as a boundary to generate two new gene sequences; in the mutation algorithm, a normal distribution with the mean value of mu and the variance of sigma is selected to carry out mutation operation on part of genes with smaller probability, and the expression of the new individual is generated

σ′＝σeN(0,Δσ)

x′＝x+N(0,Δσ)

Wherein x is the next path node of the ant colony. And selecting a fitness function to calculate individual fitness. The least mean square error of the learning samples is used herein as a fitness function. And (3) calculating an fitness value according to the formula (1), and judging whether the requirement of the current optimal solution is met. If yes, the step 4 is carried out, otherwise, the step 2 is carried out.

Step 4.4: taking the optimizing result of the ant colony algorithm in the last step as a parameter of the BP neural network, training the neural network, and calculating an error e. Has the following components

e _q ＝O _q -Y _q

Wherein O is _q Is the expected value; y is Y _q Is a predicted value; q is the number of neurons, q=1, 2, …, n.

Step 4.5: and (4) updating the weight, the threshold and the hidden layer node number of the BP neural network according to the result of the step 4.4, and judging whether the requirement is met. If yes, the algorithm is ended, and the optimal weight, the threshold value and the hidden layer node number are output; otherwise, go to step 4.3.

In the photovoltaic output day-ahead prediction method based on K-means mean clustering and BP neural network optimization, the step 5 is to input the first seventy percent of each similar day data set as training set data into a prediction model to train the neural network and save the model with the highest prediction precision, and the method specifically comprises the following steps:

in the same similar day data set obtained by K-means clustering, weather factors of the previous day of two adjacent days are used as input, photovoltaic output of the next day is used as output to train a network, and mean absolute error (MAPE) and mean square error (RMSE) are used as network precision evaluation indexes, wherein the specific calculation formula is as follows:

and obtaining and storing a prediction model with higher precision through repeated training.

In the photovoltaic output day-ahead prediction method based on K-means mean value clustering and the BP neural network optimization, the step 6 judges the type of the day similar to the day to be predicted according to the predicted weather, and inputs day-ahead historical meteorological data in the similar day data set into the model with highest photovoltaic output prediction precision to obtain photovoltaic output prediction data of the predicted day specifically comprises the following steps:

step 6.1: and (3) obtaining weather information of a day to be predicted through weather forecast, calculating the Euclidean distance between the weather information to be predicted and the clustering center obtained in the step (3), and judging the type of the similar day of the day to be predicted.

And 6.2, inputting historical meteorological data of similar days closest to the predicted day into a corresponding prediction model to obtain a predicted result before the photovoltaic output day.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that: the accuracy of similar day selection can be improved through pearson correlation coefficient analysis and K-means clustering, and therefore the difficulty of model training is reduced. The weight, the threshold and the hidden layer node number of the BP neural network are optimized by adopting a genetic algorithm and an ant colony algorithm, so that the problems of local optimization and overfitting of the BP neural network weight can be effectively solved; by combining the methods, the precision of the photovoltaic output day-ahead prediction can be improved.

Drawings

FIG. 1 is a flow chart of a method according to an embodiment of the present invention;

FIG. 2 is a flow chart of the GA-ACO optimized BP neural network of the invention;

FIG. 3 is a graph comparing the predicted result of the present invention with the predicted result and the true value of the basic BP neural network.

Detailed Description

The technical solutions of the embodiments of the present invention will be clearly and completely described in the following in conjunction with the embodiments of the present invention, and it is obvious that the described embodiments are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.

The invention will be further illustrated, but is not limited, by the following examples.

The embodiment relates to a photovoltaic output day-ahead prediction method based on K-means clustering and BP neural network optimization, wherein a flow chart is shown in figure 1, and the method comprises the following steps:

step 1, collecting historical meteorological data of photovoltaic power stations in a region to be predicted and historical photovoltaic output data corresponding to the meteorological data, and preprocessing the historical meteorological data and the historical photovoltaic output data by using an average interpolation method.

In the step, historical meteorological data recorded by photovoltaic power stations in a region to be predicted and corresponding historical photovoltaic output data are collected, the sampling time interval is 15 minutes, the data collected in one natural day are used as a sample, and an average interpolation method is adopted to preprocess the historical meteorological data and the corresponding photovoltaic output data.

In the embodiment, the total solar horizontal radiation, the air temperature, the cloud layer opacity, the atmospheric precipitation, the relative humidity, the snowfall depth, the ground air pressure and the air speed are selected as the meteorological data, the meteorological data is normalized by adopting an average interpolation method, and the calculation formula is as follows:

wherein x is ^* Represents the normalized value of x, x _min Represents the minimum value of x, x _max Represents the maximum value of x;

and 2, calculating the correlation coefficient of the historical meteorological data and the historical photovoltaic output data by adopting the Pearson correlation coefficient, reserving the data of which the correlation coefficient is larger than a threshold value, and constructing a training set.

In this step, the meteorological factor data X calculate the pearson correlation coefficient P between the photovoltaic output values Y, and the calculation formula is as follows:

In the present embodiment, the threshold value of the pearson correlation coefficient is set to 0.2.

And step 3, clustering the historical photovoltaic output data through K-means clustering and dividing a similar day data set.

In the step, the training set photovoltaic output historical data is clustered by adopting a K-means mean clustering method, and the specific steps are as follows:

for a given dataset x= { X ₁ ,X ₂ ...X _n Each object contains t features, and the data set X corresponds to an n×t. Clustering process by studying similarity between objects in dataset X, samples in dataset X are partitioned into k different categories c= { C following a certain clustering criterion ₁ ,C ₂ ...C _k And the different categories are independent of each other. To measure the similarity between objects, a distance function is introduced. In data set X, arbitrary sample X _e And X _f The similarity between the two can be determined by Euclidean distance d _ef Expressed as:

In this embodiment the cluster k value is determined by the elbow rule and since the initial cluster center is randomly chosen, it is necessary to determine the appropriate cluster center by multiple clusters.

And 4, optimizing the weight, the threshold and the hidden layer node number of the BP neural network through a genetic algorithm and an ant colony algorithm, and constructing a photovoltaic output day-ahead prediction model.

In the step, the weight, the threshold and the hidden layer node number of the BP neural network are optimized through a genetic algorithm and an ant colony algorithm, and the photovoltaic output prediction model is constructed specifically by the following steps:

The initialization formula of the pheromone is

τ＝τ _G +c

The main steps of the GA-ACO optimized BP neural network are as follows:

wherein ρ is the pheromone volatilization coefficient;the information quantity of the kth ant on the j element path in the current circulation set is obtained.

σ′＝σeN(0,Δσ)

x′＝x+N(0,Δσ)

e _q ＝O _q -Y _q

Step 4.5: and (4) updating the weight, the threshold and the hidden layer node number of the BP neural network according to the result of the step 4.4, and judging whether the requirement is met. If yes, the algorithm is ended, and the optimal weight, threshold value and hidden layer node number are output; otherwise, go to step 4.3.

And 5, inputting the first seventy percent of each similar day data set into a prediction model as training set data, training a neural network and storing the model with the highest prediction precision.

The method for training the neural network and storing the model with highest prediction precision comprises the following steps:

And 6, judging the type of the similar day of the day to be predicted according to the predicted weather, and inputting the day-ahead meteorological data in the similar day data set into the model with the highest photovoltaic output prediction precision to obtain the photovoltaic output prediction data of the predicted day.

In the step, the type of the similar day of the day to be predicted is judged according to the predicted weather, and the weather data before the day in the similar day data set are input into the model with the highest photovoltaic output prediction precision to obtain the photovoltaic output prediction data of the predicted day, specifically the steps are as follows:

In order to verify the effectiveness of the photovoltaic output day-ahead prediction method based on K-means mean clustering and the BP neural network, the following four prediction models are adopted respectively, and the prediction results obtained by the four models are subjected to comparative analysis, wherein the results are shown in fig. 3 and table 1.

Method 1: a BP neural network prediction model;

method 2: optimizing a BP neural network prediction model;

method 3: K-means+BP neural network prediction model;

method 4: the K-means+ optimized BP neural network prediction model is characterized by comprising a model;

table 1 comparison of different model errors

As can be seen from the comparison data of Table 1 and FIG. 3, after the similar days are selected by the pearson correlation coefficient analysis and the K-means clustering, the model prediction accuracy is obviously improved due to the fact that the photovoltaic output fluctuation rules in the similar days are similar. In addition, after the initial weight of the BP neural network is optimized by adopting a genetic algorithm and an ant colony algorithm, the prediction precision of the BP neural network is improved; by combining the methods, the precision of the photovoltaic output day-ahead prediction can be improved.

The foregoing is merely illustrative of the preferred embodiments of the present invention and is not intended to limit the embodiments and scope of the present invention, and it should be appreciated by those skilled in the art that equivalent substitutions and obvious variations may be made using the teachings of the present invention, which are intended to be included within the scope of the present invention.

Claims

1. A photovoltaic output day-ahead prediction method based on K-means mean value clustering and BP neural network optimization is characterized by comprising the following steps of: the method comprises the following steps:

2. The photovoltaic output day-ahead prediction method based on K-means mean clustering and BP neural network optimization is characterized by comprising the following steps of: in the step 2, the pearson correlation coefficient is adopted to calculate the correlation coefficient of the historical meteorological data and the historical photovoltaic output data, the data with the correlation coefficient larger than a threshold value is reserved, and the specific steps of constructing a training set are as follows:

the historical meteorological data X calculates a pearson correlation coefficient P between historical photovoltaic output values Y, and the calculation formula is as follows:

wherein n is the sequence length; x is x _i And y _i The ith variable of sequence X and sequence Y, respectively;and->The average value of the sequence X and the sequence Y is respectively, and the value range of P is [ -1,1]The larger the absolute value of P represents the higher degree of correlation between the two sequences.

3. The photovoltaic output day-ahead prediction method based on K-means mean clustering and BP neural network optimization is characterized by comprising the following steps of: in the step 3, the specific steps of clustering the historical photovoltaic output data and dividing the similar day data set through K-means clustering are as follows:

for a given dataset x= { X ₁ ,X ₂ ...X _n Each object contains t features, and the data set X corresponds to an n X t momentThe array, the cluster analysis process classifies samples in the data set X into k different categories C= { C by researching the similarity among objects in the data set X and following a certain clustering criterion ₁ ,C ₂ ...C _k The different categories are independent of each other, and in order to measure the similarity between objects, a distance function is introduced, and in the data set X, any sample X _e And X _f The similarity between the two can be determined by Euclidean distance d _ef Expressed as:

when sample X _e And X _f The more similar or close, d _ef The smaller; otherwise, the larger its value,

4. The photovoltaic output day-ahead prediction method based on K-means mean clustering and BP neural network optimization is characterized by comprising the following steps of: in the step 4, the weight, the threshold and the hidden layer node number of the BP neural network are optimized through a genetic algorithm and an ant colony algorithm, and the step of constructing a photovoltaic output day-ahead prediction model is specifically as follows:

step 4.1: initializing a BP neural network and an ant colony, and initializing required parameter settings, namely: connection weight omega between input layer and hidden layer _ij A hidden layer threshold alpha, a connection weight omega between a hidden layer and an output layer _jk Output layer threshold beta and hidden layer node number n, the parameters are denoted as p ₁ 、p ₂ 、…、p _n Composition element set I _ni The method comprises the steps of carrying out a first treatment on the surface of the Initializing the number S of ants, the pheromone volatilization coefficient rho and a target error E in an ant colony algorithm;

step 4.2: s ants start searching and update pheromones until all S ants complete searching, and at the same time of searching, the ants update the pheromone values on all sides passed by the ants in real time, and the pheromone value updating formula is as follows:

wherein ρ is the pheromone volatilization coefficient;information quantity of kth ant on j element path in current cycle set, +.>The information quantity increment of the kth ant on the j element path in the current circulation set is obtained;

step 4.3: adding a genetic algorithm into an ant colony algorithm, executing crossover and mutation operations on the ant colony, wherein the crossover algorithm adopts the most commonly used single-point crossover, namely randomly selecting a point in a gene sequence as a crossover point, and exchanging partial alleles of two different individuals by taking the point as a boundary to generate two new gene sequences; in the mutation algorithm, a normal distribution with the mean value of mu and the variance of sigma is selected to carry out mutation operation on part of genes with smaller probability, and the expression of the new individual is generated

σ′＝σeN(0,Δσ)

x′＝x+N(0,Δσ)

In the formula, x is the node of the next path of the ant colony, an fitness function is selected to calculate individual fitness, the method takes the minimum mean square error of a learning sample as the fitness function, a fitness value is calculated according to the formula (1), whether the requirement of the current optimal solution is met or not is judged, if yes, the step 4 is carried out, otherwise, the step 2 is carried out,

step 4.4: taking the optimizing result of the ant colony algorithm in the last step as the parameter of the BP neural network, training the neural network, and calculating the error e, wherein the error e is

e _q ＝O _q -Y _q

Wherein O is _q Is the expected value; y is Y _q Is a predicted value; q is the number of neurons, q=1, 2, …, n;

step 4.5: updating the weight, the threshold, the number of hidden layer nodes and the number of hidden layer nodes of the BP neural network according to the result of the step 4.4, judging whether the requirement is met, if yes, ending the algorithm, and outputting the optimal weight, the threshold, the number of hidden layer nodes and the number of hidden layer nodes; otherwise, go to step 4.3.

5. The photovoltaic output day-ahead prediction method based on K-means mean clustering and BP neural network optimization is characterized by comprising the following steps of: in the step 5, the step of inputting the first seventy percent of each similar day data set as training set data into the prediction model to train the neural network and save the model with the highest prediction precision is specifically as follows:

6. The photovoltaic output day-ahead prediction method based on K-means mean clustering and BP neural network optimization is characterized by comprising the following steps of: in the step 6, the type of the similar day of the day to be predicted is judged according to the predicted weather, and the historical meteorological data before the day in the similar day data set are input into the model with the highest photovoltaic output prediction precision to obtain the photovoltaic output prediction data of the predicted day, and the steps are specifically as follows:

step 6.1: obtaining weather information of a day to be predicted through weather forecast, calculating Euclidean distance between the weather information to be predicted and the clustering center obtained in the step 3, and judging the similar day type of the day to be predicted;

and 6.2, inputting weather factors of similar days closest to the predicted day into corresponding prediction models to obtain a predicted result before the photovoltaic output day.