CN116345555A - CNN-ISCA-LSTM model-based short-term photovoltaic power generation power prediction method - Google Patents

CNN-ISCA-LSTM model-based short-term photovoltaic power generation power prediction method Download PDF

Info

Publication number
CN116345555A
CN116345555A CN202310321716.0A CN202310321716A CN116345555A CN 116345555 A CN116345555 A CN 116345555A CN 202310321716 A CN202310321716 A CN 202310321716A CN 116345555 A CN116345555 A CN 116345555A
Authority
CN
China
Prior art keywords
data
cnn
lstm
power generation
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202310321716.0A
Other languages
Chinese (zh)
Inventor
王鑫
苗桂喜
孙浩然
元亮
席晟哲
王继勇
孟红杰
白方亮
连勇
王丽晔
孙大江
郑惠瀛
赵悠悠
王琪
苏子乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anyang Power Supply Co of State Grid Henan Electric Power Co Ltd
Original Assignee
Anyang Power Supply Co of State Grid Henan Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anyang Power Supply Co of State Grid Henan Electric Power Co Ltd filed Critical Anyang Power Supply Co of State Grid Henan Electric Power Co Ltd
Priority to CN202310321716.0A priority Critical patent/CN116345555A/en
Publication of CN116345555A publication Critical patent/CN116345555A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/004Generation forecast, e.g. methods or systems for forecasting future energy generation
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/381Dispersed generators
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02SGENERATION OF ELECTRIC POWER BY CONVERSION OF INFRARED RADIATION, VISIBLE LIGHT OR ULTRAVIOLET LIGHT, e.g. USING PHOTOVOLTAIC [PV] MODULES
    • H02S40/00Components or accessories in combination with PV modules, not provided for in groups H02S10/00 - H02S30/00
    • H02S40/30Electrical components
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/10Power transmission or distribution systems management focussing at grid-level, e.g. load flow analysis, node profile computation, meshed network optimisation, active network management or spinning reserve management
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • H02J2300/22The renewable source being solar energy
    • H02J2300/24The renewable source being solar energy of photovoltaic origin
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention relates to a short-term photovoltaic power generation power prediction method based on a CNN-ISCA-LSTM model, which comprises the following steps: firstly, acquiring original power generation data and meteorological data of a photovoltaic power station, and preprocessing the data; secondly, calculating the correlation between each meteorological feature and photovoltaic power generation data by utilizing a Szelman correlation method, and selecting the feature with higher correlation with the power generation power feature as an input feature of model training; on the basis, SCA is improved, and a CNN-ISCA-LSTM short-term photovoltaic power generation power prediction model is established; and finally, carrying out inverse normalization on the prediction result, evaluating the model prediction effect, and verifying the effectiveness of the method. The improved SCA is utilized to optimize the super parameters of the CNN-LSTM model, and through the verification of an example, the model can effectively improve the accuracy of photovoltaic power generation power prediction, and has higher prediction stability under the working condition of non-sunny days.

Description

CNN-ISCA-LSTM model-based short-term photovoltaic power generation power prediction method
Technical Field
The invention belongs to the technical field of new energy generated power prediction, and particularly relates to a short-term photovoltaic generated power prediction method based on a CNN-ISCA-LSTM model.
Background
Fossil fuels such as petroleum and coal are used in large quantities and gradually run to exhaustion, and at present, the world needs to cope with both energy deficiency and environmental deterioration. The development of renewable new energy sources has become the primary choice of various countries, solar energy is taken as one of the renewable new energy sources, is not restricted by regions and irradiates on the surface of the earth, and has the advantages of strong renewable capability, cleanness, environmental protection, abundant resources, convenient development and utilization and the like. The photovoltaic power generation has the characteristics of randomness, discontinuity and the like, and the influence on a power grid is larger and larger during grid connection. The prediction of the photovoltaic power generation power can improve the predictability of power output, not only can ensure the reliability of power grid dispatching information, but also can help reasonable planning of power grid dispatching, and has important practical significance and guiding significance for promoting healthy and stable development of the photovoltaic power generation industry.
There are many methods for predicting photovoltaic power generation power at present, including a statistical method, a photovoltaic performance model, a deep learning method, a hybrid model and the like. Because the deep learning model can realize end-to-end mapping, the prediction of photovoltaic output is realized by using a plurality of deep learning combination models, not only can multi-dimensional and high-depth data information be provided for the model, but also the complexity of the model can be further expanded, and the learning capacity of the model is improved. Meanwhile, various prediction modes including point prediction and interval prediction are performed by using the combined model, so that the related research on the photovoltaic output power prediction by using a deep learning method can be thinned, and a certain reference basis is provided for the subsequent photovoltaic output power problem solving and time sequence prediction processing work.
Based on the background, a high-precision prediction model is established based on deep learning to predict the short-term photovoltaic power generation power, and the method has practical value and guiding significance in various aspects such as helping a power dispatching department to perfect a dispatching plan, improving the photovoltaic power generation prediction capability, improving the economic benefit of the photovoltaic industry, carrying out related theoretical research and the like.
Disclosure of Invention
At present, in the photovoltaic power generation power prediction, the super-parameter setting of a prediction model needs to depend on a large amount of experience and manual setting, the prediction effect difference of models with different parameters is large, the prediction precision is not high enough, the model prediction effect is not stable enough under various working conditions, and in order to further improve the model prediction effect, the invention provides a method for carrying out short-term photovoltaic power generation power prediction based on an improved sine and cosine algorithm (Improved Sine Cosine Algorithm, ISCA) combined with a CNN-LSTM model. Firstly, acquiring original power generation data and meteorological data of a photovoltaic power station, and preprocessing the data; secondly, calculating the correlation between each meteorological feature and photovoltaic power generation data by utilizing a Szelman correlation method, and selecting the feature with higher correlation with the power generation power feature as an input feature of model training; on the basis, SCA is improved, and a CNN-ISCA-LSTM short-term photovoltaic power generation power prediction model is established; and finally, carrying out inverse normalization on the prediction result, evaluating the model prediction effect, and verifying the effectiveness of the method.
The invention adopts the technical scheme that: a short-term photovoltaic power generation power prediction method based on a CNN-ISCA-LSTM model comprises the following steps:
s1: acquiring original data and preprocessing the data;
s2: carrying out correlation analysis on the original features and eliminating features with low correlation coefficients;
s3: establishing a CNN-LSTM hybrid prediction model, and determining super parameters to be optimized;
s4: and (3) improving the SCA algorithm and constructing a CNN-ISCA-LSTM short-term photovoltaic power generation power prediction model.
S5: and (5) data inverse normalization and model prediction effect evaluation.
Specifically, the step S1: raw data are acquired and pre-processed. Poor data such as missing values and abnormal values caused by abnormality of the data acquisition device or the sensor may exist in the original data set, and preprocessing of the data is required. On the basis, the data is normalized, so that the reduction of prediction accuracy of the deep learning model due to different dimensions is avoided. The part specifically comprises:
(1) And filling missing values in the original data set by adopting a random forest algorithm. For a data set of m rows and n columns (n features, each feature sequence length is m), wherein the feature i contains a missing value, other feature data of the row where the missing value is located is taken as prediction input test data, the missing value is taken as an object to be predicted, the rest n-1 features do not contain the data of the row where the missing value is taken as training set features, the data which is not missing in the i is taken as training set labels, and then the model is trained and regression prediction is carried out on the missing value;
(2) And detecting abnormal data by using the box line graph and filling by adopting a random forest algorithm. And detecting the abnormal value of the original data by using a box diagram principle, then regarding the abnormal value as a missing value, and filling the missing value by using a random forest algorithm. The case diagram judges the condition of the abnormal value as follows:
Figure BDA0004151985240000031
wherein x is a Representing outliers; q (Q) 1 、Q 3 Respectively representing the upper quartile and the lower quartile of the box diagram; IQR represents the quarter-bit spacing, i.e., iqr=q 1 -Q 3
(3) And (5) data normalization processing. If the method is directly used for model training, the capability of the model for learning nonlinear characteristics is weakened, and the data needs to be normalized to the [0,1] interval. The data are normalized by adopting a min-max method, and the calculation formula is as follows:
Figure BDA0004151985240000032
wherein: x is x i (k) The original value, x, of the kth sample of feature i i,max 、x i,min Respectively the minimum value and the maximum value in the characteristic i, x i ' (k) is normalizationValues.
Specifically, the step S2: and carrying out correlation analysis on the original features and eliminating features with low correlation coefficients. The power variation of photovoltaic power generation is mainly affected by the variation of meteorological factors, such as solar irradiance, air temperature, humidity, air pressure and the like. In order to screen out the main characteristics affecting the photovoltaic output from a plurality of meteorological factors, the negative influence of factors with small relevance on the result is reduced, and the meteorological factors need to be subjected to relevance analysis.
The spearman correlation coefficient (Spearman Correlation) method provides a method for effectively judging vector similarity for processing under different variable value ranges. And selecting the pearson correlation coefficient to analyze the meteorological features so as to extract key information factors.
The calculation formula of the spearman correlation coefficient for the two sequences X and Y is:
Figure BDA0004151985240000041
wherein: n is the length of the sequence, x i And y i The ith variable of sequences X and Y respectively,
Figure BDA0004151985240000042
and->
Figure BDA0004151985240000043
The average of sequences X and Y, respectively. The value range of S is [ -1,1]The closer S is to 0, the lower the correlation of the two vectors, the more to-1 the negative correlation, and the more to 1 the positive correlation.
Specifically, the step S3: and establishing a CNN-LSTM hybrid prediction model, and determining the super parameters to be optimized.
(1) CNN is a deep neural network widely applied in recent years, and adopts a local connection and weight sharing manner to perform higher-level and more abstract processing on original data, so that deep features in a sequence can be automatically and effectively extracted, complexity of feature extraction and data reconstruction is reduced, quality of data features is improved, and the CNN performs feature extraction by using a convolution layer, a pooling layer and a full connection layer. The convolutional layer is used for extracting useful feature information in the input data, and the pooling layer is used for selecting the features extracted from the convolutional layer and reducing corresponding calculation.
Feature extraction by CNN is performed by increasing the number N of convolution layers c Multiple convolution kernels can be used to process the input data sequence and generate feature maps of corresponding sizes, while the deeper the information is extracted, the more accurate the feature expression capability, the number of convolution kernels N in each convolution layer kernel And size S kernel The level of detail important to the extracted information is relevant. After the convolution layer extracts the characteristics, the pooling layer downsamples, then the generated characteristic diagram is output through the full-connection layer, the full-connection layer determines the target output according to the input variable characteristics, and the layer number N of the full-connection layer d Number of neurons per layer N CNN,neural Is the most important super parameter of the full connection layer. In addition to the above super parameters to be optimized, the method also comprises the step of processing the batch size S CNN,batch Learning rate L rate And thus may provide better convergence and more efficient gradient computation. Finally, network overfitting is prevented by L2 regularization and dropout.
(2) The recurrent neural network (Recurrent Neural Network, RNN) can better handle time series problems, but cannot solve long-term dependency problems, i.e. as the length of the input sequence increases, the model cannot use earlier data information in the sequence. The LSTM neural network replaces the neurons in the RNN hidden layer with the memory units with long-term memory effect, so that the long-term dependence problem can be effectively solved. In the LSTM model, the memory unit comprises 3 parts of a forgetting gate, an input gate and an output gate, the forgetting gate discards irrelevant information, the input gate determines new information stored in the unit state, the output gate controls the output of hidden layer nodes, and the gate control units enable the LSTM to have the capability of updating and controlling information flows in different blocks. The LSTM neural network has 4 super parameters to be optimized, namely the layer number N of the LSTM l Maximum number of iterations N e,max Number of neurons per layerN LSTM,neural Batch size S LSTM,batch
Specifically, the step S4: and (3) improving SCA and constructing a CNN-ISCA-LSTM short-term photovoltaic generation power prediction model. SCA is an intelligent evolutionary algorithm based on sine and cosine functions, which was proposed by Mirjalli in 2015, and accepted or self-generated a set of random solutions in the initialization phase, which constitute a population. In the next stage, a scoring function is selected to repeatedly calculate scores for individual solutions in the population, then the individual solutions in the population are improved by adopting a sine and cosine iterative formula, and the optimal solution of the optimization problem is randomly searched for by the optimization method based on the population, so that the optimal solution is required to be found and operated through multiple iterations. The algorithm capability is improved by generating multiple initial solutions and increasing the number of iterations.
SCA includes the exploration and development phases. The SCA performs location updates during both the exploration and development phases using the following formulas:
Figure BDA0004151985240000061
Figure BDA0004151985240000062
Figure BDA0004151985240000063
in the above-mentioned method, the step of,
Figure BDA0004151985240000064
representing the position of the solution from iteration to the t time; />
Figure BDA0004151985240000065
Representing the value of the ith dimension target solution at iteration to the t-th time; r is (r) 1 For determining the area (or moving direction) of the next position as sine and cosine amplitude adjusting factor, for balancing exploration and development search space, and finally converging to global optimum value, using the secondThe formula is calculated, wherein a is a positive number (generally 2), k represents the current iteration number, k max Representing a maximum number of iterations; r is (r) 2 For a random number between (0, 2 pi), determining the distance to move toward or away from the target; r is (r) 3 Is a random number between (0, 2), randomly emphasizes (r 3 >1) Or decrease (r) 3 <1) The impact of the target on distance. The two iteratively updated formulas taken together can be written as:
Figure BDA0004151985240000066
wherein r is 4 The sine and cosine functions are chosen for random numbers between (0, 1) representing equal probability.
The SCA can enable the solving problem to be self-adaptive and non-convex, and has better robust optimization performance when solving the nonlinear constraint optimization problem. Its main advantage is less parameters and efficient global search mechanism. However, it is also possible to modify r 1 、r 2 、r 3 、r 4 The invention further improves the performance of SCA by adopting the following three improvement stages to update four parameters:
(1) The first improvement stage employs a reverse learning strategy to establish population diversification. The principle is as follows: assuming that the known number x is a random real number between (a, b), the method for calculating the inverse number x' is as follows:
x′=a+b-x
in general, when x i ={x i1 ,x i2 ,…,x ij X, where x ij Is (a) j ,b j ) Random number in between, j=1, 2, …, n. For candidate solution x i Its inverse vector is x '= { x' i1 ,x′ i2 ,…,x′ ij }, wherein
Figure BDA0004151985240000071
The positions of the individuals in the original population are
Figure BDA0004151985240000072
To increase the diversity of individuals of the population, the positions of individuals of the inverse population are increased by an inverse strategy +.>
Figure BDA0004151985240000073
(2) The second improvement stage utilizes the Levy flight strategy to maximize search efficiency and avoids premature algorithm convergence using the following formula:
Figure BDA0004151985240000074
Figure BDA0004151985240000075
wherein alpha is a step-size scaling factor; beta is a fixed constant of 1.5; p and q are two random variables, subject to the following distribution:
Figure BDA0004151985240000076
Figure BDA0004151985240000077
Figure BDA0004151985240000078
where Γ (·) is the gamma function.
(3) The third improvement stage utilizes chaotic mapping to modify key parameters of the SCA to increase convergence speed. In the invention, four chaotic maps of a sawtooth diagram, a sine diagram, a tent diagram and a chebyshev diagram are used for adjusting r of SCA 1 、r 2 、r 3 、r 4 Parameters. The formula expression of these chaotic maps is as follows:
Figure BDA0004151985240000081
on the basis, a CNN-ISCA-LSTM short-term photovoltaic power generation power prediction model is constructed. In order to further improve the prediction accuracy and the network performance, the super parameters of the CNN-LSTM are optimized by adopting the ISCA. The learning rate and dropout rate of the CNN-LSTM model are super parameters with continuous values, the batch size, the number of CNN or LSTM layers, the maximum iteration number and the neuron units are super parameters with discrete values, the ISCA continuously searches the solution space, a great deal of time is consumed, and the search time can be effectively reduced by converting the continuous values into the discrete values, so the conversion is performed by the following formula:
Figure BDA0004151985240000082
wherein x is ij Representing a continuous value; y is ij Representing discrete values; lb represents the lower boundary of the search space; ub denotes the lower boundary of the search space.
In the ISCA optimizing process, an initial solution set with n individuals is generated, wherein each solution is a multidimensional number vector and represents the number of parameters to be optimized. After the population and the chaotic map are initialized, the current position of the solution is continuously updated by adopting a reverse strategy and a Levy flight strategy, and the resource utilization balance can be obtained in the stage of algorithm exploration and development until the termination condition is met, wherein the obtained optimal solution is the optimal value of the CNN-LSTM super-parameter. To evaluate the effectiveness of each update strategy, the Mean Square Error (MSE) was used as the fitness function, calculated as:
Figure BDA0004151985240000083
wherein y is i Representing the actual value, y' representing the predicted value, n representing the number of samples predicted.
Specifically, the step S5: and (5) data inverse normalization and model prediction effect evaluation. Performing inverse normalization processing on the predicted photovoltaic power generation power data to make the photovoltaic power generation power data have physical significance, wherein the calculation formula is as follows:
x i (k)=x i ′(k)(x i,max -x i,min )+x i,min
by means of root mean square error (R MSE ) Average absolute error (M) AE ) Evaluating the model prediction effect, R MSE And M is as follows AE Smaller values indicate better prediction. The specific calculation expression is as follows:
Figure BDA0004151985240000091
Figure BDA0004151985240000092
wherein y is i Representing the actual value, y' representing the predicted value, and n representing the number of predicted samples.
The invention has the beneficial effects that: in the photovoltaic power generation power prediction work, the super-parameter setting of a prediction model needs to depend on a large amount of experience and manual setting, the prediction effect gap of models with different parameters is large, the prediction precision is not high enough, the model prediction effect is not stable enough under various working conditions, and in order to further improve the model prediction effect, the invention provides a method for carrying out short-term photovoltaic power generation power prediction by combining a CNN-LSTM model based on an improved sine and cosine algorithm, namely, the super-parameter optimization is carried out on the CNN-LSTM by using an ISCA, and through the verification of an example, the provided model can effectively improve the precision of the photovoltaic power generation power prediction and still has higher prediction stability under the working condition of non-sunny days.
Drawings
Fig. 1 is a structural diagram of a CNN neural network of the present invention;
FIG. 2 is a block diagram of an LSTM neural unit of the present invention;
FIG. 3 is a diagram of a combined model of CNN-LSTM of the present invention;
FIG. 4 is a modified SCA algorithm flow of the present invention;
FIG. 5 is a normalized box plot of raw data of the present invention;
FIG. 6 is a raw data characteristic correlation coefficient thermodynamic diagram of the present invention;
FIG. 7 is a comparison of the predicted effects of various models under sunny conditions of the present invention;
FIG. 8 is a comparison of the predicted effect of each model under non-sunny conditions of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. All other embodiments, which can be made by those skilled in the art without making any inventive effort, based on the embodiments of the present invention are within the scope of the present invention, and are specifically described below in connection with the embodiments.
The invention comprises the following steps:
s1: raw data are acquired and pre-processed. Poor data such as missing values and abnormal values caused by abnormality of the data acquisition device or the sensor may exist in the original data set, and preprocessing of the data is required. On the basis, the data is normalized, so that the reduction of prediction accuracy of the deep learning model due to different dimensions is avoided. The part specifically comprises:
(1) And filling missing values in the original data set by adopting a random forest algorithm. For a data set of m rows and n columns (n features, each feature sequence length is m), wherein the feature i contains a missing value, other feature data of the row where the missing value is located is taken as prediction input test data, the missing value is taken as an object to be predicted, the rest n-1 features do not contain the data of the row where the missing value is taken as training set features, the data which is not missing in the i is taken as training set labels, and then the model is trained and regression prediction is carried out on the missing value;
(2) And detecting abnormal data by using the box line graph and filling by adopting a random forest algorithm. And detecting the abnormal value of the original data by using a box diagram principle, then regarding the abnormal value as a missing value, and filling the missing value by using a random forest algorithm. The case diagram judges the condition of the abnormal value as follows:
Figure BDA0004151985240000111
wherein x is a Representing outliers; q (Q) 1 、Q 3 Respectively representing the upper quartile and the lower quartile of the box diagram; IQR represents the quarter-bit spacing, i.e., iqr=q 1 -Q 3
(3) And (5) data normalization processing. If the method is directly used for model training, the capability of the model for learning nonlinear characteristics is weakened, and the data needs to be normalized to the [0,1] interval. The data are normalized by adopting a min-max method, and the calculation formula is as follows:
Figure BDA0004151985240000112
wherein: x is x i (k) The original value, x, of the kth sample of feature i i,max 、x i,min Respectively the minimum value and the maximum value in the characteristic i, x i ' (k) is a normalized value.
S2: and carrying out correlation analysis on the original features and eliminating features with low correlation coefficients. The power variation of photovoltaic power generation is mainly affected by the variation of meteorological factors, such as solar irradiance, air temperature, humidity, air pressure and the like. In order to screen out the main characteristics affecting the photovoltaic output from a plurality of meteorological factors, the negative influence of factors with small relevance on the result is reduced, and the meteorological factors need to be subjected to relevance analysis.
The spearman correlation coefficient (Spearman Correlation) method provides a method for effectively judging vector similarity for processing under different variable value ranges. And selecting the pearson correlation coefficient to analyze the meteorological features so as to extract key information factors.
The calculation formula of the spearman correlation coefficient for the two sequences X and Y is:
Figure BDA0004151985240000113
wherein, in the formula: n is the length of the sequence, x i And y i The ith variable of sequences X and Y respectively,
Figure BDA0004151985240000114
and->
Figure BDA0004151985240000115
The average of sequences X and Y, respectively.
The value range of S is [ -1,1], and the closer S is to 0, the lower the correlation of two vectors is, the closer S is to-1, the negative correlation is, and the closer S is to 1, the positive correlation is.
S3: and establishing a CNN-LSTM hybrid prediction model, and determining the super parameters to be optimized.
(1) CNN is a deep neural network widely applied in recent years, and adopts a local connection and weight sharing mode to perform higher-level and more abstract processing on original data, so that deep features in a sequence can be automatically and effectively extracted, complexity of feature extraction and data reconstruction is reduced, quality of data features is improved, the CNN performs feature extraction by using a convolution layer, a pooling layer and a full connection layer, and a sketch of CNN extracted features is shown in fig. 1. The convolutional layer is used for extracting useful feature information in the input data, and the pooling layer is used for selecting the features extracted from the convolutional layer and reducing corresponding calculation.
When feature extraction is performed through CNN, the number N of convolution layers is increased c Multiple convolution kernels can be used to process the input data and generate feature maps of corresponding sizes, while the deeper the information is extracted, the more accurate the feature expression capability, the number of convolution kernels N in each convolution layer kernel And size S kernel The level of detail important to the extracted information is relevant. After the convolution layer extracts the characteristics, the pooling layer downsamples, then the generated characteristic diagram is output through the full-connection layer, the full-connection layer determines the target output according to the input variable characteristics, and the layer number N of the full-connection layer d Each layer ofNumber of neurons N CNN,neural Is the most important super parameter of the full connection layer. In addition to the above super parameters to be optimized, the method also comprises the step of processing the batch size S CNN,batch Learning rate L rate And thus may provide better convergence and more efficient gradient computation. Finally, network overfitting is prevented by L2 regularization and dropout.
(2) The recurrent neural network (Recurrent Neural Network, RNN) can better handle time series problems, but cannot solve long-term dependency problems, i.e. as the length of the input sequence increases, the model cannot use earlier data information in the sequence. The LSTM neural network replaces the neurons in the RNN hidden layer with the memory units with long-term memory effect, so that the long-term dependence problem can be effectively solved. In the LSTM model, the memory unit comprises 3 parts of a forgetting gate, an input gate and an output gate, the forgetting gate discards irrelevant information, the input gate determines new information stored in the unit state, the output gate controls the output of hidden layer nodes, and the gate control units enable the LSTM to have the capability of updating and controlling information flows in different blocks.
FIG. 2 is an LSTM unit structure, and FIG. 3 shows a structure diagram of a CNN-LSTM combination model. Internal state c of LSTM t Specially making linear circulation information transfer, at the same time non-linearly outputting information to external state of hidden layer, x t Input data representing the current time; c t-1 、c t Respectively representing the internal states of the previous moment and the current moment; h is a t-1 、h t Respectively representing the hidden layer states of the previous moment and the current moment; ☉ represents the multiplication of matrix elements,
Figure BDA0004151985240000131
representing matrix element addition; s is a Sigmoid activation function, and tanh is a tanh activation function; z is the cell state; z f Is in a forgetting door gating state and is used for controlling the internal state c at the last moment t-1 The amount of information that needs to be forgotten; z i The input gating state is represented, and how much information of the unit state z needs to be stored at the current moment is controlled; z o Representing the gate control state of the output gate, and controlling the internal state c at the current moment t How much information needs to be output to the external state h t . According to the information flow direction in the graph, the following formula holds:
z=tanh(W[x t h t-1 ] Τ )
z f =s(W f [x t h t-1 ] Τ )
z i =s(W i [x t h t-1 ] Τ )
z o =s(W o [x t h t-1 ] Τ )
Figure BDA0004151985240000132
h t =z o ⊙tanh(c t )
in W, W f 、W i W is provided o The weight matrices representing the cell state update, forget gate, input gate and output gate, respectively.
A total of 4 super parameters in LSTM neural network are to be optimized, namely the layer number N of LSTM l Maximum number of iterations N e,max Number of neurons per layer N LSTM,neural Batch size S LSTM,batch
S4: and (3) improving SCA and constructing a CNN-ISCA-LSTM short-term photovoltaic generation power prediction model. SCA is an intelligent evolutionary algorithm based on sine and cosine functions, which was proposed by Mirjalli in 2015, and accepted or self-generated a set of random solutions in the initialization phase, which constitute a population. In the next stage, a scoring function is selected to repeatedly calculate scores for individual solutions in the population, then the individual solutions in the population are improved by adopting a sine and cosine iterative formula, and the optimal solution of the optimization problem is randomly searched for by the optimization method based on the population, so that the optimal solution is required to be found and operated through multiple iterations. The algorithm capability is improved by generating multiple initial solutions and increasing the number of iterations.
SCA includes the exploration and development phases. The SCA performs location updates during both the exploration and development phases using the following formulas:
Figure BDA0004151985240000141
Figure BDA0004151985240000142
Figure BDA0004151985240000143
in the above-mentioned method, the step of,
Figure BDA0004151985240000144
representing the position of the solution from iteration to the t time; />
Figure BDA0004151985240000145
Representing the value of the ith dimension target solution at iteration to the t-th time; r is (r) 1 For determining the area (or moving direction) of the next position, for balancing the exploration and development search space, and finally converging to the global optimum, the second formula is used to calculate a positive number (generally 2), k represents the current iteration number, k max Representing a maximum number of iterations; r is (r) 2 For a random number between (0, 2 pi), determining the distance to move toward or away from the target; r is (r) 3 Is a random number between (0, 2), randomly emphasizes (r 3 >1) Or decrease (r) 3 <1) The impact of the target on distance. The two iteratively updated formulas taken together can be written as:
Figure BDA0004151985240000146
wherein r is 4 The sine and cosine functions are chosen for random numbers between (0, 1) representing equal probability.
SCA can enable solving problems to be self-adaptive and non-convex, and has better robust optimization when solving nonlinear constraint optimization problemsPerformance. Its main advantage is less parameters and efficient global search mechanism. However, it is also possible to modify r 1 、r 2 、r 3 、r 4 The invention further improves the performance of SCA by adopting the following three improvement stages to update four parameters:
(1) The first improvement stage employs a reverse learning strategy to establish population diversification. The principle is as follows: assuming that the known number x is a random real number between (a, b), the method for calculating the inverse number x' is as follows:
x′=a+b-x
in general, when x i ={x i1 ,x i2 ,…,x ij X, where x ij Is (a) j ,b j ) Random number in between, j=1, 2, …, n. For candidate solution x i Its inverse vector is x '= { x' i1 ,x′ i2 ,…,x′ ij }, wherein
Figure BDA0004151985240000151
The positions of the individuals in the original population are
Figure BDA0004151985240000152
To increase the diversity of individuals of the population, the positions of individuals of the inverse population are increased by an inverse strategy +.>
Figure BDA0004151985240000153
(2) The second improvement stage utilizes the Levy flight strategy to maximize search efficiency and avoids premature algorithm convergence using the following formula:
Figure BDA0004151985240000154
Figure BDA0004151985240000155
wherein alpha is a step-size scaling factor; beta is a fixed constant of 1.5; p and q are two random variables, subject to the following distribution:
Figure BDA0004151985240000156
Figure BDA0004151985240000157
Figure BDA0004151985240000158
where Γ (·) is the gamma function.
(3) The third improvement stage utilizes chaotic mapping to modify key parameters of the SCA to increase convergence speed. In the invention, four chaotic maps of a sawtooth diagram, a sine diagram, a tent diagram and a chebyshev diagram are used for adjusting r of SCA 1 、r 2 、r 3 、r 4 Parameters. The formula expression of these chaotic maps is as follows:
Figure BDA0004151985240000161
FIG. 4 shows an algorithm flow, and on the basis, a CNN-ISCA-LSTM short-term photovoltaic power generation power prediction model is constructed. In order to further improve the prediction accuracy and the network performance, the super parameters of the CNN-LSTM are optimized by adopting the ISCA. The learning rate and dropout rate of the CNN-LSTM model are super parameters with continuous values, the batch size, the number of CNN or LSTM layers, the maximum iteration number and the neuron units are super parameters with discrete values, the ISCA continuously searches the solution space, a great deal of time is consumed, and the search time can be effectively reduced by converting the continuous values into the discrete values, so the conversion is performed by the following formula:
Figure BDA0004151985240000162
wherein x is ij Representing a continuous value; y is ij Representing discrete values; lb represents the lower boundary of the search space; ub denotes the lower boundary of the search space.
In the ISCA optimizing process, an initial solution set with n individuals is generated, wherein each solution is a multidimensional number vector and represents the number of parameters to be optimized. After the population and the chaotic map are initialized, the current position of the solution is continuously updated by adopting a reverse strategy and a Levy flight strategy, and the resource utilization balance can be obtained in the stage of algorithm exploration and development until the termination condition is met, wherein the obtained optimal solution is the optimal value of the CNN-LSTM super-parameter. To evaluate the effectiveness of each update strategy, the Mean Square Error (MSE) was used as the fitness function, calculated as:
Figure BDA0004151985240000171
wherein y is i Representing the actual value, y' representing the predicted value, n representing the number of samples predicted.
S5: and (5) data inverse normalization and model prediction effect evaluation. Performing inverse normalization processing on the predicted photovoltaic power generation power data to make the photovoltaic power generation power data have physical significance, wherein the calculation formula is as follows:
x i (k)=x i ′(k)(x i,max -x i,min )+x i,min
by means of root mean square error (R MSE ) Average absolute error (M) AE ) Evaluating the model prediction effect, R MSE And M is as follows AE Smaller values indicate better prediction. The specific calculation expression is as follows:
Figure BDA0004151985240000172
Figure BDA0004151985240000173
wherein y is i Representing the actual value, y' representing the predicted value, and n representing the number of predicted samples.
The validity of the present invention is verified as follows:
the invention adopts public data from Australian solar energy center websites, including data from 1 month and 1 day in 2020 to 1 month and 1 day in 2021, the time resolution is 5min, 105120 pieces of data are divided into a training set, a verification set and a test set according to a total of 3:1:1, namely 63072 pieces of data are used as the training set, 21024 pieces of data are used as the verification set and 21024 pieces of data are used as the test set, and characteristic information contained in the data set is shown in table 1:
table 1 raw dataset characteristic information
Figure BDA0004151985240000174
Figure BDA0004151985240000181
(1) Data preprocessing
The obtained sample data does not contain missing values and abnormal values, so that the data is normalized, and a normalized box diagram is shown in fig. 5.
(2) Feature correlation analysis and screening
The features were subjected to spearman correlation analysis and a correlation coefficient thermodynamic diagram was drawn as shown in fig. 6. Taking the characteristic with the absolute value of the output power correlation larger than 0.5 as the input characteristic of model training, and as can be seen from the figure, the screened characteristic is as follows: AC. WT, GHR, DHR, RGT, RDT total 6 input features.
(3) Model parameter setting
Setting initial parameters of the proposed model, setting the initial population size of ISCA as 50, and the iteration number as 100, and setting the value range of the optimized super parameters as follows in order to reduce the influence of human factors: the learning rate interval is set to be (0.001,0.1), the batch size interval is set to be (10,100), the iteration number of each batch size training is set to be 20, the convolution kernel number interval is set to be (8,256), the convolution kernel size interval is set to be (1, 10), the dropout rate interval is set to be (0.2, 0.5), the Epoch size interval is set to be (1, 100), and the neural network training number interval is set to be (1, 500).
(4) Model validity verification
The CNN-LSTM model super parameters optimized by the ISCA algorithm are shown in table 2:
TABLE 2CNN-LSTM network model parameters
Figure BDA0004151985240000182
Figure BDA0004151985240000191
In order to verify the effectiveness of the proposed model method, the prediction effects of the LSTM, CNN-SCA-LSTM and CNN-ISCA-LSTM models under the two working conditions of sunny days and non-sunny days are compared, the prediction precision of the models is objectively evaluated, accidental error influence is avoided, 10 experiments are uniformly carried out on the prediction work under each working condition, the average value of 10 prediction values is taken as a final prediction result, error calculation analysis is carried out on the final prediction result and a true value, the prediction effect is shown in the accompanying drawings 7 and 8, the prediction error result is shown in the table 3, and when parameter optimization is not carried out, the super parameters are set according to the experience values. Under the working condition of sunny days, the R of the model is relative to LSTM, CNN-LSTM and CNN-SCA-LSTM MSE And M AE The values are smaller, R MSE The values were 11.83%, 10.29%, 2.47%, M AE The values are respectively 9.77%, 8.86% and 1.51%; under the condition of non-sunny days, the R of the model is relative to LSTM, CNN-LSTM and CNN-SCA-LSTM MSE And M AE The values are smaller, R MSE The values are respectively 13.27%, 10.85%, 2.55%, M AE The values were 11.91%, 10.16% and 2.61% smaller, respectively. The photovoltaic power generation power has small fluctuation under the sunny condition, and the photovoltaic power generation output has strong randomness and fluctuation due to weather factors under the non-sunny conditionIn the method, the prediction effect of each model is poor relative to the prediction effect of the model under the working condition of sunny days, the error fluctuation of the standard LSTM model is large, the error fluctuation of the other two models is small but exceeds 1%, the highest fluctuation of the error of the model is 1%, the method can effectively predict the photovoltaic power generation power with higher precision under the working condition of sunny days or non-sunny days, and the model has certain stability.
TABLE 3 evaluation index value
Figure BDA0004151985240000192
Through the analysis, the better prediction effect can be achieved after the super parameters of the CNN-LSTM are optimized by the SCA, meanwhile, the prediction precision can be further improved by the improvement method, and the model has certain stability in power prediction under the working conditions of sunny days and non-sunny days.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims (10)

1. A short-term photovoltaic power generation power prediction method based on a CNN-ISCA-LSTM model comprises the following steps:
s1: acquiring original data and preprocessing the data;
s2: carrying out correlation analysis on the original features and eliminating features with low correlation coefficients;
s3: establishing a CNN-LSTM hybrid prediction model, and determining super parameters to be optimized;
s4: and (3) improving the SCA algorithm and constructing a CNN-ISCA-LSTM short-term photovoltaic power generation power prediction model.
S5: and (5) data inverse normalization and model prediction effect evaluation.
2. The method for predicting short-term photovoltaic power generation power based on the CNN-ISCA-LSTM model according to claim 1, wherein in step S1, raw data is obtained and the data is preprocessed.
Poor data such as missing values and abnormal values caused by abnormality of the data acquisition device or the sensor may exist in the original data set, and preprocessing of the data is required. On the basis, the data is normalized, so that the reduction of prediction accuracy of the deep learning model due to different dimensions is avoided. The part specifically comprises:
(1) And filling missing values in the original data set by adopting a random forest algorithm. For a data set of m rows and n columns (n features, each feature sequence length is m), wherein the feature i contains a missing value, other feature data of the row where the missing value is located is taken as prediction input test data, the missing value is taken as an object to be predicted, the rest n-1 features do not contain the data of the row where the missing value is taken as training set features, the data which is not missing in the i is taken as training set labels, and then the model is trained and regression prediction is carried out on the missing value;
(2) And detecting abnormal data by using the box line graph and filling by adopting a random forest algorithm. And detecting the abnormal value of the original data by using a box diagram principle, then regarding the abnormal value as a missing value, and filling the missing value by using a random forest algorithm. The case diagram judges the condition of the abnormal value as follows:
Figure FDA0004151985230000011
wherein x is a Representing outliers; q (Q) 1 、Q 3 Respectively representing the upper quartile and the lower quartile of the box diagram; IQR represents the quarter-bit spacing, i.e., iqr=q 1 -Q 3
(3) And (5) data normalization processing. If the method is directly used for model training, the capability of the model for learning nonlinear characteristics is weakened, and the data needs to be normalized to the [0,1] interval. The data are normalized by adopting a min-max method, and the calculation formula is as follows:
Figure FDA0004151985230000021
wherein: x is x i (k) The original value, x, of the kth sample of feature i i,max 、x i,min Respectively the minimum value and the maximum value in the characteristic i, x' i (k) Is a normalized value.
3. The method for predicting short-term photovoltaic power generation power based on the CNN-ISCA-LSTM model according to claim 1, wherein in step S2, the original features are subjected to correlation analysis and features with low correlation coefficients are eliminated. The power variation of photovoltaic power generation is mainly affected by the variation of meteorological factors, such as solar irradiance, air temperature, humidity, air pressure and the like. In order to screen out the main characteristics affecting the photovoltaic output from a plurality of meteorological factors, the negative influence of factors with small relevance on the result is reduced, and the meteorological factors need to be subjected to relevance analysis.
The spearman correlation coefficient (Spearman Correlation) method provides a method for effectively judging vector similarity for processing under different variable value ranges. And selecting the pearson correlation coefficient to analyze the meteorological features so as to extract key information factors.
The calculation formula of the spearman correlation coefficient for the two sequences X and Y is:
Figure FDA0004151985230000022
wherein, in the formula: n is the length of the sequence, x i And y i The ith variable of sequences X and Y respectively,
Figure FDA0004151985230000023
and->
Figure FDA0004151985230000024
The average of sequences X and Y, respectively.
The value range of S is [ -1,1], and the closer S is to 0, the lower the correlation of two vectors is, the closer S is to-1, the negative correlation is, and the closer S is to 1, the positive correlation is.
4. The method for predicting short-term photovoltaic power generation power based on the CNN-ISCA-LSTM model according to claim 1, wherein in step S3, a CNN-LSTM hybrid prediction model is established, and a super parameter to be optimized is determined. (1) CNN is a deep neural network widely applied in recent years, and adopts a local connection and weight sharing manner to perform higher-level and more abstract processing on original data, so that deep features in a sequence can be automatically and effectively extracted, complexity of feature extraction and data reconstruction is reduced, quality of data features is improved, and the CNN performs feature extraction by using a convolution layer, a pooling layer and a full connection layer. The convolutional layer is used for extracting useful feature information in the input data, and the pooling layer is used for selecting the features extracted from the convolutional layer and reducing corresponding calculation.
Feature extraction by CNN is performed by increasing the number N of convolution layers c Multiple convolution kernels can be used to process the input data sequence and generate feature maps of corresponding sizes, while the deeper the information is extracted, the more accurate the feature expression capability, the number of convolution kernels N in each convolution layer kernel And size S kernel The level of detail important to the extracted information is relevant. After the convolution layer extracts the characteristics, the pooling layer downsamples, then the generated characteristic diagram is output through the full-connection layer, the full-connection layer determines the target output according to the input variable characteristics, and the layer number N of the full-connection layer d Number of neurons per layer N CNN,neural Is the most important super parameter of the full connection layer. In addition to the above super parameters to be optimized, the method also comprisesSize S of bracketing batch CNN,batch Learning rate L rate And thus may provide better convergence and more efficient gradient computation. Finally, network overfitting is prevented by L2 regularization and dropout.
(2) The recurrent neural network (Recurrent Neural Network, RNN) can better handle time series problems, but cannot solve long-term dependency problems, i.e. as the length of the input sequence increases, the model cannot use earlier data information in the sequence. The LSTM neural network replaces the neurons in the RNN hidden layer with the memory units with long-term memory effect, so that the long-term dependence problem can be effectively solved. In the LSTM model, the memory unit comprises 3 parts of a forgetting gate, an input gate and an output gate, the forgetting gate discards irrelevant information, the input gate determines new information stored in the unit state, the output gate controls the output of hidden layer nodes, and the gate control units enable the LSTM to have the capability of updating and controlling information flows in different blocks. The LSTM neural network has 4 super parameters to be optimized, namely the layer number N of the LSTM l Maximum number of iterations N e,max Number of neurons per layer N LSTM,neural Batch size S LSTM,batch
5. The method for predicting short-term photovoltaic power generation based on the CNN-ISCA-LSTM model according to claim 1, wherein in step S4, the SCA is improved and the CNN-ISCA-LSTM short-term photovoltaic power generation prediction model is constructed. SCA is an intelligent evolutionary algorithm based on sine and cosine functions, which was proposed by Mirjalli in 2015, and accepted or self-generated a set of random solutions in the initialization phase, which constitute a population. In the next stage, a scoring function is selected to repeatedly calculate scores for individual solutions in the population, then the individual solutions in the population are improved by adopting a sine and cosine iterative formula, and the optimal solution of the optimization problem is randomly searched for by the optimization method based on the population, so that the optimal solution is required to be found and operated through multiple iterations. The algorithm capability is improved by generating multiple initial solutions and increasing the number of iterations.
SCA includes the exploration and development phases. The SCA performs location updates during both the exploration and development phases using the following formulas:
Figure FDA0004151985230000041
Figure FDA0004151985230000042
Figure FDA0004151985230000043
in the above-mentioned method, the step of,
Figure FDA0004151985230000044
representing the position of the solution from iteration to the t time; />
Figure FDA0004151985230000045
Representing the value of the ith dimension target solution at iteration to the t-th time; r is (r) 1 For determining the area (or moving direction) of the next position, for balancing the exploration and development search space, and finally converging to the global optimum, the second formula is used to calculate a positive number (generally 2), k represents the current iteration number, k max Representing a maximum number of iterations; r is (r) 2 For a random number between (0, 2 pi), determining the distance to move toward or away from the target; r is (r) 3 Is a random number between (0, 2), randomly emphasizes (r 3 >1) Or decrease (r) 3 <1) The impact of the target on distance. The two iteratively updated formulas taken together can be written as:
Figure FDA0004151985230000051
wherein r is 4 The sine and cosine functions are chosen for random numbers between (0, 1) representing equal probability.
The SCA can enable the solving problem to be self-adaptive and non-convex, and has better robust optimization performance when solving the nonlinear constraint optimization problem. Its main advantage is less parameters and efficient global search mechanism. However, it is also possible to modify r 1 、r 2 、r 3 、r 4 The invention adopts three improvement stages to update four parameters.
6. The method for predicting short-term photovoltaic power generation power based on CNN-ISCA-LSTM model according to claims 1 and 5, wherein in step S4, further, when the SCA is improved, the first improvement stage adopts a reverse learning strategy to establish population diversity. The principle is as follows: assuming that the known number x is a random real number between (a, b), the method for calculating the inverse number x' is as follows:
x′=a+b-x
in general, when x i ={x i1 ,x i2 ,…,x ij X, where x ij Is (a) j ,b j ) Random number in between, j=1, 2, …, n. For candidate solution x i Its inverse vector is x '= { x' i1 ,x′ i2 ,…,x′ ij }, wherein
x′ ij =a j +b j -x ij
The positions of the individuals in the original population are
Figure FDA0004151985230000052
To increase the diversity of individuals of the population, the positions of individuals of the inverse population are increased by an inverse strategy +.>
Figure FDA0004151985230000053
7. The method for predicting short-term photovoltaic power generation power based on CNN-ISCA-LSTM model according to claims 1 and 5, wherein in step S4, further, when SCA is improved, the second improvement stage utilizes Levy flight strategy to maximize the search efficiency, and the following formula is used to avoid premature convergence of algorithm:
Figure FDA0004151985230000061
Figure FDA0004151985230000062
wherein alpha is a step-size scaling factor; beta is a fixed constant of 1.5; p and q are two random variables, subject to the following distribution:
Figure FDA0004151985230000063
Figure FDA0004151985230000064
Figure FDA0004151985230000065
where Γ (·) is the gamma function.
8. The method for predicting short-term photovoltaic power generation according to claims 1 and 5, wherein in step S4, further, when the SCA is improved, the third improvement stage uses chaotic mapping to modify the critical parameters of the SCA to increase the convergence rate. In the invention, four chaotic maps of a sawtooth diagram, a sine diagram, a tent diagram and a chebyshev diagram are used for adjusting r of SCA 1 、r 2 、r 3 、r 4 Parameters. The formula expression of these chaotic maps is as follows:
Figure FDA0004151985230000066
9. the method for predicting short-term photovoltaic power generation based on the CNN-ISCA-LSTM model according to claims 1 and 5, wherein in the step S4, the CNN-ISCA-LSTM short-term photovoltaic power generation prediction model is constructed on the basis of the above. In order to further improve the prediction accuracy and the network performance, the super parameters of the CNN-LSTM are optimized by adopting the ISCA. The learning rate and dropout rate of the CNN-LSTM model are super parameters with continuous values, the batch size, the number of CNN or LSTM layers, the maximum iteration number and the neuron units are super parameters with discrete values, the ISCA continuously searches the solution space, a great deal of time is consumed, and the search time can be effectively reduced by converting the continuous values into the discrete values, so the conversion is performed by the following formula:
Figure FDA0004151985230000071
wherein x is ij Representing a continuous value; y is ij Representing discrete values; lb represents the lower boundary of the search space; ub denotes the lower boundary of the search space.
In the ISCA optimizing process, an initial solution set with n individuals is generated, wherein each solution is a multidimensional number vector and represents the number of parameters to be optimized. After the population and the chaotic map are initialized, the current position of the solution is continuously updated by adopting a reverse strategy and a Levy flight strategy, and the resource utilization balance can be obtained in the stage of algorithm exploration and development until the termination condition is met, wherein the obtained optimal solution is the optimal value of the CNN-LSTM super-parameter. To evaluate the effectiveness of each update strategy, the Mean Square Error (MSE) was used as the fitness function, calculated as:
Figure FDA0004151985230000072
wherein y is i Representing the actual value, y' representing the predicted value, n representing the number of samples predicted.
10. The method for predicting short-term photovoltaic power generation power based on the CNN-ISCA-LSTM model according to claim 1, wherein the data inverse normalization and model prediction effect evaluation are performed according to step S5. Performing inverse normalization processing on the predicted photovoltaic power generation power data to make the photovoltaic power generation power data have physical significance, wherein the calculation formula is as follows:
x i (k)=x i ′(k)(x i,max -x i,min )+x i,min
by means of root mean square error (R MSE ) Average absolute error (M) AE ) Evaluating the model prediction effect, R MSE And M is as follows AE Smaller values indicate better prediction. The specific calculation expression is as follows:
Figure FDA0004151985230000081
Figure FDA0004151985230000082
wherein y is i Representing the actual value, y' representing the predicted value, and n representing the number of predicted samples.
CN202310321716.0A 2023-03-29 2023-03-29 CNN-ISCA-LSTM model-based short-term photovoltaic power generation power prediction method Withdrawn CN116345555A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310321716.0A CN116345555A (en) 2023-03-29 2023-03-29 CNN-ISCA-LSTM model-based short-term photovoltaic power generation power prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310321716.0A CN116345555A (en) 2023-03-29 2023-03-29 CNN-ISCA-LSTM model-based short-term photovoltaic power generation power prediction method

Publications (1)

Publication Number Publication Date
CN116345555A true CN116345555A (en) 2023-06-27

Family

ID=86885423

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310321716.0A Withdrawn CN116345555A (en) 2023-03-29 2023-03-29 CNN-ISCA-LSTM model-based short-term photovoltaic power generation power prediction method

Country Status (1)

Country Link
CN (1) CN116345555A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117574180A (en) * 2024-01-17 2024-02-20 华北电力大学 Fuel production and emission system data correlation control management system
CN117708716A (en) * 2024-02-05 2024-03-15 敏博科技(武汉)有限公司 Regression and time sequence fusion-based photovoltaic power generation power/quantity prediction method and equipment
CN117708716B (en) * 2024-02-05 2024-05-10 敏博科技(武汉)有限公司 Regression and time sequence fusion-based photovoltaic power generation power prediction method and equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117574180A (en) * 2024-01-17 2024-02-20 华北电力大学 Fuel production and emission system data correlation control management system
CN117574180B (en) * 2024-01-17 2024-03-19 华北电力大学 Fuel production and emission system data correlation control management system
CN117708716A (en) * 2024-02-05 2024-03-15 敏博科技(武汉)有限公司 Regression and time sequence fusion-based photovoltaic power generation power/quantity prediction method and equipment
CN117708716B (en) * 2024-02-05 2024-05-10 敏博科技(武汉)有限公司 Regression and time sequence fusion-based photovoltaic power generation power prediction method and equipment

Similar Documents

Publication Publication Date Title
CN109165774A (en) A kind of short-term photovoltaic power prediction technique
CN107609667B (en) Heat supply load prediction method and system based on Box _ cox transformation and UFCNN
CN110942205A (en) Short-term photovoltaic power generation power prediction method based on HIMVO-SVM
CN112232577B (en) Power load probability prediction system and method for multi-core smart meter
CN112100911B (en) Solar radiation prediction method based on depth BILSTM
CN114462718A (en) CNN-GRU wind power prediction method based on time sliding window
CN113326969A (en) Short-term wind speed prediction method and system based on improved whale algorithm for optimizing ELM
CN112733997A (en) Hydrological time series prediction optimization method based on WOA-LSTM-MC
CN114219181A (en) Wind power probability prediction method based on transfer learning
CN116644970A (en) Photovoltaic power prediction method based on VMD decomposition and lamination deep learning
CN114897129A (en) Photovoltaic power station short-term power prediction method based on similar daily clustering and Kmeans-GRA-LSTM
CN112381282A (en) Photovoltaic power generation power prediction method based on width learning system
CN116345555A (en) CNN-ISCA-LSTM model-based short-term photovoltaic power generation power prediction method
CN117117859B (en) Photovoltaic power generation power prediction method and system based on neural network
CN114004152A (en) Multi-wind-field wind speed space-time prediction method based on graph convolution and recurrent neural network
CN116894384B (en) Multi-fan wind speed space-time prediction method and system
CN112418504A (en) Wind speed prediction method based on mixed variable selection optimization deep belief network
CN116845875A (en) WOA-BP-based short-term photovoltaic output prediction method and device
CN116681154A (en) Photovoltaic power calculation method based on EMD-AO-DELM
CN113449466B (en) Solar radiation prediction method and system for optimizing RELM based on PCA and chaos GWO
CN116090635A (en) Meteorological-driven new energy generation power prediction method
CN115481788A (en) Load prediction method and system for phase change energy storage system
Wu et al. Optimizing CNN-LSTM Model for Short-term PV Power Prediction using Northern Goshawk Optimization
CN116662766B (en) Wind speed prediction method and device based on data two-dimensional reconstruction and electronic equipment
CN116842855B (en) Distributed photovoltaic power distribution network output prediction method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20230627