CN116245259A

CN116245259A - Photovoltaic power generation prediction method and device based on depth feature selection and electronic equipment

Info

Publication number: CN116245259A
Application number: CN202310526420.2A
Authority: CN
Inventors: 李维虎; 常苏; 吴晓尧; 裴永锋; 王晓磊; 郝敬国; 牛常宁; 王霞; 陈伟
Original assignee: Zhongtai Power Plant Of Huaneng Shandong Power Generation Co ltd
Current assignee: Huaneng Shandong Taifeng New Energy Co ltd; Zhongtai Power Plant Of Huaneng Shandong Power Generation Co ltd
Priority date: 2023-05-11
Filing date: 2023-05-11
Publication date: 2023-06-09
Anticipated expiration: 2043-05-11
Also published as: CN116245259B

Abstract

The invention provides a photovoltaic power generation prediction method and device based on depth feature selection and electronic equipment, and relates to the technical field of photovoltaic power generation prediction, comprising the following steps: acquiring meteorological data of a photovoltaic power station at a moment to be processed; performing feature extraction processing on meteorological data by using a feature extraction model to obtain target data features; and processing the target data characteristics by using a target extreme learning machine classifier to obtain a photovoltaic power generation amount interval corresponding to the meteorological data at the moment to be processed. The output weight of the hidden layer in the target extreme learning machine classifier is determined by an improved longicorn optimization algorithm; when the speed of the longicorn individual is updated by the improved longicorn optimization algorithm, the position data of the longicorn leading individual is used for replacing the historically optimal position data of all the longicorn individuals, and the longicorn leading individual is the longicorn individual with the largest fitness value in each iteration process. The technical problem of low prediction precision in the existing photovoltaic power generation prediction method is effectively solved.

Description

Photovoltaic power generation prediction method and device based on depth feature selection and electronic equipment

Technical Field

The invention relates to the technical field of photovoltaic power generation prediction, in particular to a photovoltaic power generation prediction method and device based on depth feature selection and electronic equipment.

Background

Photovoltaic power generation is a main renewable energy form, has the advantages of environmental protection, renewable energy, distributed energy and the like, and has become one of important choices for developing new energy in various countries in the world. However, the power generation amount of photovoltaic power generation has a strong uncertainty, and is affected by various factors such as weather conditions, seasonal variations, and the like. Therefore, how to accurately predict the photovoltaic power generation amount becomes a key to improve the photovoltaic power generation efficiency and the operation benefit. The traditional photovoltaic power generation prediction method mainly adopts a statistical method, such as regression analysis, time sequence analysis and the like. However, these methods are relatively simple in data processing, resulting in lower photovoltaic power generation prediction accuracy.

Disclosure of Invention

The invention aims to provide a photovoltaic power generation prediction method, a device and electronic equipment based on depth feature selection, so as to solve the technical problem of low prediction precision of the existing photovoltaic power generation prediction method.

In a first aspect, the present invention provides a photovoltaic power generation prediction method based on depth feature selection, including: acquiring meteorological data of a photovoltaic power station at a moment to be processed; performing feature extraction processing on the meteorological data by using a feature extraction model to obtain target data features; processing the target data characteristics by using a target extreme learning machine classifier to obtain a photovoltaic power generation amount interval corresponding to meteorological data at the moment to be processed; the output weight of the hidden layer in the target extreme learning machine classifier is determined by an improved longicorn optimization algorithm; when the speed of the improved longhorn beetle individual is updated, the position data of the longhorn beetle leading individual is used for replacing the historically optimal position data of all the longhorn beetle individuals, and the longhorn beetle leading individual is the longhorn beetle individual with the largest fitness value in each iteration process.

In an alternative embodiment, the feature extraction model includes: a target multi-branch LSTM model; the method further comprises the steps of: acquiring a training sample set; the training sample set comprises a plurality of meteorological sample data, and each meteorological sample data has a corresponding sample label; each sample tag uniquely corresponds to one photovoltaic power generation capacity interval, and no intersection exists between different photovoltaic power generation capacity intervals; sequentially performing data cleaning treatment and normalization treatment on meteorological sample data in the training sample set to obtain a first sample set; performing sample expansion processing on the first sample set to obtain a second sample set; training an initial multi-branch LSTM model based on the second sample set to obtain the target multi-branch LSTM model; each LSTM branch in the initial multi-branch LSTM model uniquely processes meteorological sample data of one sample label, and the total number of the LSTM branches in the initial multi-branch LSTM model is the same as the total number of the sample labels.

In an alternative embodiment, performing sample expansion processing on the first sample set includes: calculating a target distance between first and second weather sample data in the first set of samples; wherein the first and second weather sample data represent any two weather sample data in the first sample set; the target distance includes: euclidean distance, manhattan distance, and cosine distance; normalizing the target distance to obtain a normalized target distance; weighting the normalized target distance to obtain the comprehensive distance between the first meteorological sample data and the second meteorological sample data; determining a nearest neighbor set for each of the weather sample data based on a comprehensive distance between the weather sample data in the first sample set; and performing sample expansion processing based on all meteorological sample data in the first sample set and the corresponding nearest neighbor set to obtain the second sample set.

In an alternative embodiment, the sample expansion process is performed based on all meteorological sample data in the first sample set and the corresponding nearest neighbor set, including: calculating the weight of each nearest neighbor sample in the nearest neighbor set of the target meteorological sample data; wherein the target weather sample data represents any weather sample data in the first sample set; determining a target synthetic sample based on the target weather sample data, a target nearest neighbor sample, and a weight of the target nearest neighbor sample; wherein the target nearest neighbor sample represents any nearest neighbor sample in a nearest neighbor set of the target meteorological sample data; the second sample set is constructed based on the first sample set and all synthetic samples.

In an alternative embodiment, the loss function used to train the initial multi-branch LSTM model is:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein Tr represents the total number of LSTM branches in the initial multi-branch LSTM model, ++>

Sample label, ->

A prediction sample tag representing the output of the multi-branch LSTM model,

delta represents a preset threshold value, < ->

Task weight representing the t-th LSTM branch,/->

，N ₁ Indicating the accuracy of the t-th LSTM branch, N ₂ Representing the precision, N, of the t-th LSTM branch ₃ Let denote the recall of the tth LSTM branch,>

、/>

、/>

is a weighting coefficient and->

=1，

Representing an adaptive regularization term +.>

，/>

Representing the parameters of the full connection layer +.>

Norms (F/F)>

Expressing a preset constant, < >>

Regularization coefficient representing the t-th LSTM branch,/->

。

In an alternative embodiment, the method further comprises: performing feature extraction processing on meteorological sample data in the second sample set by using the target multi-branch LSTM model to obtain sample data features of each meteorological sample data; training an initial extreme learning machine classifier based on sample data features and sample labels of all meteorological sample data to obtain the target extreme learning machine classifier; wherein the initial extreme learning machine classifier uses a uniform distribution to initialize input weights and bias terms.

In an alternative embodiment, the method further comprises: initializing position data and speed data of an individual longicorn; the parameter space value corresponding to the position data of each longicorn individual is the input weight and the bias item of the extreme learning machine classifier; repeating the following steps until the preset iteration times are reached, and taking the output weights corresponding to the historically optimal position data of all the longicorn individuals as the output weights of the hidden layers in the target extreme learning machine classifier: calculating an fitness value of a target longicorn individual based on position data and speed data of the target longicorn individual, and determining the longicorn leader individual based on the fitness value; wherein the target longicorn individual represents any of the longicorn individuals; updating speed data and position data of the target longicorn individual based on the position data of the longicorn lead individual; calculating an output value of the target sample data characteristic in the hidden layer according to the updated position data of the longicorn individual; wherein the target sample data characteristic represents a sample data characteristic of any of the meteorological sample data; and calculating the output weight of the hidden layer in the extreme learning machine classifier based on the output value of the hidden layer and the sample label corresponding to the target sample data characteristic.

In a second aspect, the present invention provides a photovoltaic power generation prediction apparatus based on depth feature selection, comprising: the first acquisition module is used for acquiring meteorological data of the photovoltaic power station at the moment to be processed; the feature extraction module is used for carrying out feature extraction processing on the meteorological data by utilizing a feature extraction model to obtain target data features; the prediction module is used for processing the target data characteristics by using a target extreme learning machine classifier to obtain a photovoltaic power generation capacity interval corresponding to the meteorological data at the moment to be processed; the output weight of the hidden layer in the target extreme learning machine classifier is determined by an improved longicorn optimization algorithm; when the speed of the improved longhorn beetle individual is updated, the position data of the longhorn beetle leading individual is used for replacing the historically optimal position data of all the longhorn beetle individuals, and the longhorn beetle leading individual is the longhorn beetle individual with the largest fitness value in each iteration process.

In a third aspect, the present invention provides an electronic device, including a memory, a processor, where the memory stores a computer program executable on the processor, and the processor implements the steps of the photovoltaic power generation prediction method according to any one of the foregoing embodiments based on depth feature selection when the processor executes the computer program.

In a fourth aspect, the present invention provides a computer readable storage medium storing computer instructions that, when executed by a processor, implement a depth feature selection based photovoltaic power generation prediction method according to any one of the preceding embodiments.

The depth feature selection-based photovoltaic power generation prediction method provided by the invention adopts a deep learning technology, performs feature extraction processing on meteorological data at the moment to be processed by utilizing a feature extraction model, and processes target data features by adopting the target extreme learning machine classifier based on the improved longhorn beetle optimization algorithm.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a photovoltaic power generation prediction method based on depth feature selection according to an embodiment of the present invention;

FIG. 2 is a flowchart of a sample expansion process for a first sample set according to an embodiment of the present invention;

FIG. 3 is a functional block diagram of a photovoltaic power generation prediction device based on depth feature selection according to an embodiment of the present invention;

fig. 4 is a schematic diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Some embodiments of the present invention are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

Photovoltaic power generation has become a mainstream renewable energy source worldwide, and is widely used in the fields of home, industry, business, and the like. Photovoltaic power generation is a technology for converting light energy into electric energy by utilizing solar energy, and the basic principle is that solar radiation is converted into direct current by a photovoltaic cell and is converted into alternating current by an inverter for power supply.

The photovoltaic power generation prediction refers to predicting photovoltaic power generation capacity in a future period of time through analysis and processing of various factors such as historical data and meteorological data. The short-term photovoltaic power generation prediction refers to the prediction of photovoltaic power generation in the future from hours to days, and has very important significance. Firstly, the short-term photovoltaic power generation prediction can help power companies and power grid managers to conduct real-time scheduling and operation management so as to ensure stable operation of a power system. Secondly, short-term photovoltaic power generation prediction can also provide important references for management and management of a photovoltaic power station so as to optimize the operation efficiency and economic benefit of a photovoltaic power generation system. Therefore, the research on the short-term photovoltaic power generation prediction method has very important application value.

The traditional prediction method mainly adopts a statistical method, such as regression analysis, time series analysis and the like. These methods are relatively simple to process on the data, resulting in lower prediction accuracy. In view of the above, the embodiment of the invention provides a photovoltaic power generation prediction method based on depth feature selection, so as to improve the prediction accuracy of photovoltaic power generation.

Example 1

Fig. 1 is a flowchart of a photovoltaic power generation prediction method based on depth feature selection according to an embodiment of the present invention, as shown in fig. 1, the method specifically includes the following steps:

step S102, meteorological data of the photovoltaic power station at the moment to be processed is obtained.

In photovoltaic power generation prediction, factors influencing photovoltaic power generation output are many, and in order to ensure accuracy of photovoltaic power generation prediction, when predicting the generated energy of a photovoltaic power station, acquired meteorological data of the photovoltaic power station at a moment to be processed at least need to include: solar radiation intensity, temperature, wind speed and humidity.

And step S104, performing feature extraction processing on the meteorological data by using a feature extraction model to obtain target data features.

After the meteorological data is obtained, the embodiment of the invention utilizes the trained feature extraction model to carry out feature extraction processing on the meteorological data so as to obtain the target data features corresponding to the meteorological data. The feature extraction model uses a large amount of meteorological data and corresponding photovoltaic power generation data during training, and learns the mapping relation between the data features of the meteorological data and the photovoltaic power generation, so that after model training is finished, the feature extraction model can accurately extract the data features of the meteorological data. Compared with the traditional statistical method, the feature extraction method has the advantages that the feature extraction model for deep learning is utilized for feature extraction, and the feature expression capacity can be effectively improved. The embodiment of the invention does not specifically limit the type of the feature extraction model, and a user can select according to actual requirements.

And S106, processing the target data characteristics by using a target extreme learning machine classifier to obtain a photovoltaic power generation amount interval corresponding to the meteorological data at the moment to be processed.

After the target data characteristics corresponding to the meteorological data are obtained, the target data characteristics are used as input data of a target extreme learning machine classifier, and the output of the target extreme learning machine classifier is used as a photovoltaic power generation amount prediction result. In the embodiment of the invention, the photovoltaic power generation amount prediction result of the target extreme learning machine classifier is specifically a photovoltaic power generation amount interval, when the classifier is trained, the value range of the photovoltaic power generation amount is firstly divided into a plurality of photovoltaic power generation amount intervals, no intersection exists among different photovoltaic power generation amount intervals, then a corresponding power generation amount label is distributed for each photovoltaic power generation amount interval, and finally, the initial extreme learning machine classifier is trained based on a large number of sample data characteristics (of meteorological sample data) marked with the corresponding power generation amount label, so that the classifier capable of accurately predicting the power generation amount label according to the data characteristics of the meteorological data can be obtained. Firstly, distributing corresponding generating capacity labels for each photovoltaic generating capacity interval, and then training by using the labels is also applicable to training of the feature extraction model.

Further, in order to solve the problem of insufficient performance of the existing extreme learning machine classifier, the embodiment of the invention provides a scheme for optimizing the performance of the extreme learning machine classifier by utilizing an improved longhorn beetle optimization algorithm, and specifically, in the embodiment of the invention, the output weight of a hidden layer in the target extreme learning machine classifier is determined by the improved longhorn beetle optimization algorithm; when the speed of the longicorn individual is updated by the improved longicorn optimization algorithm, the position data of the longicorn leading individual is used for replacing the historically optimal position data of all the longicorn individuals, and the longicorn leading individual is the longicorn individual with the largest fitness value in each iteration process. The improved longicorn optimization algorithm can better simulate the behaviors of individual longicorn in actual life, and further can more accurately find the global optimal solution, so that the target extreme learning machine classifier with high accuracy and strong generalization capability is obtained.

The photovoltaic power generation prediction method based on depth feature selection provided by the embodiment of the invention adopts a deep learning technology, performs feature extraction processing on meteorological data at the moment to be processed by utilizing a feature extraction model, and processes the target data features by adopting the target extreme learning machine classifier based on the improved longhorn beetle optimization algorithm.

In an alternative embodiment, the feature extraction model includes: a target multi-branch LSTM model; the method of the invention further comprises the following steps:

step S201, a training sample set is acquired.

The training sample set comprises a plurality of meteorological sample data, and each meteorological sample data has a corresponding sample label; each sample label uniquely corresponds to one photovoltaic power generation capacity interval, and no intersection exists between different photovoltaic power generation capacity intervals.

Specifically, the weather sample data refers to historical weather data for training, which includes data dimensions identical to those of the weather data in the step S102, and also includes at least: solar radiation intensity, temperature, wind speed and humidity. The sample label of each meteorological sample data, that is, the generated energy label introduced above, different sample labels represent different photovoltaic generated energy intervals, for example, the sample label corresponding to the photovoltaic generated energy interval [ a, B) is 1, the sample label corresponding to the photovoltaic generated energy interval [ B, C) is 2, the sample label corresponding to the photovoltaic generated energy interval [ C, D) is 3, and so on. Assuming that the photovoltaic power generation amount corresponding to certain meteorological sample data is X, and X is located in a section [ C, D ], the sample label of the meteorological sample data is 3.

Step S202, sequentially performing data cleaning processing and normalization processing on meteorological sample data in a training sample set to obtain a first sample set.

The data in the photovoltaic power generation prediction application mostly has the conditions of redundancy, deficiency, error and the like, and the purpose of adopting data cleaning processing and normalization processing is to ensure the quality of training data so as to meet the task of building a subsequent learning model.

When the data cleaning process is performed, the data cleaning process can be specifically divided into the following three aspects:

in a first aspect, redundant data or duplicate data is directly deleted, wherein the redundant data represents data that can be obtained by simple linear transformation.

In a second aspect, missing data is processed in a mean-shift manner. Specifically, the average value is filled in, namely, the information of the attribute features is analyzed, whether the attribute features belong to numerical features or not is judged, if the attribute features belong to the numerical features, the average value of the attribute in all other remaining objects is calculated, and the average value is interpolated to a gap; if the characteristic is a non-numerical characteristic, counting the occurrence times of the attribute characteristic in other objects by adopting a mode principle in statistics, calculating the frequency, and finally selecting the value with the largest occurrence time to perform interpolation work.

In a third aspect, error value detection is performed using a chi-square test method that is non-parametric verification. Specifically, the deviation value between suspected error data and normal data is calculatedThe row statistics show that the higher the deviation value is, the higher the possibility of data abnormality is, and the lower the deviation value is, the higher the possibility that the data is normal is. The formula of chi-square statistics is as follows:

wherein->

For normal data value, ++>

Is a suspected error data value, ">

Is the degree of difference between normal data and suspected erroneous data. If the difference degree is large, the error data can be identified to be directly removed.

After data cleaning processing is carried out on meteorological sample data in the training sample set, normalization processing is carried out so as to better reflect the relation between the data and the predicted result and reduce the influence of different orders on the predicted result. Optionally, the embodiment of the invention adopts a range normalization method to normalize the data:

wherein, the method comprises the steps of, wherein,

、/>

represents the maximum and minimum values in the same data sample, respectively,/->

Representing data before normalization ++>

Representing the data after normalization. And processing the meteorological sample data in the training sample set according to the method to obtain a first sample set.

Step S203, performing sample expansion processing on the first sample set to obtain a second sample set.

In order to avoid the technical problem that the performance of the trained model is poor in certain photovoltaic power generation intervals due to sample imbalance. According to the embodiment of the invention, after the first sample set is obtained, sample expansion is further carried out on the basis of the first sample set, so that a second sample set with relatively balanced samples is obtained, and the prediction effect of the model is ensured. The embodiment of the invention does not limit the sample expansion method specifically, and a user can select according to actual requirements.

Step S204, training the initial multi-branch LSTM model based on the second sample set to obtain a target multi-branch LSTM model.

Each LSTM branch in the initial multi-branch LSTM model uniquely processes meteorological sample data of one sample label, and the total number of the LSTM branches in the initial multi-branch LSTM model is the same as the total number of the sample labels.

When the characteristics of the photovoltaic power generation output and the meteorological data are extracted, various deep learning models such as a Convolutional Neural Network (CNN), a cyclic neural network (RNN), a long and short time memory network (LSTM) and the like can be selected. In short-term photovoltaic power generation predictions, cyclic neural networks and long-and-short term memory networks are typically used for modeling due to the special nature of time series data. The embodiment of the invention provides an improved LSTM deep learning model for extracting characteristics of photovoltaic power generation output and meteorological data.

Conventional feature extraction methods require manual design of features of time series data, but such methods are not only time consuming and laborious, but may not be able to discover potential features in the data. Thus, there is a need for an automated feature extraction algorithm that can extract useful features from raw data while being applicable to different types of data (i.e., different sample tags). The embodiment of the invention provides an improved LSTM deep learning feature extraction algorithm (namely a target multi-branch LSTM model), which mainly aims at processing meteorological sample data with different sample labels through different LSTM branches and finally carrying out generating capacity prediction through a full connection layer. Compared with a model architecture with only one LSTM branch, the design architecture of the multi-branch LSTM model in the embodiment of the invention can effectively improve the feature extraction capability of the model, and is further beneficial to improving the classification precision.

Therefore, after the second sample set is obtained, the second sample set can be used to train the initial multi-branch LSTM model, so as to obtain the target multi-branch LSTM model. If the sample tags in the embodiment of the invention are Q in total, the target multi-branch LSTM model corresponds to Q LSTM branches, and each LSTM branch is only used for processing meteorological sample data of the same sample tag.

In an alternative embodiment, as shown in fig. 2, in step S203, sample expansion processing is performed on the first sample set, and specifically includes the following steps:

in step S2031, a target distance between the first weather sample data and the second weather sample data in the first sample set is calculated.

Wherein the first and second weather sample data represent any two weather sample data in the first sample set; the target distance includes: euclidean distance, manhattan distance, and cosine distance.

For short-term photovoltaic power generation prediction tasks, the number and quality of training samples have a crucial influence on model prediction effects. Existing sample expansion methods such as SMOTE (Synthetic Minority Over-sampling Technique) are mainly directed to classification tasks and therefore need to be modified to meet the requirements of photovoltaic power generation prediction tasks. According to the embodiment of the invention, the training samples of the photovoltaic power generation prediction task are subjected to sample expansion by using an improved SMOTE algorithm, and the distance between meteorological sample data in the first sample set is calculated. Constructing meteorological sample data after data cleaning and normalization as feature vectors, and recording as

The corresponding photovoltaic power generation amount is +. >

I takes on values from 1 to n, n representing the number of meteorological samples in the first sample setThe total amount of data. Then the first sample set may be expressed as:

。

in order to better capture the similarity between samples, the embodiment of the invention introduces various distance measurement methods when calculating the distance between meteorological sample data, and the method comprises the following steps: euclidean distance, manhattan distance, and cosine distance. For a given first gas image sample data

And second meteorological sample data->

The characteristic vectors are respectively

And->

Where d represents the feature dimension. />

Representing first weather sample data->

Feature vector +.>

Is the kth feature of->

Representing second meteorological sample data->

Feature vector +.>

Is the kth feature of (c).

Then the Euclidean distance between the first meteorological sample data and the second meteorological sample data is expressed as:

the method comprises the steps of carrying out a first treatment on the surface of the The manhattan distance is expressed as: />

The method comprises the steps of carrying out a first treatment on the surface of the The cosine distance is expressed as: />

. Wherein (1)>

Representation->

And->

Dot product of->

And->

Respectively indicate->

And->

Is a mould length->

，/>

，/>

。

Step S2032, performing normalization processing on the target distance to obtain a normalized target distance.

Specifically, when the distance values obtained by different distance measurement methods are normalized, the following formulas are adopted:

wherein- >

The distance after normalization is indicated as such,

representing the distance before normalization, ++>

Representing the minimum value of the distance +.>

Representing the maximum value of the distance.

Step S2033, performing weighting processing on the normalized target distance to obtain a comprehensive distance between the first meteorological sample data and the second meteorological sample data.

After obtaining the normalized target distance, taking the weighted sum of the normalized distance values as the comprehensive distance between the first meteorological sample data and the second meteorological sample data, wherein the calculation formula of the comprehensive distance is as follows:

wherein, the method comprises the steps of, wherein,

represents normalized Euclidean distance, +.>

Representing the normalized Manhattan distance, +.>

Representing normalized cosine distance,/>

，/>

，/>

Weights representing the corresponding distance measurement methods and satisfy

。

Step S2034, determining a nearest neighbor set for each meteorological sample data based on the integrated distance between the meteorological sample data in the first set of samples.

For the first gas image sample data

According to the calculated comprehensive distance +.>

Can determine K nearest neighbors in the first sample set D, and the nearest neighbor set formed by the K nearest neighbors is recorded as

So that for any->

There is->

Wherein->

. According to the method, a corresponding nearest neighbor set can be determined for each meteorological sample data.

Step S2035, performing sample expansion processing based on all meteorological sample data in the first sample set and the corresponding nearest neighbor set, to obtain a second sample set.

After determining the corresponding nearest neighbor set for each meteorological sample data in the first sample set, sample expansion can be completed based on each meteorological sample data and the corresponding nearest neighbor set, and new meteorological sample data is synthesized by sample expansion setting, so that a second sample set consisting of the first sample set and the synthesized sample together is obtained. According to the embodiment of the invention, through integrating a plurality of distance measurement methods, the generated synthetic sample can be more in line with sample similarity characteristics under different distance measurement, so that the prediction effect of the short-term photovoltaic power generation prediction model is further improved.

In an optional embodiment, the step S2035 performs a sample expansion process based on all weather sample data in the first sample set and the corresponding nearest neighbor set, and specifically includes the following steps:

in step S20351, a weight of each nearest neighbor sample in the nearest neighbor set of target weather sample data is calculated.

Wherein the target weather sample data represents any weather sample data in the first sample set.

Specifically, during sample expansion, the target meteorological sample data is aimed at

Is->

Firstly, calculating the weight of each meteorological sample data in the set according to the following formula: />

Wherein->

Representing meteorological sample data->

In the target meteorological sample data->

Is->

Is a weight of (a).

In step S20352, a target synthetic sample is determined based on the target weather sample data, the target nearest neighbor sample, and the weight of the target nearest neighbor sample.

Wherein the target nearest neighbor sample represents any nearest neighbor sample in the nearest neighbor set of target meteorological sample data.

Specifically, for target weather sample data

First select its nearest neighbor set +.>

Meteorological sample data->

Then calculate the initial synthetic sample +.>

And corresponding photovoltaic power generation +>

The calculation formula is as follows: />

；/>

. Wherein (1)>

Represents the weight factor coefficient, and->

。

Then, the embodiment of the invention introduces a local search factor

Optimizing the initial synthetic sample by using a gradient descent method: />

；/>

. Wherein L () is a square loss function, +.>

For the sample

Ladder for L ()Degree (f)>

For sample tag->

Gradient with respect to L ().

Finally, in order to improve the quality of the synthesized sample, by comparing the loss value of the initial synthesized sample with the loss value of the synthesized sample after the local search optimization, a sample with a smaller loss value is selected as the final synthesized sample, that is,

. Wherein,,

representing target weather sample data->

And nearest neighbor set->

Meteorological sample data

Based on the above processing method, the target weather sample data can be obtained>

And a plurality of synthesized samples of all meteorological sample data in the nearest neighbor set after sample expansion processing.

According to the methods from step S20351 to step S20352, each meteorological sample data and the corresponding nearest neighbor set are processed respectively, so as to obtain all the synthesized samples. According to the embodiment of the invention, the local search strategy is introduced, so that the generated synthetic sample can better meet the actual requirements of the photovoltaic power generation prediction task, and the model prediction effect is further improved.

Step S20353, constructing a second sample set based on the first sample set and all the synthesized samples.

If m meteorological sample data are in the second sample set after sample expansion, and the feature dimension of the feature vector x of each meteorological sample data is d. Based on the above description, embodiments of the present invention utilize a multi-branch LSTM model for feature extraction of sample data, where each LSTM branch processes one type of data. Each branch consists of an LSTM layer and a fully connected layer.

If the input data of the target LSTM branch p (i.e., any LSTM branch in the multi-branch LSTM model) is meteorological sample data

Wherein->

. Then the output of the LSTM layer of the target LSTM branch is

Wherein->

Dimension representing LSTM hidden state, +.>

Represents the +.about.th of the target LSTM branch p>

The LSTM layer hides the output value of the state. Output of the LSTM layer to which the full link layer of the target LSTM branch is to be connected +.>

Mapping to a +.>

In the feature space of the dimension, i.e.)>

Wherein->

Weights of full connection layer representing target LSTM branch p, +.>

Representing the bias of the fully connected layer of the target LSTM branch p.

When the initial multi-branch LSTM model is trained, as the feature extraction model LSTM is provided with a plurality of branches, the traditional loss function is commonly used as a loss function, but the traditional loss function is difficult to be well applied to multi-type network branches, and the effect of feature extraction is easy to be negatively influenced. To cope with a multi-branched LSTM network structure, in an alternative implementation, the loss function used by the embodiment of the present invention to train the initial multi-branched LSTM model is:

。

where Tr represents the total number of LSTM branches in the initial multi-branch LSTM model,

Sample label, ->

Predictive sample tag representing multi-branch LSTM model output,/->

Delta represents a preset threshold, and when the error of the sample is less than or equal to delta, a square difference loss function is used, otherwise, an absolute value loss function is used. Thus, the square difference loss function is more suitable for the non-outlier points; for outliers, it is more appropriate to use an absolute value loss function.

Task weight representing the t-th LSTM branch,/->

，N ₁ Indicating the accuracy of the t-th LSTM branch, N ₂ Representing the precision, N, of the t-th LSTM branch ₃ Represents the t-th LSTRecall of M branches ∈>

、

、/>

Is a weighting coefficient and->

=1，/>

Representing an adaptive regularization term +.>

，

Representing the parameters of the full connection layer +.>

Norms (F/F)>

Express preset constant (for avoiding the case where denominator is 0),>

regularization coefficient representing the t-th LSTM branch,/->

。

Based on the above, the embodiment of the invention can adaptively adjust the task weight and regularization strength of each branch according to the performance (accuracy, precision and recall) of each LSTM branch. The loss function is essentially an adaptive multitasking mixed regularization loss function combining multiple tasks, adaptive task weights and adaptive regularization terms, and has higher complexity. Because the loss function can adaptively adjust the task weight and the regularization strength according to the task performance, better model performance can be obtained under different tasks and scenes.

Further, to avoid model overfitting and improve the generalization capability of the model, regularization techniques may also be used. Specifically, a Dropout layer is added after the fully connected layer of each LSTM branch to randomly set the output of some neurons to 0. And, early stop strategy can also be adopted, i.e. when the error on the verification set is not reduced several times in succession, the training is stopped.

After training, when the target multi-branch LSTM model is applied to perform feature extraction on meteorological data, the model inputs the meteorological data to each LSTM branch to perform feature extraction respectively, and feature fusion is performed after feature extraction, wherein the feature fusion mode is as follows:

wherein->

Representing the fused features->

The characteristic addition is represented by adding the characteristic values at the same index position.

Having described how the target multi-branch LSTM model is obtained through training, the relevant portions of the finite-learning-machine classifier are described in detail below. In an alternative embodiment, the method of the present invention further comprises the steps of:

step S301, performing feature extraction processing on meteorological sample data in the second sample set by using the target multi-branch LSTM model to obtain sample data features of each meteorological sample data.

Step S302, training the initial extreme learning machine classifier based on sample data features and sample labels of all meteorological sample data to obtain a target extreme learning machine classifier.

Wherein the initial extreme learning machine classifier uses a uniform distribution to initialize the input weights and bias terms.

That is, after feature extraction of each meteorological sample data in the second sample set using the feature extraction model, the extracted features are further reused for training of the classifier. The embodiment of the invention provides an innovative extreme learning machine classifier algorithm which can be used for predicting and classifying short-term photovoltaic power generation data after feature extraction, and the algorithm is combined with a longhorn beetle optimization algorithm to realize further optimization of a model.

Specifically, the extreme learning machine classifier (hereinafter abbreviated as ELM) is a fast neural network algorithm, and has fast training speed and good generalization capability. Conventional ELM algorithms typically use normal distribution or randomly generated methods to initialize the input weights and bias terms to build hidden layers, and then use analytic solutions to solve for the output weights. However, these methods generally require adjustment of the parameters of the distribution to obtain an optimal model, and easily result in over-fitting of the model, requiring a significant amount of computational resources during training. To address these problems, the present invention proposes a method of initializing the input weights and bias terms of a classifier using a uniform distribution. Specifically, both the input weights and the bias terms are initialized to random numbers in a uniform distribution, so that the problem of parameter adjustment of the distribution can be avoided, and training of a model can be accelerated. In addition, since the uniform distribution is a widely used distribution, comparison and integration with other algorithms can be more conveniently performed when the ELM algorithm is used.

The prediction output of the extreme learning machine classifier is as follows: g (f) =h (f) β, where H (f) represents an output matrix of the classifier hidden layer after the sample data feature f of the meteorological sample data is input into the classifier, β represents an output weight (substantially a matrix), g (f) is a prediction result of probability that the sample data feature f belongs to different categories, and a category with the highest probability in g (f) is selected as a prediction output category of the sample data feature f. For example, when g (f) is [0.2,0.5,0.1,0.2], the predicted output class of the classifier on the sample data feature f is the class corresponding to the probability value of 0.5, namely the 2 nd class.

Wherein,,

output matrix representing hidden layer->

L is the number of hidden layer nodes, +.>

Representing the output of the ith neuron, +.>

Sign () represents a sigmoid function, < ->

Input weights representing the ith neuron, < +.>

Representing the bias term for the ith neuron.

In the traditional ELM algorithm, the output weight of a classifier hidden layer

，/>

Is->

Pseudo-inverse of>

，/>

And representing a sample label matrix, wherein the sample label matrix refers to a matrix formed by label values corresponding to each sample in a training set according to rows. For example, for a multi-classification problem with 3 classes, the size of the output data matrix is n×3, where N is the number of samples in the training set. Each row represents a label value for a sample, with 0 or 1 representing which category the sample belongs to. The output data matrix Ts is expressed as:

Wherein the first row corresponds to a first sample with a tag value of category 1, the second row corresponds to a second sample with a tag value of category 2, and so on. It should be noted that each sample belongs to only one class, so there is only one 1 in each row, and the remaining elements are all 0.

In some alternative embodiments, regularization techniques may also be used to randomly Dropout, randomly select some neurons during training, and set their output value to 0, thereby reducing the risk of overfitting of the model. While the conventional Dropout method is usually applied in the input layer or the full connection layer, the embodiment of the invention provides a new random Dropout method which can be directly applied in the hidden layer of the ELM algorithm.

Specifically, a random variable r.epsilon.0, 1 is defined]Where r=0 indicates that the neuron is randomly selected and turned off, and r=1 indicates that the neuron remains on. In the ELM algorithm, the input weight w is multiplied by a random variable r, so as to obtain a hidden layer output value Cr (f) after random Dropout:

wherein->

Representing bit wise multiplication, sigm () represents a sigmoid function. Since the random Dropout method randomly selects some neurons in each iteration and sets their output values to 0, the risk of over-fitting of the model can be effectively reduced, thereby improving the generalization capability of the model.

Unlike the conventional ELM method, the output weight of the hidden layer in the target extreme learning machine classifier used in the embodiment of the present invention is determined by an improved longhorn beetle optimization algorithm, and the determination flow of the output weight will be described in detail below.

In an alternative embodiment, the method of the present invention further comprises the steps of:

step S401, initializing position data and speed data of the individual longhorn beetles.

The parameter space corresponding to the position data of each longicorn individual is taken as the input weight and the bias item of the extreme learning machine classifier.

And repeating the following steps S402-S405 until the preset iteration times are reached, and taking the output weights corresponding to all the longicorn individual historic optimal position data as the output weights of the hidden layers in the target extreme learning machine classifier.

Step S402, calculating the fitness value of the target longicorn individual based on the position data and the speed data of the target longicorn individual, and determining the longicorn leader individual based on the fitness value.

Wherein the target longicorn individual represents any of all longicorn individuals.

Step S403, updating the speed data and the position data of the target longicorn individual based on the position data of the longicorn leader individual.

And step S404, calculating the output value of the target sample data characteristic in the hidden layer according to the updated position data of the longicorn individual.

Wherein the target sample data characteristic represents a sample data characteristic of any meteorological sample data.

Step S405, calculating the output weight of the hidden layer in the extreme learning machine classifier based on the output value of the hidden layer and the sample label corresponding to the target sample data feature.

The traditional longicorn optimization algorithm is inspired by longicorn search behaviors, and a strategy for observing the longicorn when searching food is adopted: when the longhorn beetles search, one direction is randomly selected, the longhorn beetles move a certain distance according to the current direction and the distance, and meanwhile, whether the moving direction is changed is determined by observing whether better food exists in front. The algorithm utilizes a heuristic strategy in the longhorn beetle searching process and applies the heuristic strategy to the optimization problem, so that the traditional longhorn beetle optimization algorithm can search a global optimal solution in a complex searching space by simulating the searching behavior of the longhorn beetles.

According to the embodiment of the invention, the parameter space corresponding to the position data of each longhorn beetle in the longhorn beetle optimization algorithm is taken as the input weight and the bias item of the extreme learning machine classifier, so that the historically optimal position data of all longhorn beetle individuals can be found out through the longhorn beetle optimization algorithm, and the output weight of the hidden layer in the target extreme learning machine classifier is calculated based on the input weight and the bias item corresponding to the optimal position data.

As can be seen from the above description of the steps, the following steps are adopted in the embodiment of the present invention to implement the improved longhorn optimization algorithm:

and step 1, initializing position data and speed data of the individual longhorn beetles.

And 2, calculating an fitness value z (f) of the target longicorn, determining a longicorn leader according to the fitness value, and updating the speed and the position of the target longicorn leader based on the position data of the longicorn leader.

Wherein the fitness value

MSE represents mean square error, H (f) represents the output matrix of the hidden layer of the classifier after the sample data feature f of the meteorological sample data x is input into the classifier, and the calculation method of H (f) is referred to above, ">

Sample tags representing weather sample data x.

In a conventional longhorn optimization algorithm, if the speed and position of individual longhorns are expressed as vectors v and s, their values are updated using the following formula:

；

. Wherein i represents the ith longicorn individual, t represents the iteration number, < ->

Representing inertial weights, c ₁ And c ₂ Represents an acceleration factor, r ₁ And r ₂ Representing a random number. pbest_i represents the historically optimal location for the ith longhorn beetle individual, and gbest represents the historically optimal location for all longhorn beetle individuals. />

Representing the position of the ith longicorn individual at the t-th iteration,/for each iteration >

The position of the ith longicorn individual at the t+1st iteration is indicated. />

Representing the speed of the ith longicorn individual at the t-th iteration,/for each iteration>

The speed of the ith individual longicorn at iteration t+1 is indicated.

However, in the conventional longhorn beetle optimization algorithm, each individual longhorn beetle is independently moved and searched. However, in real life, the longhorn beetles generally form a group, wherein the relationship between the leader and the follower exists, so the invention provides a longhorn beetle optimization algorithm (hereinafter abbreviated as LFA algorithm) based on the leader and the follower, which can better simulate the behavior of the longhorn beetles in real life.

Specifically, after the fitness values of all the longicorn individuals in the population are calculated, the longicorn individual with the largest fitness value is selected as the longicorn leader (i.e., the leader), and the remaining individuals will become followers, which will follow the movements and searches of the leader. In the follower moving stage, the LFA algorithm adjusts the position and speed of the follower according to the position and speed of the leader, so that the movement and search of the whole group are realized. Specifically, the LFA algorithm updates the follower's position and velocity using the following formula:

；/>

. Wherein i represents the ith follower, t represents the number of iterations, +.>

Representing inertial weights, c ₁ And c ₂ Represents an acceleration factor, r ₁ And r ₂ Representing a random number. pbest_i represents the historically optimal location of the ith follower and lbest represents the location of the leader.

Step 3, calculating the output value of the target sample data characteristic in the classifier hidden layer according to the updated position data of the target longicorn individual

。

Step 4, according to the formula

Solving the output weight of the hidden layer in the extreme learning machine classifier corresponding to the target longicorn individual>

。

And 5, iterating the steps 2 to 4 until the preset iteration times are reached, stopping the iteration, and finishing training. And taking the output weight corresponding to the historic optimal position data of all the longicorn individuals as the output weight of the hidden layer in the target extreme learning machine classifier.

In some alternative embodiments, batch processing techniques, random Dropout methods, and Early stopping techniques may also be employed to increase the training speed and generalization ability of the model. The batch processing technology is to process a plurality of samples at one time. Specifically, the training set is divided into several small batches, and a portion of the samples are randomly selected from the different small batches at a time to train the model. Such batch processing techniques can effectively reduce computation time and can reduce the risk of overfitting of the model to some extent. In batch processes, it is often necessary to select the appropriate batch size to achieve the best model performance. Generally, the larger the batch size, the faster the training speed, but the greater the risk of overfitting of the model. Conversely, the smaller the batch size, the slower the training speed, but the better the generalization ability of the model. The user needs to select the optimal lot size according to the specific dataset and model.

The random Dropout method is applied to the batched samples, for calculating the output H of the hidden layer,

w represents an input weight matrix, +.>

Representing a random Dropout matrix, T representing the transposed sign of the matrix, m representing the total number of samples in the second sample set, n representing the characteristic dimension of the characteristic of the sample data, +.>

Representing the bias term vector.

Early stop is a common regularization method that can stop training in advance in the training process, thereby avoiding the risk of overfitting of the model. Specifically, the sample set is divided into a training set and a validation set, and at each iteration cycle, the training set is used to train the model and the validation set is used to evaluate the performance of the model. When the performance of the model is not improved, training is stopped in advance, and the model with the best performance is selected as a final classifier. The method can avoid overfitting, can improve generalization capability of the model to be shown in conclusion, can provide the target extreme learning machine classifier with high accuracy and strong generalization capability, can be used for prediction classification of short-term photovoltaic power generation data, and has wide application value.

Example two

The embodiment of the invention also provides a photovoltaic power generation prediction device based on depth feature selection, which is mainly used for executing the photovoltaic power generation prediction method based on depth feature selection provided by the embodiment, and the photovoltaic power generation prediction device based on depth feature selection provided by the embodiment of the invention is specifically introduced below.

Fig. 3 is a functional block diagram of a photovoltaic power generation prediction device based on depth feature selection according to an embodiment of the present invention, where, as shown in fig. 3, the device mainly includes: a first acquisition module 10, a feature extraction module 20, a prediction module 30, wherein:

the first acquisition module 10 is used for acquiring meteorological data of the photovoltaic power station at the moment to be processed.

The feature extraction module 20 is configured to perform feature extraction processing on the meteorological data by using the feature extraction model, so as to obtain a target data feature.

And the prediction module 30 is used for processing the target data characteristics by using the target extreme learning machine classifier to obtain a photovoltaic power generation amount interval corresponding to the meteorological data at the moment to be processed.

The output weight of the hidden layer in the target extreme learning machine classifier is determined by an improved longicorn optimization algorithm; when the speed of the longicorn individual is updated by the improved longicorn optimization algorithm, the position data of the longicorn leading individual is used for replacing the historically optimal position data of all the longicorn individuals, and the longicorn leading individual is the longicorn individual with the largest fitness value in each iteration process.

The photovoltaic power generation prediction device based on depth feature selection provided by the embodiment of the invention adopts a deep learning technology, performs feature extraction processing on meteorological data at the moment to be processed by utilizing a feature extraction model, and processes target data features by adopting the target extreme learning machine classifier based on the improved longhorn beetle optimization algorithm.

Optionally, the feature extraction model includes: a target multi-branch LSTM model; the apparatus further comprises:

the second acquisition module is used for acquiring a training sample set; the training sample set comprises a plurality of meteorological sample data, and each meteorological sample data has a corresponding sample label; each sample label uniquely corresponds to one photovoltaic power generation capacity interval, and no intersection exists between different photovoltaic power generation capacity intervals.

The preprocessing module is used for sequentially carrying out data cleaning processing and normalization processing on meteorological sample data in the training sample set to obtain a first sample set.

And the expansion module is used for carrying out sample expansion processing on the first sample set to obtain a second sample set.

The first training module is used for training the initial multi-branch LSTM model based on the second sample set to obtain a target multi-branch LSTM model; each LSTM branch in the initial multi-branch LSTM model uniquely processes meteorological sample data of one sample label, and the total number of the LSTM branches in the initial multi-branch LSTM model is the same as the total number of the sample labels.

Optionally, the expansion module includes:

a calculation unit for calculating a target distance between the first meteorological sample data and the second meteorological sample data in the first sample set; wherein the first and second weather sample data represent any two weather sample data in the first sample set; the target distance includes: euclidean distance, manhattan distance, and cosine distance.

And the normalization unit is used for normalizing the target distance to obtain the normalized target distance.

And the weighting unit is used for carrying out weighting processing on the normalized target distance to obtain the comprehensive distance between the first meteorological sample data and the second meteorological sample data.

A determining unit for determining a nearest neighbor set for each meteorological sample data based on a comprehensive distance between the meteorological sample data in the first set of samples.

And the expansion unit is used for carrying out sample expansion processing based on all meteorological sample data in the first sample set and the corresponding nearest neighbor set to obtain a second sample set.

Optionally, the expansion unit is specifically configured to:

calculating the weight of each nearest neighbor sample in the nearest neighbor set of the target meteorological sample data; wherein the target weather sample data represents any weather sample data in the first sample set.

Determining a target synthetic sample based on the target weather sample data, the target nearest neighbor sample, and the weight of the target nearest neighbor sample; wherein the target nearest neighbor sample represents any nearest neighbor sample in the nearest neighbor set of target meteorological sample data.

A second sample set is constructed based on the first sample set and all of the synthesized samples.

Optionally, the loss function used to train the initial multi-branch LSTM model is:

。

sample label, ->

Predictive sample tag representing multi-branch LSTM model output,/->

Delta represents a preset threshold value, < ->

Task weight representing the t-th LSTM branch,/->

、/>

、/>

is a weighting coefficient and->

=1，/>

Representing an adaptive regularization term +.>

，/>

Representing the parameters of the full connection layer +.>

Norms (F/F)>

Expressing a preset constant, < >>

Representing the regularization coefficient of the tth LSTM branch,

。

optionally, the device is further configured to:

and performing feature extraction processing on the meteorological sample data in the second sample set by using the target multi-branch LSTM model to obtain sample data features of each meteorological sample data.

Training the initial extreme learning machine classifier based on sample data features and sample labels of all meteorological sample data to obtain a target extreme learning machine classifier; wherein the initial extreme learning machine classifier uses a uniform distribution to initialize the input weights and bias terms.

Optionally, the device is further configured to:

Initializing position data and speed data of an individual longicorn; the parameter space corresponding to the position data of each longicorn individual is taken as the input weight and the bias item of the extreme learning machine classifier.

Repeating the following steps until the preset iteration times are reached, and taking the output weight corresponding to the historically optimal position data of all the longicorn individuals as the output weight of the hidden layer in the target extreme learning machine classifier:

calculating an fitness value of the target longicorn individual based on the position data and the speed data of the target longicorn individual, and determining a longicorn leader individual based on the fitness value; wherein the target longicorn individual represents any of all longicorn individuals.

The speed data and the position data of the target longicorn individual are updated based on the position data of the longicorn leader individual.

Calculating an output value of the target sample data characteristic in the hidden layer according to the updated position data of the longicorn individual; wherein the target sample data characteristic represents a sample data characteristic of any meteorological sample data.

And calculating the output weight of the hidden layer in the extreme learning machine classifier based on the output value of the hidden layer and the sample label corresponding to the target sample data characteristic.

Example III

Referring to fig. 4, an embodiment of the present invention provides an electronic device, including: a processor 60, a memory 61, a bus 62 and a communication interface 63, the processor 60, the communication interface 63 and the memory 61 being connected by the bus 62; the processor 60 is arranged to execute executable modules, such as computer programs, stored in the memory 61.

The memory 61 may include a high-speed random access memory (RAM, random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. The communication connection between the system network element and at least one other network element is achieved via at least one communication interface 63 (which may be wired or wireless), and may use the internet, a wide area network, a local network, a metropolitan area network, etc.

Bus 62 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 4, but not only one bus or type of bus.

The memory 61 is configured to store a program, and the processor 60 executes the program after receiving an execution instruction, and the method executed by the apparatus for defining a process disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 60 or implemented by the processor 60.

The processor 60 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware or instructions in software in the processor 60. The processor 60 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but may also be a digital signal processor (Digital Signal Processing, DSP for short), application specific integrated circuit (Application Specific Integrated Circuit, ASIC for short), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA for short), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory 61 and the processor 60 reads the information in the memory 61 and in combination with its hardware performs the steps of the method described above.

The embodiment of the invention provides a photovoltaic power generation prediction method and device based on depth feature selection and a computer program product of electronic equipment, which comprise a computer readable storage medium storing non-volatile program codes executable by a processor, wherein the instructions included in the program codes can be used for executing the method described in the embodiment of the method, and specific implementation can be seen in the embodiment of the method and is not repeated herein.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

In the description of the present invention, it should be noted that, directions or positional relationships indicated by terms such as "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., are directions or positional relationships based on those shown in the drawings, or are directions or positional relationships conventionally put in use of the inventive product, are merely for convenience of describing the present invention and simplifying the description, and are not indicative or implying that the apparatus or element to be referred to must have a specific direction, be constructed and operated in a specific direction, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.

Furthermore, the terms "horizontal," "vertical," "overhang," and the like do not denote a requirement that the component be absolutely horizontal or overhang, but rather may be slightly inclined. As "horizontal" merely means that its direction is more horizontal than "vertical", and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.

In the description of the present invention, it should also be noted that, unless explicitly specified and limited otherwise, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. The photovoltaic power generation prediction method based on depth feature selection is characterized by comprising the following steps of:

Acquiring meteorological data of a photovoltaic power station at a moment to be processed;

performing feature extraction processing on the meteorological data by using a feature extraction model to obtain target data features;

processing the target data characteristics by using a target extreme learning machine classifier to obtain a photovoltaic power generation amount interval corresponding to meteorological data at the moment to be processed;

the output weight of the hidden layer in the target extreme learning machine classifier is determined by an improved longicorn optimization algorithm; when the speed of the improved longhorn beetle individual is updated, the position data of the longhorn beetle leading individual is used for replacing the historically optimal position data of all the longhorn beetle individuals, and the longhorn beetle leading individual is the longhorn beetle individual with the largest fitness value in each iteration process.

2. The depth feature selection-based photovoltaic power generation prediction method according to claim 1, wherein the feature extraction model comprises: a target multi-branch LSTM model; the method further comprises the steps of:

acquiring a training sample set; the training sample set comprises a plurality of meteorological sample data, and each meteorological sample data has a corresponding sample label; each sample tag uniquely corresponds to one photovoltaic power generation capacity interval, and no intersection exists between different photovoltaic power generation capacity intervals;

Sequentially performing data cleaning treatment and normalization treatment on meteorological sample data in the training sample set to obtain a first sample set;

performing sample expansion processing on the first sample set to obtain a second sample set;

training an initial multi-branch LSTM model based on the second sample set to obtain the target multi-branch LSTM model; each LSTM branch in the initial multi-branch LSTM model uniquely processes meteorological sample data of one sample label, and the total number of the LSTM branches in the initial multi-branch LSTM model is the same as the total number of the sample labels.

3. The depth feature selection-based photovoltaic power generation prediction method according to claim 2, wherein performing sample expansion processing on the first sample set comprises:

calculating a target distance between first and second weather sample data in the first set of samples; wherein the first and second weather sample data represent any two weather sample data in the first sample set; the target distance includes: euclidean distance, manhattan distance, and cosine distance;

normalizing the target distance to obtain a normalized target distance;

Weighting the normalized target distance to obtain the comprehensive distance between the first meteorological sample data and the second meteorological sample data;

determining a nearest neighbor set for each of the weather sample data based on a comprehensive distance between the weather sample data in the first sample set;

and performing sample expansion processing based on all meteorological sample data in the first sample set and the corresponding nearest neighbor set to obtain the second sample set.

4. A depth feature selection-based photovoltaic power generation prediction method according to claim 3, wherein performing sample expansion processing based on all meteorological sample data in the first sample set and a corresponding nearest neighbor set comprises:

calculating the weight of each nearest neighbor sample in the nearest neighbor set of the target meteorological sample data; wherein the target weather sample data represents any weather sample data in the first sample set;

determining a target synthetic sample based on the target weather sample data, a target nearest neighbor sample, and a weight of the target nearest neighbor sample; wherein the target nearest neighbor sample represents any nearest neighbor sample in a nearest neighbor set of the target meteorological sample data;

The second sample set is constructed based on the first sample set and all synthetic samples.

5. The depth feature selection-based photovoltaic power generation prediction method of claim 2, wherein the loss function used to train the initial multi-branch LSTM model is:

；

wherein Tr represents the total number of LSTM branches in the initial multi-branch LSTM model,

sample label, ->

Predictive sample tag representing multi-branch LSTM model output,/->

Delta represents a preset threshold value, < ->

Task weight representing the t-th LSTM branch,/->

、/>

、/>

is a weighting coefficient and->

=1，/>

Representing an adaptive regularization term +.>

，/>

Representing the parameters of the full connection layer +.>

Norms (F/F)>

Expressing a preset constant, < >>

Representing the regularization coefficient of the tth LSTM branch,

。

6. the depth feature selection-based photovoltaic power generation prediction method of claim 2, further comprising:

performing feature extraction processing on meteorological sample data in the second sample set by using the target multi-branch LSTM model to obtain sample data features of each meteorological sample data;

Training an initial extreme learning machine classifier based on sample data features and sample labels of all meteorological sample data to obtain the target extreme learning machine classifier; wherein the initial extreme learning machine classifier uses a uniform distribution to initialize input weights and bias terms.

7. The depth feature selection-based photovoltaic power generation prediction method of claim 6, further comprising:

initializing position data and speed data of an individual longicorn; the parameter space value corresponding to the position data of each longicorn individual is the input weight and the bias item of the extreme learning machine classifier;

repeating the following steps until the preset iteration times are reached, and taking the output weights corresponding to the historically optimal position data of all the longicorn individuals as the output weights of the hidden layers in the target extreme learning machine classifier:

calculating an fitness value of a target longicorn individual based on position data and speed data of the target longicorn individual, and determining the longicorn leader individual based on the fitness value; wherein the target longicorn individual represents any of the longicorn individuals;

Updating speed data and position data of the target longicorn individual based on the position data of the longicorn lead individual;

calculating an output value of the target sample data characteristic in the hidden layer according to the updated position data of the longicorn individual; wherein the target sample data characteristic represents a sample data characteristic of any of the meteorological sample data;

8. A photovoltaic power generation prediction device based on depth feature selection, comprising:

the first acquisition module is used for acquiring meteorological data of the photovoltaic power station at the moment to be processed;

the feature extraction module is used for carrying out feature extraction processing on the meteorological data by utilizing a feature extraction model to obtain target data features;

the prediction module is used for processing the target data characteristics by using a target extreme learning machine classifier to obtain a photovoltaic power generation capacity interval corresponding to the meteorological data at the moment to be processed;

9. An electronic device comprising a memory, a processor, the memory having stored thereon a computer program executable on the processor, characterized in that the processor, when executing the computer program, implements the steps of the depth feature selection based photovoltaic power generation prediction method of any of claims 1 to 7.

10. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the depth feature selection based photovoltaic power generation prediction method of any one of claims 1 to 7.