CN118036825A

CN118036825A - Power load prediction method and system based on multi-scale feature fusion

Info

Publication number: CN118036825A
Application number: CN202410367979.XA
Authority: CN
Inventors: 陈淑伟; 刘诗纯; 邓源硕; 史帅; 崔晓明; 温振新; 古欣; 李雪梅
Original assignee: Shandong Youren Intelligent Technology Co ltd
Current assignee: Shandong Youren Intelligent Technology Co ltd
Priority date: 2024-03-28
Filing date: 2024-03-28
Publication date: 2024-05-14

Abstract

The invention discloses a power load prediction method and a system based on multi-scale feature fusion, wherein the method comprises the following steps: acquiring power load data; extracting first features of the power load data based on the trained comparative learning model; the method comprises the steps of taking historical load data of each sampling time step and a corresponding generated enhancement view as positive sample pairs, taking the historical load data of the rest moments as negative samples, constructing the positive and negative sample pairs, extracting sample characteristics, and training a comparison learning model based on the sample characteristics; clustering the power load data to obtain states of the power load data under different scales, taking the states of the power load data under different scales as nodes, and constructing edges according to the relevance and similarity among the nodes, so as to extract second features based on the generated state evolution diagram; and the first characteristic and the second characteristic are subjected to weighted fusion, load prediction is carried out based on the fused characteristic, and the accuracy and the robustness of the prediction are obviously improved.

Description

Power load prediction method and system based on multi-scale feature fusion

Technical Field

The invention relates to the technical field of power load prediction, in particular to a power load prediction method and system based on multi-scale feature fusion.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Because the power load data is influenced by a plurality of factors and has certain volatility and nonlinearity, the realization of accurate prediction trend is a problem which needs to be solved by various prediction methods at present. The power load prediction method experiences an evolution from a conventional timing model to a deep learning timing model. Traditional time sequence models, such as differential autoregressive moving average models, have been widely used in power load prediction; however, such models cannot efficiently process non-linear, high-volatility data, resulting in poor prediction accuracy.

With the development of artificial intelligence technology, the deep learning time series prediction model effectively makes up the defects of the traditional model. However, the existing deep learning power load prediction method still faces the following problems:

(1) The power load data presents complex and changeable data modes, and the inherent rule of the data with single frequency is difficult to comprehensively reveal. This limitation results in the model being difficult to extract as effective features of the power load variation, thereby affecting the accuracy of the predictions.

(2) Power load data is often accompanied by characteristics such as nonlinearities and noise, which can be challenging for a partially deep learning model. These models often have difficulty effectively filtering noise when processing such data, resulting in overfitting the insubstantial data, and thus, difficulty in accurately capturing the true trend features of the data.

(3) The fact that the existing deep learning model generally lacks sufficient generalization capability in power load prediction means that the model is excellent in training data, but prediction performance of the model can be greatly reduced when the model faces new and unseen data, and the model is limited in universality and practicability in practical application.

Disclosure of Invention

In order to solve the problems, the invention provides a power load prediction method and a power load prediction system based on multi-scale feature fusion, which are characterized in that common trend features among different scale data are deeply excavated by designing a comparison learning model with a self-adaptive sampling deviation correction mechanism, representative states in the scale data are identified through cluster analysis, and the accuracy and the robustness of prediction are remarkably improved by constructing dynamic correlations and similarities among state evolution diagram capturing states.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

In a first aspect, the present invention provides a method for predicting electrical loads based on multi-scale feature fusion, comprising:

Acquiring power load data;

extracting first features of the power load data based on the trained comparative learning model; the method comprises the steps of taking historical load data of each sampling time step and a corresponding generated enhancement view as positive sample pairs, taking the historical load data of the rest moments as negative samples, constructing the positive and negative sample pairs, extracting sample characteristics, and training a comparison learning model based on the sample characteristics;

clustering the power load data to obtain states of the power load data under different scales, taking the states of the power load data under different scales as nodes, and constructing edges according to the relevance and similarity among the nodes, so as to extract second features based on the generated state evolution diagram;

and carrying out weighted fusion on the first characteristic and the second characteristic, and carrying out load prediction based on the fused characteristic.

As an alternative implementation manner, power load minute-level data and power load day-level data with different sampling frequencies are obtained, and after normalization processing, the power load minute-level data and the power load day-level data are aligned to the same time step and spliced to obtain a two-dimensional sequence.

As an alternative embodiment, an exponential smoothing method is used to generate the enhanced view s _t＝a×x_t+(1-a)×s_t-1; wherein x _t represents load data at time t; s _t-1 represents a smoothed value at time t-1; a is a smoothing constant.

As an alternative embodiment, a plurality of convolution encoding blocks are used to extract sample features, and the training based on the sample features includes: calculating the distribution distance between samples, and constructing a difference matrix according to the distribution distance; setting the distribution distance of the positive sample pairs in the difference matrix as a threshold value for judging the false negative samples, and judging the distribution distances corresponding to other negative samples one by one; if the distribution distance of a certain negative sample is smaller than or equal to the threshold value, the negative sample is a false negative sample, and the weight of the negative sample is set to be 0; if the distribution distance of a negative sample is greater than the threshold value, the negative sample is a true negative sample, and the weight is set to 1.

As an alternative embodiment, the process of extracting the second feature G _t based on the generated state-evolution graph includes:

Aggregating node neighbor messages by adopting a message propagation mechanism to obtain a characteristic H _t：H_t＝v_i(WN_t-1 +b);

Learning weight α _t:

modeling node encoding feature N _t and state evolution graph G _t:

N _t-1 is the node coding feature in the t-1 moment state evolution diagram G _t-1; v _i is the state of the i-th cluster after clustering; w, W _a and b are learnable parameters; The splicing operation is performed; n _t is the node encoding feature in state evolution diagram G _t; y _t is real data.

As an alternative embodiment, the weighted fusion process includes ：G¹＝Sigmoid(H^x);G²＝Sigmoid(G_t);H＝G¹×H^x+G²×G_t; where H ^x is a first feature, G _t is a second feature, sigmoid is an activation function, and H is a fused feature; g ¹、G² is the weight.

In a second aspect, the present invention provides a power load prediction system based on multi-scale feature fusion, comprising:

a data acquisition module configured to acquire power load data;

a contrast learning module configured to extract a first feature of the power load data based on the trained contrast learning model; the method comprises the steps of taking historical load data of each sampling time step and a corresponding generated enhancement view as positive sample pairs, taking the historical load data of the rest moments as negative samples, constructing the positive and negative sample pairs, extracting sample characteristics, and training a comparison learning model based on the sample characteristics;

The state evolution module is configured to cluster the power load data to obtain states of the power load data under different scales, the states under the different scales are used as nodes, edges are constructed according to the relevance and the similarity among the nodes, and therefore second features are extracted based on the generated state evolution diagram;

and the prediction module is configured to carry out weighted fusion on the first characteristic and the second characteristic and carry out load prediction based on the fused characteristic.

In a third aspect, the invention provides an electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the method of the first aspect.

In a fourth aspect, the present invention provides a computer readable storage medium storing computer instructions which, when executed by a processor, perform the method of the first aspect.

In a fifth aspect, the invention provides a computer program product comprising a computer program which, when executed by a processor, implements the method of the first aspect.

Compared with the prior art, the invention has the beneficial effects that:

According to the invention, by sampling the power load data with different frequencies, including the minute-level data and the day-level data, the continuity is maintained in the time dimension, the data with two different scales cover the same time period, and by designing the contrast learning model with the self-adaptive sampling deviation correcting mechanism, the common trend characteristics among the data with different scales are deeply excavated, so that the sensitivity of the model to noise data is reduced, and the accuracy and the robustness of model prediction are obviously improved.

According to the invention, the representative states in the scale data are accurately identified through cluster analysis, and the dynamic association and similarity between the states are captured by constructing a state evolution diagram, so that the correlation characteristics between different scale data are further extracted and fused.

According to the invention, the attention mechanism is adopted to carry out weighted fusion on the features extracted by the two modules, and the weights of different features are adjusted in a self-adaptive manner, so that the model can be better adapted to the power load data with a complex mode, the features containing rich information are output, and the error of the power load data prediction is obviously reduced.

Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

FIG. 1 is a flowchart of a power load prediction method based on multi-scale feature fusion provided in embodiment 1 of the present invention;

FIG. 2 is a schematic diagram of a convolutional coding block structure according to embodiment 1 of the present invention;

FIG. 3 is a flow chart of the adaptive negative sample bias mitigation provided in embodiment 1 of the present invention;

fig. 4 is a schematic structural diagram of a state evolution diagram according to embodiment 1 of the present invention.

Detailed Description

The invention is further described below with reference to the drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, unless the context clearly indicates otherwise, the singular forms also are intended to include the plural forms, and furthermore, it is to be understood that the terms "comprises" and "comprising" and any variations thereof are intended to cover non-exclusive inclusions, e.g., processes, methods, systems, products or devices that comprise a series of steps or units, are not necessarily limited to those steps or units that are expressly listed, but may include other steps or units that are not expressly listed or inherent to such processes, methods, products or devices.

Embodiments of the invention and features of the embodiments may be combined with each other without conflict.

Example 1

The embodiment provides a power load prediction method based on multi-scale feature fusion, as shown in fig. 1, including:

Acquiring power load data;

In this embodiment, first, the power load minute-scale data and the power load day-scale data of different sampling frequencies are acquired, ensuring that these data remain continuous in the time dimension, and the data of two different scales cover the same period of time.

Setting a power load minute-level data sequenceWherein/>Is the power load minute level data at time T, T ₁ is the length of the power load minute level data, T e {1,2,., T ₁ };

setting power load data sequence Wherein/>Is the power load day level data at time T, T ₂ is the length of the power load day clock level data, T e {1, 2..the., T ₂ }.

Normalizing the power load minute-scale data and the power load day-scale data:

Wherein, X is the original power load data, X _min is the sequence minimum, X _max is the sequence maximum, and X ^' is the normalized sequence.

Normalizing the power load minute data and the power load day data according to the formula (1), and performing sliding window processing on the power load minute data and the power load day data to obtain a new sequenceAnd/>Wherein L ₁ and L ₂ are sliding window sizes of minute-level data and day-level data respectively, and T is the number of sliding window segments.

Since the sequences X ^m' and X ^d' have different lengths, the embodiment aligns and splices the data with different sampling frequencies to the same time step, splices the multi-scale data into a two-dimensional sequence X, and provides a basis for subsequent feature extraction and predictive analysis.

Wherein f (·) represents a neural network, X ^m' and X ^d' are aligned to the same length L,Representing a splicing operation; x= [ X ₁,x₂,…,x_T]∈R^T×L×2,x_t∈R^L×2, L is the aligned sequence length, X _t is a two-dimensional vector containing minute-level data and day-level data at time t.

In this embodiment, a contrast learning model is trained by constructing positive and negative sample pairs. The essence of contrast learning is to learn common features between samples by minimizing the distance between positive samples and maximizing the distance between negative samples. Therefore, for the historical load data x _t of each sampling time step, a data enhancement technology is adopted to generate a corresponding enhancement view s _t, the enhancement view and a corresponding original sample form a positive sample pair, and the samples at different time points are taken as negative samples.

Specific enhancement means employ exponential smoothing.

s_t＝a×x_t+(1-a)×s_t-1 (3)

Wherein S _t represents a smoothed value at time t, that is, an enhanced view corresponding to the power load data of each sampling time step, s= [ S ₁,s₂,…,s_T]∈R^T×L×2;x_t represents an actual value at time t; s _t-1 represents the actual value at time t-1; a is a smoothing constant, the value of which is typically in the range of 0 to 1, and determines the degree of smoothing, as a approaches 1, the smoothed value will depend more on the most recent actual value, and less on the history data, which may lead to a prediction result that is very sensitive to the most recent changes; conversely, as a approaches 0, the smoothed value will depend more on the historical data and less on the most recent actual value.

Sample features are then extracted using a plurality of convolutional encoders to capture local and global features of the data. Specifically, the enhanced view S and the power load data X are fed into a plurality of convolutional encoding blocks (ConvBlock) to extract sample features. The structure of the convolutional coding block is shown in fig. 2.

H^s＝ConvBlock_n(S) (4)

H^x＝ConvBlock_n(X) (5)

Where n is the number of convolutionally encoded blocks, H ^s is the feature corresponding to the enhanced view S, and H ^x is the feature corresponding to the power load data X.

In the training process based on sample characteristics, a false negative sample, namely a sample which is highly similar to a positive sample, is often generated, and the effectiveness of a contrast learning model is reduced by the false negative sample, so that the harm generated in the training process is eliminated by learning a weight for the false negative sample, the sample distinguishing capability of the model is improved, and the negative sample deviation is adaptively relieved.

As shown in fig. 3, the specific flow includes:

(1) And (3) accurately calculating the distribution distance between samples by utilizing the KL divergence, and constructing a difference matrix based on the distribution distance, wherein the difference matrix intuitively reflects the distribution difference between different samples.

(2) And setting the distribution distance of the positive sample pairs in the difference matrix as a threshold value for judging the false negative samples, and judging the distribution distances corresponding to other negative samples one by one.

(3) If the distribution distance of a negative sample is less than or equal to the threshold value, then the negative sample is too close in distribution to the positive sample, so the negative sample is effectively a false negative sample; for a false negative sample, its weight is set to 0 during training to eliminate possible misleading.

(4) If the distribution distance of a negative sample is greater than a threshold value, then the negative sample is a true negative sample, and for such negative sample its weight is set to 1 to ensure its normal contribution during training.

Through the steps, the corresponding weight is successfully allocated to each negative sample, and the influence of the false negative sample on the training process is effectively identified and eliminated.

In this embodiment, clustering is performed on the power load data by using a Kmeans method, so as to obtain a representative state of the power load data under different scales. For x= [ X ₁,x₂,…,x_T]∈R^T×L×2,x_t∈R^L×2, it was clustered using the Kmeans method.

Wherein,Is the mean vector for cluster C _i, i ε {1,2,.., K }, K is the number of clusters.

The Kmeans clustering algorithm adopts a greedy strategy, and an approximate solution is obtained through iterative optimization. The minimized square error ensures that samples in the cluster tightly surround the mean vector to a certain extent, namely the similarity of the samples in the cluster is high, so that a Kmeans method is used for clustering, and a representative state V= { V _i, 0.ltoreq.i < K } in data with different scales of power loads is obtained.

In this embodiment, a state evolution graph is constructed, representative states in different scale data are represented as nodes, and edges are constructed according to the relevance and similarity between the nodes, so as to extract features including dependency relations between the different scale data.

As shown in fig. 4, the method specifically includes:

By adopting a message propagation (MESSAGE PASSING, MP) mechanism, obtaining a characteristic H _t by aggregating all neighbor messages;

H_t＝v_i(WN_t-1+b) (7)

Wherein N _t-1 is the point feature at time t-1, v _i is the state of the ith cluster, W and b are learnable parameters, N ₀＝{μ_i, 0.ltoreq.i < k.

After obtaining the feature H _t, learning the weight α _t:

Wherein, Representing the stitching operation, W _a is a learnable parameter and G _t-1 is a graph feature at time t-1.

Using LSTM networks, the point feature N _t and the graph feature G _t are modeled in combination with Y _t information:

Where N _t is the node encoding feature in graph G _t, MP is the message propagation mechanism, H _t is the feature representation after MP has aggregated all neighbor messages, and Y _t is the real data.

The calculation formula of the specific LSTM network is shown as the formula (11) -formula (16).

i_t＝σ(W_i·[h_t-1,x_t]+b_i) (11)

f_t＝σ(W_f·[h_t-1,x_t]+b_f) (12)

o_t＝σ(W_o·[h_t-1,x_t]+b_o) (14)

h_t＝o_t*tanh(C_t) (16)

Wherein σ, tanh represent the activation function, W _i、W_C、W_f、W_o represent the parameter matrix, b _i、b_c、b_f、b_o represent the learnable bias vector, h _t-1 represent the hidden state value at the last time, h _t represent the current state, x _t represent the input at the current time,Representing the temporary hidden variable at the current time.

Finally, according to the state evolution diagram, the characteristic G _t containing the power load data of different scales is obtained.

In this embodiment, the feature H ^x extracted by the comparison learning model and the feature G _t extracted by the state evolution diagram are weighted and fused by the attention mechanism, so as to obtain a feature representation finally containing rich information.

G¹＝Sigmoid(H^x) (17)

G²＝Sigmoid(G_t) (18)

H＝G¹×H^x+G²×G_t (19)

Wherein Sigmoid is an activation function, and H is a fused feature.

And inputting the fused characteristics into a gate control loop network (GRU) to output a final load prediction result. GRU relieves the gradient disappearance and gradient explosion problems of RNN through a gating mechanism, simplifies the structure of LSTM and reduces the calculated amount; and the GRU can handle long-term dependency problems of time-series data and is suitable for such tasks as power load prediction.

The specific GRU network calculation formulas are shown in formulas (20) to (23).

z_t＝σ(W_z·[h_t-1,x_t]) (20)

r_t＝σ(W_r·[h_t-1,x_t]) (21)

Wherein sigma and tanh represent activation functions, and W _z、W_r and W represent parameter matrixes; the GRU changes the input door, the forget door and the output door into two doors: update gate z _t and reset gate r _t, and finally get final prediction result through GRU

To better demonstrate the effect of the method of this example, the results were demonstrated using some index evaluating the regression problem, mean Absolute Error (MAE), mean Absolute Percent Error (MAPE), root Mean Square Error (RMSE), and Mean Absolute Scale Error (MASE), respectively.

Wherein y _t represents the actual value of the time series at time t,Representing the predicted value of y _t, n is the test set length.

The method of the embodiment and 3 advanced methods: the four indexes of the method of the embodiment are highest by comparing the model of a learning network (TS-TCC), a cyclic recursion network (RNN) and a sample convolution switching network (SCINet), and the method of the embodiment is ranked first in the comparison method, so that the effectiveness of the method of the embodiment is verified.

Example 2

The embodiment provides a power load prediction system based on multi-scale feature fusion, which comprises:

a data acquisition module configured to acquire power load data;

It should be noted that the above modules correspond to the steps described in embodiment 1, and the above modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to those disclosed in embodiment 1. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.

In further embodiments, there is also provided:

an electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the method described in embodiment 1. For brevity, the description is omitted here.

It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may include read only memory and random access memory and provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.

A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method described in embodiment 1.

The method in embodiment 1 may be directly embodied as a hardware processor executing or executed with a combination of hardware and software modules in the processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein.

A computer program product comprising a computer program which, when executed by a processor, implements the method described in embodiment 1.

The present invention also provides at least one computer program product tangibly stored on a non-transitory computer-readable storage medium. The computer program product comprises computer executable instructions, such as instructions comprised in program modules, being executed in a device on a real or virtual processor of a target to perform the processes/methods as described above. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular abstract data types. In various embodiments, the functionality of the program modules may be combined or split between program modules as desired. Machine-executable instructions for program modules may be executed within local or distributed devices. In distributed devices, program modules may be located in both local and remote memory storage media.

Computer program code for carrying out methods of the present invention may be written in one or more programming languages. These computer program code may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the computer or other programmable data processing apparatus, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the computer, partly on the computer, as a stand-alone software package, partly on the computer and partly on a remote computer or entirely on the remote computer or server.

In the context of the present invention, computer program code or related data may be carried by any suitable carrier to enable an apparatus, device or processor to perform the various processes and operations described above. Examples of carriers include signals, computer readable media, and the like. Examples of signals may include electrical, optical, radio, acoustical or other form of propagated signals, such as carrier waves, infrared signals, etc.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

While the foregoing description of the embodiments of the present invention has been presented in conjunction with the drawings, it should be understood that it is not intended to limit the scope of the invention, but rather, it is intended to cover all modifications or variations within the scope of the invention as defined by the claims of the present invention.

Claims

1. The power load prediction method based on multi-scale feature fusion is characterized by comprising the following steps of:

Acquiring power load data;

2. The power load prediction method based on multi-scale feature fusion according to claim 1, wherein power load minute-scale data and power load day-scale data with different sampling frequencies are obtained, and after normalization processing, the power load minute-scale data and the power load day-scale data are aligned to the same time step and spliced to obtain a two-dimensional sequence.

3. The method for predicting the electrical load based on multi-scale feature fusion of claim 1, wherein an exponential smoothing method is used to generate the enhanced view s _t＝a×x_t+(1-a)×s_t-1; wherein x _t represents load data at time t; s _t-1 represents a smoothed value at time t-1; a is a smoothing constant.

4. The method for predicting electrical loads based on multi-scale feature fusion of claim 1, wherein the process of extracting sample features using a plurality of convolution encoding blocks and training based on the sample features comprises: calculating the distribution distance between samples, and constructing a difference matrix according to the distribution distance; setting the distribution distance of the positive sample pairs in the difference matrix as a threshold value for judging the false negative samples, and judging the distribution distances corresponding to other negative samples one by one; if the distribution distance of a certain negative sample is smaller than or equal to the threshold value, the negative sample is a false negative sample, and the weight of the negative sample is set to be 0; if the distribution distance of a negative sample is greater than the threshold value, the negative sample is a true negative sample, and the weight is set to 1.

5. The method of claim 1, wherein extracting the second feature G _t based on the generated state-evolutionary diagram comprises:

Learning weight α _t:

modeling node encoding feature N _t and state evolution graph G _t:

6. The method of claim 1, wherein the process of weighted fusion comprises ：G¹＝Sigmoid(H^x);G²＝Sigmoid(G_t);H＝G¹×H^x+G²×G_t;, wherein H ^x is a first feature, G _t is a second feature, sigmoid is an activation function, and H is a fused feature; g ¹、G² is the weight.

7. A multi-scale feature fusion-based electrical load prediction system, comprising:

a data acquisition module configured to acquire power load data;

8. An electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the method of any one of claims 1-6.

9. A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method of any of claims 1-6.

10. A computer program product comprising a computer program which, when executed by a processor, implements the method of any of claims 1-6.