CN113887812B

CN113887812B - Clustering-based small sample load prediction method, device, equipment and storage medium

Info

Publication number: CN113887812B
Application number: CN202111200796.1A
Authority: CN
Inventors: 陈东; 张海; 汪启元; 陈致晖; 吴辰晔; 沈灯鸿; 刘之亮; 赵晨; 张然; 王波
Original assignee: Guangdong Power Grid Co Ltd; Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Current assignee: Guangdong Power Grid Co Ltd; Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Priority date: 2021-10-14
Filing date: 2021-10-14
Publication date: 2023-07-07
Anticipated expiration: 2041-10-14
Also published as: CN113887812A

Abstract

The application discloses a small sample load prediction method, a device, equipment and a storage medium based on clustering, wherein the prediction method comprises the following steps: extracting characteristics of the historical power load and the power load to be predicted to obtain a characteristic vector; integrating and clustering the historical power load and the power load to be predicted according to the obtained feature vector to obtain a clustering result; adopting a wavelet noise reduction algorithm to reduce noise of the clustering result, and carrying out averaging treatment on the noise-reduced data to obtain time sequence data with preset length; inputting time sequence data with preset length into a second-order long-short-term memory neural network to obtain a prediction result of the power load; the second-order long-short-term memory neural network is trained through data of historical power loads and power loads to be predicted. The method still has excellent prediction performance under the condition that the power load to be predicted is scarce.

Description

Clustering-based small sample load prediction method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of power load prediction technologies, and in particular, to a clustering-based small sample load prediction method, apparatus, device, and storage medium.

Background

A method of predicting time-series data (hereinafter, referred to as time-series data) has been widely used in the electric power market. Traditional statistical models have advantages in terms of interpretability but are less performing in terms of prediction accuracy, etc. In recent years, algorithms based on machine learning have gained a very high degree of attention in the field of time series prediction by virtue of higher prediction accuracy.

Another closely related technique is small sample learning. Generally, a large-scale training set is essential for deep learning, and a small sample learning purpose is to improve the existing deep learning algorithm so as to achieve an excellent training effect on the small-scale training set.

However, while deep learning based models have higher accuracy in time series data prediction, these models are less efficient for data utilization. When the training set is small, deep learning-based models often fail to accurately predict future trends in the time series data.

Meanwhile, when a new user joins the power grid, the traditional prediction model cannot effectively obtain priori knowledge capable of supporting load prediction of the new user from historical data of other users so as to improve prediction accuracy.

Disclosure of Invention

The application provides a clustering-based small sample load prediction method, device, equipment and storage medium, which are used for solving the problem of low prediction result precision when a training set of power load is smaller in the prior art.

In order to solve the above technical problems, the present application proposes a small sample load prediction method based on clustering, including: extracting characteristics of the historical power load and the power load to be predicted to obtain a characteristic vector; integrating and clustering the historical power load and the power load to be predicted according to the obtained feature vector to obtain a clustering result; adopting a wavelet noise reduction algorithm to reduce noise of the clustering result, and carrying out averaging treatment on the noise-reduced data to obtain time sequence data with preset length; inputting time sequence data with preset length into a second-order long-short-term memory neural network to obtain a prediction result of the power load; the second-order long-short-term memory neural network is trained through data of historical power loads and power loads to be predicted. Optionally, feature extraction is performed on the historical power load and the power load to be predicted to obtain feature vectors, including: discrete wavelet analysis and statistical analysis based on seasonality, skewness and sample entropy are carried out on the historical power load and the power load to be predicted, so that feature vectors are extracted.

Optionally, denoising the clustering result by adopting a wavelet denoising algorithm, including: setting the discrete wavelet transformation order layer number; wherein the number of levels of the discrete wavelet transform is set to a maximum value that is not affected by the edge effect; and denoising the clustering result by using discrete wavelet transform.

Optionally, the historical power load and the power load to be predicted are clustered in an integrated manner according to the obtained feature vector, so as to obtain a clustering result, including: and obtaining a clustering result based on the K-means algorithm, hierarchical clustering, neighbor propagation algorithm and Gaussian mixture model.

In order to solve the above technical problem, the present application proposes a small sample load prediction device based on clustering, including: the characteristic extraction module is used for extracting characteristics of the historical power load and the power load to be predicted to obtain a characteristic vector; the clustering integration module is used for integrating and clustering the historical power load and the power load to be predicted according to the obtained feature vector to obtain a clustering result; the noise reduction processing module is used for reducing noise of the clustering result by adopting a wavelet noise reduction algorithm, and carrying out averaging processing on the noise-reduced data to obtain time sequence data with preset length; the prediction result module is used for inputting time sequence data with preset length into the second-order long-short-term memory neural network to obtain a prediction result of the power load; the second-order long-short-term memory neural network is trained through data of historical power loads and power loads to be predicted. Optionally, the feature extraction module is further configured to: discrete wavelet analysis and statistical analysis based on seasonality, skewness and sample entropy are carried out on the historical power load and the power load to be predicted, so that feature vectors are extracted.

Optionally, the noise reduction processing module is further configured to: setting the discrete wavelet transformation order layer number; wherein the number of levels of the discrete wavelet transform is set to a maximum value that is not affected by the edge effect; and denoising the clustering result by using discrete wavelet transform.

Optionally, the cluster integration module is further configured to: and obtaining a clustering result based on the K-means algorithm, hierarchical clustering, neighbor propagation algorithm and Gaussian mixture model.

In order to solve the above technical problems, the present application proposes an electronic device, including a memory and a processor, where the memory is connected to the processor, and the memory stores a computer program, and the computer program implements the cluster-based small sample load prediction method when executed by the processor.

To solve the above technical problem, the present application proposes a computer readable storage medium storing a computer program, which when executed implements the clustering-based small sample load prediction method described above.

The application provides a clustering-based small sample load prediction method, a device, equipment and a storage medium, wherein a stable clustering result can be obtained by acquiring a feature vector and integrating and clustering a historical power load and a power load to be predicted according to the acquired feature vector; the wavelet denoising algorithm is adopted to denoise the clustering result, and the denoised data is subjected to the averaging treatment, so that the prediction accuracy can be remarkably improved; inputting time sequence data with preset length into a second-order long-short-term memory neural network to obtain a prediction result of the power load; the second-order long-short-term memory neural network is trained by historical power load and data of power load to be predicted, so that the prediction of a small sample is reliable. According to the method, the historical power load is used for mixing the power load to be predicted, the feature extraction and the integrated clustering result are combined to provide priori knowledge for the second-order LSTM, and then the model is finely adjusted by using a small amount of load data of the power load to be predicted, so that excellent prediction performance is still achieved under the condition that the power load to be predicted is scarce.

Drawings

In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow diagram of one embodiment of a cluster-based small sample load prediction method of the present application;

FIG. 2 is a flow chart of an embodiment of feature vector extraction according to the present application;

FIG. 3 is a schematic diagram of one embodiment of an integrated cluster architecture of the present application;

FIG. 4 is a schematic diagram of one embodiment of a second-order long-term memory neural network of the present application;

FIG. 5 is a graph showing the impact of data granularity, training set sample number and time series period T prediction error;

FIG. 6 is a schematic structural diagram of an embodiment of a cluster-based small sample load prediction device of the present application;

FIG. 7 is a schematic diagram of an embodiment of an electronic device of the present application;

FIG. 8 is a schematic diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

In order to better understand the technical solutions of the present application, the clustering-based small sample load prediction method, apparatus, device and storage medium provided in the present application are described in further detail below with reference to the accompanying drawings and detailed description.

The technical problem to be solved by the application mainly comprises two aspects:

the first problem to be solved is to obtain a priori knowledge available for deep learning models in unlabeled historical data of grid users. The classical method of obtaining a priori knowledge from unlabeled data is cluster analysis. Although this is intuitive, how to perform data reduction and pattern discovery on high-dimensional time series data through feature extraction, and providing a more stable clustering result is a problem worthy of optimization. According to the method, discrete wavelet analysis and statistical analysis are carried out on the power load data, a comprehensive feature extraction template is provided for the problem of high-dimensional unlabeled time sequence data clustering, and a stable solution is provided for clustering analysis through the result of integrating a plurality of clusters. This helps to improve the compactness and interpretability of the cluster analysis.

The second problem to be solved is to combine historical load data with new user load data for small sample load prediction. The method carries out noise reduction based on wavelet decomposition on the historical data clustering result after averaging to obtain representative prototype time sequence data. The prototype time sequence data is used for providing priori knowledge for the algorithm to train a basic model, and fine tuning is carried out on the basic model through new user load data. This can significantly improve the accuracy of the deep learning algorithm in time sequential data prediction.

Referring to fig. 1, fig. 1 is a schematic flow chart of an embodiment of a small sample load prediction method based on clustering, in this implementation, the small sample load prediction method based on clustering may include:

s110: and extracting characteristics of the historical power load and the power load to be predicted to obtain characteristic vectors.

In this step, the user power load rule and characteristics need to be analyzed: discrete wavelet analysis and statistical analysis based on seasonality, skewness and sample entropy are carried out on the historical power load and the power load to be predicted, so that feature vectors are extracted. Referring to fig. 2, fig. 2 is a flowchart illustrating an embodiment of feature vector extraction.

When discrete wavelet analysis is performed on time series data, a part corresponding to a new user data timestamp in historical data is decomposed into an L-layer wavelet coefficient balance tree by utilizing discrete wavelet packet decomposition. The total energy of each layer of the balanced tree is defined as E. For the j-th layer contain N _j Wavelet coefficient set d of number _j Its discrete wavelet energy DWE can be expressed as:

the DWE was base 10 log to remove the correlation between discrete wavelet energies of each layer:

on the basis, discrete cosine transform DCT is further carried out on LWE coefficients of each layer in the balance tree to obtain cepstral coefficients WCC of the original time sequence data:

and performing dimension reduction integration on the three coefficients of DWE, LWE and WCC through principal component analysis PCA to obtain a new low-dimensional feature space. In addition, the original time sequence is subjected to statistical analysis on the basis of the characteristic space based on discrete wavelet analysis. First, the time series is listed as X by STL seasonal decomposition _t Decomposition into seasonal S _t Trend T _t Residual E _t Three additive components and define two indicators describing the periodicity and the trending of the time series, respectively:

secondly, in order to measure the non-gaussian property of the random process of the time sequence data, namely the thick tail characteristic of the random probability distribution, the skewness skew of the original time sequence data is calculated:

the non-linear characteristics of the time series data are measured using sample entropy. For time series data X _t ＝{x ₁ ,x ₂ ,..x _N Dividing the original sequence into N-m+1 segment sub-sequences with a window size of length m:

X _m (i)＝{x _i ,x _i+1 ,…,x _i+m-1 },1≤i≤N-m+1；

for two different subsequences with sequence numbers i and j, the distance between the two is calculated as:

a threshold r is designated, all the sub-sequence pairs divided by windows with the length of m and with the interval smaller than r are counted and recorded as N _m The method comprises the steps of carrying out a first treatment on the surface of the Counting the number of pairs of all subsequences cut by a window with length of m+1 and with a spacing less than r, and marking the number as N _m+1 . Then for a finite number N, the temporal data sample entropy is expressed as:

finally, to measure the long-term correlation of the time series, the autocorrelation of the time series is measured using a non-linear hurst index. For time sequence data after Gaussian regularization

Calculate X' _t Is the cumulative sum sequence Y of (2) _t Wherein the ith cumulative sum is denoted +.>

Thus, the hurst index can be expressed as:

s120: and integrating and clustering the historical power load and the power load to be predicted according to the obtained feature vector to obtain a clustering result.

After the feature space with lower user history load data dimension is obtained, a widely used clustering device comprising K-means, hierarchical clustering, neighbor propagation clustering, gaussian mixture model and the like is used for obtaining a stable clustering result, and the result of the clustering device is integrated into a stable clustering solution through a similarity partitioning algorithm CSPA based on clustering. Referring to fig. 3, fig. 3 is a schematic diagram illustrating an embodiment of an integrated cluster structure according to the present application.

Unsupervised clustering provides prototype data for the predictive model to learn a priori knowledge of the predictive task. The algorithm adopts an integrated clustering mode as a clustering mode, and learns and integrates clustering results of an original data set based on a K-means algorithm, hierarchical clustering, a neighbor propagation algorithm and a Gaussian mixture model to obtain data division capable of better reflecting the internal structure of the data set. The superiority of the integrated clustering on the prediction algorithm is confirmed by designing a demonstration experiment and using the prediction accuracy as an index to compare the integrated clustering with the independent clustering methods.

S130: and adopting a wavelet denoising algorithm to denoise the clustering result, and carrying out averaging treatment on the denoised data to obtain time sequence data with preset length.

Integrated clustering utilizes limited data to predict targetHistorical data marked with the same time series data characteristics are aggregated. In order to generate a priori knowledge available to the predictive model, a prototype timing data generator (hereinafter referred to as a generator) receives a plurality of pieces of historical timing data corresponding to the clustering result, and converts the historical timing data into prototype timing data available for the predictive model to learn through averaging and wavelet noise reduction. The generator first denoises the clustering result using a wavelet denoising algorithm. Specifically, the number of levels of the discrete wavelet transform is set to a maximum value that is not affected by the edge effect. The signal length is denoted as l _x The filter length is denoted as l _f The discrete wavelet transform order layer number can be expressed as:

the discrete wavelet transformed time series data will be passed through a hard threshold function to achieve noise reduction, where the threshold is denoted as T, the input signal is denoted as x, and the output signal is denoted as ρ _T (x)：

The plurality of pieces of historical time sequence data subjected to discrete wavelet noise reduction are further averaged, and a generator generates prototype time sequence data X with a specific length ^c 。

S140: inputting time sequence data with preset length into a second-order long-short-term memory neural network to obtain a prediction result of the power load; the second-order long-short-term memory neural network is trained through data of historical power loads and power loads to be predicted.

In order to fully utilize the prior knowledge collected by the feature extraction and the integrated clustering to train a prediction model, the traditional long-short-period memory neural network is expanded into a second-order long-short-period memory neural network (hereinafter referred to as a second-order LSTM). Referring to fig. 4, fig. 4 is a schematic diagram of an embodiment of a second-order long-term memory neural network according to the present application.

The second-order long-short-term memory neural network comprises a first-order LSTM and a second-order LSTM, wherein the historical power load is used for performing basic class training on the first-order LSTM, and the power load to be predicted is used for performing model fine tuning on the second-order LSTM, so that the trained second-order long-short-term memory neural network capable of predicting data is obtained.

According to the above steps, the time series data after the wavelet decomposition and noise reduction is further obtained from the historical power load and the power load to be predicted, and the true value X is obtained in one step _test Finally, the prediction result X can be obtained by combining _pred 。

Specifically, according to the integrated clustering result, historical data X of a plurality of users belonging to the same class as the predicted target is used for ^S 1,X ^S 2,…X ^S n and a small amount of training data of the predicted target

The second order LSTM will train based on both, respectively. Stage one, second order LSTM utilization generator for X ^S 1,X ^S 2,…X ^S n generated prototype timing data X ^c Training, and marking updated network weight as theta ₀ Based on the ability to adapt the neural network quickly to new prediction tasks. In the second stage, the second-order LSTM utilizes a small amount of training data with limited prediction targets to finely adjust the network weight, and the updated network weight is recorded as theta ₁ The trained second-order neural network has the future load data of the predicted target +.>

Is used for the prediction ability of the model (C).

RMSE is a commonly used accuracy index for the predicted result of time series data, if the true value of time series data is denoted as p _x The predicted value is recorded as

The RMSE may be written in the form:

to measure the comprehensive performance of the prediction algorithm on multiple pieces of time sequence data, the average MRMSE of RMSE results based on different pieces of time sequence data is used as an index to represent the prediction accuracy:

where M represents the number of timing data used to calculate the mean of the RMSE, and n represents the length of a single timing data. Fig. 5 shows the effect of data granularity M, training set sample number N and time series period T on prediction error.

In time series data prediction, granularity and periodicity of training data have significant influence on model prediction accuracy. Intuitively, the model performs better with larger granularity and provided training data covering one or more cycles of the time series. And performing verification test by using sine time sequence data with additional Gaussian noise, and representing the prediction accuracy by using MRMSE as an index. Under the condition of controlling other variables, the prediction error is inversely proportional to the data granularity M.

MRMSE∝O(N ^-1 )；

The number of data is measured by the number of samples of the training set of the prediction model, T/M represents the ratio of period to granularity, and when N=PT/M and P is any positive integer, the prediction error reaches the local minimum lower bound.

In summary, the embodiment provides a feature extraction template based on wavelet packet decomposition and statistical analysis for the problem of non-label long-term time series data cluster analysis, and provides a stable solving method for cluster analysis by an integrated clustering method; in order to obtain priori knowledge with available deep learning model and strong universality from the historical time sequence data, the clustering result of the historical time sequence data is averaged and wavelet decomposition noise reduction based on a hard threshold value is carried out; and meanwhile, the deep learning model trained through priori knowledge is finely adjusted by using small sample data with unknown labels, so that a prediction model which is customized for the small sample data and has high accuracy is obtained. The present embodiment proposes the minimum lower bound of the small sample data amount required to achieve the optimal model prediction accuracy at a specific granularity by studying the relationship between the time series data granularity and the small sample data amount.

Based on the above-mentioned small sample load prediction method based on clustering, the present application further provides a small sample load prediction device based on clustering, please refer to fig. 6, fig. 6 is a schematic structural diagram of an embodiment of the small sample load prediction device based on clustering, in this embodiment, the small sample load prediction device 100 based on clustering may include a feature extraction module 110, a cluster integration module 120, a noise reduction processing module 130, and a prediction result module 140.

Specifically, the feature extraction module 110 is configured to perform feature extraction on the historical power load and the power load to be predicted, so as to obtain a feature vector.

The clustering integration module 120 is configured to integrate and cluster the historical power load and the power load to be predicted according to the obtained feature vector, and obtain a clustering result.

The noise reduction processing module 130 is configured to reduce noise of the clustering result by using a wavelet noise reduction algorithm, and perform a mean processing on the noise reduced data to obtain time sequence data with a preset length.

The prediction result module 140 is configured to input time sequence data with a preset length into the second-order long-short-term memory neural network to obtain a prediction result of the power load; the second-order long-short-term memory neural network is trained through data of historical power loads and power loads to be predicted.

Optionally, the feature extraction module 110 is further configured to: discrete wavelet analysis and statistical analysis based on seasonality, skewness and sample entropy are carried out on the historical power load and the power load to be predicted, so that feature vectors are extracted.

Optionally, the noise reduction processing module 130 is further configured to: setting the discrete wavelet transformation order layer number; wherein the number of levels of the discrete wavelet transform is set to a maximum value that is not affected by the edge effect; and denoising the clustering result by using discrete wavelet transform.

Optionally, the cluster integration module 120 is further configured to: and obtaining a clustering result based on the K-means algorithm, hierarchical clustering, neighbor propagation algorithm and Gaussian mixture model.

Based on the clustering-based small sample load prediction method, the application also provides an electronic device, as shown in fig. 7, and fig. 7 is a schematic structural diagram of an embodiment of the electronic device. The electronic device 200 may comprise a memory 21 and a processor 22, the memory 21 being connected to the processor 22, the memory 21 having stored therein a computer program which, when executed by the processor 22, implements the method of any of the embodiments described above. The steps and principles of the method are described in detail in the above method, and are not described in detail herein.

In the present embodiment, the processor 22 may also be referred to as a CPU (central processing unit ). The processor 22 may be an integrated circuit chip having signal processing capabilities. Processor 22 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Based on the clustering-based small sample load prediction method, the application also provides a computer-readable storage medium. Referring to fig. 8, fig. 8 is a schematic structural diagram of an embodiment of a computer readable storage medium of the present application. The computer readable storage medium 300 has stored thereon a computer program 31 which, when executed by a processor, implements the method of any of the embodiments described above. The steps and principles of the method are described in detail in the above method, and are not described in detail herein.

Further, the computer readable storage medium 300 may be a usb disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic tape, or a compact disc, etc. which may store the program code.

The ability to quickly adapt to time series prediction tasks in small sample scenarios with limited data volume is an important challenge for power load prediction and other practical applications. The application provides a small sample time sequence prediction method based on second-order LSTM. According to the method, the existing power load record is utilized through integrated clustering, so that the forecasting task of the time sequence with unknown labels and sparse data size is effectively solved. Numerous studies have shown that this approach can be significantly better than its baseline on both major electrical load data sets. In addition, the performance of FSL-LSTM is demonstrated in terms of both small sample data size and data granularity.

It is to be understood that the specific embodiments described herein are for purposes of illustration only and are not limiting. Further, for ease of description, only some, but not all, of the structures associated with this application are shown in the drawings. The step numbers used herein are also for convenience of description only, and are not limiting as to the order in which the steps are performed. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The terms "first," "second," and the like in this application are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The foregoing description is only of embodiments of the present application, and is not intended to limit the scope of the patent application, and all equivalent structures or equivalent processes using the descriptions and the contents of the present application or other related technical fields are included in the scope of the patent application.

Claims

1. A cluster-based small sample load prediction method, comprising:

extracting characteristics of the historical power load and the power load to be predicted to obtain a characteristic vector;

the feature extraction of the historical power load and the power load to be predicted is carried out to obtain a feature vector, and the feature vector comprises:

performing discrete wavelet analysis and statistical analysis based on seasonality, skewness and sample entropy on the historical power load and the power load to be predicted so as to extract a feature vector;

when discrete wavelet analysis is carried out on time series data, discrete wavelet packet decomposition is utilized to decompose a part corresponding to a new user data timestamp in historical data into L layers of wavelet coefficient balance; defining the total energy of each layer of the balance tree as E; for the j-th layer contain N _j Wavelet coefficient set d of number _j Its discrete wavelet energy DWE can be expressed as:

performing dimension reduction integration on three coefficients of DWE, LWE and WCC through principal component analysis PCA to obtain a new low-dimension feature space;

according to the obtained feature vector, integrating and clustering the historical power load and the power load to be predicted to obtain a clustering result;

adopting a wavelet noise reduction algorithm to reduce noise of the clustering result, and carrying out averaging treatment on the noise-reduced data to obtain time sequence data with preset length;

inputting the time sequence data with the preset length into a second-order long-short-term memory neural network to obtain a prediction result of the power load; wherein the second-order long-short-term memory neural network is trained via the historical power load and the data of the power load to be predicted.

2. The cluster-based small sample load prediction method according to claim 1, wherein the denoising the cluster result using a wavelet denoising algorithm comprises:

setting the discrete wavelet transformation order layer number; wherein the number of levels of the discrete wavelet transform is set to a maximum value that is not affected by the edge effect;

and denoising the clustering result by using the discrete wavelet transform.

3. The clustering-based small sample load prediction method according to claim 2, wherein the integrating the historical power load and the power load to be predicted according to the feature vector to obtain a clustering result comprises:

and obtaining the clustering result based on a K-means algorithm, hierarchical clustering, a neighbor propagation algorithm and a Gaussian mixture model.

4. A cluster-based small sample load prediction apparatus, comprising:

the characteristic extraction module is used for extracting characteristics of the historical power load and the power load to be predicted to obtain a characteristic vector; the method is also used for carrying out discrete wavelet analysis and statistical analysis based on seasonality, skewness and sample entropy on the historical power load and the power load to be predicted so as to extract a feature vector;

specifically, discrete wavelet analysis and statistical analysis based on seasonality, skewness and sample entropy are carried out on the historical power load and the power load to be predicted so as to extract a feature vector;

the clustering integration module is used for integrating and clustering the historical power load and the power load to be predicted according to the obtained feature vector to obtain a clustering result;

the noise reduction processing module is used for reducing noise of the clustering result by adopting a wavelet noise reduction algorithm, and carrying out averaging processing on the noise-reduced data to obtain time sequence data with preset length;

the prediction result module is used for inputting the time sequence data with the preset length into a second-order long-short-term memory neural network to obtain a prediction result of the power load; wherein the second-order long-short-term memory neural network is trained via the historical power load and the data of the power load to be predicted.

5. The cluster-based small sample load prediction apparatus of claim 4, wherein the noise reduction processing module is further configured to:

and denoising the clustering result by using the discrete wavelet transform.

6. The cluster-based small sample load prediction apparatus of claim 4, wherein the cluster integration module is further configured to:

7. An electronic device comprising a memory and a processor, the memory being coupled to the processor, the memory storing a computer program that, when executed by the processor, implements the cluster-based small sample load prediction method of any of claims 1-3.

8. A computer readable storage medium, characterized in that a computer program is stored, which computer program, when executed, implements the cluster-based small sample load prediction method of any of claims 1-3.