CN114202012A

CN114202012A - High-dimensional load clustering method based on recursive graph and convolution self-encoder

Info

Publication number: CN114202012A
Application number: CN202111366207.7A
Authority: CN
Inventors: 邓欣宇; 王小璇; 高强伟
Original assignee: State Grid Corp of China SGCC; State Grid Tianjin Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; State Grid Tianjin Electric Power Co Ltd
Priority date: 2021-11-17
Filing date: 2021-11-17
Publication date: 2022-03-18

Abstract

The invention relates to a high-dimensional load clustering method based on a recursive graph and a convolution self-encoder, which comprises the following steps: s1, obtaining high-dimensional load power characteristics, S2, constructing a high-dimensional load characteristic enhancement model based on a recursive graph theory; s3, constructing a high-dimensional load feature extraction model based on a convolution self-encoder; and S4, constructing a high-dimensional load clustering model based on spectral clustering to obtain a high-dimensional load clustering result. The invention can convert the one-dimensional load characteristics into the two-dimensional recursion graph characteristics to realize the characteristic enhancement, and combines the convolution self-encoder to realize the characteristic extraction, thereby achieving better clustering effect by utilizing the characteristics.

Description

High-dimensional load clustering method based on recursive graph and convolution self-encoder

Technical Field

The invention belongs to the technical field of power load clustering analysis, relates to a high-dimensional load clustering method, and particularly relates to a high-dimensional load clustering method based on a recursive graph and a convolution self-encoder.

Background

The informatization and digital transformation of the propulsion power system are important tasks for the construction of the smart power grid. Under the background, in recent years, information technology and intelligent measurement technology are rapidly developed in the field of power distribution and utilization, functions of the intelligent electric meter are continuously updated, and available load data exponentially increases. In the aspect of time dimension, the sampling period of the novel intelligent electric meter is gradually shortened from 1 day to 1 hour, half an hour and 15 minutes, the available daily load data is expanded to 1 day by 24, 48 and 96 points, and an important data base is provided for cluster analysis of the electricity consumption behaviors of users. The operation, distribution and dispatching of the intelligent power grid and the development and application of the electric power marketing service business bring new opportunities for power distribution and utilization big data value mining that power supply enterprises want to comprehensively master the energy consumption information of users, and further formulate time-of-use electricity price and demand response plan, so that the users can actively participate in power grid dispatching, the economical efficiency and the reliability of power grid operation are improved, energy-saving guidance is provided for the users, and energy conservation and emission reduction are promoted. In order to achieve the above purpose, it is necessary to deeply mine the user energy utilization rule based on the power distribution and utilization big data algorithm, sense the power utilization habits of the user, and accurately grasp the future power utilization situation.

The user electricity utilization behavior cluster analysis is a process of mining the electricity utilization characteristics of users from a massive load curve by using a clustering algorithm and obtaining a typical electricity utilization mode, and is a basis for implementing load management. The users with similar power consumption modes are divided into a group through clustering, so that power supply enterprises can grasp the energy consumption requirements of the users, and more efficient demand response management can be further developed. The power supply enterprise can also innovate the service mode on the basis of analyzing the power utilization mode of the user, for example, a step power price policy is formulated to improve the energy utilization efficiency, the user is guided to improve unreasonable energy utilization habits according to the analysis result, the power cost is reduced, abnormal power utilization detection, power quality optimization, peak clipping and valley filling and the like are carried out, and therefore the intelligent, friendly and interactive power distribution and utilization mode is achieved.

However, with the increasing of the data acquisition frequency of the smart meter, the sampling period is shortened continuously, the data dimension is increased rapidly, the traditional clustering method generally adopts the Euclidean distance as the similarity measurement function, the similarity and the difference between the power consumption behaviors of the users cannot be accurately measured in the face of high-dimensional load data, and the phenomenon of wrong and missing clustering often occurs in the practical application process, so that the situation of hypodynamia is presented in the face of a high-dimensional load curve. The load clustering method based on feature dimension reduction firstly extracts key information in high-dimensional load data, and then clusters the key information, and has advantages in clustering effect and operation efficiency methods, but at present, existing research mainly focuses on dimension reduction of original one-dimensional load data, and the difficulty in feature extraction is high, and excessive information loss is easily caused, so that how to effectively extract features of high-dimensional load features, and a better clustering effect is achieved by using the features, which is a technical problem to be urgently solved by technical personnel in the field.

Through searching, no prior art document which is the same as or similar to the prior art document is found.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides a high-dimensional load clustering method based on a recursive graph and a convolution self-encoder, can convert one-dimensional load characteristics into two-dimensional recursive graph characteristics to realize characteristic enhancement, and realizes characteristic extraction by combining the convolution self-encoder, thereby achieving better clustering effect by utilizing the characteristics.

The invention solves the practical problem by adopting the following technical scheme:

a high-dimensional load clustering method based on a recursive graph and a convolution self-encoder comprises the following steps:

s1, acquiring high-dimensional load power characteristics, wherein the characteristics comprise daily load data of N users, each daily load data comprises M data points, and a load power characteristic library with the size of N multiplied by M is formed;

s2, constructing a high-dimensional load characteristic enhancement model based on a recursive graph theory: converting one-dimensional daily load characteristics in the load power characteristic library into two-dimensional recursive graph characteristics by taking a day as a unit, and forming N load recursive graph characteristics;

s3, constructing a high-dimensional load feature extraction model based on a convolution self-encoder, inputting the N load recursion graph features obtained in the step S2 into the convolution self-encoder, and training the convolution self-encoder, wherein the feature extraction dimension in the convolution self-encoder is set to be T, and after the training is finished, inputting the N load recursion graph features into the encoder of the convolution self-encoder to obtain a high-dimensional load feature extraction result, namely load key features;

s4, constructing a high-dimensional load clustering model based on spectral clustering, determining the optimal clustering number by adopting an interval statistical method, and clustering the load key characteristics obtained in the step S3 by utilizing a spectral clustering algorithm to obtain a high-dimensional load clustering result.

Moreover, the construction and training of the convolutional self-encoder in the step S3 both use tensierflow, keras deep learning toolkit in python programming language.

Moreover, the convolutional self-encoder in step S3 is composed of an encoder and a decoder, the encoder is responsible for extracting the load recursive graph features into a feature vector, and the decoder is responsible for restoring the feature vector into the original load recursive graph features; the training method of the convolution self-encoder comprises the following steps: the load recursive graph features are used as the input of the convolution self-encoder and the output of the convolution self-encoder, so that the convolution self-encoder can learn the most key information in the original load recursive graph, and the feature extraction of the load recursive graph is realized.

Also, M and N in the steps S1-S3 are both natural numbers, and M is generally greater than (or equal to) 48.

T in step S3 is a natural number, and T is smaller than M.

The invention has the advantages and beneficial effects that:

1. the invention provides a high-dimensional load characteristic enhancement method based on a recursion map, which can convert one-dimensional load characteristics into two-dimensional recursion map characteristics, is convenient for mining the stationarity and the internal similarity of a load curve, realizes characteristic enhancement and establishes a good foundation for the subsequent characteristic extraction step;

2. the invention provides a high-dimensional load characteristic extraction method based on a convolution self-encoder, which is used for performing data dimension reduction on a two-dimensional load image by utilizing the advanced characteristic extraction capability of the convolution self-encoder, reducing characteristic redundancy, extracting key load characteristics from recursive graph characteristics, reducing data volume, improving clustering efficiency and being more suitable for high-dimensional load clustering;

3. the invention provides a high-dimensional load clustering method based on spectral clustering, which is characterized in that the optimal clustering data is obtained by utilizing an interval statistical method, and the key load characteristics extracted by a convolutional self-encoder are clustered to obtain a high-dimensional load clustering result.

Drawings

FIG. 1 is a one-dimensional load signature graph provided by an embodiment of the present invention;

FIG. 2 is a two-dimensional load recursion diagram provided by an embodiment of the present invention;

FIG. 3 is a graph illustrating the results of interval statistics analysis provided by an embodiment of the present invention;

FIG. 4 is a comparison diagram of a dimension reduction algorithm provided by an embodiment of the present invention;

FIG. 5 is a comparison graph of clustering of original load curves provided by an embodiment of the present invention;

FIG. 6 is a comparison graph of clusters extracted based on RP-CAE features according to an embodiment of the present invention;

FIG. 7 is a DI index plot for different feature extraction dimensions provided by embodiments of the present invention;

fig. 8 is a DBI index graph under different feature extraction dimensions provided by the embodiment of the present invention;

fig. 9 is a high-dimensional load curve clustering result graph provided in the embodiment of the present invention.

Detailed Description

The embodiments of the invention will be described in further detail below with reference to the accompanying drawings:

the high-dimensional load clustering method based on the recursive graph and the convolution self-encoder comprises the following steps:

The construction and training of the convolutional self-encoder in the step S3 both use tensierflow and keras deep learning toolkit in python programming language;

the convolutional self-encoder in the step S3 is composed of an encoder and a decoder, the encoder is responsible for extracting the load recursive graph feature into a feature vector, and the decoder is responsible for restoring the feature vector into the original load recursive graph feature; the training method of the convolution self-encoder comprises the following steps: the load recursive graph features are used as the input of the convolution self-encoder and the output of the convolution self-encoder, so that the convolution self-encoder can learn the most key information in the original load recursive graph, and the feature extraction of the load recursive graph is realized.

M and N in the steps S1-S3 are both natural numbers, and M is generally greater than (or equal to) 48.

T in the step S3 is a natural number, and T is smaller than M.

The invention is further illustrated by the following specific examples:

as shown in fig. 1 to 9, in this embodiment, on an actual measurement data set of an erlang smart meter published by the erlang energy agency, the load clustering method of the present invention is used for load clustering, and includes the following steps:

s1, acquiring high-dimensional load power characteristics

The data set for the actual measurement of the smart electric meter in the irish embodiment includes a load curve from 2011 of 6059 household users to 2011 of 7-21-2013 of 1-6-1, and the sampling time interval of the electric meter is 30min (i.e. 48 data points are acquired in one day). Firstly, typical electricity consumption patterns of residential users are extracted, and then the load curves of all the users are subjected to cluster analysis. Because the load of residents has strong randomness and volatility, the electricity utilization habits of users cannot be accurately reflected only by using the load curve of one day, and if the selected time range is too long, the electricity utilization habits of the users may change, so the average value of the daily load data of one month of the users is used as the typical electricity utilization mode of the electricity utilization mode, and the specific time range is from 8/1/2012/8/31. In addition, because the power consumption scales of different users are different, directly clustering the power data may divide the load curves with the same power consumption law but different magnitudes into multiple classes, so the maximum and minimum normalization processing needs to be performed on the original load curves, and the power data is scaled to [0, 1], thereby ignoring the influence of the magnitudes on clustering. 6059 resident users are totally collected in the original Ireland intelligent electric meter data set, after 8 invalid users with constant power of 0 are removed, the dimension of a load matrix to be clustered is 6051 multiplied by 48, and a load power characteristic library with the size of 6059 multiplied by 48 is formed

And S2, constructing a high-dimensional load characteristic enhancement model based on the recursive graph theory. And converting the one-dimensional daily load characteristics in the load power characteristic library into two-dimensional recursive graph characteristics by taking a day as a unit, and forming 6051 load recursive graph characteristics. By way of example, a one-dimensional daily burden feature and a two-dimensional recursion plot feature are shown in fig. 1 and 2, respectively.

And S3, constructing a high-dimensional load feature extraction model based on the convolution self-encoder. Inputting the load recursion graph characteristics obtained in the step S2 into a convolution self-encoder, and training the convolution self-encoder, wherein the characteristic extraction dimension in the convolution self-encoder is set to 15. And after the training is finished, inputting the load recursive graph characteristics into an encoder of a convolution self-encoder to obtain a high-dimensional load characteristic extraction result, namely load key characteristics.

The network structure for constructing the convolutional auto-encoder in the embodiment is shown in table 1. The encoder part is provided with a layer of convolution layers, the size of the convolution kernel is (3 multiplied by 3), the number of the convolution kernels is 16, and the size of the maximum pooling layer is (2 multiplied by 2). Subsequently, two-dimensional features are converted into one-dimensional features through a Fattlen layer, and the features are compressed to 15 dimensions at a density layer (feature extraction dimensions will be discussed in section 2.4.5), so as to obtain encoded feature vectors. The decoder part first upscales the features through the Dense layer so that the feature dimensions coincide with those of the Fattlen layer output vector. The Reshape layer is then used to convert the one-dimensional feature vector into a two-dimensional feature, the dimensions being consistent with the Fattlen layer input vector. The two-dimensional features are then upsampled to achieve the inverse of the maximum pooling layer, with a window size of (2 x 2). Finally, two layers of convolution layers are arranged, the number of convolution kernels is respectively 16 and 1, the window size is (3 multiplied by 3), the former realizes the inverse transformation of a decoder, and the latter aims to convert the multilayer two-dimensional characteristics into one layer to realize the characteristic reconstruction. The results of the linear regression are transformed between 0 and 1 using the "Relu" as the activation function of the neural network in both the encoding and decoding parts and the "Sigmoid" function in the output part (i.e. the last convolutional layer). The network initial learning rate was set to 0.01, using "Adam" as the optimizer, and the Mean Square Error (MSE) as the loss function.

TABLE 1 convolutional self-encoder network architecture

And S4, constructing a high-dimensional load clustering model based on spectral clustering. And (4) determining the optimal clustering number by adopting an interval statistical method, and clustering the load key characteristics obtained in the step S3 by utilizing a spectral clustering algorithm to obtain a high-dimensional load clustering result.

The interval statistics was used to determine the optimal cluster number, and the results are shown in fig. 3. Satisfy Gap_n(k)≥Gap_n(k+1)-s_k+1The minimum k value of (2) is 10, and therefore this embodiment takes 10 as the number of clusters.

For example, in this embodiment, Davies-Bouldin Index (DBI) and Dunn Index (Dunn Index, DI) are used to evaluate clustering effect, specifically as follows:

(1)DBI

the DBI is calculated as follows

Wherein K is a cluster number,

is the average of the distances between samples in class i, c_iIs the cluster center of class i. The smaller the DBI, the smaller the intra-class distance, and the larger the inter-class clustering, the better the clustering effect.

(2)DI

DI is calculated as follows

Wherein d is_min(C_i,C_j) Represents the minimum distance, dim (C), between class i and class j samples_l) Is the maximum distance between samples within class l. The larger the DI, the larger the inter-class distance, and the smaller the intra-class distance, the better the clustering effect.

The following are detailed experimental results:

1. comparison of dimension reduction algorithm

In order to verify the effectiveness of the present invention, different dimension reduction algorithms are selected to reduce the original 48-dimensional load data to 15 dimensions, and spectral clustering is implemented, so as to compare the clustering effects of the different dimension reduction algorithms. For convenience of description, the feature extraction method provided by the present invention is denoted as RP-CAE (recursive Plot and conditional Auto-encoder), and the basic introduction and parameter setting of the comparison method are as follows:

(1) principal Component Analysis (PCA): PCA is a statistical method, which converts a group of variables with correlation into linearly independent variables by means of orthogonal transformation to obtain principal component components. Because PCA is a linear dimensionality reduction algorithm, most of real high-dimensional data is linear inseparable, and dimensionality reduction quality is poor. Therefore, the Kernel principal component analysis (Kernel-PCA) firstly maps the original data to a high-dimensional feature space and then carries out principal component extraction, thereby realizing nonlinear dimension reduction and improving the dimension reduction effect. This section sets the Kernel function of Kernel-PCA to "RBF".

(2) UAE: the basic principle of UAE is described in section 2 of this chapter, this section sets a three-layer UAE, the input and output layer dimensions are both original load dimensions 48, the feature extraction layer dimension is 15, and the remaining parameters are consistent with table 1.

(3) LSTM self-encoder (LSTM-AE): LSTM-AE is a dimension reduction method combining the time sequence memory capability of LSTM and the nonlinear feature extraction capability of a self-encoder, the network structure and parameters of the section are set to be the same as those of UAE, and only the full connection layer in the network is replaced by an LSTM layer.

Fig. 4 shows the comparison of clustering indexes after different dimension reduction algorithms are combined with spectral clustering, wherein the smaller the DBI index is, the larger the DI index is, the better the clustering effect is. As can be seen from FIG. 4, better clustering effect is obtained after the dimension reduction is performed on the load data, and the effectiveness of the dimension reduction strategy is verified. In several dimension reduction algorithms, the method based on the self-encoder is superior to the traditional PCA and Kernel-PCA, and shows that compared with a statistical method, the characteristic vector extracted by the neural network model can reflect the characteristics of an original load curve better, and the dimension reduction effect is better. In the feature extraction method based on the self-encoder, the common UAE effect is the worst, the LSTM self-encoder can capture the time sequence characteristic of a load curve, so the effect is improved.

2. Clustering algorithm comparison

To verify the advancement of spectral Clustering, this embodiment compares it with the traditional Clustering algorithm, including fuzzy C-means Clustering, Gaussian Mixture Model (GMM) Clustering, and Hierarchical Clustering (HAC). The comparative experiment was divided into two parts: the original 48-dimensional daily load curve is clustered, and the result is shown in FIG. 5 after the RP-CAE is used for dimensionality reduction. As can be seen from fig. 6, compared with the conventional clustering algorithm, spectral clustering has the best effect, and the effectiveness of spectral clustering for load clustering is verified. Meanwhile, the four clustering algorithms under the two clustering strategies are consistent in performance, after RP-CAE dimensionality reduction is carried out on original load data, the effects of the four clustering algorithms are remarkably improved, and the effectiveness of the feature extraction method provided by the invention is further verified.

3. Influence of feature extraction dimensionality on clustering effect

In order to study the influence of the feature extraction dimension on the clustering effect, the embodiment implements feature extraction of different dimensions on the original 48-dimensional load data by modifying the network structure of the convolutional self-encoder, and the obtained change of the clustering validity index along with the feature extraction dimension is shown in fig. 7 and 8. As can be seen from fig. 7 and 8, as the feature extraction dimension increases, the clustering effect increases first and then decreases, and the best effect is achieved when the dimension is 15. The number of clusters selected in this embodiment is 10, and when the feature extraction dimensionality is much larger than the number of clusters, feature redundancy still exists, and the purpose of data dimension reduction is not achieved, so that the clustering effect is poor. When the feature extraction dimension is far smaller than the number of the clustering clusters, the feature vectors are difficult to comprehensively represent the load curve characteristics, the difference between the power utilization behaviors of the users cannot be distinguished, and the phenomenon of missing classification in the clustering is easily caused. Therefore, the suitable feature extraction dimension is presumed to be slightly higher than the number of clustering clusters, so that the electricity utilization features of the user can be reflected as far as possible while the dimension reduction of the data is realized, and the clustering effect is enhanced.

4. High dimensional load clustering results

In this embodiment, power consumption behavior analysis is performed based on the proposed load clustering method, and each cluster obtained by clustering represents one type of power consumption behavior, and the result is shown in fig. 9. From the clustering result, the outline of each cluster is sharp, the shape difference with other clusters is obvious, and the effect is better. From the characteristic of the load curve, the electricity utilization behaviors of residents are diversified, and single-peak, double-peak and stable electricity utilization modes exist. In this embodiment, the average value of the curves in the cluster is defined as a typical power consumption pattern, and the division result of the typical power consumption pattern of the user is shown in table 2.

Claims

1. A high-dimensional load clustering method based on a recursive graph and a convolution self-encoder is characterized in that: the method comprises the following steps:

2. The high-dimensional load clustering method based on the recursive graph and the convolution self-encoder as claimed in claim 1, wherein: the construction and training of the convolutional self-encoder in step S3 both use tensierflow, keras deep learning toolkit in python programming language.

3. The high-dimensional load clustering method based on the recursive graph and the convolution self-encoder as claimed in claim 1, wherein: the convolutional self-encoder in the step S3 is composed of an encoder and a decoder, the encoder is responsible for extracting the load recursive graph feature into a feature vector, and the decoder is responsible for restoring the feature vector into the original load recursive graph feature; the training method of the convolution self-encoder comprises the following steps: the load recursive graph features are used as the input of the convolution self-encoder and the output of the convolution self-encoder, so that the convolution self-encoder can learn the most key information in the original load recursive graph, and the feature extraction of the load recursive graph is realized.

4. The high-dimensional load clustering method based on the recursive graph and the convolution self-encoder as claimed in claim 1, wherein: m and N in the steps S1-S3 are both natural numbers, and M is generally greater than or equal to 48.

5. The high-dimensional load clustering method based on the recursive graph and the convolution self-encoder as claimed in claim 1, wherein: t in the step S3 is a natural number, and T is smaller than M.