CN111723876B

CN111723876B - Load curve integrated spectrum clustering method considering double-scale similarity

Info

Publication number: CN111723876B
Application number: CN202010699981.9A
Authority: CN
Inventors: 万灿; 徐胜蓝; 于建成; 曹照静
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2020-07-20
Filing date: 2020-07-20
Publication date: 2021-09-28
Anticipated expiration: 2040-07-20
Also published as: CN111723876A

Abstract

The invention discloses a load curve integrated spectrum clustering method considering double-scale similarity, which calculates load morphological change similarity through cosine distance of load differential vector and constructs a double-scale similarity measurement mode for measuring load similarity; the clustering performance improvement and the effective combination of the two measurement modes are realized through integrated clustering, the spectrum clustering is used as a basic clustering model generation method, the diversity of the basic clustering is ensured by selecting different similarity measurement modes, setting different cluster numbers and random operation, a weighted consistency matrix and the spectrum clustering are used as a clustering integration strategy, a Davison baudin index DBI or a new index MDBI is used as a clustering evaluation index in the clustering integration process, a consistency matrix is calculated by taking the reciprocal of the DBI or the MDBI as a weight self-adaptive setting basis, and the final integrated clustering division is realized through the spectrum clustering. The method has excellent clustering effectiveness and robustness, and can avoid the defect that a single spectral clustering method needs parameter optimization for different data sets.

Description

Load curve integrated spectrum clustering method considering double-scale similarity

Technical Field

The invention relates to a load curve integrated spectrum clustering method considering double-scale similarity, and belongs to the field of load characteristic analysis of power systems.

Background

Under the background of urban energy Internet, the construction perfection of an electricity utilization information acquisition system and a scheduling, operation and inspection and marketing business system promotes the rapid accumulation of electric power data resources. Valuable information such as energy use characteristics is hidden in the power data, and data analysis technology needs to be applied to mining. Clustering is used as an unsupervised learning technology, is suitable for being applied to classification of label-free load curves, provides classification results according to load characteristic differences for power enterprises, helps the power enterprises to accurately master user energy behavior rules, and provides powerful support for applications such as demand side response, load prediction and power utilization abnormity detection.

The load characteristic analysis by the clustering technology has a deeper research foundation. The related research of load clustering mainly focuses on the following three aspects: 1) a load clustering method. The effectiveness of the load clustering result is the key for ensuring the application value of the load clustering result, and how to design a proper load clustering method to improve the load clustering quality is one of research hotspots. 2) A load similarity measure. And a reasonable distance measurement mode is selected according to the load clustering purpose, the similarity of load characteristics of different users is measured, and the clustering effect is more accurate and effective. 3) And (5) load data feature extraction. The low-dimensional characteristics capable of effectively reflecting the load characteristic difference are extracted from the high-dimensional load curve data, and the load clustering quality and efficiency can be improved.

Euclidean distance is a classical similarity measure in load clustering. In the transformer substation load characteristic clustering analysis, the transformer substation load characteristic is represented by transformer substation user composition and curve Euclidean distance. Research literature exists for applying a fast search density peak value clustering method to load clustering and introducing a histogram equalization technology to improve the load clustering effect. Research literature has been conducted to perform two-layer clustering by combining partitional clustering and hierarchical clustering, so that the advantages of the two-layer clustering method are complemented, and the load clustering effectiveness is optimized. The load clustering analysis research adopts the Euclidean distance as the measurement basis of load similarity, but the Euclidean distance focuses on calculating curve distance, and the load clustering analysis research has limitation in the aspect of mining the similarity of load curve form changes. In order to improve the similarity measurement problem, research literature measures load similarity from both the distance and morphological characteristics, and performs load classification by using a spectral clustering method. Research literature introduces dynamic time warping distance and cross-correlation methods to improve the morphological similarity calculation mode of the load sequence. The research literature introduces a concept of typical time warping distance, and the concept is combined with a Gaussian kernel function to represent the similarity of a load sequence in a space-time dual-scale mode, so that a spectrum multi-manifold clustering method is improved. At present, a single clustering method or a two-layer clustering method is mostly adopted for load clustering research, but the methods generally have limitations, for example, parameters need to be debugged again for different data sets, adaptability to different data structures is different, multiple parameters need to be debugged, and the like, and the problems can cause adverse effects on load clustering quality.

Disclosure of Invention

In order to solve the summarized problem in the background technology, the invention provides a load curve integrated spectrum clustering method considering double-scale similarity.

In order to achieve the purpose, the invention adopts the following technical scheme:

a load curve integrated spectral clustering method considering double-scale similarity is characterized in that a double-scale similarity measurement mode is constructed by combining difference cosine distance and Euclidean distance, a differential basis clustering model is constructed by adopting a spectral clustering method based on double-scale similarity measurement, and clustering integration is realized by using a consistency matrix based on self-adaptive weighting of intra-cluster evaluation indexes and spectral clustering.

The specific method comprises the following steps:

firstly, calculating load form change similarity through cosine distance of a load differential vector, and constructing a dual-scale similarity measurement mode for measuring load similarity so as to make up for the deficiency of similarity of Euclidean distance measurement load characteristics; then, taking spectral clustering as a generation method of a base clustering model, setting different clustering numbers and randomly operating to construct a differential base clustering model by selecting different similarity measurement modes, and ensuring the diversity of the base clustering model; and finally, taking the weighted consistency matrix and the spectral clustering as a clustering integration strategy, taking the Theisenberg index DBI or a new index MDBI as a clustering evaluation index in the clustering integration process, taking the reciprocal of the DBI or the MDBI as a weight self-adaptive setting basis to calculate the weighted consistency matrix, then realizing the final integrated clustering division by the spectral clustering, and realizing the clustering performance improvement and the effective combination of the two measurement modes by the integrated clustering.

In the above technical solution, further, in order to avoid the influence of the load curve amplitude difference on the load form similarity calculation result, before the method is operated, the maximum normalization processing is performed on the load curve, and the specific method includes:

the power consumption of different users is different, the daily load curves are inconsistent in amplitude and even greatly different, but the load clustering classification is based on the similarity of load forms, and the curve amplitude has no significance in similarity calculation. In order to avoid the influence of the curve amplitude difference on the similarity calculation result, the load curve needs to be normalized first.

Assuming that there are m load curves in the load dataset, and the dimension of the load curve is n, all load samples will be classified into k clusters during clustering. Processing the load data by adopting a maximum value normalization method, wherein the definition is shown as the following formula:

in the formula, x_ijNormalizing the jth dimension value of the ith load curve original data to obtain a value;

representing the j dimension value of the original data of the i-th load curve;

represents the maximum value in all dimensions of the raw data of the ith load curve.

Furthermore, a first-order difference operation is performed on the normalized load curve data, and then a cosine distance of a first-order difference vector of the load, namely a difference cosine distance, is calculated, so as to reflect the consistency of the morphological changes of the two load curves, and the specific method comprises the following steps:

the normalized load curve data is subjected to first-order difference operation, and power change vectors reflecting the morphological change characteristics of ascending, descending, stability and the like of each load curve can be extracted. The cosine distance is calculated by cosine similarity, and the cosine similarity is used for measuring the similarity of two vectors through the cosine value of an included angle between the two vectors in a vector space. The cosine distance represents the relative difference in the vector direction, and the cosine distance of the first-order difference vector of the load can be used for reflecting the consistency of the form change of the two load curves. The value range of the difference cosine distance is [0,2], and the smaller the value is, the higher the similarity of the form changes of the two load curves is.

The first order difference operation of the load is defined as:

in the formula (I), the compound is shown in the specification,

and the j-th dimension value of the ith differential load vector after the first-order differential operation is represented.

The differential cosine distance of the load is defined as:

dc_ii′＝1-c_ii′

in the formula, dc_ii'The cosine distance of the ith and ith 'load differential vectors is represented, namely the differential cosine distance of the ith and ith' load curves; c. C_ii'Representing cosine similarity of the ith and ith' load differential vectors;

is the ith load difference vector;

a 2-norm representing the ith load difference vector; in the second expression, the multiplication sign of the numerator represents the vector dot multiplication, and the multiplication sign of the denominator represents the numerical multiplication.

Further, a load curve comprehensive distance based on double-scale similarity is constructed by combining the difference cosine distance and the Euclidean distance, the load distance and the similarity degree of morphological change are considered, the comprehensive distance can be obtained by a linear function, and the specific method comprises the following steps of:

the composite distance is defined as:

ds_ii′＝a_e·de_ii′+a_c·dc_ii′·r

in the formula (ds)_ii'Representing the integrated distance of the ith and ith' load curves; de_ii'The Euclidean distance of the ith and ith' load curves; dc_ii'The difference cosine distance of the ith and ith' load curves; a is_e、a_cRespectively the weight coefficients of the Euclidean distance and the difference cosine distance when calculating the comprehensive distance, considering that the two similarities are effective measurement modes, a_eAnd a_cAll are taken as 0.5; because the Euclidean distance and the difference cosine distance are different in value range, the difference cosine distance needs to be amplified by r times, and r is a proportionality coefficient.

Since the minimum limit of the load differential cosine distance and the euclidean distance are both 0, and the maximum limit is not consistent, the proportionality coefficient r is calculated by the following formula:

in the formula, de_max、dc_maxRespectively representing the maximum value of the euclidean distance and the maximum value of the differential cosine distance of all load curves in the data set.

Selecting a Davison baud index DBI and adjusting a landed index ARI as internal and external evaluation indexes of a clustering effect of the method, and considering that Euclidean distances are adopted in a classic DBI formula to measure distances of different data samples, the result effectiveness of the clustering method adopting other similarity measurement modes cannot be accurately evaluated, so that the comprehensive distance is applied to distance calculation of the DBI, and the specific method is as follows:

Davies-Bouldin Index (DBI), proposed by David, Torvis and Tangnad burger to evaluate the cluster validity, also called classification accuracy Index. The DBI comprehensively considers the similarity of the samples in the clusters and the difference of the samples among the clusters, the smaller the value of the DBI, the higher the clustering effectiveness, and the specific definition is as follows:

in the formula, de_iRepresents the mean Euclidean distance, de (C), of the sample of class i to its class center_i,C_j) Representing the euclidean distance of class centers for class i and j.

Adjusting the landed Index (ARI) is a common external evaluation Index for clustering, which performs an evaluation of clustering validity by calculating the number of sample pairs allocated to the same or different clusters in the true label and clustering result, and is specifically defined as:

wherein RI represents a Lande index; TP represents the number of sample pairs that are classified as the same class in the true label and also as the same class in the clustering result; TN represents the number of sample pairs classified as heterogeneous in the real label and also classified as heterogeneous in the clustering result;

representing the number of combinations of any two samples taken from the m load samples. E (RI) is the expected value of RI, and max (RI) represents the maximum value of RI. ARI value range is [ -1,1]The larger the value is, the closer the clustering result is to the real situation, and ARI is 1, which indicates that the clustering result is consistent with the real label.

Selecting a davison fortunei index DBI and adjusting a landed index ARI as internal and external evaluation indexes of a method clustering effect, considering that the Euclidean distance is adopted in a classic DBI formula to measure the distance of different data samples, and the result effectiveness of a clustering method adopting other similarity measurement modes cannot be accurately evaluated, so that the comprehensive distance is applied to distance calculation of the DBI to construct a new index (Modified DBI, MDBI), namely:

in the formula, MDBI is a new index used for evaluating the effectiveness of the load clustering result considering the double-scale similarity; ds_iThe average integrated distance from the sample in the ith class to the class center of the sample; ds (C)_i,C_j) Indicating the combined distance of class centers for the ith and jth classes.

Further, the construction method of the differentiated base clustering model comprises the following steps:

the spectral clustering method is evolved from graph theory, a data sample is regarded as a distribution point in space, the points are connected by edges with weights, and the weight values of the edges are in direct proportion to the similarity between the points of the data sample. The method is characterized in that a nondirectional weight graph formed by a space inner point and a weighted edge is subjected to graph cutting by spectral clustering, and the main aim is to enable the weight values of the edges among different subgraphs to be as low as possible after graph cutting and enable the weight values of the edges in the subgraphs to be as high as possible. The spectral clustering performance is excellent, and the adaptability to data distribution is strong.

In spectral clustering, the edge weight of an undirected graph is represented by a similarity matrix, and in most spectral clustering methods, a Gaussian kernel function is adopted to calculate the similarity matrix, namely:

in the formula, s_ii'The element value of the ith row and the ith 'column of the similarity matrix, namely the weight value of the edge between the ith data sample point and the ith' data sample point; d_ii'Distance representing the ith and ith' load curves; σ is a scale parameter of the kernel function.

In the spectral clustering method, the similarity measurement among different load curves is mainly embodied in d of a similarity matrix_ii'In general, d_ii'And the squared Euclidean distance is adopted, so that the squared Euclidean distance between the inner class and the outer class is optimized when the load class cluster is divided by the spectral clustering under the condition. Spectral clustering method for measuring load similarity by using differential cosine distance to calculate similarity matrix by using differential cosine distance instead of squared Euclidean distanceThe similarity matrix is defined as follows:

the base clustering result can be generated by adopting different clustering methods, setting different cluster numbers, randomly running for multiple times and the like. Selecting spectral clustering as a base clustering method, taking a fixed value for a scale parameter (an empirical value taken from an experimental result, specifically, evaluating the quality of a result according to an evaluation index, and then selecting a scale parameter which is good in performance in a plurality of data sets as a fixed value according to the result), and ensuring the diversity of a base clustering model through the following three aspects: 1) the similarity measurement mode adopts Euclidean distance or difference cosine distance; 2) setting different cluster numbers with the value range of [ k ]_min，k_max]Each of which is an integer; 3) the method for setting each pair of parameter combinations of the first two parameters randomly runs for many times, and the number of times is p. In an undirected graph segmentation mode of spectral clustering, an NCut graph segmentation method is selected to process an undirected weight graph obtained by a similar matrix, and a feature matrix obtained after a dimension reduction in the graph segmentation process is clustered by k-means.

Further, the method for integrating the base clustering model by adopting the weighted consistency matrix method comprises the following steps:

the consistency matrix method is a widely used classical clustering integration strategy, which converts a base clustering model into an m × m consistency matrix by calculating the probability that different samples are divided into the same type of clusters in all the base clustering models:

in the formula, con_ijThe value of the ith row and the ith' column of the consistency matrix; b represents the number of the base clustering models; i { } is an indicator function, and when the formula is established in brackets, the value is 1, otherwise, the value is 0; l is_b(i) A class cluster label representing the ith sample in the b-th basis clustering model.

When the set of base clustering models contains low-validity members, the validity of each base clustering model is not considered, and the performance of the integrated clustering method is adversely affected by simple integration. Therefore, the cluster evaluation indexes of different base cluster models need to be combined, the clustering effectiveness of the different base cluster models is considered in the consistency matrix calculation process to perform adaptive weight setting, and the influence of the different base cluster models on the integrated clustering is adjusted.

When only the distance difference of the curves is considered and the clustering performance is optimized through integrated clustering, the DBI can be adopted to calculate the weight of the base clustering model; and when the distance of the load curve and the form change difference are comprehensively considered, the MDBI can be adopted to calculate the base clustering weight. The smaller the DBI and the MDBI are, the higher the effectiveness of the clustering result is represented, so the weight value of the base clustering model is the reciprocal of the DBI or the MDBI of the corresponding base clustering model. The weighted consistency matrix is defined as follows:

in the formula, w_bThe weight of the b-th base clustering model when the weighted consistency matrix is calculated; in_bAnd the cluster evaluation index of the b-th base cluster model can be represented as DBI or MDBI. The second equation scales the weights of the basis clustering models to make the sum of the weights 1, so as to make the value range of the elements of the weighted consistency matrix be [0, 1%]。

The weighted consistency matrix can be regarded as a similarity matrix reflecting the similarity of the samples, and the similarity matrix is processed through spectral clustering. Like the basis clustering algorithm, the spectral clustering in the integration process also adopts the pattern cutting mode of NCut and selects k-means to cluster the feature matrix.

The invention has the beneficial effects that:

according to the load curve integrated spectral clustering method considering the double-scale similarity, the spectral clustering algorithm is improved through the integrated learning idea, the cluster quality of load clustering is improved, the clustering effectiveness is excellent, the integrated spectral clustering is more stable in performance in different data sets, the robustness is excellent, and the defect that a single spectral clustering algorithm needs to debug scale parameters again for different data sets is overcome; the integrated spectral clustering method effectively combines the Euclidean distance and the difference cosine distance through the integration of the differential basis clustering model, comprehensively considers the similarity of the load double-scale, and can more accurately and effectively mine the load form change information reflecting the load energy consumption mode; and the effectiveness and robustness of the load clustering are further optimized through effective weighting of the base clustering in the base clustering integration process.

Drawings

FIG. 1 is a frame diagram of a load curve integrated spectral clustering method considering two-scale similarity;

FIG. 2 is a schematic diagram of a load curve integrated spectral clustering result considering dual-scale similarity of a autumn load data set;

FIG. 3 is a schematic diagram of a data set D1;

fig. 4 is a schematic diagram of a data set D2.

Detailed Description

The invention is further described with reference to the accompanying drawings and examples.

The framework of the load curve integrated spectral clustering method considering the two-scale similarity is shown in FIG. 1.

(1) Firstly, processing load data by adopting a maximum value normalization method, wherein the definition is shown as the following formula:

representing the j dimension value of the original data of the i-th load curve;

And performing first-order difference operation on the normalized load curve data, and calculating the cosine distance of the first-order difference vector of the load, namely the difference cosine distance, so as to reflect the consistency of the morphological changes of the two load curves. And then, a load curve comprehensive distance based on double-scale similarity is constructed by combining the difference cosine distance and the Euclidean distance, the load distance and the similarity degree of morphological change are considered, and the comprehensive distance can be obtained through a linear function.

(2) Selecting spectral clustering as a base clustering method, taking a fixed value for a scale parameter of the spectral clustering, selecting an NCut graph cutting method to process a undirected weight graph obtained by a similar matrix in an undirected graph cutting mode of the spectral clustering, and selecting k-means to cluster a feature matrix obtained after dimension reduction in the graph cutting process.

The diversity of the base clustering model is ensured by the following three aspects:

1) the similarity measurement mode adopts Euclidean distance or difference cosine distance. And adopting a Gaussian kernel function to calculate the similarity matrix.

2) Setting different cluster numbers with the value range of [ k ]_min，k_max]Each of which is an integer.

3) The method for setting each pair of parameter combinations of the first two parameters randomly runs for many times, and the number of times is p.

(3) And calculating a weighted consistency matrix by combining the clustering evaluation indexes of different base clustering models.

When only the distance difference of the curves is considered and the clustering performance is optimized through integrated clustering, the DBI can be adopted to calculate the weight of the base clustering model; and when the distance of the load curve and the form change difference are comprehensively considered, the MDBI can be adopted to calculate the base clustering weight. And the weight value of the base clustering model is the reciprocal of the DBI or the MDBI of the corresponding base clustering model. The weighted consistency matrix is defined as follows:

in the formula, w_bThe weight of the b-th base clustering model when the weighted consistency matrix is calculated; in_bAnd the cluster evaluation index of the b-th base cluster model can be represented as DBI or MDBI. The second equation scales the basis cluster model weights to sum to 1.

(4) The similarity matrix is processed by spectral clustering. Like the basis clustering algorithm, the spectral clustering in the integration process also adopts the pattern cutting mode of NCut and selects k-means to cluster the feature matrix.

(5) Evaluating the integrated spectral clustering model based on clustering evaluation indexes, wherein the evaluation indexes comprise: internal evaluation indices DBI and MDBI, external evaluation ARI. And selecting the clustering number by an index optimal method.

An example is constructed by adopting actual measurement user load data of four seasons of a certain city in south China for one day, and the data sampling interval is 15 min. The data of the sample calculation after data preprocessing comprises 1565 users, and the sample calculation comprises various load types such as industry, business, residents and the like.

(1) Integrated spectral clustering and internal evaluation index verification considering only distance difference

And comparing the performance of the multi-class load clustering algorithm with the performance of the integrated spectrum clustering algorithm considering DBI weighting on the four-season load data set. The comparison algorithm comprises the following steps: 1) a k-means algorithm for measuring similarity by Euclidean distance, which is called kmeu for short; 2) a spectral clustering algorithm which measures similarity by Euclidean distance and has fixed scale parameters is called speu for short; 3) a spectral clustering algorithm which measures similarity by Euclidean distance and optimizes scale parameters is called speu-gamma for short; 4) a two-layer clustering algorithm, km-ag for short; 5) and (4) an integrated spectral clustering algorithm without considering index weighting, which is called ESC-1 for short.

The specific algorithm parameter settings are shown in table 1. Wherein the number of clusters is selected as [ k ]_min，k_max]Each inAn integer number; considering that clustering cluster number is too small to cause cluster meaningless, the minimum value k_minAll take the value of 3; in order to ensure the diversity of the base clustering model and considering that the load optimal cluster number in most researches is single digit, the maximum value k of the cluster number_maxAll take the value of 9; in the speu algorithm, a scale parameter sigma is fixed, and the scale parameter sigma is selected through experiments to make the algorithm show better in most data sets: take γ 1/2 σ²1.0. For all algorithms, the set parameter combinations are adopted to randomly run for 20 times, and the result of DBI optimization is obtained.

TABLE 1 Algorithm parameter set

Table 2 shows DBI of various load clustering algorithms. As can be seen from table 2: in four data sets, 1) the integrated spectral clustering indexes considering DBI weighting are superior to a speu algorithm and a kmeu algorithm, compared with the speu algorithm, the indexes of the invention are respectively improved by 0.62%, 0.78%, 2.75% and 0.43%, and compared with the kmeu algorithm, the indexes are respectively improved by 30.2%, 41.3%, 27.7% and 9.67%, and the integrated spectral clustering can improve the clustering effectiveness by depending on an integrated learning idea; 2) mostly, indexes of the speu-gamma algorithm are superior to those of the speu algorithm, but optimal scale parameters of the speu-gamma algorithm are inconsistent in different data sets, and thus, the fact that the spectral clustering needs to debug the scale parameters again for different load data sets is proved; 3) indexes of the kmeu algorithm and the km-ag algorithm are inferior to those of the spectral clustering algorithm, and the index phase difference values are-0.293 and-0.223 in the mean values of all data sets respectively; 4) in the data set of spring, summer and autumn, the index of the invention is superior to the speu-gamma algorithm, and in the data set of winter, the index of the invention is inferior to the speu-gamma algorithm; 5) the indexes of the integrated spectral clustering algorithm ESC with the DBI weighting considered are superior to those of the integrated spectral clustering algorithm ESC-1 without the weighting considered, the indexes of the ESC-1 algorithm in the data sets of summer and winter are inferior to those of the speu algorithm, the reason is that the DBI index of the spectral clustering result based on the difference cosine distance in the basic clustering model is poor, the integrated clustering performance is influenced, and the integration without considering the effectiveness of the basic clustering model is proved to influence the effectiveness and the robustness of the integrated clustering method.

TABLE 2 clustering result DBI comparison of six classes of algorithms

(2) Integrated spectral clustering and internal evaluation index verification considering double-scale similarity

And comparing the performance of the multi-class load clustering algorithm with the performance of the integrated spectral clustering method considering the similarity of double scales on the four-season load data set. The comparison algorithm comprises the following steps: 1) a k-means algorithm for measuring similarity by Euclidean distance, which is called kmeu for short; 2) a k-means algorithm for measuring similarity by using differential cosine distance, which is abbreviated as kmco; 3) a spectral clustering algorithm which measures similarity by Euclidean distance and has fixed scale parameters is called speu for short; 4) a spectral clustering algorithm which measures similarity by using differential cosine distance and has fixed scale parameters is called spco for short; 5) a spectral clustering algorithm which measures similarity by comprehensive distance and optimizes scale parameters is called spec-gamma for short; 6) and a two-layer clustering algorithm, namely km-ag for short.

The algorithm parameters are shown in table 3. For all algorithms, the set parameter combinations are adopted to randomly run for 20 times, and the optimal result of the MDBI is obtained.

TABLE 3 Algorithm parameter set

Table 4 shows the DBI of each type of load clustering algorithm. As can be seen from table 4: in four data sets, 1) the integrated spectral clustering algorithm ESC indexes considering MDBI weighting are superior to other algorithms, the MDBI is respectively promoted by 0.45%, 18.68%, 4.42% and 0.43% compared with the spec-gamma algorithm, and is respectively promoted by 0.23%, 1.84%, 9.32% and 2.33% compared with the speu algorithm, so that the effectiveness of the integrated spectral clustering is superior to that of a single spectral clustering algorithm when load double-scale similarity clustering is comprehensively considered, and the robustness is superior; 2) the spec-gamma algorithm is superior to the speu algorithm only in MDBI performance in autumn and winter, and the defect of robustness of a single spectral clustering algorithm is proved; 3) the optimal scale parameters of the spec-gamma algorithm in the four data sets are inconsistent, and the scale parameters of the spectral clustering which need to be debugged again for different load data sets are verified again; 4) the classical k-means algorithm and the two-layer clustering algorithm have the km-ag index which is inferior to the performance of the spectral clustering algorithm.

TABLE 4 seven-class Algorithm clustering results MDBI comparison

Fig. 2 shows the load curve integrated spectral clustering results considering dual-scale similarity for the autumn load dataset. It can be seen that the integrated spectral clustering method classifies autumn loads into three clusters, and three typical load forms can be respectively summarized as a single peak type, a peak avoiding type I and a peak avoiding type II. The first type of load has morphological characteristics that the load climbs in the morning, the load is relatively stable in the daytime, and the load is reduced in the evening and in the morning; the morphological characteristics of the second type of load mainly show that the load is rapidly reduced at early morning and relatively stable in other periods; the third category of load is characterized by a rapid fall in the early morning and a rapid rise in the evening. The three types of loads have larger difference in distance and form change, so that the load curve integrated spectrum clustering result considering the similarity of double scales is reasonable and effective.

(3) Integrated spectral clustering and external evaluation index verification considering double-scale similarity

Two new example datasets were constructed, each as follows: 1) data set D1, number of load class clusters k₁6, each comprising 5 to 30 curves, which are different, for a total of 105 load curves; 2) data set D2, number of load class clusters k₂Each class contains about 20 curves for a total of 160 load curves. Given the two dataset truth classification labels as shown in fig. 3 and 4.

The performance of the multi-class load clustering algorithm was compared to the performance of the integrated spectral clustering method considering the dual-scale similarity on the new data sets D1 and D2. The comparison algorithm comprises the following steps: 1) a k-means algorithm for measuring similarity by Euclidean distance, which is called kmeu for short; 2) a k-means algorithm for measuring similarity by using differential cosine distance, which is abbreviated as kmco; 3) a spectral clustering algorithm which measures similarity by Euclidean distance and has fixed scale parameters is called speu for short; 4) a spectral clustering algorithm which measures similarity by using differential cosine distance and has fixed scale parameters is called spco for short; 5) a spectral clustering algorithm which measures similarity by comprehensive distance and optimizes scale parameters is called spec-gamma for short; 6) and a two-layer clustering algorithm, namely km-ag for short.

The specific algorithm parameter settings are shown in table 5. For all algorithms, the set parameter combination is adopted to randomly run for 20 times, and the result of ARI optimization is obtained.

TABLE 5 Algorithm parameter set

Table 6 gives the ARI for each type of load clustering algorithm. As can be seen from table 6: in two data sets, 1) the ARI of the integrated spectral clustering method considering the double-scale similarity is better than or equal to the spec-gamma algorithm, the ARI in the data set D2 is improved by 1.52-24.7%, and the integrated spectral clustering proves that the distinguishing capability of the load morphological characteristics is better than that of a single spectral clustering algorithm when the load double-scale similarity clustering is comprehensively considered, and the robustness is better; 2) the spectral clustering algorithm speu or spco which measures similarity by using single Euclidean distance or difference cosine distance has larger ARI fluctuation in the two data sets, which proves that the single distance measurement mode has defects in measuring load morphological characteristics; 3) the two-layer clustering algorithm km-ag is better in the data set D1, but is inferior to the ESC algorithm, speu algorithm and spec-gamma algorithm in the data set D2, and the classic k-means algorithm ARI is inferior to the ESC method.

TABLE 6 ARI comparison of clustering results of seven classes of algorithms

The above description of the embodiments of the present invention is provided in conjunction with the accompanying drawings, and not intended to limit the scope of the present invention, and all equivalent models or equivalent algorithm flows made by using the contents of the present specification and the accompanying drawings are within the scope of the present invention by applying directly or indirectly to other related technologies.

Claims

1. A load curve integrated spectral clustering method considering double-scale similarity is characterized by comprising the following steps: firstly, calculating load form change similarity through cosine distance of load differential vectors, and constructing a dual-scale similarity measurement mode for measuring load similarity; then, taking spectral clustering as a base clustering model generation algorithm, setting different clustering numbers and randomly operating to construct a differential base clustering model by selecting different similarity measurement modes, and ensuring diversity of the base clustering model; finally, a weighted consistency matrix and spectral clustering are used as a clustering integration strategy, the Theisenberg index DBI or a new index MDBI is used as a clustering evaluation index in the clustering integration process, the reciprocal of the DBI or the MDBI is used as a weight self-adaptive setting basis to calculate the weighted consistency matrix, the spectral clustering is used for realizing the final integrated clustering division, and the clustering performance improvement and the effective combination of the two measurement modes are realized through the integrated clustering;

selecting a Davison baud index DBI and adjusting a landed index ARI as internal and external evaluation indexes of a clustering effect of the method, and considering that Euclidean distances are adopted in a classic DBI formula to measure distances of different data samples, the result effectiveness of the clustering method adopting other similarity measurement modes cannot be accurately evaluated, so that the comprehensive distance is applied to distance calculation of the DBI to construct a new index MDBI, namely:

in the formula, MDBI is a new index used for evaluating the effectiveness of the load clustering result considering the double-scale similarity; ds_iAnd ds_jRespectively the average integrated distance from the sample in the ith class to the class center of the sample and the average integrated distance from the sample in the jth class to the class center of the sample; ds (C)_i,C_j) Representing the combined distance of class centers of the ith and jth classes;

the load curve comprehensive distance based on the double-scale similarity is constructed by combining the difference cosine distance and the Euclidean distance, the load distance and the similarity degree of the morphological change are considered, and the comprehensive distance can be obtained by a linear function:

ds_ii′＝a_e·de_ii′+a_c·dc_ii′·r

in the formula (ds)_ii'Representing the integrated distance of the ith and ith' load curves; de_ii'The Euclidean distance of the ith and ith' load curves; dc_ii'The difference cosine distance of the ith and ith' load curves; a is_e、a_cRespectively the weight coefficients of the Euclidean distance and the difference cosine distance when calculating the comprehensive distance, considering that the two similarities are effective measurement modes, a_eAnd a_cAll are taken as 0.5; because the value ranges of the euclidean distance and the differential cosine distance are not consistent, the differential cosine distance needs to be amplified by r times, wherein r is a proportionality coefficient and is calculated as follows:

in the formula, de_max、dc_maxRespectively representing the maximum value of Euclidean distances and the maximum value of difference cosine distances of all load curves in the data set;

the method for integrating the base clustering model by adopting a weighted consistency matrix method comprises the following steps:

and (3) combining the cluster evaluation indexes of different base cluster models, and taking the cluster effectiveness into consideration in the weighted consistency matrix calculation process to perform self-adaptive weight setting: when only the distance difference of the curves is considered and the clustering performance is optimized through integrated clustering, the DBI can be adopted to calculate the weight of the base clustering model; when the distance of the load curve and the form change difference are comprehensively considered, the MDBI can be adopted to calculate the base clustering weight; since the smaller the DBI and the MDBI are, the higher the effectiveness of the clustering result is represented, the weight value of the base clustering model is the reciprocal of the DBI or the MDBI of the corresponding base clustering model, and the weighting consistency matrix is defined as follows:

in the formula, w_bThe weight of the b-th base clustering model when the weighted consistency matrix is calculated; in_bRepresenting a clustering evaluation index of the b-th base clustering model, wherein the index can be DBI or MDBI; i { } is an indicator function, and when the formula is established in brackets, the value is 1, otherwise, the value is 0; l is_b(i) And L_b(i ') respectively representing the class label of the ith sample in the b-th base clustering model and the class label of the ith' sample in the b-th base clustering model; the second equation scales the weights of the basis clustering models to make the sum of the weights 1, so as to make the value range of the elements of the weighted consistency matrix be [0, 1%]；

The weighted consistency matrix can be regarded as a similarity matrix reflecting the sample similarity, the similarity matrix is processed by adopting spectral clustering, and the spectral clustering in the integration process also adopts a graph cutting mode of NCut and selects k-means to cluster the feature matrix.

2. The method for load curve ensemble spectral clustering considering dual-scale similarity according to claim 1, wherein, in order to avoid the influence of the load curve amplitude difference on the load morphology similarity calculation result, the load curve is subjected to maximum normalization before the method is run.

3. The method for clustering load curves with consideration of double-scale similarity according to claim 2, wherein the normalized load curve data is subjected to first-order difference operation, and then the cosine distance of the first-order difference vector of the load, i.e. the difference cosine distance, is calculated to reflect the consistency of the morphological changes of the two load curves.

4. The load curve integrated spectral clustering method considering the double-scale similarity as claimed in claim 1, wherein the construction method of the differentiated basis clustering model is as follows: selecting spectral clustering as a base clustering method, taking a fixed value for a scale parameter of the spectral clustering, and ensuring the diversity of a base clustering model through the following three aspects: 1) the similarity measurement mode adopts Euclidean distance or difference cosine distance; 2) setting different cluster numbers with the value range of [ k ]_min，k_max]Each of which is an integer; 3) the method for setting each pair of parameter combination of the similarity measurement mode and the cluster number randomly runs for multiple times, and the number of times is p;

in an undirected graph segmentation mode of spectral clustering, an NCut graph segmentation method is selected to process an undirected weight graph obtained by a similar matrix, and a feature matrix obtained after a dimension reduction in the graph segmentation process is clustered by k-means.