CN115687955A

CN115687955A - Voting based resident user load curve clustering method and device

Info

Publication number: CN115687955A
Application number: CN202310000646.9A
Authority: CN
Inventors: 丁贵立; 韩威; 章彧涵; 许志浩; 王宗耀; 康兵; 张兴旺; 程巧; 戴永熙; 郑芯蕊; 杨勇; 曹昆峰
Original assignee: Nanchang Institute of Technology
Current assignee: Nanchang Institute of Technology
Priority date: 2023-01-03
Filing date: 2023-01-03
Publication date: 2023-02-03

Abstract

The invention belongs to the technical field of power load monitoring, and discloses a method and a device for clustering a residential user load curve based on voting, wherein the method realizes high-dimensional data dimensionality reduction through integrated tree fitting, and determines the optimal clustering number by adopting a contour coefficient; and determining a reference clustering algorithm according to a CH criterion, and finally uniformly integrating clustering results through a consistency function matrix. The invention can integrate the advantages of the clustering algorithm of each member, has great improvement effect on the aspects of clustering precision, clustering effect and robustness, and can accurately identify the energy utilization characteristics of the user.

Description

Voting based resident user load curve clustering method and device

Technical Field

The invention belongs to the technical field of power load monitoring, and particularly relates to a residential user load curve clustering method and device based on voting.

Background

The power consumer is an important component of the power system, and the intrinsic rule of deeply mining the load data of the power consumer has important significance for planning, running and the like of the power system, so that the power consumer is widely concerned and researched.

Ensemble clustering is an unsupervised learning method. The objective is to aggregate a plurality of different clustering results (called base clustering results) into one clustering result by using a certain combination method. The method aims to use a certain method or make the integrated clustering obtain the advantages of each clustering algorithm according to a certain relation, so as to obtain a high-efficiency clustering result.

The idea of the voting method is to maximally utilize a clustering algorithm to distinguish data. The clustering algorithm classifies data according to clustering results, but each clustering result does not have a uniform identifier for class division, so that a uniform function of different clustering algorithm results needs to be designed in the voting method to process the problems corresponding to the clustering results of different clustering algorithms. Simply speaking, voting is similar to voting in life, voters vote candidates, more candidates win the vote, correspondingly, samples correspond to the candidates, a clustering algorithm corresponds to the voters, and attribution category labels of the samples are obtained through voting.

Disclosure of Invention

In order to know the electricity consumption behavior characteristics of residential users, a scientific demand response regulation and control strategy is formulated, the users are guided to reasonably use electricity, peak clipping and valley filling are avoided, and reasonable allocation of power resources is realized. The invention provides a resident user load curve clustering method and a device based on voting, which realize high-dimensional data dimensionality reduction through integrated tree fitting, overcome the correlation of clustering index dimensionality by adopting the Mahalanobis distance, and further determine the effective number of clustering; and determining a reference clustering algorithm according to a CH criterion, and finally uniformly integrating clustering results through a consistency function matrix. The invention can be used for screening out power users with different electricity utilization characteristics, screening out clients with different levels meeting the requirement response activities, and aiming at the power users with different levels, a power grid company can aim at the users with different electricity utilization characteristics and then adopt different methods to carry out the requirement response activities, thereby being beneficial to saving the activity cost, improving the electricity saving efficiency, guiding the users to use electricity scientifically and reasonably and realizing the reasonable distribution of power resources.

The invention is realized by the following technical scheme. A resident user load curve clustering method based on voting comprises the following steps:

a resident user load curve clustering method based on voting comprises the following steps:

step 1, establishing an original sample data set, analyzing the original sample data set through an empirical mode decomposition method, and constructing overall characteristics and local characteristics of user load curve data to obtain a characteristic data set;

step 2, fitting the feature data set based on the integrated tree model, extracting characteristic indexes of the feature data set to realize dimension reduction, obtaining feature quantity reflecting user energy features, and constructing a dimension reduction data set;

step 3, determining one member clustering algorithm as a reference algorithm based on Calinski-Harabasz (CH) indexes, and taking a clustering result of the reference algorithm as a reference clustering result;

step 3.1, determining the quality of the clustering result through the contour coefficient so as to select the optimal clustering number;

step 3.2, clustering the dimensionality reduction data set obtained in the step 2 by a member clustering algorithm according to the optimal clustering number obtained in the step 3.1 to obtain a clustering result;

step 3.3, measuring the effectiveness of the clustering conclusion of the member clustering algorithm through a Calinski-Harabasz index, and selecting a reference clustering algorithm;

step 4, constructing a consistency function, and unifying different member clustering algorithm class labels; and dividing the samples into full standard samples and non-full standard samples and outputting clustering results.

In the step 1, the missing power consumption is fitted by a least square method, and the user load curve data is supplemented, so that an original sample data set is constructed; the user load curve data comprises a daily electricity load curve, a monthly electricity load curve and an annual electricity load curve; and analyzing the original sample data set by an empirical mode decomposition method, and constructing the overall characteristics and the local characteristics of the user load curve data.

The overall characteristics comprise average power consumption of residents, standard deviation of power consumption sequences of the residents, historical power consumption kurtosis and long-term and short-term trends of the power consumption of the residents.

The local features include approximate entropies and quantiles of periodicity and volatility of the time series.

The optimal cluster number is calculated as followsS _i ：

In the formula (I), the compound is shown in the specification,b _i representative sampleiThe minimum of the average distances to the samples belonging to the other classes,a _i representative sampleiThe average distance to other samples of the category to which it belongs is calculated as follows:

where dis represents a sample in the same classiAnd a samplejThe distance (c) is calculated by using the Euclidean distance, and the formula is as follows:

nthe number of samples is represented as a function of,x _p representing a samplexTo (1) apThe index value is set according to the index value,y _p representing a sampleyTo (1) apAn index value.

Calinski-Harabasz index was calculated as follows:

in the formula (I), the compound is shown in the specification,krepresenting the number of clusters;Na number of samples representing the reduced-dimension dataset;SS _B is the between-class variance, SS _W Is the intra-class variance;B _k is the distance between classes;W _k is an intra-class distance;n _q is aqThe number of data samples of (a);c _q is aqThe cluster center of (a);c _E is the cluster center of all classes；c _q Is aqThe set of medium data samples is then compared to the set of medium data samples,Trepresenting the transpose of the matrix.

In step 4, one member clustering algorithm selected in step 3 is used as a reference algorithm, and the other member clustering algorithms are compared with the reference algorithm; selectingC ₁ As a benchmark clustering algorithm, dividing the dimensionality reduction data set intokClass, build one and other member clustering algorithmsC _o （oA uniform matrix of results of =2,3,4.):

in the formula (I), the compound is shown in the specification,S _o1 is a reference clustering algorithmC ₁ And member clustering algorithmC _o A unified matrix of results of (a); s _mw representing benchmark clustering algorithmC ₁ Class (1)mAnd member clustering algorithmC _o Class (1)wThe number of samples that overlap.

The invention also provides a resident user load curve clustering device based on voting, which comprises the following steps:

the data characteristic extraction module is internally provided with an empirical mode decomposition method for extracting the characteristics of the original sample data set;

the data fitting module is used for fitting the characteristic data set based on the integrated tree model, extracting characteristic indexes of the characteristic data set to realize dimension reduction, and obtaining characteristic quantity reflecting user energy characteristics;

a reference clustering algorithm selecting module, which determines one member clustering algorithm as a reference algorithm based on Calinski-Harabasz (CH) indexes;

and the consistency unification module is used for unifying different member clustering algorithm class labels based on a consistency function.

The invention provides a nonvolatile computer storage medium, which stores computer executable instructions, wherein the computer executable instructions can execute the resident user load curve clustering method based on voting.

The present invention also provides a computer program product comprising a computer program stored on a non-volatile computer storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the above-mentioned voting-based resident user load curve clustering method.

The invention has the beneficial effects that: the method is an important means for feeding back the power supply and demand relationship in time, and has important significance for improving the operation efficiency of the power system and maintaining the stable operation of the power. Specifically, the power consumption of the user in the peak electricity consumption period can be reduced, and the user is guided to start the equipment power consumption in the underestimation electricity consumption period; the power utilization mode of the user can be changed according to the self power utilization characteristics; the method can help the power grid enterprise to establish a more perfect service mechanism for the demand response.

According to the method, by considering the characteristics of the user load curve data, aiming at the problem that the clustering effect, the clustering precision and the robustness are difficult to be considered, the integrated clustering algorithm based on voting is provided, the advantages of the clustering algorithms of all members are integrated, the method has great improvement effects in the aspects of the clustering precision, the clustering effect and the robustness, the energy utilization characteristics of the users can be accurately identified, the grid company is helped to accurately invite the demand response to the potential participating users, the cost is saved, and the development effect of demand response activities is effectively improved.

Drawings

FIG. 1 is a flow chart of the resident user load curve clustering method based on voting according to the invention.

Detailed Description

The invention is explained in more detail below with reference to the figures and examples.

Referring to fig. 1, a method for clustering a load curve of a resident user based on voting comprises the following steps:

step 1, establishing an original sample data set, analyzing the original sample data set through an empirical mode decomposition method, and constructing the overall characteristics and the local characteristics of user load curve data to obtain a characteristic data set;

fitting the missing power consumption by a least square method, and supplementing the user load curve data so as to construct an original sample data set; the user load curve data comprises a daily power load curve (obtained by collecting data every 15 min by an HPLC intelligent electric energy meter), a monthly power load curve and an annual power load curve; analyzing an original sample data set by an empirical mode decomposition method, and constructing the overall characteristics and the local characteristics of the user load curve data; the overall characteristics comprise average power consumption of residents, standard deviation of power consumption sequences of residents, historical power consumption kurtosis, long-term and short-term trends of the power consumption of residents and the like. The local features include approximate entropies and quantiles of periodicity and volatility of the time series.

Step 2, fitting the feature data set based on the integrated tree model, extracting characteristic indexes of the feature data set to realize dimension reduction, obtaining feature quantity reflecting user energy features, and constructing a dimension reduction data set:

inputting the feature data set into an integrated tree model as an independent variable, taking the user participation demand response as 1 and the non-participation demand response as 0, and carrying out integrated tree model training; the training process is performed based on cross-validation grid search, and the integrated tree model parameters are set as follows:

setting a model information gain degree evaluation index (criterion) as a disorder state (entry) mode, setting the maximum feature number (max _ features) of a single tree construction process as 16, setting the minimum separation sample number (min _ samples _ split) parameter as 6, setting the tree number (n _ estimators) of random forests as 100, setting the maximum depth (max _ depth) of the single tree as 21, and fitting the integrated tree model to obtain representative 6 electricity utilization dimension indexes and physical meanings of the indexes as shown in the following table:

under the condition that only the electricity utilization condition of the user is known, more representative energy utilization characteristic quantities can be mined through the steps 1 and 2, the user energy utilization image can be more accurately constructed, and the analysis effect is improved.

step 3.1, determining the quality of the clustering result through the contour coefficient so as to select the optimal clustering numberS _i ：

In the formula (I), the compound is shown in the specification,b _i representative sampleiMinimum value of average distance from samples belonging to other classes, a _i Representative sampleiThe average distance to other samples of the class to which it belongs is calculated as follows

the meaning is the set distance of two elements in Euclidean space, which is widely used for identifying the dissimilarity degree of two scalar elements because of intuitive understandability and strong interpretability,nthe number of samples is indicated to be,x _p representing a samplexTo (1) apThe index value is set according to the index value,y _p representing a sampleyTo (1) apAn index value.

and 3.3, measuring the effectiveness of the clustering conclusion of the member clustering algorithm through a Calinski-Harabasz (CH) index, and selecting the reference clustering algorithm. The CH index is a score calculated by evaluating the variance between classes and the variance in the classes, and the larger the value is, the closer the classes are, the more dispersed the classes are, i.e. the better clustering result is.

In the formula (I), the compound is shown in the specification,krepresenting the number of clusters;Na number of samples representing the reduced-dimension dataset;SS _B is the between-class variance, SS _W Is the intra-class variance;B _k is the distance between classes;W _k is an intra-class distance;n _q is aqThe number of data samples of (a);c _q is of the classqThe cluster center of (a);c _E is the cluster center of all classes;c _q is aqThe set of medium data samples is then compared to the set of medium data samples,Trepresenting the transpose of the matrix.

And 3, taking one member clustering algorithm selected in the step 3 as a reference algorithm, and comparing the other member clustering algorithms with the reference algorithm. Hypothesis selectionC ₁ As a benchmark clustering algorithm, dividing the dimensionality reduction data set intokClass, building one and other member clustering algorithmC _o （oA uniform matrix of results of =2,3,4.):

in the formula (I), the compound is shown in the specification,S _o1 is a benchmark clustering algorithmC ₁ And member clustering algorithmC _o A unified matrix of results of (a); s _mw representing benchmark clustering algorithmC ₁ Class (1)mAnd member clustering algorithmC _o Class (1)wThe number of samples that overlap. Taking the element subscript corresponding to the maximum value of each row of data as a category matching label, if the element subscript is on the second row of datamIf the row is the maximum value of the row, the reference clustering algorithmC ₁ InmClass and member clustering algorithmC _o In (1)wThe classes are corresponding class labels, and the class labels of different clustering algorithms can be unified through the method.

The application case is as follows: the research data is derived from data of a resident intelligent energy consumption service specimen bank accumulated in a resident user demand response test carried out in Jiangxi province from 2019 to 2021, and comprises 96-point daily load data acquired by an HPLC (high performance liquid chromatography) intelligent electric meter. After screening, 1694 users are selected as research objects, and the 96-point daily load curve data (including working days and rest days) of the 1694 users are subjected to integrated clustering example analysis.

Processing the research data according to the step 1;

according to the step 2, the characteristic indexes obtained by processing the daily load curve of 96 points of 1694 user residents are used as the input quantity of each algorithm;

and 3, selecting one clustering algorithm as a reference algorithm, and comparing the other clustering algorithms with the reference algorithm. And clustering the daily load data in the data set by using each clustering member clustering algorithm. And 4 member clustering algorithms are selected, including k-means, gray wolf optimization fuzzy C-means, gaussian fuzzy (Gaussian fuzzy) clustering and self-organizing map (SOM) algorithms. Table 3 shows the contour coefficient index and the total index of the clustering result when each member clustering algorithm operates independently given different clustering numbers. It can be seen that when the number of clusters is 3, the total score is highest for 4 algorithms and the profile coefficient scores for 4 algorithms all get higher scores, so the best number of clusters chosen herein is k =3.

According to the step 3 and the step 4, as can be seen from the table 4 according to the CH criterion, the 4 clustering algorithms are more stable to be exerted on the specimen bank resident user data set. And (3) taking the characteristic indexes obtained by reducing the dimension of the 96-point daily load curve of 1694 user residential users as the input quantity of each algorithm, and testing the member clustering algorithm and the voting integration algorithm. The following table shows the clustering CH information values of the member clustering algorithm and voting integrated clustering results.

Among the 4 member clustering algorithms, the clustering stability of the traditional k-means algorithm (k-means) and the SOM (self-organizing mapping network) algorithm is higher than that of the other two member clustering algorithms, and the clustering effect is kept at a better level. The clustering effect of the gray wolf algorithm optimized fuzzy C-means clustering (GWO-FCM) is ranked first in the scores of the 4 member clustering algorithms, but the stability is not as good as that of the k-means algorithm and the SOM algorithm. By taking CH as an index for measuring the effectiveness of the clustering effect, the integrated clustering algorithm keeps the front row in the CH value sequencing of working days and rest days, improves the average effectiveness of the clustering result by 31% compared with the average values of the CH indexes of k-means, GWO-FCM, gaussian fuzzy (Gaussian fuzzy) and self-organizing map (SOM) algorithms by 34.61%,7.38%,57.72% and 24.30%, and verifies that the voting integration algorithm provided by the method has a remarkable improvement effect on the clustering effectiveness of the resident data of a certain provincial resident specimen bank.

The embodiment provides a resident user load curve clustering device based on voting, which comprises:

and the consistency unification module unifies different member clustering algorithm class labels based on a consistency function.

In still other embodiments, a non-transitory computer storage medium is provided, the computer storage medium storing computer-executable instructions that can perform the voting-based resident user load curve clustering method in any of the above embodiments.

The present embodiment also provides a computer program product comprising a computer program stored on a non-volatile computer storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to execute the resident user load curve clustering method based on voting by the above-mentioned embodiment.

The present embodiment provides an electronic device, including: one or more processors, and a memory. The electronic device may further include: an input device and an output device. The processor, memory, input device, and output device may be connected by a bus or other means. The memory is the non-volatile computer-readable storage medium described above. The processor executes various functional applications and data processing of the server by running the nonvolatile software program, instructions and modules stored in the memory, namely, the voting-based resident user load curve clustering method in the above embodiment is realized. The input means may receive input numerical or character information and generate key signal inputs related to user settings and function control of the voting-based resident user load curve clustering method. The output device may include a display device such as a display screen.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the application can be implemented by adopting various computer languages, such as object-oriented programming language Java and transliterated scripting language JavaScript.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A resident user load curve clustering method based on voting is characterized by comprising the following steps:

step 3, determining one member clustering algorithm as a reference algorithm based on the Calinski-Harabasz index, and taking a clustering result of the reference algorithm as a reference clustering result;

2. A voting based resident user load curve clustering method according to claim 1, wherein in step 1, the user load curve data is supplemented by least square fitting of the missing power consumption, so as to construct an original sample data set; the user load curve data comprises a daily power load curve, a monthly power load curve and an annual power load curve; and analyzing the original sample data set by an empirical mode decomposition method, and constructing the overall characteristics and the local characteristics of the user load curve data.

3. A voting-based resident user load curve clustering method according to claim 2, wherein the overall characteristics comprise average resident electricity consumption, standard deviation of resident electricity consumption sequences, peak degree of historical electricity consumption, long-term and short-term trends of resident electricity consumption.

4. A voting based resident user load curve clustering method according to claim 2, wherein the local features include approximate entropies and quantiles of periodicity and volatility of the time series.

5. A voting based resident user load curve clustering method according to claim 1, wherein the optimal number of clusters is calculated as followsS _i ：

6. A voting-based resident user load curve clustering method according to claim 1, wherein the Calinski-Harabasz index is calculated as follows:

in the formula (I), the compound is shown in the specification,krepresenting the number of clusters;Na number of samples representing the reduced-dimension dataset;SS _B is the between-class variance, SS _W Is the intra-class variance;B _k is the distance between classes;W _k is an intra-class distance;n _q is aqThe number of data samples of (a);c _q is aqThe cluster center of (a);c _E is the cluster center of all classes;c _q is aqThe set of medium data samples is then compared to the set of medium data samples,Trepresenting the transpose of the matrix.

7. A voting based resident user load curve clustering method according to claim 1, wherein in step 4, one member clustering algorithm selected in step 3 is used as a reference algorithm, and the remaining member clustering algorithms are compared with the reference algorithm; selectingC ₁ As a benchmark clustering algorithm, dividing the dimensionality reduction data set intokClass, build one and other member clustering algorithmsC _o Unifying the matrix of the results of (1):

8. A resident user load curve clustering device based on voting is characterized by comprising the following components:

a reference clustering algorithm selecting module, which determines one member clustering algorithm as a reference algorithm based on the Calinski-Harabasz index;

9. A non-volatile computer storage medium, wherein the computer storage medium stores computer-executable instructions for performing the voting based residential user load curve clustering method according to any one of claims 1 to 7.

10. A computer program product, characterized in that the computer program product comprises a computer program stored on a non-volatile computer storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the method for clustering voting-based residential user load curves according to any one of claims 1 to 7.