CN110781332A - Clustering method of daily load curve of electric residential users based on compound clustering algorithm - Google Patents
Clustering method of daily load curve of electric residential users based on compound clustering algorithm Download PDFInfo
- Publication number
- CN110781332A CN110781332A CN201910983879.9A CN201910983879A CN110781332A CN 110781332 A CN110781332 A CN 110781332A CN 201910983879 A CN201910983879 A CN 201910983879A CN 110781332 A CN110781332 A CN 110781332A
- Authority
- CN
- China
- Prior art keywords
- clustering
- data
- algorithm
- daily load
- cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 150000001875 compounds Chemical class 0.000 title abstract description 8
- 230000006399 behavior Effects 0.000 claims abstract description 13
- 239000011159 matrix material Substances 0.000 claims abstract description 10
- 238000007781 pre-processing Methods 0.000 claims description 6
- 238000000513 principal component analysis Methods 0.000 claims description 5
- 238000012847 principal component analysis method Methods 0.000 claims description 3
- 230000001932 seasonal effect Effects 0.000 claims description 2
- 230000002354 daily effect Effects 0.000 claims 11
- 239000002131 composite material Substances 0.000 claims 7
- 238000004378 air conditioning Methods 0.000 claims 1
- 238000010411 cooking Methods 0.000 claims 1
- 238000013480 data collection Methods 0.000 claims 1
- 230000003203 everyday effect Effects 0.000 claims 1
- 230000005611 electricity Effects 0.000 abstract description 15
- 238000011946 reduction process Methods 0.000 abstract description 2
- 238000004458 analytical method Methods 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 230000003542 behavioural effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 239000006185 dispersion Substances 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Health & Medical Sciences (AREA)
- Marketing (AREA)
- Tourism & Hospitality (AREA)
- General Engineering & Computer Science (AREA)
- General Business, Economics & Management (AREA)
- Primary Health Care (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Quality & Reliability (AREA)
- Development Economics (AREA)
- Databases & Information Systems (AREA)
- Entrepreneurship & Innovation (AREA)
- Probability & Statistics with Applications (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
基于复合聚类算法的电力居民用户日负荷曲线聚类方法,获取电力居民用户日负荷数据,该数据包含有P个样本,每个样本有Q个时间点属性的数据集矩阵;对电力居民用户日负荷数据进行预处理,获得初始群集;对初始群集进行降维处理,获得降维群集;采用聚类算法1对降维群集进行初步聚类,得到初始聚类中心;采用聚类算法2对聚类算法1得到的初始聚类中心进行聚类,并使用聚类有效性指标,对聚类结果进行评估,最终得到M个聚类中心;采用得到的M个聚类中心,作为聚类算法2的初始聚类中心,对数据进行聚类,获得行为相似的用户群。本发明把庞大零散的日负荷数据聚类成行为相似的用户群。电力企业管理人对聚类成的用户群进行分析,可以更好地预测用电量高峰和低谷,为电力业务的管理提供更可靠地方法。
The daily load curve clustering method of electric residential users based on compound clustering algorithm obtains the daily load data of electric residential users. The data contains P samples, and each sample has a data set matrix of Q time point attributes; The daily load data is preprocessed to obtain the initial cluster; the dimensionality reduction process is performed on the initial cluster to obtain the dimensionality reduction cluster; the clustering algorithm 1 is used to perform preliminary clustering on the dimensionality reduction cluster to obtain the initial cluster center; the clustering algorithm 2 is used to obtain the initial clustering center. The initial clustering centers obtained by clustering algorithm 1 are clustered, and the clustering effectiveness index is used to evaluate the clustering results, and finally M clustering centers are obtained; the obtained M clustering centers are used as the clustering algorithm. 2 is the initial clustering center, and the data is clustered to obtain user groups with similar behaviors. The invention clusters the huge and scattered daily load data into user groups with similar behaviors. The power enterprise managers can analyze the clustered user groups to better predict the peak and trough of electricity consumption, and provide a more reliable method for the management of power business.
Description
技术领域technical field
本发明涉及电力居民用户用电技术领域,尤其是一种基于复合聚类算法的电力居民用户日负荷曲线聚类方法。The invention relates to the technical field of electricity consumption of electric residential users, in particular to a method for clustering daily load curves of electric residential users based on a compound clustering algorithm.
背景技术Background technique
随着电力行业的快速发展,以及智能电表的普及,获取电力居民用户的用电情况变得更加方便,同时,电力公司会获得更加庞大以及详细的用户用电数据。With the rapid development of the power industry and the popularization of smart meters, it has become more convenient to obtain the electricity consumption of electricity households.
面对庞大的用电数据,利用现有的数据挖掘和分析技术,对电力用户日负荷数据进行规律分析以及特征提取,从而便于电力公司根据电价政策为用户提供更加高质量的供电服务。其中,居民用户用电细分是电力公司提供优质服务的重要方面,面对日益增长的居民用户用电负荷那比例的增加,使用合理,高效的数据聚类算法对用户进行分析可以帮助电力公司根据用户的特征提供更加合理,个性化的供电方案,让用户获得更好的体验。In the face of huge electricity consumption data, the existing data mining and analysis technology is used to conduct regular analysis and feature extraction on the daily load data of power users, so as to facilitate the power company to provide users with higher quality power supply services according to the electricity price policy. Among them, the subdivision of residential users' electricity consumption is an important aspect of the power company's provision of high-quality services. In the face of the increasing proportion of residential users' electricity load, the use of reasonable and efficient data clustering algorithms to analyze users can help power companies. Provide more reasonable and personalized power supply solutions according to the characteristics of users, so that users can get a better experience.
但是,单一的原始的聚类算法聚类效率低,聚类效果差,例如,K-means算法由于对于初始聚类中心的选择是随机的,这使得对于样本数据量大的数据集,容易使聚类结果陷入局部最优。无法确定最佳聚类数目,需要研究人员逐个测试,导致聚类效率低下。从而不能很好地反映用户用电数据中的潜在规律以及用电特征,从而无法为电力公司在居民用户聚类方面提供良好的支持。However, the single original clustering algorithm has low clustering efficiency and poor clustering effect. For example, the K-means algorithm selects the initial clustering center randomly, which makes it easy to use the data set with a large amount of sample data. The clustering results fall into local optimum. Unable to determine the optimal number of clusters, researchers need to test one by one, resulting in low clustering efficiency. Therefore, the potential laws and electricity consumption characteristics in the user's electricity consumption data cannot be well reflected, so that the power companies cannot provide good support for the clustering of residential users.
发明内容SUMMARY OF THE INVENTION
本发明提供一种基于复合聚类算法的电力居民用户日负荷曲线聚类方法,该方法根据实时采集用电负荷的智能电表中的数据,对负荷曲线进行聚类,进而将有相同用电行为的用户聚到一起。The present invention provides a method for clustering daily load curves of electric residential users based on a compound clustering algorithm. The method clusters the load curves according to the real-time data collected in the smart meter of the electricity load, and then will have the same electricity consumption behavior. users get together.
本发明采取的技术方案为:The technical scheme adopted in the present invention is:
基于复合聚类算法的电力居民用户日负荷曲线聚类方法,包括以下步骤:The clustering method of daily load curve of electric residential users based on compound clustering algorithm includes the following steps:
步骤1:获取电力居民用户日负荷数据,该数据包含有P个样本,每个样本有Q个时间点属性的数据集矩阵;Step 1: Obtain the daily load data of electric residential users, the data contains P samples, and each sample has a data set matrix of Q time point attributes;
步骤2:对电力居民用户日负荷数据进行预处理,获得初始群集;Step 2: Preprocess the daily load data of electric residential users to obtain the initial cluster;
步骤3:对初始群集进行降维处理,获得降维群集;Step 3: Perform dimensionality reduction processing on the initial cluster to obtain a dimensionality reduction cluster;
步骤4:采用聚类算法1对降维群集进行初步聚类,得到初始聚类中心;Step 4: Use clustering algorithm 1 to perform preliminary clustering on the dimensionality reduction cluster to obtain the initial cluster center;
步骤5:采用聚类算法2对聚类算法1得到的初始聚类中心进行聚类,并使用聚类有效性指标,对聚类结果进行评估,最终得到M个聚类中心;Step 5: Use
步骤6:采用步骤5得到的M个聚类中心,作为聚类算法2的初始聚类中心,对数据进行聚类,获得行为相似的用户群,并对获得的行为相似的用户群进行行为特征分析。Step 6: Use the M clustering centers obtained in
所述步骤1中,对于P个样本,每个样本有Q个时间点属性的电力居民用户日负荷数据集,具体包括:In the step 1, for P samples, each sample has a daily load data set of electric residential users with Q time-point attributes, which specifically includes:
P个样本为居民用户样本,居民生活主要受季节变化、气温变化、收人水平、空调、电炊拥有率等因素影响,不用的因素会导致不同的日负荷曲线;Q为每日各个时间点由智能电表采集的该时间点的用电功率,Q的值根据智能电表采集数据的时间间隔而定。P samples are resident user samples. Residents’ lives are mainly affected by factors such as seasonal changes, temperature changes, income levels, air conditioners, and the ownership rate of electric cookers. Unused factors will lead to different daily load curves; Q is each time point of the day. For the power consumption at this time point collected by the smart meter, the value of Q is determined according to the time interval at which the smart meter collects data.
所述步骤2中,预处理包括缺失值处理、数据标准化、数据正则化处理;In the
缺失值处理,对含有较多缺失值的数据进行删除,对含有较少缺失值的数据进行补全;Missing value processing, delete data with more missing values, and complete data with fewer missing values;
数据标准化,将原始数据线性化的方法转换到[0,1]的范围;Data normalization, converting the original data linearization method to the range of [0, 1];
数据正则化处理,将每个属性减去该属性对应的均值,然后,再除以该属性对应方差。For data regularization, each attribute is subtracted from the mean corresponding to the attribute, and then divided by the corresponding variance of the attribute.
所述步骤3中,降维处理采用PCA(Principal Component Analysis),即主成分分析方法;获取的降维集群为p个样本、每个样本有q个属性的数据集矩阵。In the
所述步骤4中,采用聚类算法1对降维群集做初步聚类,获得行为相似的用户群,具体包括,采用Mean-shift算法,将数据集中的p个样本聚成N类,其中,N为正整数。In the
所述步骤5中,采用聚类算法2对聚类算法1得到的聚类中心进行聚类,采用聚类有效性指标评估聚类结果,具体包括:采用K-means算法对Mean-shift算法得到的N个聚类中心进行聚类,在聚类数目N范围内,对[2,N]分别聚类,其中,N为正整数,并使用Calinski-Harabasz(CH)指标对聚类结果进行评估,选取CH值最大的结果,最终得到M个聚类中心,其中,M为[2,N]中的正整数。In the
所述步骤6中,采用得到的M个聚类中心作为K-means算法的初始聚类中心,对数据集中的每个样本,即每个用户或每条记录进行聚类,最后得到M个类的用户。In the
本发明一种基于复合聚类算法的电力居民用户日负荷曲线聚类方法,以电力居民用户日负荷数据为分析对象,通过数据预处理,数据降维,以及特征聚类等多个算法过程,其中,特征聚类算法优选Mean-shift算法与K-means算法相结合。把庞大零散的日负荷数据聚类成行为相似的用户群。电力企业管理人对聚类成的用户群进行分析,可以更好地预测用电量高峰和低谷,为电力业务的管理提供更可靠地方法,为电力客户提供更优质的服务。The present invention is a method for clustering the daily load curve of electric residential users based on a compound clustering algorithm. The daily load data of electric residential users is taken as the analysis object, and through multiple algorithm processes such as data preprocessing, data dimension reduction, and feature clustering, etc. Among them, the feature clustering algorithm is preferably a combination of the Mean-shift algorithm and the K-means algorithm. Cluster the huge and scattered daily load data into user groups with similar behaviors. By analyzing the clustered user groups, the managers of power enterprises can better predict the peaks and troughs of electricity consumption, provide more reliable methods for the management of power business, and provide better services for power customers.
附图说明Description of drawings
图1为本发明方法实施例1的流程图。FIG. 1 is a flow chart of Embodiment 1 of the method of the present invention.
图2为本发明方法实施例2的流程图。FIG. 2 is a flow chart of
具体实施方式Detailed ways
实施例1:Example 1:
基于复合聚类算法的电力居民用户日负荷曲线聚类方法,包括以下步骤:The clustering method of daily load curve of electric residential users based on compound clustering algorithm includes the following steps:
步骤1:获取电力居民用户日负荷数据,该数据包含有P个样本,每个样本有Q个时间点属性的数据集矩阵;Step 1: Obtain the daily load data of electric residential users, the data contains P samples, and each sample has a data set matrix of Q time point attributes;
步骤2:对电力居民用户日负荷数据进行预处理,获得初始群集;Step 2: Preprocess the daily load data of electric residential users to obtain the initial cluster;
步骤3:对初始群集进行降维处理,获得降维群集;Step 3: Perform dimensionality reduction processing on the initial cluster to obtain a dimensionality reduction cluster;
步骤4:采用聚类算法1对降维群集进行初步聚类,得到N个初始聚类中心;Step 4: Use clustering algorithm 1 to perform preliminary clustering on the dimensionality reduction clusters to obtain N initial cluster centers;
步骤5:采用聚类算法2对聚类算法1得到的初始聚类中心进行聚类,并使用聚类有效性指标,对聚类结果进行评估,最终得到M个聚类中心;Step 5: Use
步骤6:采用步骤5得到的M个聚类中心,作为聚类算法2的初始聚类中心,对数据进行聚类,获得行为相似的用户群,并对获得的行为相似的用户群进行行为特征分析。Step 6: Use the M clustering centers obtained in
实施例2:Example 2:
基于复合聚类算法的电力居民用户日负荷曲线聚类方法,包括以下步骤:The clustering method of daily load curve of electric residential users based on compound clustering algorithm includes the following steps:
首先,获取电力居民用户日负荷数据,该数据包含有P个样本、每个样本有Q个时间点属性的数据集矩阵。First, the daily load data of electric residential users is obtained, which contains a data set matrix with P samples and Q time point attributes for each sample.
一般情况下,电网公司营销系统经过的数据集包括数万或更多的样本,每个样本为一个电力居民用户,随着智能电表的普及,统计每个用户的居民用户日负荷数据变得非常容易。In general, the data set passed by the power grid company's marketing system includes tens of thousands or more samples, each of which is a residential electricity user. With the popularization of smart meters, it has become very difficult to count the daily load data of each user's residential users. easy.
然后,对获取的电力居民用户日负荷数据进行预处理,获得初始群集,其中,本实施例中,预处理过程包括对电力居民用户日负荷用电数据进行缺失值处理,数据标准化,数据正则化以及数据降维,经过以上处理后,获得的初始群集为p个样本、每个样本有q个属性的数据集矩阵。Then, perform preprocessing on the acquired daily load data of electric residential users to obtain an initial cluster, wherein, in this embodiment, the preprocessing process includes performing missing value processing, data standardization, and data regularization on the electric residential user daily load power consumption data As well as data dimensionality reduction, after the above processing, the initial cluster obtained is a dataset matrix of p samples and each sample has q attributes.
其中,缺失值处理具体为,对有效值少的样本进行删除,对有效值多的样本的缺失值进行补全。当然,在删除有效值少的属性时,可一并将冗余属性进行删除。Specifically, the missing value processing is to delete the samples with few valid values, and complete the missing values of the samples with many valid values. Of course, when deleting attributes with few valid values, redundant attributes can be deleted together.
删除样本的过程中,若删除n个样本,则剩余p个样本,其中,p=P-n。另外,对缺失值进行补充的方式有多种,本申请中,对已有有效性取其平均作为缺失值的填充值。本领域技术人员可根据选择其他补充方法,其不均不影响之后的分析过程。In the process of deleting samples, if n samples are deleted, p samples remain, where p=P-n. In addition, there are various ways to supplement the missing values. In this application, the average of the existing validity is taken as the filling value of the missing value. Those skilled in the art can choose other supplementary methods according to their non-uniformity and will not affect the subsequent analysis process.
数据标准化具体为,将原始数据线性化的方法转换到[0,1]的范围,最大-最小归一化的计算公式为The data normalization is specifically, the method of linearizing the original data is converted to the range of [0, 1], and the calculation formula of the maximum-minimum normalization is:
该方法实现对原始数据的等比例缩放,其中,Xnorm为归一化后的数据,X为原始数据,Xmax、Xmin分别为原始数据集的最大值和最小值。The method realizes equal scaling of the original data, wherein X norm is the normalized data, X is the original data, and X max and X min are the maximum and minimum values of the original data set, respectively.
数据正则化具体为,将每个属性减去该属性对应的均值,然后,再除以该属性对应方差。经过标准化与正则化处理后,每个属性的数据都聚集在0附近,且方差为1,即获得的样本数据具有零均值和单位方差。Data regularization is performed by subtracting the mean corresponding to the attribute from each attribute, and then dividing by the corresponding variance of the attribute. After standardization and regularization, the data of each attribute are clustered around 0 and the variance is 1, that is, the obtained sample data has zero mean and unit variance.
数据降维具体为,采用PCA(Principal Component Analysis),即主成分分析方法,对数据集进行降维,得到处理后的降维群集R。The data dimensionality reduction is specifically, using PCA (Principal Component Analysis), that is, principal component analysis method, to reduce the dimensionality of the data set, and obtain the processed dimensionality reduction cluster R.
降维的过程中,若降维数为q,那么降维后的降维群集R则为p个样本、每个样本有q个属性的数据集矩阵。In the process of dimensionality reduction, if the number of dimensionality reduction is q, then the dimensionality reduction cluster R after dimensionality reduction is a dataset matrix with p samples and each sample has q attributes.
之后,采用聚类算法,对数据集R内的数据进行聚类,获得用电行为相似的的用户群,具体包括,首先,采用Mean-shift算法,将数据集中的p个样本聚成N类,其中N为正整数,然后采用K-means算法对Mean-shift算法得到的N个聚类中心进行聚类,在聚类数目N范围内,对[2,N]分别聚类,其中,N为正整数,并使用Calinski-Harabasz(CH)指标对聚类结果进行评估,选取CH值最大的结果,最终得到M个聚类中心,其中M为[2,N]中的正整数,最后,采用得到的M个聚类中心作为K-means算法的初始聚类中心对数据集中的每个样本,即每个用户或每条记录,进行聚类,最后得到M个类的用户。Afterwards, the clustering algorithm is used to cluster the data in the data set R, and the user groups with similar electricity consumption behaviors are obtained, which includes, first, the Mean-shift algorithm is used to cluster the p samples in the data set into N categories , where N is a positive integer, and then the K-means algorithm is used to cluster the N cluster centers obtained by the Mean-shift algorithm, and within the range of the number of clusters N, cluster [2, N] respectively, where N is a positive integer, and the Calinski-Harabasz (CH) index is used to evaluate the clustering results, select the result with the largest CH value, and finally get M cluster centers, where M is a positive integer in [2, N], and finally, The obtained M cluster centers are used as the initial cluster centers of the K-means algorithm to perform clustering on each sample in the data set, that is, each user or each record, and finally obtain M classes of users.
本实施例中,Mean-shift算法的具体过程包括:In this embodiment, the specific process of the Mean-shift algorithm includes:
首先,从数据集中找到任意一样本i,对该样本点进行均值漂移向量计算并改变当前中心点位置;然后,平移窗口,重新计算概率密度;最终收敛到概率密度极大值处,Mean-shift处理数据集R中的下一个对象。First, find any sample i from the data set, calculate the mean shift vector of the sample point and change the current center point position; then, shift the window and recalculate the probability density; finally converge to the maximum value of the probability density, Mean-shift Process the next object in the dataset R.
同一个类中的数据属性值越相似或者相等,这个类中的样本密度就越大。每个行为相似的用户群称为一个类,最终获得多个相似的类,每个类都有其中心样本点,用户群依次命名为类1,类2…,类N。The more similar or equal the data attribute values in the same class, the greater the density of samples in this class. Each user group with similar behavior is called a class, and finally multiple similar classes are obtained, each class has its central sample point, and the user groups are named as class 1,
本实施例中,K-means算法的具体过程包括:In this embodiment, the specific process of the K-means algorithm includes:
第一步,将N个中心样本点记为X={x1,x2,...,xN},从集群X中任意找到k个点Y={y1,y2,...,yk}作为聚类中心,其中k属于[2,N];The first step is to record the N central sample points as X={x 1 , x 2 ,...,x N }, and find k points Y={y 1 , y 2 ,... ,y k } as the cluster center, where k belongs to [2, N];
第二步,计算集群X中的每个点到Y中k个聚类中心点的距离,并将其分到距离最小的聚类中心点所对应的类中;The second step is to calculate the distance from each point in cluster X to the k cluster center points in Y, and classify it into the class corresponding to the cluster center point with the smallest distance;
第三步,对每个聚类中心进行重新计算;The third step is to recalculate each cluster center;
第四步,重复第二步和第三步直到聚类中心的位置不再变化;The fourth step, repeat the second and third steps until the position of the cluster center does not change;
第五步,计算出对应k值的Calinski-Harabasz(CH)指标;The fifth step is to calculate the Calinski-Harabasz (CH) index corresponding to the k value;
第六步,使用对于从2到N的每个k值,重复第一步到第五步,选取Calinski-Harabasz(CH)指标最大值对应的k值记为K,对应聚类中心记为Z={Z1,Z2,...,ZK};The sixth step is to use for each k value from 2 to N, repeat the first step to the fifth step, select the k value corresponding to the maximum value of the Calinski-Harabasz (CH) index as K, and the corresponding cluster center as Z ={Z 1 ,Z 2 ,...,Z K };
第七步,计算数据集R中每个点到Z中K个聚类中心点的距离,并将其分到距离最小的聚类中心点所对应的类中;The seventh step is to calculate the distance from each point in the data set R to the K cluster center points in Z, and divide it into the class corresponding to the cluster center point with the smallest distance;
第八步,重复第七步和第三步直到聚类中心的位置不再变化;The eighth step, repeat the seventh and third steps until the position of the cluster center does not change;
第九步,输出聚类结果。The ninth step, output the clustering results.
本实施例中,Calinski-Harabasz(CH)指标的具体计算过程如下所示:In this embodiment, the specific calculation process of the Calinski-Harabasz (CH) indicator is as follows:
其中,g表示聚类的数目,h表示当前的类,trB(h)表示类间离差矩阵的迹,trW(h)表示类内离差矩阵的迹。CH越大代表着类自身越紧密,类与类之间越分散,即更优的聚类结果。Among them, g represents the number of clusters, h represents the current class, trB(h) represents the trace of the inter-class dispersion matrix, and trW(h) represents the trace of the intra-class dispersion matrix. The larger the CH, the closer the class itself is, and the more dispersed the classes are, that is, the better the clustering results.
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910983879.9A CN110781332A (en) | 2019-10-16 | 2019-10-16 | Clustering method of daily load curve of electric residential users based on compound clustering algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910983879.9A CN110781332A (en) | 2019-10-16 | 2019-10-16 | Clustering method of daily load curve of electric residential users based on compound clustering algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110781332A true CN110781332A (en) | 2020-02-11 |
Family
ID=69385688
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910983879.9A Pending CN110781332A (en) | 2019-10-16 | 2019-10-16 | Clustering method of daily load curve of electric residential users based on compound clustering algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110781332A (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111523794A (en) * | 2020-04-21 | 2020-08-11 | 国网四川省电力公司电力科学研究院 | Environment-friendly management and control measure response studying and judging method based on power utilization characteristics of pollution emission enterprises |
CN111612033A (en) * | 2020-04-15 | 2020-09-01 | 广东电网有限责任公司 | Distribution transformer fault diagnosis method based on gravity search and density peak clustering |
CN111724278A (en) * | 2020-06-11 | 2020-09-29 | 国网吉林省电力有限公司 | A fine classification method and system for power multi-load users |
CN111784093A (en) * | 2020-03-27 | 2020-10-16 | 国网浙江省电力有限公司 | An auxiliary judgment method for enterprise resumption of work based on power big data analysis |
CN111860634A (en) * | 2020-07-15 | 2020-10-30 | 国网福建省电力有限公司 | Load Clustering Method Based on OCHNN-Kmeans Algorithm |
CN111898857A (en) * | 2020-04-07 | 2020-11-06 | 沈阳工业大学 | BEMD and kmeans-based power user characteristic analysis method and system |
CN112270338A (en) * | 2020-09-27 | 2021-01-26 | 西安理工大学 | Power load curve clustering method |
CN112381137A (en) * | 2020-11-10 | 2021-02-19 | 重庆大学 | New energy power system reliability assessment method, device, equipment and storage medium |
CN112836769A (en) * | 2021-03-10 | 2021-05-25 | 广东电网有限责任公司电力调度控制中心 | Demand response user classification method and system based on principal component analysis |
CN113011702A (en) * | 2021-02-07 | 2021-06-22 | 国网浙江省电力有限公司金华供电公司 | User energy utilization characteristic mining method based on curve clustering algorithm |
CN113449793A (en) * | 2021-06-28 | 2021-09-28 | 国网北京市电力公司 | Method and device for determining power utilization state |
CN113515593A (en) * | 2021-04-23 | 2021-10-19 | 平安科技(深圳)有限公司 | Topic detection method and device based on clustering model and computer equipment |
CN113743977A (en) * | 2021-06-28 | 2021-12-03 | 国网上海市电力公司 | User behavior-based electricity consumption data feature extraction method and system |
CN113837311A (en) * | 2021-09-30 | 2021-12-24 | 南昌工程学院 | Resident customer clustering method and device based on demand response data |
CN114336651A (en) * | 2022-01-04 | 2022-04-12 | 国网四川省电力公司营销服务中心 | Power dispatching method and device based on peak clipping potential |
CN114386485A (en) * | 2021-12-21 | 2022-04-22 | 桂林航天工业学院 | A Stress Curve Clustering Method for Architectural Fiber Bragg Grating Stress Sensors |
CN115049021A (en) * | 2022-08-11 | 2022-09-13 | 江西合一云数据科技股份有限公司 | Data processing method and device applied to public cluster management and equipment thereof |
CN115221728A (en) * | 2022-08-09 | 2022-10-21 | 国网福建省电力有限公司 | A typical daily generation method of integrated energy system based on GMM clustering |
CN115687955A (en) * | 2023-01-03 | 2023-02-03 | 南昌工程学院 | Voting based resident user load curve clustering method and device |
CN115964650A (en) * | 2022-12-28 | 2023-04-14 | 福州大学 | K-medoids cluster analysis-based method for mining installed capacity of residential air conditioner |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090142029A1 (en) * | 2007-12-03 | 2009-06-04 | Institute For Information Industry | Motion transition method and system for dynamic images |
CN103942606A (en) * | 2014-03-13 | 2014-07-23 | 国家电网公司 | Residential electricity consumption customer segmentation method based on fruit fly intelligent optimization algorithm |
CN104992297A (en) * | 2015-07-10 | 2015-10-21 | 国家电网公司 | Electricity fee collection risk assessment device based on big data platform clustering algorithm and method thereof |
CN109284851A (en) * | 2018-06-11 | 2019-01-29 | 西安交通大学 | A classification method of consumer electricity behavior suitable for demand-side response |
CN109522934A (en) * | 2018-10-22 | 2019-03-26 | 云南电网有限责任公司 | A kind of power consumer clustering method based on clustering algorithm |
-
2019
- 2019-10-16 CN CN201910983879.9A patent/CN110781332A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090142029A1 (en) * | 2007-12-03 | 2009-06-04 | Institute For Information Industry | Motion transition method and system for dynamic images |
CN103942606A (en) * | 2014-03-13 | 2014-07-23 | 国家电网公司 | Residential electricity consumption customer segmentation method based on fruit fly intelligent optimization algorithm |
CN104992297A (en) * | 2015-07-10 | 2015-10-21 | 国家电网公司 | Electricity fee collection risk assessment device based on big data platform clustering algorithm and method thereof |
CN109284851A (en) * | 2018-06-11 | 2019-01-29 | 西安交通大学 | A classification method of consumer electricity behavior suitable for demand-side response |
CN109522934A (en) * | 2018-10-22 | 2019-03-26 | 云南电网有限责任公司 | A kind of power consumer clustering method based on clustering algorithm |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111784093A (en) * | 2020-03-27 | 2020-10-16 | 国网浙江省电力有限公司 | An auxiliary judgment method for enterprise resumption of work based on power big data analysis |
CN111784093B (en) * | 2020-03-27 | 2023-07-11 | 国网浙江省电力有限公司 | Enterprise reworking auxiliary judging method based on power big data analysis |
CN111898857A (en) * | 2020-04-07 | 2020-11-06 | 沈阳工业大学 | BEMD and kmeans-based power user characteristic analysis method and system |
CN111612033A (en) * | 2020-04-15 | 2020-09-01 | 广东电网有限责任公司 | Distribution transformer fault diagnosis method based on gravity search and density peak clustering |
CN111523794A (en) * | 2020-04-21 | 2020-08-11 | 国网四川省电力公司电力科学研究院 | Environment-friendly management and control measure response studying and judging method based on power utilization characteristics of pollution emission enterprises |
CN111724278A (en) * | 2020-06-11 | 2020-09-29 | 国网吉林省电力有限公司 | A fine classification method and system for power multi-load users |
CN111860634A (en) * | 2020-07-15 | 2020-10-30 | 国网福建省电力有限公司 | Load Clustering Method Based on OCHNN-Kmeans Algorithm |
CN112270338A (en) * | 2020-09-27 | 2021-01-26 | 西安理工大学 | Power load curve clustering method |
CN112381137B (en) * | 2020-11-10 | 2024-06-07 | 重庆大学 | Reliability assessment method, device, equipment and storage medium for new energy power system |
CN112381137A (en) * | 2020-11-10 | 2021-02-19 | 重庆大学 | New energy power system reliability assessment method, device, equipment and storage medium |
CN113011702A (en) * | 2021-02-07 | 2021-06-22 | 国网浙江省电力有限公司金华供电公司 | User energy utilization characteristic mining method based on curve clustering algorithm |
CN112836769A (en) * | 2021-03-10 | 2021-05-25 | 广东电网有限责任公司电力调度控制中心 | Demand response user classification method and system based on principal component analysis |
CN112836769B (en) * | 2021-03-10 | 2024-02-02 | 广东电网有限责任公司电力调度控制中心 | Demand response user classification method and system based on principal component analysis |
CN113515593A (en) * | 2021-04-23 | 2021-10-19 | 平安科技(深圳)有限公司 | Topic detection method and device based on clustering model and computer equipment |
CN113743977A (en) * | 2021-06-28 | 2021-12-03 | 国网上海市电力公司 | User behavior-based electricity consumption data feature extraction method and system |
CN113449793A (en) * | 2021-06-28 | 2021-09-28 | 国网北京市电力公司 | Method and device for determining power utilization state |
CN113837311B (en) * | 2021-09-30 | 2023-10-10 | 南昌工程学院 | A method and device for clustering residential customers based on demand response data |
CN113837311A (en) * | 2021-09-30 | 2021-12-24 | 南昌工程学院 | Resident customer clustering method and device based on demand response data |
CN114386485A (en) * | 2021-12-21 | 2022-04-22 | 桂林航天工业学院 | A Stress Curve Clustering Method for Architectural Fiber Bragg Grating Stress Sensors |
CN114336651A (en) * | 2022-01-04 | 2022-04-12 | 国网四川省电力公司营销服务中心 | Power dispatching method and device based on peak clipping potential |
CN114336651B (en) * | 2022-01-04 | 2023-08-01 | 国网四川省电力公司营销服务中心 | Power scheduling method and device based on peak clipping potential |
CN115221728A (en) * | 2022-08-09 | 2022-10-21 | 国网福建省电力有限公司 | A typical daily generation method of integrated energy system based on GMM clustering |
CN115049021A (en) * | 2022-08-11 | 2022-09-13 | 江西合一云数据科技股份有限公司 | Data processing method and device applied to public cluster management and equipment thereof |
CN115964650A (en) * | 2022-12-28 | 2023-04-14 | 福州大学 | K-medoids cluster analysis-based method for mining installed capacity of residential air conditioner |
CN115964650B (en) * | 2022-12-28 | 2025-05-09 | 福州大学 | A method for mining installed capacity of residential air conditioners based on K-medoids cluster analysis |
CN115687955A (en) * | 2023-01-03 | 2023-02-03 | 南昌工程学院 | Voting based resident user load curve clustering method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110781332A (en) | Clustering method of daily load curve of electric residential users based on compound clustering algorithm | |
CN104063480B (en) | A kind of load curve parallel clustering method of electrically-based big data | |
CN111724278A (en) | A fine classification method and system for power multi-load users | |
CN108805213B (en) | Power load curve double-layer spectral clustering method considering wavelet entropy dimensionality reduction | |
CN112819299A (en) | Differential K-means load clustering method based on center optimization | |
CN103440539B (en) | A method for processing user electricity data | |
CN107133652A (en) | Electricity customers Valuation Method and system based on K means clustering algorithms | |
CN110163675A (en) | A kind of Price Evaluation of Real Estate method and system | |
CN109558467B (en) | Method and system for identifying electricity user categories | |
CN110738232A (en) | A method for diagnosing the causes of grid voltage over-limit based on data mining technology | |
CN111553444A (en) | A load identification method based on non-intrusive load terminal data | |
CN113591899A (en) | Power customer portrait recognition method and device and terminal equipment | |
CN111612228A (en) | An analysis method of user's electricity consumption behavior based on electricity consumption information | |
CN116842405A (en) | Power load data clustering method, system, equipment and storage medium | |
CN104850612B (en) | Distribution network user load characteristic classification method based on enhanced aggregation hierarchical clustering | |
CN115526277A (en) | Daily load curve clustering method based on convolution variational self-encoder | |
CN108197425A (en) | A kind of intelligent grid data resolving method based on Non-negative Matrix Factorization | |
CN114897097A (en) | Power consumer portrait method, device, equipment and medium | |
CN116894744A (en) | Power grid user data analysis method based on improved k-means clustering algorithm | |
CN111898857A (en) | BEMD and kmeans-based power user characteristic analysis method and system | |
CN118861536A (en) | A method for processing abnormal power consumption behavior based on power consumption in substations | |
CN111915116A (en) | A Classification Method of Electricity Residential Users Based on K-means Clustering | |
CN110516849A (en) | A load classification result evaluation method based on typical daily load curve | |
CN114611738A (en) | A Load Forecasting Method Based on User's Electricity Behavior Analysis | |
Kumar et al. | A deep clustering framework for load pattern segmentation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200211 |