CN111191687A - Power communication data clustering method based on improved K-means algorithm - Google Patents

Power communication data clustering method based on improved K-means algorithm Download PDF

Info

Publication number
CN111191687A
CN111191687A CN201911286973.5A CN201911286973A CN111191687A CN 111191687 A CN111191687 A CN 111191687A CN 201911286973 A CN201911286973 A CN 201911286973A CN 111191687 A CN111191687 A CN 111191687A
Authority
CN
China
Prior art keywords
initial
classification
distance
clustering
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911286973.5A
Other languages
Chinese (zh)
Other versions
CN111191687B (en
Inventor
刘晴
刘旭
汤玮
金海�
姜海
董武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Power Grid Co Ltd
Original Assignee
Guizhou Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Power Grid Co Ltd filed Critical Guizhou Power Grid Co Ltd
Priority to CN201911286973.5A priority Critical patent/CN111191687B/en
Publication of CN111191687A publication Critical patent/CN111191687A/en
Application granted granted Critical
Publication of CN111191687B publication Critical patent/CN111191687B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a power communication data clustering method based on an improved K-means algorithm, which comprises the following steps: s101, carrying out standardized processing on the power communication data; s102, manually selecting an initial classification number K from the normalized data, determining an element distance matrix according to the K value, and determining K initial clustering centers; s103, selecting an element, and determining a classification group corresponding to the element by calculating the distance between the element and each initial clustering center; s104, updating the clustering centers of the classification groups, and determining the actual clustering centers of the classification groups; s105, obtaining the classification of the power communication data until the classification group is not changed any more; on the basis of the traditional K-means clustering algorithm, the initial classification number K value can be dynamically adjusted and improved according to the clustering effect so as to improve the clustering effect; the initial elements can be selected more reasonably according to the element distance matrix so as to improve classification rationality and have strong practicability.

Description

Power communication data clustering method based on improved K-means algorithm
Technical Field
The invention belongs to the technical field of power communication, and particularly relates to a power communication data clustering method based on an improved K-means algorithm.
Background
The electric power communication network has huge redundant data, the development of redundant data processing is important content of electric power communication data management, and data clustering is a preposed link of redundant data processing, so that the huge electric power communication data are classified, the type of the redundant data is analyzed according to the actual condition of the data in each class, and a redundant data processing method is adopted according to local conditions.
The K-means algorithm is a main method for data clustering of the current power communication network, the implementation flow of the traditional K-means algorithm is shown in figure 1, and the main flow comprises the following steps:
(1) giving a K value, and randomly selecting an initial element; the K value is the number of element classifications obtained by clustering. The classification number K value of the traditional K-means algorithm is given manually, and initial elements of each initial classification are selected from the elements to be clustered manually;
(2) judging element classification; judging the subordination relation between each element and each classification one by one according to the distance between each element and each classification center position;
(3) updating the classification center position; and after the element judgment is finished each time, updating the newly added elements to update the positions of all the classification centers.
The K value and the initial element are key factors for realizing element clustering in the K-means algorithm, and the K value and the initial element in the traditional K-means algorithm are both given manually, lack of scientific support and difficult to ensure clustering effect.
Disclosure of Invention
The invention overcomes the defects of the prior art, and solves the technical problems that: the power communication data clustering method based on the improved K-means algorithm is capable of adjusting the initial classification number K and the initial clustering center.
In order to solve the technical problems, the invention adopts the technical scheme that: a power communication data clustering method based on an improved K-means algorithm comprises the following steps: s101, carrying out standardized processing on the power communication data; s102, manually selecting an initial classification number K from the normalized data, determining an element distance matrix according to the K value, and determining K initial clustering centers; s103, selecting an element, and determining a classification group corresponding to the element by calculating the distance between the element and each initial clustering center; s104, updating the clustering centers of the classification groups, and determining the actual clustering centers of the classification groups; and S105, repeating the step S103 until the classification group is not changed any more, and obtaining the classification of the power communication data.
Further, still include:
and S106, judging whether the initial classification number K meets the optimal classification value.
Preferably, the power communication data is subjected to normalization processing, specifically, the power communication data is converted into character-type numerical values, continuous numerical values and discrete numerical values which are easy to process;
the character-type numerical conversion process comprises the following steps: the character type numerical values in the power communication data are subjected to value sharing, and a conversion formula can be expressed as follows:
Figure BDA0002318271660000021
in the formula (1), xi
Figure BDA0002318271660000022
Respectively taking the values of the character type attribute i of the power communication data before and after processing, Cha1、Cha2… … is N character values of the attribute, which can be converted into values between 0 and 1 according to the character attribute value types;
the continuous type values include: the continuous numerical value in the power communication data is processed by adopting a normalization method, and the processing formula can be expressed as follows:
Figure BDA0002318271660000023
in the formula (2), xi
Figure BDA0002318271660000024
Respectively taking values of the continuous type attribute i of the power communication data before and after processing,
Figure BDA0002318271660000025
and taking values of the continuous attribute.
Preferably, the normalized data is subjected to manual selection of an initial classification number K, an element distance matrix is determined according to the K value, and K initial clustering centers are determined, which specifically includes:
s1021, manually selecting an initial classification number K;
s1022, calculating the distance between each element according to an Euclidean distance formula;
Figure BDA0002318271660000026
assuming that the power communication data to be analyzed after data normalization processing has N items, and the data has M items of attributes, x in the formula (3)iDenotes the ith item, xi,jThe j attribute value of the ith item of data is represented,mrepresents dimension, d (x)i,xj) Representing data xiAnd data xjThe distance between them;
s1023, obtaining an element distance matrix according to the distance between the elements, and determining the average value of each row of elements, namely the average distance between the corresponding data of the row and all other data;
s1024, selecting the maximum average distance as the first initial clustering center, and selecting the remaining initial clustering centers to meet the target that the average distance between the remaining initial clustering centers and the selected initial elements is maximum, namely:
Figure BDA0002318271660000031
in the formula (4), J is the number of the selected initial elements, the number of the initial elements is increased one by one until the total number of the initial elements is equal to the number K of the initial classification, and t is set as the number K of the initial classificationHeart, then the set of initial cluster centers is (x)t,1,xt,2,Lxt,M)。
Preferably, the selecting an element, and determining the classification group corresponding to the element by calculating the distance between the element and each initial cluster center specifically includes: calculating the distance between each selected element and each initial clustering center by the formula (5), namely:
Figure BDA0002318271660000032
in the formula (5), d (x)i,xt) And clustering the element i into the classification with the minimum distance according to the distance value of the element and each initial clustering center.
Preferably, the updating the clustering centers of the classification groups and determining the actual clustering centers of the classification groups specifically include: when an element is added, the j-th attribute value updating formula of the central position of the classification group can be expressed as:
Figure BDA0002318271660000033
in the formula (6), xt,jTo add the j attribute value, x, of the actual cluster center t' in the cluster groupt,j' the j ' th attribute of the actual clustering center t ' in the classification group before adding elements is valued, xi,jTo increase the number of elements in the group after an element, NtAnd taking the value of the j attribute of the added element.
Preferably, the determining whether the initial classification number K satisfies the optimal classification value specifically includes:
s1061, calculating the distance between the actual clustering centers t' according to an Euclidean distance formula, namely:
Figure BDA0002318271660000041
in the formula (7), d (x)t1,xt2) As the actual cluster center t1Inter-class distance from the actual cluster center t 2;
s1061, calculating the minimum value of the inter-class distances among all the actual clustering centers t', namely the minimum inter-class distance TDmin
S1062, calculating the average value of the inter-class distances among all the actual clustering centers t', namely the average inter-class distance TDave
S1063, calculating the maximum value of the distances of all elements in the same classification, namely the maximum intra-class distance ITDmax
S1064, judging the minimum inter-class distance TDminWhether much less than the mean inter-class distance TDaveReturning to step S102, otherwise, executing step S1065;
s1065, judging the ITD of the maximum intra-class distance by the rootmaxWhether much larger than the average inter-class distance TDaveIf so, returning to the step S102, otherwise, executing the step S1066;
and S1066, if the initial classification number K meets the optimal classification value, the classification of the power communication data can be output.
Preferably, the converted value ranges of the character-type value, the continuous-type value and the discrete-type value are all data between 0 and 1.
Compared with the prior art, the invention has the following beneficial effects:
the invention relates to a power communication data clustering method based on an improved K-means algorithm, which improves an initial classification number K and an initial clustering center which are manually given on the basis of the traditional K-means clustering algorithm, and can dynamically adjust and improve the value of the initial classification number K according to the clustering effect so as to improve the clustering effect; the initial elements can be selected more reasonably according to the element distance matrix so as to improve classification rationality.
Drawings
The present invention will be described in further detail with reference to the accompanying drawings;
FIG. 1 is a flow chart of a conventional K-means algorithm;
fig. 2 is a schematic flow chart of a power communication data clustering method based on an improved K-means algorithm according to an embodiment of the present invention;
fig. 3 is a schematic flow chart of a power communication data clustering method based on an improved K-means algorithm according to a second embodiment of the present invention;
fig. 4 is a schematic flow chart of a power communication data clustering method based on an improved K-means algorithm according to a third embodiment of the present invention;
fig. 5 is a schematic flow chart of a power communication data clustering method based on an improved K-means algorithm according to a fourth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the present invention; all other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 2 is a schematic flow chart of a power communication data clustering method based on an improved K-means algorithm according to an embodiment of the present invention, and as shown in fig. 2, the power communication data clustering method based on the improved K-means algorithm includes:
s101, carrying out standardized processing on the power communication data;
s102, manually selecting an initial classification number K from the normalized data, determining an element distance matrix according to the K value, and determining K initial clustering centers;
s103, selecting an element, and determining a classification group corresponding to the element by calculating the distance between the element and each initial clustering center;
s104, updating the clustering centers of the classification groups, and determining the actual clustering centers of the classification groups;
and S105, repeating the step S103 until the classification group is not changed any more, and obtaining the classification of the power communication data.
Specifically, in this embodiment, on the basis of the conventional K-means clustering algorithm, an initial classification number K value and an initial clustering center which are manually given are improved, in this embodiment, an element distance matrix is determined according to the given initial classification number K value, and a group of elements with the largest average distance is selected as the initial clustering center to enhance the discreteness of the initial elements, and the selection of the rest of the initial clustering centers can be more reasonably selected according to the element distance matrix, so that the clustering effect is improved, and the classification rationality is improved.
Fig. 3 is a schematic flow chart of a power communication data clustering method based on an improved K-means algorithm according to a second embodiment of the present invention, as shown in fig. 3, on the basis of the first embodiment, the method further includes:
and S106, judging whether the initial classification number K meets the optimal classification value.
In the embodiment, the initial classification number K selected manually can be dynamically adjusted and improved according to the clustering effect, so that the clustering effect is improved, the clustering rationality is improved, and the processing efficiency of redundant data of the power communication network is improved.
Further, in step S101, the power communication data is subjected to normalization processing, specifically, the power communication data is converted into character-type values, continuous-type values, and discrete-type values which are easy to process; and the converted value ranges of the character type numerical value, the continuous type numerical value and the discrete type numerical value are all data between 0 and 1.
The character-type numerical conversion process comprises the following steps: the character type numerical value can be counted to obtain the character value range, the common value of the character type numerical values in the electric power communication data is obtained without loss of generality, and the conversion formula can be expressed as follows:
Figure BDA0002318271660000061
in the formula (1), xi
Figure BDA0002318271660000062
Respectively taking the values of the character type attribute i of the power communication data before and after processing, Cha1、Cha2… … are N character type values of the attribute, according to the characterThe type attribute value category can be correspondingly converted into a numerical value between 0 and 1;
the continuous type values include: the continuous numerical value in the power communication data is processed by adopting a normalization method, and the processing formula can be expressed as follows:
Figure BDA0002318271660000063
in the formula (2), xi
Figure BDA0002318271660000064
Respectively taking values of the continuous type attribute i of the power communication data before and after processing,
Figure BDA0002318271660000065
upper and lower limit values for the value of the continuous attribute;
the discrete numerical processing mode is similar to the character numerical processing mode, and the discrete numerical processing mode and the character numerical processing mode are also converted according to the value possibility.
Fig. 4 is a schematic flow chart of a power communication data clustering method based on an improved K-means algorithm according to a third embodiment of the present invention, as shown in fig. 4, on the basis of the second embodiment, the normalized data is subjected to manual selection of an initial classification number K, an element distance matrix is determined according to a K value, and K initial clustering centers are determined, which specifically includes:
s1021, manually selecting an initial classification number K;
s1022, calculating the distance between each element according to an Euclidean distance formula;
Figure BDA0002318271660000071
assuming that the power communication data to be analyzed after data normalization processing has N items, and the data has M items of attributes, x in the formula (3)iDenotes the ith item, xi,jThe j attribute value of the ith item of data is represented,mrepresenting dimension, defining the distance between data as the Euclidean space distance corresponding to each attribute valueI is then d (x)i,xj) Representing data xiAnd data xjThe distance between them;
s1023, obtaining an element distance matrix according to the distance between the elements, wherein the matrix is an NxN-order matrix, and the element in the ith row and the jth column is data xiAnd data xjDistance d (x) therebetweeni,xj) Determining the average value of each row element, namely the average distance between the corresponding data of the row and all other data;
s1024, selecting the element with the largest average distance as the first initial clustering center, wherein the selection of the remaining initial clustering centers should meet the target that the average distance between the remaining initial clustering centers and the selected initial element is the largest, namely:
Figure BDA0002318271660000072
in the formula (4), J is the number of the selected initial elements, the number of the initial elements is increased one by one until the total number of the initial elements is equal to the initial classification number K, and the initial clustering center value obtained according to the method has the maximum average distance and is most beneficial to clustering; determining the position of a classification center according to the selected initial clustering center, wherein the initial classification center position is the attribute value of the corresponding initial element, and if t is the initial clustering center, the set of the initial clustering centers is (x)t,1,xt,2,Lxt,M)。
Further, in step S103, the selecting an element, and determining the classification group corresponding to the element by calculating the distance between the element and each initial cluster center specifically includes: defining: the distance between the element and the classification is the Euclidean distance between the element and the classification initial clustering center, and the distance between the selected element and each initial clustering center is calculated, namely:
Figure BDA0002318271660000073
in the formula (5), d (x)i,xt) Is the distance between the element i and the initial cluster center t, in terms of the element to eachAnd clustering the distance value of the initial clustering centers into the classification with the minimum distance.
Further, in step S104, the updating the cluster centers of the classification groups, and determining the actual cluster centers of the classification groups, where the actual cluster centers refer to the average values of the attributes corresponding to all the elements belonging to the classification, and specifically includes: when an element is added, the j-th attribute value updating formula of the central position of the classification group can be expressed as:
Figure BDA0002318271660000081
in the formula (6), xt,jTo add the j attribute value, x, of the actual cluster center t' in the cluster groupt,j' the j ' th attribute of the actual clustering center t ' in the classification group before adding elements is valued, xi,jTo increase the number of elements in the group after an element, NtAnd taking the value of the j attribute of the added element.
Fig. 5 is a schematic flow chart of a power communication data clustering method based on an improved K-means algorithm according to a fourth embodiment of the present invention, as shown in fig. 5, on the basis of the third embodiment, the determining whether the initial classification number K satisfies the optimal classification value specifically includes:
s1061, calculating the distance between the actual clustering centers t' according to an Euclidean distance formula, namely:
Figure BDA0002318271660000082
in the formula (7), d (x)t1,xt2) The inter-class distance between the actual clustering center t1 and the actual clustering center t 2;
s1062, calculating the minimum value of the inter-class distances among all the actual clustering centers t', namely the minimum inter-class distance TDmin
S1063, calculating the average value of the inter-class distances among all the actual clustering centers t', namely the average inter-class distance TDave
S1064, calculating all element distances in the same classificationMaximum value of distance, i.e. maximum intra-class distance ITDmax
S1065, judging the minimum inter-class distance TDminWhether much less than the mean inter-class distance TDaveIf so, returning to the step S102, otherwise, executing the step S1066;
s1066, judging the ITD of the maximum intra-class distancemaxWhether much larger than the average inter-class distance TDaveIf so, returning to the step S102, otherwise, executing a step S1067;
and S1067, if the initial classification number K meets the optimal classification value, the classification of the power communication data can be output.
Specifically, if the manually selected initial classification number K is too large, which may cause the classification to exceed the actual requirement, there is a minimum inter-class distance TDminMuch smaller than the mean inter-class distance TDaveThe case (1); otherwise, if the initial classification number K is too small, the classification will be insufficient and the actual requirement will be met, and there is a maximum intra-class distance ITD of a certain groupmaxFar greater than the average inter-class distance TDaveThe case (1).
If the relationship exists:
TDmin>mmTDave(8)
in the formula (8), mm is a given small number, and can be generally 0.2, the value of K is considered to be overlarge, K-1 can be used for replacing the original value of K, and the step S102 is returned to for distance again;
if the relationship exists:
ITDmax>MMTDave(9)
in the formula (9), when MM is a given larger number and can be generally 8, the value of K is considered to be too small, K +1 can be used for replacing the original value of K, and the step (II) is returned to perform clustering again.
And when the initial classification number K does not satisfy the problems of the formulas (8) and (9), the initial classification number K is reasonable in value, and the output result is finished.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (8)

1. A power communication data clustering method based on an improved K-means algorithm is characterized by comprising the following steps: the method comprises the following steps:
s101, carrying out standardized processing on the power communication data;
s102, manually selecting an initial classification number K from the normalized data, determining an element distance matrix according to the K value, and determining K initial clustering centers;
s103, selecting an element, and determining a classification group corresponding to the element by calculating the distance between the element and each initial clustering center;
s104, updating the clustering centers of the classification groups, and determining the actual clustering centers of the classification groups;
and S105, repeating the step S103 until the classification group is not changed any more, and obtaining the classification of the power communication data.
2. The power communication data clustering method based on the improved K-means algorithm as claimed in claim 1, wherein: further comprising:
and S106, judging whether the initial classification number K meets the optimal classification value.
3. The power communication data clustering method based on the improved K-means algorithm as claimed in claim 1, wherein: the electric power communication data are subjected to normalized processing, specifically, the electric power communication data are converted into character type numerical values, continuous type numerical values and discrete type numerical values which are easy to process;
the character-type numerical conversion process comprises the following steps: the character type numerical values in the power communication data are subjected to value sharing, and a conversion formula can be expressed as follows:
Figure FDA0002318271650000011
in the formula (1), xi
Figure FDA0002318271650000012
Respectively taking the values of the character type attribute i of the power communication data before and after processing, Cha1、Cha2… … is N character values of the attribute, which can be converted into values between 0 and 1 according to the character attribute value types;
the continuous type values include: the continuous numerical value in the power communication data is processed by adopting a normalization method, and the processing formula can be expressed as follows:
Figure FDA0002318271650000013
in the formula (2), xi
Figure FDA0002318271650000021
Respectively taking values of the continuous type attribute i of the power communication data before and after processing,
Figure FDA0002318271650000022
and taking values of the continuous attribute.
4. The power communication data clustering method based on the improved K-means algorithm as claimed in claim 1, wherein: the normalized data is subjected to manual selection of an initial classification number K, an element distance matrix is determined according to a K value, and K initial clustering centers are determined, and the method specifically comprises the following steps:
s1021, manually selecting an initial classification number K;
s1022, calculating the distance between each element according to an Euclidean distance formula;
Figure FDA0002318271650000023
suppose thatThe electric power communication data to be analyzed after data normalization processing have N items, the data contain M items with attribute, and x in formula (3)iDenotes the ith item, xi,jJ attribute value representing ith item of data, m represents dimension, d (x)i,xj) Representing data xiAnd data xjThe distance between them;
s1023, obtaining an element distance matrix according to the distance between the elements, and determining the average value of each row of elements, namely the average distance between the corresponding data of the row and all other data;
s1024, selecting the maximum average distance as the first initial clustering center, and selecting the remaining initial clustering centers to meet the target that the average distance between the remaining initial clustering centers and the selected initial elements is maximum, namely:
Figure FDA0002318271650000024
in the formula (4), J is the number of the selected initial elements, the number of the initial elements is increased one by one until the total number of the initial elements is equal to the initial classification number K, and t is the initial clustering center, so that the set of the initial clustering centers is (x)t,1,xt,2,L xt,M)。
5. The power communication data clustering method based on the improved K-means algorithm as claimed in claim 1, wherein: selecting an element, and determining a classification group corresponding to the element by calculating the distance between the element and each initial cluster center, specifically comprising: calculating the distance between each selected element and each initial clustering center by the formula (5), namely:
Figure FDA0002318271650000025
in the formula (5), d (x)i,xt) And clustering the element i into the classification with the minimum distance according to the distance value of the element and each initial clustering center.
6. The power communication data clustering method based on the improved K-means algorithm as claimed in claim 1, wherein: the updating of the clustering centers of the classification groups and the determination of the actual clustering centers of the classification groups specifically include: when an element is added, the j-th attribute value updating formula of the central position of the classification group can be expressed as:
Figure FDA0002318271650000031
in the formula (6), xt,jTo add the j attribute value, x, of the actual cluster center t' in the cluster groupt,j' the j ' th attribute of the actual clustering center t ' in the classification group before adding elements is valued, xi,jTo increase the number of elements in the group after an element, NtAnd taking the value of the j attribute of the added element.
7. The power communication data clustering method based on the improved K-means algorithm as claimed in claim 2, wherein: the determining whether the initial classification number K satisfies the optimal classification value specifically includes:
s1061, calculating the distance between the actual clustering centers t' according to an Euclidean distance formula, namely:
Figure FDA0002318271650000032
in the formula (7), d (x)t1,xt2) The inter-class distance between the actual clustering center t1 and the actual clustering center t 2;
s1061, calculating the minimum value of the inter-class distances among all the actual clustering centers t', namely the minimum inter-class distance TDmin
S1062, calculating the average value of the inter-class distances among all the actual clustering centers t', namely the average inter-class distance TDave
S1063, calculating the maximum value of the distances of all elements in the same classification, namely the maximum intra-class distance ITDmax
S1064, judging the minimum inter-class distance TDminWhether much less than the mean inter-class distance TDaveReturning to step S102, otherwise, executing step S1065;
s1065, judging the ITD of the maximum intra-class distance by the rootmaxWhether much larger than the average inter-class distance TDaveIf so, returning to the step S102, otherwise, executing the step S1066;
and S1066, if the initial classification number K meets the optimal classification value, the classification of the power communication data can be output.
8. The power communication data clustering method based on the improved K-means algorithm as claimed in claim 3, wherein: and the converted value ranges of the character type numerical value, the continuous type numerical value and the discrete type numerical value are all data between 0 and 1.
CN201911286973.5A 2019-12-14 2019-12-14 Power communication data clustering method based on improved K-means algorithm Active CN111191687B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911286973.5A CN111191687B (en) 2019-12-14 2019-12-14 Power communication data clustering method based on improved K-means algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911286973.5A CN111191687B (en) 2019-12-14 2019-12-14 Power communication data clustering method based on improved K-means algorithm

Publications (2)

Publication Number Publication Date
CN111191687A true CN111191687A (en) 2020-05-22
CN111191687B CN111191687B (en) 2023-02-10

Family

ID=70709187

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911286973.5A Active CN111191687B (en) 2019-12-14 2019-12-14 Power communication data clustering method based on improved K-means algorithm

Country Status (1)

Country Link
CN (1) CN111191687B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680764A (en) * 2020-08-13 2020-09-18 国网浙江省电力有限公司 Industry reworking and production-resuming degree monitoring method
CN111680937A (en) * 2020-08-13 2020-09-18 国网浙江省电力有限公司营销服务中心 Small and micro enterprise rework rate evaluation method based on power data grading and empowerment
CN112507607A (en) * 2020-11-12 2021-03-16 中国电建集团中南勘测设计研究院有限公司 Method for correcting pressure intensity calculation result of water-proof curtain wall
CN116360352A (en) * 2022-12-02 2023-06-30 山东和信智能科技有限公司 Intelligent control method and system for power plant

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101329683A (en) * 2008-07-25 2008-12-24 华为技术有限公司 Recommendation system and method
CN103440566A (en) * 2013-08-27 2013-12-11 北京京东尚科信息技术有限公司 Method and device for generating order picking collection lists and method for optimizing order picking route
CN105095516A (en) * 2015-09-16 2015-11-25 中国传媒大学 Broadcast television subscriber grouping system and method based on spectral clustering integration
CN106202335A (en) * 2016-06-28 2016-12-07 银江股份有限公司 A kind of big Data Cleaning Method of traffic based on cloud computing framework
CN106682079A (en) * 2016-11-21 2017-05-17 云南电网有限责任公司电力科学研究院 Detection method of user's electricity consumption behavior of user based on clustering analysis
WO2018157286A1 (en) * 2017-02-28 2018-09-07 深圳市大疆创新科技有限公司 Recognition method and device, and movable platform
CN108629375A (en) * 2018-05-08 2018-10-09 广东工业大学 Power customer sorting technique, system, terminal and computer readable storage medium
CN108898154A (en) * 2018-09-29 2018-11-27 华北电力大学 A kind of electric load SOM-FCM Hierarchical clustering methods
CN109034231A (en) * 2018-07-17 2018-12-18 辽宁大学 The deficiency of data fuzzy clustering method of information feedback RBF network valuation
CN109271427A (en) * 2018-10-17 2019-01-25 辽宁大学 A kind of clustering method based on neighbour's density and manifold distance
US20190073416A1 (en) * 2016-11-14 2019-03-07 Ping An Technology (Shenzhen) Co., Ltd. Method and device for processing question clustering in automatic question and answering system
CN109685128A (en) * 2018-12-18 2019-04-26 电子科技大学 A kind of MB-kmeans++ clustering method and the user conversation clustering method based on it
CN109934301A (en) * 2019-03-22 2019-06-25 广东电网有限责任公司 A kind of power load aggregation analysis method, device and equipment
CN110263837A (en) * 2019-06-13 2019-09-20 河海大学 A kind of circuit breaker failure diagnostic method based on multilayer DBN model

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101329683A (en) * 2008-07-25 2008-12-24 华为技术有限公司 Recommendation system and method
CN103440566A (en) * 2013-08-27 2013-12-11 北京京东尚科信息技术有限公司 Method and device for generating order picking collection lists and method for optimizing order picking route
CN105095516A (en) * 2015-09-16 2015-11-25 中国传媒大学 Broadcast television subscriber grouping system and method based on spectral clustering integration
CN106202335A (en) * 2016-06-28 2016-12-07 银江股份有限公司 A kind of big Data Cleaning Method of traffic based on cloud computing framework
US20190073416A1 (en) * 2016-11-14 2019-03-07 Ping An Technology (Shenzhen) Co., Ltd. Method and device for processing question clustering in automatic question and answering system
CN106682079A (en) * 2016-11-21 2017-05-17 云南电网有限责任公司电力科学研究院 Detection method of user's electricity consumption behavior of user based on clustering analysis
WO2018157286A1 (en) * 2017-02-28 2018-09-07 深圳市大疆创新科技有限公司 Recognition method and device, and movable platform
CN108629375A (en) * 2018-05-08 2018-10-09 广东工业大学 Power customer sorting technique, system, terminal and computer readable storage medium
CN109034231A (en) * 2018-07-17 2018-12-18 辽宁大学 The deficiency of data fuzzy clustering method of information feedback RBF network valuation
CN108898154A (en) * 2018-09-29 2018-11-27 华北电力大学 A kind of electric load SOM-FCM Hierarchical clustering methods
CN109271427A (en) * 2018-10-17 2019-01-25 辽宁大学 A kind of clustering method based on neighbour's density and manifold distance
CN109685128A (en) * 2018-12-18 2019-04-26 电子科技大学 A kind of MB-kmeans++ clustering method and the user conversation clustering method based on it
CN109934301A (en) * 2019-03-22 2019-06-25 广东电网有限责任公司 A kind of power load aggregation analysis method, device and equipment
CN110263837A (en) * 2019-06-13 2019-09-20 河海大学 A kind of circuit breaker failure diagnostic method based on multilayer DBN model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李秀馨等: "基于改进FCM算法的卫星云图聚类方法研究", 《红外技术》 *
邹臣嵩等: "基于最大距离积与最小距离和协同K 聚类算法", 《计算机应用与软件》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680764A (en) * 2020-08-13 2020-09-18 国网浙江省电力有限公司 Industry reworking and production-resuming degree monitoring method
CN111680937A (en) * 2020-08-13 2020-09-18 国网浙江省电力有限公司营销服务中心 Small and micro enterprise rework rate evaluation method based on power data grading and empowerment
CN111680937B (en) * 2020-08-13 2020-11-13 国网浙江省电力有限公司营销服务中心 Small and micro enterprise rework rate evaluation method based on power data grading and empowerment
CN112507607A (en) * 2020-11-12 2021-03-16 中国电建集团中南勘测设计研究院有限公司 Method for correcting pressure intensity calculation result of water-proof curtain wall
CN112507607B (en) * 2020-11-12 2023-02-10 中国电建集团中南勘测设计研究院有限公司 Method for correcting pressure intensity calculation result of water-proof curtain wall
CN116360352A (en) * 2022-12-02 2023-06-30 山东和信智能科技有限公司 Intelligent control method and system for power plant
CN116360352B (en) * 2022-12-02 2024-04-02 山东和信智能科技有限公司 Intelligent control method and system for power plant

Also Published As

Publication number Publication date
CN111191687B (en) 2023-02-10

Similar Documents

Publication Publication Date Title
CN111191687B (en) Power communication data clustering method based on improved K-means algorithm
CN107578288B (en) Non-invasive load decomposition method considering user power consumption mode difference
CN106446967A (en) Novel power system load curve clustering method
CN108932557A (en) A kind of Short-term Load Forecasting Model based on temperature cumulative effect and grey relational grade
CN106408008A (en) Load curve distance and shape-based load classification method
CN111489188B (en) Resident adjustable load potential mining method and system
CN112367675B (en) Wireless sensor network data fusion method and network system based on self-encoder
CN114040272B (en) Path determination method, device and storage medium
CN110705685A (en) Neural network quantitative classification method and system
CN115696690B (en) Distributed intelligent building illumination self-adaptive energy-saving control method
CN111541628A (en) Power communication network service resource allocation method and related device
CN109272058A (en) Integrated power load curve clustering method
CN114781717A (en) Network point equipment recommendation method, device, equipment and storage medium
CN114358378A (en) User side energy storage optimal configuration system and method for considering demand management
CN113676357A (en) Decision method for edge data processing in power internet of things and application thereof
Gong et al. Adaptive interactive genetic algorithms with individual interval fitness
CN113112177A (en) Transformer area line loss processing method and system based on mixed indexes
Lin et al. Deployment method of power terminal edge control center based on cloud-edge cooperative mode
CN117034046A (en) Flexible load adjustable potential evaluation method based on ISODATA clustering
CN111080164A (en) Power load clustering result evaluation method based on daily load curve
CN110689452A (en) Clustering algorithm-based power market business center service center planning method
CN108205721B (en) Spline interpolation typical daily load curve selecting device based on clustering
CN114781703A (en) Hierarchical multi-objective optimization method, terminal equipment and storage medium
CN115186882A (en) Clustering-based controllable load spatial density prediction method
CN106777298A (en) A kind of distributed clustering method based on fractal technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant