CN109063769B - Clustering method, system and medium for automatically determining cluster number based on coefficient of variation - Google Patents
Clustering method, system and medium for automatically determining cluster number based on coefficient of variation Download PDFInfo
- Publication number
- CN109063769B CN109063769B CN201810864958.3A CN201810864958A CN109063769B CN 109063769 B CN109063769 B CN 109063769B CN 201810864958 A CN201810864958 A CN 201810864958A CN 109063769 B CN109063769 B CN 109063769B
- Authority
- CN
- China
- Prior art keywords
- cluster
- clustering
- paper
- clusters
- calculating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a clustering method, a system and a medium for automatically confirming cluster quantity based on a coefficient of variation, wherein the density value of each data point in a data set is calculated, the density index is calculated according to the density value, and the data point with the maximum density index is selected as a first clustering center; calculating the shortest distance between each data point and the current existing clustering center, then calculating the probability of selecting each data point as the clustering center according to the shortest distance, and preselecting the clustering centers according to a wheel disc method; until a set clustering center is selected, performing k-means clustering according to the selected initial clustering center to generate clusters with corresponding number; calculating the average intra-cluster variation coefficient and the minimum inter-cluster variation coefficient, then calculating the difference value between the average intra-cluster variation coefficient and the minimum inter-cluster variation coefficient, comparing the difference value with a set value, and merging the two clusters with the minimum inter-cluster variation coefficient if the difference value is smaller than the set value; and outputting the clustering result until the difference value is greater than or equal to the set value.
Description
Technical Field
The invention relates to a clustering method, a system and a medium for automatically confirming cluster quantity based on a variation coefficient.
Background
With the rapid development of information technology, a large amount of data materials stored in different forms are accumulated in many industries such as businesses, enterprises, scientific research institutions and government departments, and various useful information is often hidden in the large amount of data and is difficult to obtain only by means of a query retrieval mechanism and a statistical method of a database, so that a data mining technology is rapidly developed, and a cluster analysis technology is an important research field in data mining and has been widely used in many applications including pattern recognition, data analysis, image processing and market research.
The clustering analysis technique is an unsupervised learning method in which a partition-based clustering algorithm is simple and can be used for various data types, but the number of clusters needs to be set in advance and is sensitive to initial clustering centers, and the k-means + + algorithm is an improvement over the conventional k-means algorithm, but the defect that the number of clusters is set manually still exists.
Disclosure of Invention
In order to solve the defects of the prior art, the invention provides a clustering method, a system and a medium for automatically confirming the cluster number based on a variation coefficient, which solve the defects that the traditional k-means + + clustering algorithm manually sets the cluster number and the initial centroid is not properly selected, improve the k-means + + clustering algorithm based on division by using the concepts of the variation coefficient and the density index, do not need to manually set the cluster number, and ensure the accuracy of a clustering result;
in order to solve the technical problems, the invention adopts the following technical scheme:
as a first aspect of the present invention, there is provided a clustering method of automatically confirming the number of clusters based on a coefficient of variation;
the clustering method for automatically confirming the cluster number based on the coefficient of variation comprises the following steps:
step (1): calculating the density value of each data point in the data set, calculating a density index according to the density value, and selecting the data point with the maximum density index as a first clustering center;
step (2): calculating the shortest distance between each data point and the current existing clustering center, then calculating the probability of selecting each data point as the clustering center according to the shortest distance, and finally preselecting the clustering center according to a wheel disc method; the density index of the preselected clustering center is greater than a set threshold;
and (3): repeating the step (2) until a set number of clustering centers are selected, and then performing k-means clustering according to the selected initial clustering centers to generate clusters with corresponding numbers;
and (4): calculating the average intra-cluster variation coefficient and the minimum inter-cluster variation coefficient, then calculating the difference value between the average intra-cluster variation coefficient and the minimum inter-cluster variation coefficient, comparing the difference value with a set value, and merging the two clusters with the minimum inter-cluster variation coefficient if the difference value is smaller than the set value; and (5) repeating the step (4) until the difference value is larger than or equal to the set value, and outputting a clustering result.
Further, the step of calculating the density value of each data point in the data set comprises:
hypothesis data set (S)1,S2,…,Sd) Has d-dimensional property, and data space S ═ S1×S2×…×SdIs a d-dimensional data space, x ∈ (x)1,x2,…,xd) Representing data points in the data set on a d-dimensional data space.
First, the number k of initial clusters is set*(k1<k*<k2) A value of (a), wherein k1And k2Are each greater than the number of target clusters.
Then, the density value ρ of the data point x is calculatedxAnd expressed by equations (1) and (2):
where num is the number of data points, dxy(f) is a function that determines whether the distance between the data point y and the data point x is less than or equal to the density range R;
further, calculating a density index according to the density value, and selecting a data point with the maximum density index as a first clustering center; comprises the following steps:
according to density value rhoxCalculate the data density index di (density index) and take the data point with the highest density index as the first cluster center:
further, the step of calculating the shortest distance between each data point and the current existing clustering center is as follows:
according to the mode of selecting the initial clustering center in the k-means + + algorithm, for the rest data points in the data set, sequentially calculating the distance between the data point and the selected initial clustering center, and comparing and selecting the shortest distance as the shortest distance D (x) between the data point and the current existing clustering center.
Further, the step of calculating the probability of each data point being selected as the cluster center according to the shortest distance comprises:
wherein, D (x) represents the shortest distance between each data point and the current existing cluster center; p (x) represents the probability of each data point being selected as a cluster center;
further, the step of preselecting the clustering center according to the roulette method comprises the following steps:
setting a threshold value tau, wherein only when the density index of the preselected clustering center reaches tau, the preselected clustering center can be used as a formal clustering center, otherwise, a new data point is reselected as the clustering center; repeating the roulette method until k is selected*And (4) clustering centers.
Further, the step of calculating the average intra-cluster coefficient of variation is:
first, the intra-cluster coefficient of variation CV for each cluster is calculatedi:
Wherein, muiIs the centroid of cluster i, miNumber of data points for cluster i, xjIs the jth data point, k, in cluster i*Indicating the number of preselected cluster centers.
Since a larger coefficient of variation indicates more discrete data points, the degree of cluster aggregation is reflected by calculating the intra-cluster coefficient of variation.
Further, the step of calculating the minimum inter-cluster variation coefficient is:
first, the inter-cluster variation coefficient CV is calculatedij:
Then, the minimum inter-cluster variation coefficient D is calculatedmin:
Dmin=min{CVij,i=1,2,…,k*,j=1,2,…,k*} (8)
Wherein m isijNumber of data points, μ, for clusters i and jijIs the centroid of cluster i and cluster j, xlThe ith data point in cluster i and cluster j.
Further, calculating a difference value between the average intra-cluster variation coefficient and the minimum inter-cluster variation coefficient, comparing the difference value with a set value, and merging the two clusters with the minimum inter-cluster variation coefficient if the difference value is smaller than the set value; if the difference value is larger than or equal to the set value, the step of outputting the clustering result is as follows:
calculating the difference value T between the average intra-cluster variation coefficient and the minimum inter-cluster variation coefficient, and judging whether cluster combination needs to be carried out according to the difference value:
if T is greater than or equal to 0, that isWhen 0 is less than or equal to T<When epsilon, merging two clusters with the minimum inter-cluster variation coefficient;
and when epsilon is less than or equal to T, outputting the number of clusters and the data point corresponding to each cluster.
As a second aspect of the present invention, there is provided a clustering system that automatically confirms the number of clusters based on a coefficient of variation;
a clustering system for automatically determining the number of clusters based on the coefficient of variation, comprising: the computer program product comprises a memory, a processor, and computer instructions stored on the memory and executed on the processor, wherein the computer instructions, when executed by the processor, perform the steps of any of the above methods.
As a third aspect of the present invention, there is provided a computer-readable storage medium;
a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, perform the steps of any of the above methods.
Compared with the prior art, the invention has the beneficial effects that:
the k-means + + clustering algorithm based on division is improved by using the concepts of the variation coefficient and the density index, the number of clusters does not need to be manually set, and the accuracy of a clustering result is also ensured.
The data point with the maximum density index is selected as the first clustering center, and the clustering algorithm based on division is sensitive to the selection of the initial centroid, so that abnormal values in the data set can be effectively avoided.
The improved clustering algorithm for automatically confirming the cluster number optimizes the confirmation of the cluster number and the selection of the initial centroid by utilizing the concept of the coefficient of variation, greatly improves the clustering quality, and can be effectively applied to the clustering analysis of data.
And the intra-cluster variation coefficient is used for representing the intra-cluster cohesion degree of the clusters, the inter-cluster variation coefficient is used for representing the inter-cluster separation degree of the clusters, and when the cohesion degree and the separation degree reach the maximum, the clustering effect is optimal.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.
Fig. 1 is a flow chart of a clustering algorithm for automatically determining the number of clusters based on the coefficient of variation.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the clustering method for automatically determining the number of clusters based on the coefficient of variation includes:
step 1: calculating each data point in the data setDensity value ρxThe density index DI is calculated from the density values and the data point with the highest density index is selected as the first cluster center.
Step 2: calculating the shortest distance D (x) between each data point and the current existing clustering center, then calculating the probability P (x) of each data point being selected as the next clustering center according to the distance, finally selecting the preselected clustering center according to a roulette method, and when the density index of the preselected clustering center reaches a threshold value tau, using the preselected clustering center as a new clustering center, or recalculating and selecting.
Step 3: step2 is repeated until k is selected*(k1<k*<k2) Individual clustering centers and performing k-means clustering to generate k*And (4) clustering.
Step 4: calculating the mean intra-cluster coefficient of variationAnd minimum inter-cluster coefficient of variation DminObtaining a difference value T if T<0, i.e.Merging the two clusters with smaller separation degree; if T is greater than or equal to 0, that isWhen 0 is less than or equal to T<And when epsilon is less than or equal to T, the clustering effect is optimal.
Step 5: and (5) circularly executing Step4 until the clustering effect is optimal.
Firstly, an initial clustering center is selected by utilizing the concept of density index, and the clustering quality is improved. The initial clustering center is selected by calculating the density value of each data point, calculating the density index according to the density value, selecting the data point with the maximum density index as the first clustering center, then calculating the probability that the data point is selected as the next clustering center according to the distance between the data point and the existing clustering center, confirming other clustering centers, and finally performing a k-means algorithm to form the initial clustering when the density index of the clustering center reaches a certain threshold value.
The topic of the conference paper is various, so that cluster analysis needs to be performed on the conference paper to gather papers with similar topics. But we do not know the specific number of categories at first, so in order to obtain high-quality clustering effect, the proposed clustering algorithm for automatically confirming the number of clusters is applied to this. The NIPS conference paper from 1987 to 2015 is taken as an experimental data set, and clustering analysis is mainly performed on the conference paper according to the number of times of using English words in the conference paper in the data set. The data set has 11463-dimensional attributes and 5811 sample data, and the data space S ═ S1×S2×…×S11463Is a data space of 11463 dimensions, x ∈ (x)1,x2,…,x5811) Showing the number of occurrences of each word in one of the NIPS conference papers.
Confirming the number of categories of the initial conference paper, and randomly confirming k*(k1<k*<k2) A value of (a), wherein k1And k2Are all significantly larger values than the number of categories of the target meeting paper.
Computing a meeting paper data set (S)1,S2,…,S11463) Density value ρ of Mediterranean paper xxThat is, the difference degree between the conference papers x is smaller than or equal to the number of conference papers in the density range,
where num is the number of meeting papers, dxyThe difference between the conference paper y and the conference paper x in the conference paper data set is R is the density range, and f (X) is a function for judging whether the difference between the conference paper y and the conference paper x is less than or equal to the density range R.
Density value ρ from each meeting paperxCalculating the density index DI (Density index) and using the meeting paper with the largest density index as the first clustering center, namely DImaxAnd is expressed by the formula (3),
the meeting discussion with the largest density index is selected as the first clustering center, because the clustering algorithm based on division is sensitive to the selection of the initial centroid, and abnormal discussion data can be effectively avoided by selecting the meeting discussion with the larger density as the clustering center, so that the clustering quality is improved.
Calculating the minimum difference degree D (x) of each conference paper and the current existing clustering center, then calculating the probability of each conference paper being selected as the next clustering center according to the difference degree,
for the selection of the initial clustering center, the conference papers with larger mutual difference should be selected as the clustering center, so that the probability that each conference paper is selected as the clustering center is calculated, and the larger the difference with the existing clustering center is, the larger the probability that the conference paper is selected as the clustering center is, so that the selected clustering center is relatively discrete.
And selecting a preselected clustering center according to the probability by a roulette method, setting a threshold tau as the clustering algorithm based on division is sensitive to abnormal values, and taking the preselected clustering center as a formal clustering center only when the density index of the preselected clustering center reaches the threshold tau, otherwise, reselecting a new conference paper as the clustering center. This process is repeated until k is selected*A cluster center based on the obtained k*Performing a conventional k-means algorithm to form k at an initial clustering center*And (4) clustering.
Number of paper categories k due to initial selection*Is obviously larger than the target k value, so the number of clusters needs to be mergedReduced to k, but the number of article categories of the target is not known at first, so the concept of coefficient of variation is introduced to determine when to stop the merging of clusters. By calculating k*And determining whether the category number of the thesis is optimal or not by the relation between the average intra-cluster variation coefficient and the minimum inter-cluster variation coefficient of each cluster, namely, representing the intra-cluster cohesion degree of the clusters by using the intra-cluster variation coefficient, representing the inter-cluster separation degree of the clusters by using the inter-cluster variation coefficient, and when the cohesion degree and the separation degree are both maximized, obtaining the optimal clustering effect.
The concept of the variation coefficient is introduced, the variation coefficient is a statistic for representing the data distribution condition and is used for reflecting the dispersion degree of the data, the advantage is that the average value of the data does not need to be referred, the data is a dimensionless quantity, when two groups of data with different dimensions or different average values are compared, the variation coefficient rather than the standard deviation is used as a reference for comparison, and therefore, the threshold value for calculating the cluster number by adopting the variation coefficient is suitable for all types of data sets.
It means the ratio of the variation index of a set of data to the average index thereof, i.e., the ratio of the standard deviation sigma to the average value mu, and is expressed by the formulas (5) and (6),
calculating the intra-cluster variation coefficient of each cluster according to the variation coefficient, and then averaging the intra-cluster variation coefficientsAnd expressed by equations (7) and (8),
wherein, muiIs the centroid of cluster i, miNumber of meeting papers for cluster i, xjIs the jth meeting paper in cluster i. Since a larger coefficient of variation indicates a more dispersed distribution of the conference paper, how well the degree of aggregation of each cluster is reflected by calculating the intra-cluster coefficient of variation.
Calculating inter-cluster variation coefficient between any two clusters according to the variation coefficient, and then obtaining minimum value D of the inter-cluster variation coefficientminAnd expressed by formulas (9) and (10),
Dmin=min{CVij,i=1,2,…,k*,j=1,2,…,k*} (10)
wherein m isijThe sum of the number of meeting papers for cluster i and cluster j, μijIs the centroid of cluster i and cluster j, xlThe first meeting paper in cluster i and cluster j. The separation degree of the two clusters is reflected by calculating the variation coefficient among the clusters.
Calculating the difference value T between the variation coefficient in the average cluster and the variation coefficient between the minimum clusters, judging whether cluster combination is needed or not according to the difference value,
if T<0, i.e.Indicating the presence of two clusters with a smaller inter-cluster variation coefficient. The smaller the inter-cluster variation coefficient, the more cohesive the distribution of the meeting papers in the two clusters, the lower the degree of separation; because the number of the clusters which are initially arranged is larger than that of the target clusters, the variation coefficients in the average clusters are smaller, the variation amplitude is smaller, the degree of agglomeration of each cluster is higher, and the method has the advantages thatOnly a merging of clusters needs to be performed. The strategy of merging is to merge two clusters with the least separation, i.e. the inter-cluster variation coefficient is DminTwo clusters of (a).
If T is greater than or equal to 0, that isWhen 0 is less than or equal to T<When epsilon is generated, the difference value is smaller, which indicates that two clusters with smaller inter-cluster variation coefficient exist, the closer the inter-cluster variation coefficient and the intra-cluster variation coefficient are, the more the distribution of the conference papers in the two clusters is aggregated, the lower the separation degree is, and the higher the aggregation degree of each cluster is, the cluster merging needs to be performed; when epsilon is less than or equal to T, a certain difference exists, which indicates that the inter-cluster variation coefficients are large, the larger the difference between the inter-cluster variation coefficients and the intra-cluster variation coefficients is, the more discrete the distribution of the conference papers in the two clusters is, the larger the separation degree is, and simultaneously, the higher the cohesion degree of each cluster is, when the separation degrees among all the clusters reach a certain degree, the good clustering effect is achieved, and the best number of the conference paper categories can be obtained.
If the clusters are merged, the average intra-cluster variation coefficient needs to be recalculatedAnd minimum inter-cluster coefficient of variation DminAnd then judging whether the optimal clustering effect is achieved or not according to the difference value of the two, otherwise, continuing to merge clusters, and circularly executing the process until the termination condition is achieved.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (3)
1. A clustering method for automatically confirming the number of meeting paper clusters based on variation coefficients is characterized by comprising the following steps:
taking NIPS conference paper from 1987 to 2015 as an experimental data set according to the meetings in the data setClustering analysis is carried out on the conference paper by using the times of English words in the discussion paper, the data set has 11463-dimensional attributes and 5811 sample data, and the data space S is S ═ S1×S2×…×S11463Is a data space of 11463 dimensions, x ∈ (x)1,x2,...,x5811) Representing the occurrence number of each word in one NIPS conference paper;
step (1) confirming the number of the categories of the initial conference paper and randomly confirming k*(k1<k*<k2) A value of (a), wherein k1And k2Are each a value greater than the number of categories of the target meeting paper,
computing a meeting paper data set (S)1,S2,…,S11463) Density value ρ of Mediterranean paper xxThat is, the difference degree between the conference papers x is smaller than or equal to the number of conference papers in the density range,
where num is the number of meeting papers, dxyThe difference degree between the conference paper y and the conference paper x in the conference paper data set is determined, R is a density range, f (X) is a function for judging whether the difference degree between the conference paper y and the conference paper x is less than or equal to the density range R, and according to the density value rho of each conference paperxCalculating the density index DI, and using the meeting article with the highest density index as the first clustering center, i.e. DImaxThe number of the atoms, expressed as,
step (2), calculating the minimum difference degree D (x) between each conference paper and the current existing clustering center, then calculating the probability of each conference paper being selected as the next clustering center according to the difference degree,
and (3) selecting a preselected clustering center according to the probability by a wheel disc method, setting a threshold tau, taking the preselected clustering center as a formal clustering center only when the density index of the preselected clustering center reaches the tau, otherwise, reselecting a new conference paper as the clustering center, and repeating the wheel disc method until k is selected*A cluster center based on the obtained k*An initial clustering center, performing k-means clustering to form k*Clustering;
step (4), calculating the intra-cluster variation coefficient of each cluster, and then calculating the average value of the intra-cluster variation coefficientsAs indicated by the general representation of the,
wherein, muiIs the centroid of cluster i, miNumber of meeting papers for cluster i, xjFor the jth meeting paper in cluster i,
calculating the inter-cluster variation coefficient between any two clusters, and then finding the minimum value D of the inter-cluster variation coefficientminThe number of the atoms, expressed as,
Dmin=min{CVij,i=1,2,…,k*,j=1,2,…,k*};
wherein m isijThe sum of the number of meeting papers for cluster i and cluster j, μijIs the centroid of cluster i and cluster j, xlCalculating the difference value T between the average intra-cluster variation coefficient and the minimum inter-cluster variation coefficient for the 1 st meeting paper in the cluster i and the cluster j, judging whether cluster combination needs to be carried out or not according to the difference value,
if T is greater than or equal to 0, that isWhen T is more than or equal to 0 and less than epsilon, merging two clusters with the minimum inter-cluster variation coefficient;
when epsilon is less than or equal to T, the optimal clustering effect is achieved, the number of clusters and the data points corresponding to each cluster are output,
if the clusters are merged, the average intra-cluster variation coefficient needs to be recalculatedAnd minimum inter-cluster coefficient of variation DminAnd then judging whether the optimal clustering effect is achieved or not according to the difference value of the two, otherwise, continuing to merge clusters, and circularly executing the process until the termination condition is achieved.
2. A clustering system for automatically confirming the number of meeting paper clusters based on coefficient of variation comprises: a memory, a processor, and computer instructions stored on the memory and executed on the processor, which when executed by the processor, perform the steps of claim 1.
3. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, perform the steps of claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810864958.3A CN109063769B (en) | 2018-08-01 | 2018-08-01 | Clustering method, system and medium for automatically determining cluster number based on coefficient of variation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810864958.3A CN109063769B (en) | 2018-08-01 | 2018-08-01 | Clustering method, system and medium for automatically determining cluster number based on coefficient of variation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109063769A CN109063769A (en) | 2018-12-21 |
CN109063769B true CN109063769B (en) | 2021-04-09 |
Family
ID=64832407
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810864958.3A Active CN109063769B (en) | 2018-08-01 | 2018-08-01 | Clustering method, system and medium for automatically determining cluster number based on coefficient of variation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109063769B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111027585B (en) * | 2019-10-25 | 2023-04-07 | 南京大学 | K-means algorithm hardware realization method and system based on k-means + + centroid initialization |
CN111368876A (en) * | 2020-02-11 | 2020-07-03 | 广东工业大学 | Double-threshold sequential clustering method |
CN111476270B (en) * | 2020-03-04 | 2024-04-30 | 中国平安人寿保险股份有限公司 | Course information determining method, device, equipment and storage medium based on K-means algorithm |
CN111833171B (en) * | 2020-03-06 | 2021-06-25 | 北京芯盾时代科技有限公司 | Abnormal operation detection and model training method, device and readable storage medium |
CN111507428B (en) * | 2020-05-29 | 2024-01-05 | 深圳市商汤科技有限公司 | Data processing method and device, processor, electronic equipment and storage medium |
CN112070387B (en) * | 2020-09-04 | 2023-09-26 | 北京交通大学 | Method for evaluating multipath component clustering performance of complex propagation environment |
CN112053063B (en) * | 2020-09-08 | 2023-12-19 | 山东大学 | Load partitioning method and system for planning and designing energy system |
CN113378682B (en) * | 2021-06-03 | 2023-04-07 | 山东省科学院自动化研究所 | Millimeter wave radar fall detection method and system based on improved clustering algorithm |
CN113301600A (en) * | 2021-07-27 | 2021-08-24 | 南京中网卫星通信股份有限公司 | Abnormal data detection method and device for performance of satellite and wireless communication converged network |
CN116109933B (en) * | 2023-04-13 | 2023-06-23 | 山东省土地发展集团有限公司 | Dynamic identification method for ecological restoration of abandoned mine |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105139282A (en) * | 2015-08-20 | 2015-12-09 | 国家电网公司 | Power grid index data processing method, device and calculation device |
CN105488589A (en) * | 2015-11-27 | 2016-04-13 | 江苏省电力公司电力科学研究院 | Genetic simulated annealing algorithm based power grid line loss management evaluation method |
CN106570729A (en) * | 2016-11-14 | 2017-04-19 | 南昌航空大学 | Air conditioner reliability influence factor-based regional clustering method |
CN107133652A (en) * | 2017-05-17 | 2017-09-05 | 国网山东省电力公司烟台供电公司 | Electricity customers Valuation Method and system based on K means clustering algorithms |
CN107229751A (en) * | 2017-06-28 | 2017-10-03 | 济南大学 | A kind of concurrent incremental formula association rule mining method towards stream data |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8473215B2 (en) * | 2003-04-25 | 2013-06-25 | Leland Stanford Junior University | Method for clustering data items through distance-merging and density-merging techniques |
-
2018
- 2018-08-01 CN CN201810864958.3A patent/CN109063769B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105139282A (en) * | 2015-08-20 | 2015-12-09 | 国家电网公司 | Power grid index data processing method, device and calculation device |
CN105488589A (en) * | 2015-11-27 | 2016-04-13 | 江苏省电力公司电力科学研究院 | Genetic simulated annealing algorithm based power grid line loss management evaluation method |
CN106570729A (en) * | 2016-11-14 | 2017-04-19 | 南昌航空大学 | Air conditioner reliability influence factor-based regional clustering method |
CN107133652A (en) * | 2017-05-17 | 2017-09-05 | 国网山东省电力公司烟台供电公司 | Electricity customers Valuation Method and system based on K means clustering algorithms |
CN107229751A (en) * | 2017-06-28 | 2017-10-03 | 济南大学 | A kind of concurrent incremental formula association rule mining method towards stream data |
Non-Patent Citations (2)
Title |
---|
Detecting cluster numbers based on density changes using density-index enhanced Scale-invariant density-based clustering initialization algorithm;Onapa Limwattanapibool等;《2017 9th International Conference on Information Technology and Electrical Engineering》;20180111;第1-5页 * |
聚类K-means算法的应用研究;石云平;《理论与方法》;20090831;第28-31页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109063769A (en) | 2018-12-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109063769B (en) | Clustering method, system and medium for automatically determining cluster number based on coefficient of variation | |
CN109615014B (en) | KL divergence optimization-based 3D object data classification system and method | |
CN101853389A (en) | Detection device and method for multi-class targets | |
WO2009099448A1 (en) | Methods and systems for score consistency | |
US7818322B2 (en) | Efficient method for clustering nodes | |
CN110111113B (en) | Abnormal transaction node detection method and device | |
CN112639842A (en) | Suppression of deviation data using machine learning models | |
CN115454779A (en) | Cloud monitoring stream data detection method and device based on cluster analysis and storage medium | |
CN111339247B (en) | Microblog subtopic user comment emotional tendency analysis method | |
CN115686432B (en) | Document evaluation method for retrieval sorting, storage medium and terminal | |
CN113111063A (en) | Medical patient main index discovery method applied to multiple data sources | |
WO2023050652A1 (en) | Text recognition-based method for determining esg index in region, and related product | |
CN111625578B (en) | Feature extraction method suitable for time series data in cultural science and technology fusion field | |
CN115544257B (en) | Method and device for quickly classifying network disk documents, network disk and storage medium | |
CN111914930A (en) | Density peak value clustering method based on self-adaptive micro-cluster fusion | |
CN110991517A (en) | Classification method and system for unbalanced data set in stroke | |
CN111652733A (en) | Financial information management system based on cloud computing and block chain | |
Qi et al. | Object retrieval with image graph traversal-based re-ranking | |
Ren et al. | Multivariate functional data clustering using adaptive density peak detection | |
CN111861706A (en) | Data discretization regulation and control method and system and risk control model establishing method and system | |
Choo et al. | Automatic folder allocation system for electronic text document repositories using enhanced Bayesian classification approach | |
Gonçalves et al. | Approaching authorship attribution as a multi-view supervised learning task | |
Li | Text Classification Retrieval Based on Complex Network and ICA Algorithm. | |
AlSaif | Large scale data mining for banking credit risk prediction | |
Zhou et al. | Information fusion for combining visual and textual image retrieval in imageclef@ icpr |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |