CN109063769A - Clustering method, system and the medium of number of clusters amount are automatically confirmed that based on the coefficient of variation - Google Patents
Clustering method, system and the medium of number of clusters amount are automatically confirmed that based on the coefficient of variation Download PDFInfo
- Publication number
- CN109063769A CN109063769A CN201810864958.3A CN201810864958A CN109063769A CN 109063769 A CN109063769 A CN 109063769A CN 201810864958 A CN201810864958 A CN 201810864958A CN 109063769 A CN109063769 A CN 109063769A
- Authority
- CN
- China
- Prior art keywords
- cluster
- coefficient
- variation
- data point
- clusters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Abstract
The invention discloses clustering method, system and media that number of clusters amount is automatically confirmed that based on the coefficient of variation, calculate the density value of each data point in data set, calculate dnesity index according to density value, select the maximum data point of dnesity index as first cluster centre;The shortest distance between each data point and current existing cluster centre is calculated, the probability of cluster centre is then chosen as according to each data point of minimum distance calculation, preselects cluster centre according to wheel disc method;Until selecting setting cluster centre, k-means cluster is carried out to generate the cluster of corresponding number according to the initial cluster center selected;The coefficient of variation between the coefficient of variation and most tuftlet in calculating mean cluster, then, the difference for calculating the coefficient of variation between the coefficient of variation and most tuftlet in mean cluster, difference is compared with setting value, if difference is less than setting value, the smallest two clusters of the coefficient of variation between two clusters are merged;Until difference is more than or equal to setting value, then cluster result is exported.
Description
Technical field
The present invention relates to clustering method, system and media that number of clusters amount is automatically confirmed that based on the coefficient of variation.
Background technique
With the fast development of information technology, many industries such as business, enterprise, scientific research institution and government department are all accumulated
Magnanimity, different form storage data information often implies various useful information in these mass datas, only
The query and search mechanism and statistical method for only relying on database are difficult to obtain these information, therefore the also data rapidly developed
Digging technology, Clustering Analysis Technology are an important fields of research in data mining, have been widely used in many and have answered
In, including pattern-recognition, data analysis, image procossing and market survey.
Clustering Analysis Technology is a kind of unsupervised learning method, wherein the clustering algorithm based on division is simple and can be with
For various data types, but need the quantity of setting cluster and, k-means++ algorithm pair sensitive to initial cluster center in advance
Traditional k-means algorithm is improved, but still haves the defects that the quantity of cluster is manually arranged.
Summary of the invention
In order to solve the deficiencies in the prior art, the present invention provides the cluster sides that number of clusters amount is automatically confirmed that based on the coefficient of variation
Method, system and medium, solve traditional k-means++ clustering algorithm be manually arranged cluster quantity and initial mass center choose not
When defect, the k-means++ clustering algorithm based on division is changed using the concept of the coefficient of variation and dnesity index
Into also ensuring the accuracy of cluster result it is not necessary that the quantity of cluster is manually arranged;
In order to solve the above-mentioned technical problem, the present invention adopts the following technical scheme:
As the first aspect of the present invention, the clustering method that number of clusters amount is automatically confirmed that based on the coefficient of variation is provided;
The clustering method of number of clusters amount is automatically confirmed that based on the coefficient of variation, comprising:
Step (1): calculating the density value of each data point in data set, calculates dnesity index according to density value, selects close
The maximum data point of index is spent as first cluster centre;
Step (2): the shortest distance between each data point and current existing cluster centre is calculated, then according to most short distance
From the probability that each data point of calculating is chosen as cluster centre, finally, preselecting cluster centre according to wheel disc method;The pre-selection cluster
The dnesity index at center is greater than given threshold;
Step (3): repeating step (2), until the cluster centre of setting number is selected, it is then initial according to selecting
Cluster centre carries out k-means cluster to generate the cluster of corresponding number;
Step (4): then the coefficient of variation between the coefficient of variation and most tuftlet in calculating mean cluster calculates variation in mean cluster
Difference is compared by the difference of the coefficient of variation between coefficient and most tuftlet with setting value, if difference is less than setting value, by two
The smallest two clusters of the coefficient of variation merge between a cluster;It repeats step (4), until difference is more than or equal to setting value, then exports
Cluster result.
Further, the step of calculating the density value of each data point in data set are as follows:
Assuming that data set (S1, S2..., Sd) there is d dimension attribute, and data space S=S1×S2×…×SdIt is d dimension
According to space, x ∈ (x1,x2,…,xd) data point of the expression on d dimension data space in data set.
Firstly, the quantity k of setting initial cluster*(k1<k*<k2) value, wherein k1And k2It is the quantity greater than target cluster.
Then, the density value ρ of data point x is calculatedx, and indicated with formula (1) and (2):
Wherein, num is the number of data point, dxyFor the distance of data point y in data set to data point x, R is density model
It encloses, f (X) is to judge whether data point y is less than or equal to the function of density range R at a distance from data point x;
Further, dnesity index is calculated according to density value, selects the maximum data point of dnesity index poly- as first
Class center;The step of are as follows:
According to density value ρxIt calculates packing density index D I (Density Index), and by the maximum data of dnesity index
Point is used as first cluster centre:
Further, the step of calculating the shortest distance between each data point and current existing cluster centre are as follows:
According to the mode for selecting initial cluster center in k-means++ algorithm, for the remainder strong point in data set, according to
Secondary calculating data point compares at a distance from the initial cluster center having been selected out and selects shortest distance as the data
Shortest distance D (x) between point and current existing cluster centre.
Further, the step of probability of cluster centre being chosen as according to minimum distance calculation each data point are as follows:
Wherein, D (x) indicates the shortest distance between each data point and current existing cluster centre;P (x) indicates each
Data point is chosen as the probability of cluster centre;
Further, the step of preselecting cluster centre according to wheel disc method are as follows:
Threshold tau is set, only when the dnesity index for preselecting cluster centre reaches τ, just can be used as formal cluster centre,
Otherwise new data point is reselected as cluster centre;Wheel disc method is repeated always until selecting k*A cluster centre.
Further, the step of calculating the coefficient of variation in mean cluster are as follows:
Firstly, calculating coefficient of variation CV in the cluster of each clusteri:
Then, the coefficient of variation in mean cluster is calculated
Wherein, μiFor the mass center of cluster i, miFor the data point number of cluster i, xjFor j-th of data point in cluster i, k*Indicate pre-
The number for the cluster centre selected.
Because the coefficient of variation is bigger to illustrate that data point is more discrete, so reflecting cluster by the coefficient of variation in calculating cluster
The quality of condensation degree.
Further, between calculating most tuftlet the step of the coefficient of variation are as follows:
Firstly, calculating coefficient of variation CV between clusterij:
Then, coefficient of variation D between calculating most tuftletmin:
Dmin=min { CVij, i=1,2 ..., k*, j=1,2 ..., k*} (8)
Wherein, mijFor the number of data points of cluster i and cluster j, μijFor the mass center of cluster i and cluster j, xlFor the l in cluster i and cluster j
A data point.
Further, the difference for calculating the coefficient of variation between the coefficient of variation and most tuftlet in mean cluster, by difference and setting value
It is compared, if difference is less than setting value, the smallest two clusters of the coefficient of variation between two clusters is merged;If difference
More than or equal to setting value, then the step of exporting cluster result are as follows:
The difference T for calculating the coefficient of variation between the coefficient of variation and most tuftlet in mean cluster, according to difference judge whether to need into
The merging of row cluster:
If T < 0, i.e.,The smallest two clusters of the coefficient of variation between merging cluster;
If T >=0, i.e.,As 0≤T < ε, merge the smallest two clusters of the coefficient of variation between cluster;
As ε≤T, export cluster quantity and each cluster corresponding to data point.
As a second aspect of the invention, the clustering system that number of clusters amount is automatically confirmed that based on the coefficient of variation is provided;
The clustering system of number of clusters amount is automatically confirmed that based on the coefficient of variation, comprising: memory, processor and be stored in storage
The computer instruction run on device and on a processor, when the computer instruction is run by processor, completes any of the above-described side
Step described in method.
As the third aspect of the present invention, a kind of computer readable storage medium is provided;
A kind of computer readable storage medium, is stored thereon with computer instruction, and the computer instruction is transported by processor
When row, step described in any of the above-described method is completed.
Compared with prior art, the beneficial effects of the present invention are:
The k-means++ clustering algorithm based on division is changed using the concept of the coefficient of variation and dnesity index
Into also ensuring the accuracy of cluster result it is not necessary that the quantity of cluster is manually arranged.
It selects the maximum data point of dnesity index as first cluster centre, is due to the clustering algorithm pair based on division
The selection of initial mass center is more sensitive, can effectively avoid the exceptional value in data set in this way.
The improved clustering algorithm for automatically confirming that number of clusters amount, using the coefficient of variation concept number of clusters amount confirmation and just
It is all optimized in the selection of the prothyl heart, can have greatly improved on clustering result quality, data can be effectively applied to
Clustering.
Condensation degree in the cluster of cluster is indicated with the coefficient of variation in cluster, and separating degree between the cluster of cluster is indicated with the coefficient of variation between cluster, when
When condensation degree and separating degree reach maximum, Clustering Effect is optimal.
Detailed description of the invention
The accompanying drawings constituting a part of this application is used to provide further understanding of the present application, and the application's shows
Meaning property embodiment and its explanation are not constituted an undue limitation on the present application for explaining the application.
Fig. 1 is the clustering algorithm flow chart that number of clusters amount is automatically confirmed that based on the coefficient of variation.
Specific embodiment
It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the application.Unless another
It indicates, all technical and scientific terms used herein has usual with the application person of an ordinary skill in the technical field
The identical meanings of understanding.
It should be noted that term used herein above is merely to describe specific embodiment, and be not intended to restricted root
According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singular
Also it is intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet
Include " when, indicate existing characteristics, step, operation, device, component and/or their combination.
As shown in Figure 1, automatically confirming that the clustering method of number of clusters amount based on the coefficient of variation, comprising:
Step1: the density value ρ of each data point in data set is calculatedx, dnesity index DI, selection are calculated according to density value
The maximum data point of dnesity index is as first cluster centre.
Step2: the shortest distance D (x) between each data point and current existing cluster centre is calculated, then according to distance
Size calculates the probability P (x) that each data point is chosen as next cluster centre, finally by wheel disc method selection pre-selection cluster
The heart just can be used as new cluster centre, otherwise recalculate selection when the dnesity index for preselecting cluster centre reaches threshold tau.
Step3: repeating Step2, until selecting k*(k1<k*<k2) a cluster centre, and carry out k-means cluster and generate k*
A cluster.
Step4: the coefficient of variation in mean cluster is calculatedCoefficient of variation D between most tuftletmin, difference T is obtained, if T < 0, i.e.,Lesser two clusters of separating degree are merged;If T >=0, i.e.,As 0≤T < ε, by separating degree compared with
Two small clusters merge, and as ε≤T, Clustering Effect is optimal.
Step5: circulation executes Step4, until Clustering Effect is optimal.
First with the conceptual choice initial cluster center of dnesity index, clustering result quality is improved.The choosing of initial cluster center
Select is to calculate dnesity index by the density value of each data point of calculating according to density value, select the maximum data of dnesity index
Point is used as first cluster centre, and then basis calculates data point at a distance from existing cluster centre and is chosen as in next cluster
The probability of the heart confirms other cluster centres with this, while the dnesity index of cluster centre will reach certain threshold value, most laggard
Row k-means algorithm forms initial clustering.
The theme of meeting paper is varied, so needing to carry out clustering to meeting paper, will have similar topic
Paper be brought together.But we are not aware that specific categorical measure at the beginning, in order to obtain the cluster effect of high quality
Fruit, so the clustering algorithm for automatically confirming that number of clusters amount of proposition is applied to this.We were with NIPS meeting in 1987 to 2015
Argumentative writing is experimental data set, mainly according in the meeting paper in data set use English word number to meeting paper into
Row clustering.The data set has 11463 dimension attributes and 5811 sample datas, and data space S=S1×S2×…
×S11463It is 11463 dimension data spaces, x ∈ (x1,x2,…,x5811) indicate that each word goes out occurrence in a NIPS meeting paper
Several situations.
The quantity for confirming initial meeting paper classification, confirms k at random*(k1<k*<k2) value, wherein k1And k2It is obvious
Greater than the value of target meeting category of paper quantity.
Calculate meeting paper data set (S1, S2..., S11463) in meeting paper x density value ρx, i.e., with meeting paper x's
Diversity factor is less than or equal to the quantity of the meeting paper in density range,
Wherein, num is the quantity of meeting paper, dxyFor the difference of meeting paper y and meeting paper x in meeting paper data set
Different degree, R are density range, and f (X) is to judge whether meeting paper y and the diversity factor of meeting paper x are less than or equal to density range
The function of R.
According to the density value ρ of every meeting paperxIts dnesity index DI (Density Index) is calculated, and density is referred to
The maximum meeting paper of number is as first cluster centre, i.e. DImax, and indicated with formula (3),
The maximum meeting paper of dnesity index is selected as first cluster centre, to be because the cluster based on division is calculated
Method is more sensitive to the selection of initial mass center, selects the biggish meeting paper of density that can effectively avoid as cluster centre different
Normal paper data, to improve the quality of cluster.
The smallest diversity factor D (x) for calculating each meeting paper and current existing cluster centre, then according to diversity factor meter
The probability that each meeting paper is chosen as next cluster centre is calculated,
Selection for initial cluster center, it should select the biggish meeting paper of mutual diversity factor as cluster centre,
Therefore, the probability that each meeting paper is chosen as cluster centre is calculated, it is bigger with the diversity factor of existing cluster centre, then it is selected
Probability as cluster centre is bigger, so that the cluster centre relative discrete selected.
According to probability by wheel disc method selection pre-selection cluster centre, since the clustering algorithm based on division is more quick to exceptional value
Sense, so setting threshold tau just can be used as formal cluster centre only when the dnesity index for preselecting cluster centre reaches τ,
Otherwise new meeting paper is reselected as cluster centre.This process is repeated always until selecting k*A cluster centre, root
According to obtained k*A initial cluster center carries out traditional k-means algorithm and forms k*A cluster.
Due to the category of paper quantity k of initial selected*Significantly greater than target k value, so needing to carry out the merging of cluster for cluster
Number be reduced to k, but be not aware that the category of paper quantity of target at the beginning, so introduce the concept of the coefficient of variation,
Determine when the merging of stopping cluster.By calculating k*In the mean cluster of a cluster between the coefficient of variation and most tuftlet the coefficient of variation relationship
It determines whether the categorical measure of paper is optimal, i.e., indicates condensation degree in the cluster of cluster with the coefficient of variation in cluster, made a variation between cluster
Coefficient indicates separating degree between the cluster of cluster, and when condensation degree and separating degree reach maximum, Clustering Effect is optimal.
The concept of the coefficient of variation is introduced, the coefficient of variation is to indicate a statistic of data distribution situation, for reflecting number
According to dispersion degree, it is a characteristic that benefit, which is the average value for needing not refer to data, different comparing two groups of dimensions
Or when the different data of mean value, it should use the coefficient of variation rather than standard deviation is as the reference compared, therefore use the coefficient of variation
The threshold value for calculating number of clusters amount is suitable for all types of data sets.
It is meant that the ratio between the indicator of variation of one group of data and its average index, the i.e. ratio of standard deviation sigma and average value mu,
And indicated with formula (5) and (6),
The coefficient of variation in the cluster of each cluster is calculated according to the coefficient of variation, then seeks the average value of the coefficient of variation in clusterIt is used in combination
Formula (7) and (8) expression,
Wherein, μiFor the mass center of cluster i, miFor the quantity of the meeting paper of cluster i, xjFor the jth piece meeting paper in cluster i.Cause
For the bigger distribution for illustrating meeting paper of the coefficient of variation more disperses, so reflecting each cluster by the coefficient of variation in calculating cluster
Condensation degree quality.
According to the coefficient of variation between the cluster between coefficient of variation calculating any two cluster, the minimum of the coefficient of variation between cluster is then sought
Value Dmin, and indicated with formula (9) and (10),
Dmin=min { CVij, i=1,2 ..., k*, j=1,2 ..., k*} (10)
Wherein, mijFor the quantity and μ of the meeting paper of cluster i and cluster jijFor the mass center of cluster i and cluster j, xlFor in cluster i and cluster j
L meeting papers.Reflect the quality of two cluster separating degrees by the coefficient of variation between calculating cluster.
The difference T for calculating the coefficient of variation between the coefficient of variation and most tuftlet in mean cluster, according to difference judge whether to need into
The merging of row cluster,
If T < 0, i.e.,Illustrate that there are lesser two clusters of the coefficient of variation between cluster.The coefficient of variation is smaller between cluster, and two
The distribution of meeting paper in a cluster is more agglomerated, and separating degree is lower;Since the quantity of the cluster of initial setting up is greater than target cluster
Quantity, so the coefficient of variation is smaller in mean cluster and amplitude of variation is smaller, the condensation degree of each cluster is higher, so only needing
Carry out the merging of cluster.Combined strategy is to merge the smallest two clusters of separating degree, i.e., the coefficient of variation is D between clusterminTwo
Cluster.
If T >=0, i.e.,As 0≤T < ε, difference is smaller, illustrates there are the coefficient of variation between cluster lesser two
Cluster, the coefficient of variation is more close in the coefficient of variation and cluster between cluster, and the distribution of meeting paper is more agglomerated in two clusters, and separating degree is got over
It is low, and the condensation degree of each cluster is higher, then needs to carry out the merging of cluster;As ε≤T, there are certain difference, illustrate to become between cluster
Different coefficient is bigger, and coefficient of variation difference is bigger in the coefficient of variation and cluster between cluster, the meeting paper distribution in two clusters more from
It dissipating, separating degree is bigger, while the condensation degree of each cluster is higher, when the separating degree between all clusters reaches a certain level, this
When Clustering Effect it is good, the quantity of optimal meeting paper classification can be obtained.
If carrying out the merging of cluster, need to recalculate the coefficient of variation in mean clusterCoefficient of variation D between most tuftletmin,
Then judged whether to be optimal Clustering Effect according to the difference of the two, otherwise continue the merging of cluster, circulation executes this mistake
Journey, until reaching termination condition.
The foregoing is merely preferred embodiment of the present application, are not intended to limit this application, for the skill of this field
For art personnel, various changes and changes are possible in this application.Within the spirit and principles of this application, made any to repair
Change, equivalent replacement, improvement etc., should be included within the scope of protection of this application.
Claims (10)
1. automatically confirming that the clustering method of number of clusters amount based on the coefficient of variation, characterized in that include:
Step (1): calculating the density value of each data point in data set, calculates dnesity index according to density value, density is selected to refer to
The maximum data point of number is as first cluster centre;
Step (2): the shortest distance between each data point and current existing cluster centre is calculated, then according to shortest distance meter
The probability that each data point is chosen as cluster centre is calculated, finally, preselecting cluster centre according to wheel disc method;The pre-selection cluster centre
Dnesity index be greater than given threshold;
Step (3): repeating step (2), until the cluster centre of setting number is selected, then according to the initial clustering selected
Center carries out k-means cluster to generate the cluster of corresponding number;
Step (4): in calculating mean cluster then the coefficient of variation between the coefficient of variation and most tuftlet calculates the coefficient of variation in mean cluster
Difference is compared by the difference of the coefficient of variation between most tuftlet with setting value, if difference is less than setting value, by two clusters
Between the smallest two clusters of the coefficient of variation merge;It repeats step (4), until difference is more than or equal to setting value, then exports cluster
As a result.
2. the clustering method of number of clusters amount is automatically confirmed that based on the coefficient of variation as described in claim 1, characterized in that
The step of calculating the density value of each data point in data set are as follows:
Assuming that data set (S1, S2..., Sd) there is d dimension attribute, and data space S=S1×S2×…×SdIt is that d dimension data is empty
Between, x ∈ (x1,x2,…,xd) data point of the expression on d dimension data space in data set;
Firstly, the quantity k of setting initial cluster*Value, wherein k1<k*<k2, k1And k2It is the quantity greater than target cluster;
Then, the density value ρ of data point x is calculatedx, and indicated with formula (1) and (2):
Wherein, num is the number of data point, dxyFor the distance of data point y in data set to data point x, R is density range, f
It (X) is to judge whether data point y is less than or equal to the function of density range R at a distance from data point x.
3. the clustering method of number of clusters amount is automatically confirmed that based on the coefficient of variation as described in claim 1, characterized in that
Dnesity index is calculated according to density value, selects the step of maximum data point of dnesity index is as first cluster centre
Are as follows:
According to density value ρxPacking density index D I is calculated, and using the maximum data point of dnesity index as first cluster centre:
4. the clustering method of number of clusters amount is automatically confirmed that based on the coefficient of variation as described in claim 1, characterized in that
The step of calculating the shortest distance between each data point and current existing cluster centre are as follows:
The mode of initial cluster center is selected successively to count the remainder strong point in data set according in k-means++ algorithm
The data point is calculated at a distance from the initial cluster center having been selected out, compare select shortest distance as the data point with
Shortest distance D (x) between current existing cluster centre.
5. the clustering method of number of clusters amount is automatically confirmed that based on the coefficient of variation as described in claim 1, characterized in that
The step of being chosen as the probability of cluster centre according to each data point of minimum distance calculation are as follows:
Wherein, D (x) indicates the shortest distance between each data point and current existing cluster centre;P (x) indicates each data
Point is chosen as the probability of cluster centre;
The step of preselecting cluster centre according to wheel disc method are as follows:
Threshold tau is set, only when the dnesity index for preselecting cluster centre reaches τ, just can be used as formal cluster centre, otherwise
New data point is reselected as cluster centre;Wheel disc method is repeated always until selecting k*A cluster centre.
6. the clustering method of number of clusters amount is automatically confirmed that based on the coefficient of variation as described in claim 1, characterized in that
The step of calculating the coefficient of variation in mean cluster are as follows:
Firstly, calculating coefficient of variation CV in the cluster of each clusteri:
Then, the coefficient of variation in mean cluster is calculated
Wherein, μiFor the mass center of cluster i, miFor the data point number of cluster i, xjFor j-th of data point in cluster i, k*Expression is selected in advance
Cluster centre number.
7. the clustering method of number of clusters amount is automatically confirmed that based on the coefficient of variation as described in claim 1, characterized in that
Between calculating most tuftlet the step of the coefficient of variation are as follows:
Firstly, calculating coefficient of variation CV between clusterij:
Then, coefficient of variation D between calculating most tuftletmin:
Dmin=min { CVij, i=1,2 ..., k*, j=1,2 ..., k*} (8)
Wherein, mijFor the number of data points of cluster i and cluster j, μijFor the mass center of cluster i and cluster j, xlFor first of number in cluster i and cluster j
Strong point.
8. the clustering method of number of clusters amount is automatically confirmed that based on the coefficient of variation as described in claim 1, characterized in that
The difference for calculating the coefficient of variation between the coefficient of variation and most tuftlet in mean cluster, difference is compared with setting value, if
Difference is less than setting value, then merges the smallest two clusters of the coefficient of variation between two clusters;If difference is more than or equal to setting
The step of being worth, then exporting cluster result are as follows:
The difference T for calculating the coefficient of variation between the coefficient of variation and most tuftlet in mean cluster, judges whether to need to carry out cluster according to difference
Merging:
If T < 0, i.e.,The smallest two clusters of the coefficient of variation between merging cluster;
If T >=0, i.e.,As 0≤T < ε, merge the smallest two clusters of the coefficient of variation between cluster;
As ε≤T, export cluster quantity and each cluster corresponding to data point.
9. automatically confirming that the clustering system of number of clusters amount based on the coefficient of variation, comprising: memory, processor and be stored in memory
Computer instruction that is upper and running on a processor, when the computer instruction is run by processor, completes any of the above-described method
The step.
10. a kind of computer readable storage medium, is stored thereon with computer instruction, the computer instruction is run by processor
When, complete step described in any of the above-described method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810864958.3A CN109063769B (en) | 2018-08-01 | 2018-08-01 | Clustering method, system and medium for automatically determining cluster number based on coefficient of variation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810864958.3A CN109063769B (en) | 2018-08-01 | 2018-08-01 | Clustering method, system and medium for automatically determining cluster number based on coefficient of variation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109063769A true CN109063769A (en) | 2018-12-21 |
CN109063769B CN109063769B (en) | 2021-04-09 |
Family
ID=64832407
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810864958.3A Active CN109063769B (en) | 2018-08-01 | 2018-08-01 | Clustering method, system and medium for automatically determining cluster number based on coefficient of variation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109063769B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111027585A (en) * | 2019-10-25 | 2020-04-17 | 南京大学 | K-means algorithm hardware realization method and system based on k-means + + centroid initialization |
CN111368876A (en) * | 2020-02-11 | 2020-07-03 | 广东工业大学 | Double-threshold sequential clustering method |
CN111476270A (en) * | 2020-03-04 | 2020-07-31 | 中国平安人寿保险股份有限公司 | Course information determining method, device, equipment and storage medium based on K-means algorithm |
CN111507428A (en) * | 2020-05-29 | 2020-08-07 | 深圳市商汤科技有限公司 | Data processing method and device, processor, electronic equipment and storage medium |
CN111833171A (en) * | 2020-03-06 | 2020-10-27 | 北京芯盾时代科技有限公司 | Abnormal operation detection and model training method, device and readable storage medium |
CN112053063A (en) * | 2020-09-08 | 2020-12-08 | 山东大学 | Load partitioning method and system for energy system planning design |
CN112070387A (en) * | 2020-09-04 | 2020-12-11 | 北京交通大学 | Multipath component clustering performance evaluation method in complex propagation environment |
CN113301600A (en) * | 2021-07-27 | 2021-08-24 | 南京中网卫星通信股份有限公司 | Abnormal data detection method and device for performance of satellite and wireless communication converged network |
CN113378682A (en) * | 2021-06-03 | 2021-09-10 | 山东省科学院自动化研究所 | Millimeter wave radar fall detection method and system based on improved clustering algorithm |
CN116109933A (en) * | 2023-04-13 | 2023-05-12 | 山东省土地发展集团有限公司 | Dynamic identification method for ecological restoration of abandoned mine |
CN111476270B (en) * | 2020-03-04 | 2024-04-30 | 中国平安人寿保险股份有限公司 | Course information determining method, device, equipment and storage medium based on K-means algorithm |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105139282A (en) * | 2015-08-20 | 2015-12-09 | 国家电网公司 | Power grid index data processing method, device and calculation device |
CN105488589A (en) * | 2015-11-27 | 2016-04-13 | 江苏省电力公司电力科学研究院 | Genetic simulated annealing algorithm based power grid line loss management evaluation method |
US20170091282A1 (en) * | 2003-04-25 | 2017-03-30 | The Board Of Trustees Of The Leland Stanford Junior University | A method for identifying clusters of fluorescence-activated cell sorting data points |
CN106570729A (en) * | 2016-11-14 | 2017-04-19 | 南昌航空大学 | Air conditioner reliability influence factor-based regional clustering method |
CN107133652A (en) * | 2017-05-17 | 2017-09-05 | 国网山东省电力公司烟台供电公司 | Electricity customers Valuation Method and system based on K means clustering algorithms |
CN107229751A (en) * | 2017-06-28 | 2017-10-03 | 济南大学 | A kind of concurrent incremental formula association rule mining method towards stream data |
-
2018
- 2018-08-01 CN CN201810864958.3A patent/CN109063769B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170091282A1 (en) * | 2003-04-25 | 2017-03-30 | The Board Of Trustees Of The Leland Stanford Junior University | A method for identifying clusters of fluorescence-activated cell sorting data points |
CN105139282A (en) * | 2015-08-20 | 2015-12-09 | 国家电网公司 | Power grid index data processing method, device and calculation device |
CN105488589A (en) * | 2015-11-27 | 2016-04-13 | 江苏省电力公司电力科学研究院 | Genetic simulated annealing algorithm based power grid line loss management evaluation method |
CN106570729A (en) * | 2016-11-14 | 2017-04-19 | 南昌航空大学 | Air conditioner reliability influence factor-based regional clustering method |
CN107133652A (en) * | 2017-05-17 | 2017-09-05 | 国网山东省电力公司烟台供电公司 | Electricity customers Valuation Method and system based on K means clustering algorithms |
CN107229751A (en) * | 2017-06-28 | 2017-10-03 | 济南大学 | A kind of concurrent incremental formula association rule mining method towards stream data |
Non-Patent Citations (2)
Title |
---|
ONAPA LIMWATTANAPIBOOL等: "Detecting cluster numbers based on density changes using density-index enhanced Scale-invariant density-based clustering initialization algorithm", 《2017 9TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND ELECTRICAL ENGINEERING》 * |
石云平: "聚类K-means算法的应用研究", 《理论与方法》 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111027585A (en) * | 2019-10-25 | 2020-04-17 | 南京大学 | K-means algorithm hardware realization method and system based on k-means + + centroid initialization |
CN111368876A (en) * | 2020-02-11 | 2020-07-03 | 广东工业大学 | Double-threshold sequential clustering method |
CN111476270A (en) * | 2020-03-04 | 2020-07-31 | 中国平安人寿保险股份有限公司 | Course information determining method, device, equipment and storage medium based on K-means algorithm |
CN111476270B (en) * | 2020-03-04 | 2024-04-30 | 中国平安人寿保险股份有限公司 | Course information determining method, device, equipment and storage medium based on K-means algorithm |
CN111833171A (en) * | 2020-03-06 | 2020-10-27 | 北京芯盾时代科技有限公司 | Abnormal operation detection and model training method, device and readable storage medium |
CN111507428A (en) * | 2020-05-29 | 2020-08-07 | 深圳市商汤科技有限公司 | Data processing method and device, processor, electronic equipment and storage medium |
CN111507428B (en) * | 2020-05-29 | 2024-01-05 | 深圳市商汤科技有限公司 | Data processing method and device, processor, electronic equipment and storage medium |
CN112070387B (en) * | 2020-09-04 | 2023-09-26 | 北京交通大学 | Method for evaluating multipath component clustering performance of complex propagation environment |
CN112070387A (en) * | 2020-09-04 | 2020-12-11 | 北京交通大学 | Multipath component clustering performance evaluation method in complex propagation environment |
CN112053063B (en) * | 2020-09-08 | 2023-12-19 | 山东大学 | Load partitioning method and system for planning and designing energy system |
CN112053063A (en) * | 2020-09-08 | 2020-12-08 | 山东大学 | Load partitioning method and system for energy system planning design |
CN113378682A (en) * | 2021-06-03 | 2021-09-10 | 山东省科学院自动化研究所 | Millimeter wave radar fall detection method and system based on improved clustering algorithm |
WO2023004899A1 (en) * | 2021-07-27 | 2023-02-02 | 南京中网卫星通信股份有限公司 | Method and apparatus for detecting abnormal data of satellite and wireless communication convergence network performance |
CN113301600A (en) * | 2021-07-27 | 2021-08-24 | 南京中网卫星通信股份有限公司 | Abnormal data detection method and device for performance of satellite and wireless communication converged network |
CN116109933A (en) * | 2023-04-13 | 2023-05-12 | 山东省土地发展集团有限公司 | Dynamic identification method for ecological restoration of abandoned mine |
CN116109933B (en) * | 2023-04-13 | 2023-06-23 | 山东省土地发展集团有限公司 | Dynamic identification method for ecological restoration of abandoned mine |
Also Published As
Publication number | Publication date |
---|---|
CN109063769B (en) | 2021-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109063769A (en) | Clustering method, system and the medium of number of clusters amount are automatically confirmed that based on the coefficient of variation | |
He et al. | A two-stage genetic algorithm for automatic clustering | |
CN109873501B (en) | Automatic identification method for low-voltage distribution network topology | |
CN109063945A (en) | A kind of 360 degree of customer portrait construction methods of sale of electricity company based on Value accounting system | |
Chou et al. | Identifying prospective customers | |
CN107220337B (en) | Cross-media retrieval method based on hybrid migration network | |
CN101853389A (en) | Detection device and method for multi-class targets | |
CN111931505A (en) | Cross-language entity alignment method based on subgraph embedding | |
CN111401785A (en) | Power system equipment fault early warning method based on fuzzy association rule | |
CN107545360A (en) | A kind of air control intelligent rules deriving method and system based on decision tree | |
CN103324939A (en) | Deviation classification and parameter optimization method based on least square support vector machine technology | |
CN112580742A (en) | Graph neural network rapid training method based on label propagation | |
Hruschka et al. | Improving the efficiency of a clustering genetic algorithm | |
Sun et al. | Does Every Data Instance Matter? Enhancing Sequential Recommendation by Eliminating Unreliable Data. | |
CN110427365A (en) | Improve the address merging method and system for closing single accuracy | |
CN111625578B (en) | Feature extraction method suitable for time series data in cultural science and technology fusion field | |
CN109977131A (en) | A kind of house type matching system | |
CN112836750A (en) | System resource allocation method, device and equipment | |
CN102262682A (en) | Rapid attribute reduction method based on rough classification knowledge discovery | |
CN109543712B (en) | Method for identifying entities on temporal data set | |
CN107423759B (en) | Comprehensive evaluation method, device and application of low-dimensional successive projection pursuit clustering model | |
Kaewwichian | Multiclass classification with imbalanced datasets for car ownership demand model–Cost-sensitive learning | |
CN110009024A (en) | A kind of data classification method based on ID3 algorithm | |
CN109344320A (en) | A kind of book recommendation method based on Apriori | |
CN110084376B (en) | Method and device for automatically separating data into boxes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |