CN108985318A - A kind of global optimization K mean cluster method and system based on sample rate - Google Patents
A kind of global optimization K mean cluster method and system based on sample rate Download PDFInfo
- Publication number
- CN108985318A CN108985318A CN201810525709.1A CN201810525709A CN108985318A CN 108985318 A CN108985318 A CN 108985318A CN 201810525709 A CN201810525709 A CN 201810525709A CN 108985318 A CN108985318 A CN 108985318A
- Authority
- CN
- China
- Prior art keywords
- density value
- submanifold
- cluster
- sample point
- initial
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Aiming at the problem that present invention cluster result present in traditional K mean cluster method easily falls into local optimum dependent on initial cluster center, a kind of global optimization K mean cluster KMS-GOSD method and system based on sample rate are proposed.In an iterative process, KMS-GOSD method passes through Gauss model first and obtains the pre-estimation density value of all cluster centres, and actual density value is then carried out offset operation lower than the maximum cluster centre of pre-estimation density value.By optimizing cluster centre position, KMS-GOSD method can not only promote global exploring ability, and can overcome the dependence to cluster initial center point.Comparison of the present invention is carried out using the UCI data set of standard, it is found that improved method has higher accuracy rate and stability compared to traditional method.
Description
Technical field
The present invention relates to the fast density peak values in machine learning to cluster field, more particularly to a kind of based on sample rate
Global optimization K mean cluster method and system.
Background technique
Traditional K- means clustering method has many advantages, such as simple and effective, fast convergence rate, convenient for processing large data collection, mesh
It is preceding to be widely used in many fields such as scientific research and industrial applicability.But the overall situation present in complex data collection explores energy
Power is weak, the problems such as easily falling into local optimum, is still the heavy difficulties improved in the research of K mean value.
Using the core concept of density peaks clustering method, domestic and foreign scholars carry out with different angle K Mean Method
Analysis and improvement.The country such as Xing Changzheng proposes a kind of method based on averag density clustering of optimizing initial centers.Using clustering
The attribute and structure for analyzing data object before, choose suitable initial cluster center, instead of traditional random initial center of K mean value
Point, and keep the iterative process of tradition K mean value constant.Foreign scholar proposes to pass through kernel function, adaptive neural network in iterative process
The method that the methods of network method, differential evolution method auxiliary K mean value find the high sample point of density in global scope.Such as text
Offer " Approximate Normalized Cuts without Eigen-decomposition Information
Sciences " in propose by using approximation weight kernel optimization objective function.
Since traditional K mean value is sensitive to cluster centre, so that the selection of cluster centre will directly affect cluster accuracy rate
Just.A kind of document " optimization of the initial cluster center of K-means algorithm " " Distributed Cluster excavation calculation based on local density
Method " point out that cluster centre should be at the relatively high point of sample rate in cluster." a kind of improved k-means initially gathers document
Class Research on Center Selection Algorithm " a kind of " new k-means cluster centre Algorithms of Selecting " " K- of minimum variance optimization cluster centre
Means algorithm " by showing theory analysis and effect of the present invention: when cluster centre is located at sample rate higher, gather
Class accuracy rate can be obviously improved.
Summary of the invention
Part is easily fallen into most dependent on initial cluster center for cluster result present in traditional K mean cluster method
Excellent problem, to avoid the speed explored to the excessive analysis of hash object and the quickening method overall situation, the present invention proposes base
In global optimization K mean cluster method (the Global Optimized K-means Clustering of sample rate
Algorithm based on Sample Density, abbreviation KMS-GOSD).In the process of traditional K mean cluster method iteration
In, KMS-GOSD method is by the way that actual density value to be displaced in such lower than the maximum cluster centre of pre-estimation density value
Greater than the point of pre-estimation density value, realization avoids falling into local optimum, so overcome cluster result to initial cluster center according to
Lai Xing.Simultaneously before offset, being added gradually decreases pre-estimation density value with the decay factor that the number of iterations is inversely proportional, and then drops
The deflection probability of low cluster centre.It can guarantee that there is KMS-GOSD method early period stronger global exploring ability, later period in this way
Also there is stronger stability.
A kind of global optimization K mean cluster method based on sample rate comprises the steps of:
S1, raw data set X, submanifold number K and scale parameter Ra comprising N number of sample point are obtained, wherein N is greater than 1;
S2, K sample point is randomly selected as initial cluster center in the raw data set X, be denoted as wi, wherein i
=1,2,3 ..., K;
S3, calculate separately initial data concentrate initial cluster center other than all sample points apart from each initial clustering
Center wiDistance, and by the initial data concentrate initial cluster center other than all sample points be assigned to away from nearest
Initial cluster center formed K submanifold;
S4, the mass center of all submanifolds is denoted as W respectivelyi, according to formulaCalculate Wi's
Pre-estimation density value Fi,t, and calculate WiActual density value Fi,c;
Wherein m is the maximum number of iterations of the number of iterations and is preset value, and t indicates current the number of iterations, 2 φ (3
σ × Ra) value can be tabled look-up and obtain according to standard normal distribution function;
S5, the mass center W by each submanifoldiAs new cluster centre, where judging each new cluster centre respectively
It whether there is actual density value F in submanifoldi,cLess than pre-estimation density value Fi,tSample point, if it does not, jumping to S10;
If it does, jumping to S6;
S6, actual density value F is obtainedi,cWith pre-estimation density value Fi,tThe maximum sample point of absolute difference where son
Cluster;
S7, in the submanifold that S6 is obtained, obtain several sample points at random, and calculate separately several sample points
Actual density value Fi,c;
S8, judge in each and every one several sample points with the presence or absence of actual density value Fi,cGreater than pre-estimation density value Fi,t's
Sample point;If it is present jumping to S10;Otherwise S9 is jumped to;
S9, by actual density value Fi,cWith pre-estimation density value Fi,tThe maximum sample point of absolute difference is as new cluster
Then center executes step S10;
S10, judge cluster centre WiWhether no longer change, jumps to S11 if meeting;Otherwise the number of iterations t is updated to t
+ 1, using new cluster centre as new initial cluster center, jump to S3;
S11, output cluster result.
In a kind of global optimization K mean cluster method based on sample rate of the invention, the actual density value Fi,c
According to formulaIt calculates, whereindijFor WiJ-th of sample into i-th of submanifold
This nijEuclidean distance, SiIndicate the number of sample point in i-th of submanifold, j is j-th of sample point, c ∈ [1, cmax], cmax
For preset peak excursion number, r=R × Ra;R is most long distance of the cluster centre from sample point in the submanifold of place in any submanifold
From.
The global optimization K mean cluster system based on sample rate that the present invention also provides a kind of, comprising with lower module:
Initialization module, for obtaining raw data set X, submanifold number K and scale parameter Ra comprising N number of sample point,
Wherein N is greater than 1;
Initial cluster center obtains module, for randomly selecting K sample point in the raw data set X as initial
Cluster centre is denoted as wi, wherein i=1,2,3 ..., K;
Submanifold forms module, for calculating separately all sample point distances other than initial data concentration initial cluster center
Each initial cluster center wiDistance, and the initial data is concentrated into all sample points point other than initial cluster center
It is fitted on away from nearest initial cluster center wiForm K submanifold;
Pre-estimation density value and actual density value computing module, for the mass center of all submanifolds to be denoted as W respectivelyi, pass through
FormulaCalculate WiPre-estimation density value Fi,t, and calculate WiActual density value Fi,c;
Wherein m is the maximum number of iterations of the number of iterations and is preset value, and t indicates current the number of iterations, 2 φ (3
σ × Ra) value can be tabled look-up and obtain according to standard normal distribution function;
First judgment module, for by the mass center W of each submanifoldiAs new cluster centre, judge respectively each new
It whether there is actual density value F in submanifold where cluster centrei,cLess than pre-estimation density value Fi,tSample point, if do not deposited
Jumping to third judgment module;If it does, jumping to submanifold obtains module;
Submanifold obtains module, for obtaining actual density value Fi,cWith pre-estimation density value Fi,tAbsolute difference it is maximum
Submanifold where sample point;
Actual density value computing module, for obtaining several samples at random in the submanifold that submanifold obtains that module obtains
Point, and calculate separately the actual density value F of several sample pointsi,c;
Second judgment module, for judging in each and every one several sample points with the presence or absence of actual density value Fi,cGreater than pre-
Estimate density value Fi,tSample point;If it is present jumping to third judgment module;Otherwise new cluster centre is jumped to obtain
Modulus block;
New cluster centre obtains module, is used for actual density value Fi,cWith pre-estimation density value Fi,tAbsolute difference is most
Then big sample point executes third judgment module as new cluster centre;
Third judgment module, for judging cluster centre WiWhether no longer change, output result mould is jumped to if meeting
Block;Otherwise the number of iterations t is updated to t+1, using new cluster centre as new initial cluster center, jumps to submanifold and forms mould
Block;
Object module is exported, for exporting cluster result.
In a kind of global optimization K mean cluster system based on sample rate of the invention, the pre-estimation density value
With the actual density value F in actual density value computing modulei,cAccording to formulaIt calculates, whereindijFor WiJ-th of sample point n into i-th of submanifoldijEuclidean distance, SiIt indicates in i-th of submanifold
The number of sample point, j are j-th of sample point, c ∈ [1, cmax], cmaxFor preset peak excursion number, r=R × Ra;R is to appoint
Maximum distance of the cluster centre from sample point in the submanifold of place in one submanifold.
The method of the present invention compares traditional K mean cluster method, and in an iterative process, KMS-GOSD method passes through Gauss first
Model obtains the pre-estimation density value of all cluster centres, and actual density value is then lower than the maximum cluster of pre-estimation density value
Center carries out offset operation.By optimizing cluster centre position, KMS-GOSD method can not only promote global exploring ability, and
The dependence to cluster initial center point can be overcome.Comparison of the present invention is carried out using the UCI data set of standard, after discovery improves
Method have higher accuracy rate and stability compared to traditional method.
Detailed description of the invention
Present invention will be further explained below with reference to the attached drawings and examples, in attached drawing:
Fig. 1 is flow chart of the embodiment of the present invention;
Fig. 2 is Gaussian distribution model distribution map corresponding with subclass sample dot density.
Specific embodiment
In order to which the technical features, objects and effects of invention are more clearly understood, now compare attached drawing and this is described in detail
The specific embodiment of invention.
Since traditional K mean value is sensitive to cluster centre, so that the selection of cluster centre will directly affect cluster accuracy rate
Just.A kind of document " optimization of the initial cluster center of K-means algorithm " " Distributed Cluster excavation calculation based on local density
Method " point out that cluster centre should be at the relatively high point of sample rate in cluster." a kind of improved k-means initially gathers document
Class Research on Center Selection Algorithm " a kind of " new k-means cluster centre Algorithms of Selecting " " K- of minimum variance optimization cluster centre
Means algorithm " by showing theory analysis and effect of the present invention: when cluster centre is located at sample rate higher, gather
Class accuracy rate can be obviously improved.The core concept of global optimization K mean cluster algorithm proposed by the present invention is by iteration
Dynamic finds the higher cluster centre replacement of density and chooses the higher initial cluster center of density directly before iteration in the process.For
Judge the height of cluster centre density, the present invention passes through Gauss model first and obtains the initial pre-estimation density value of iteration, then sharp
It is calculated with Euclidean distance by the actual density value of cluster centre, will be finally gradually reduced in actual density value and iterative process pre-
Estimation is compared, and obtains the higher cluster centre of density.
What is proposed in document " a kind of Distributed Cluster mining algorithm based on local density " mixes ideal cluster
Made of sample set, monotone variation trend thought is presented from intra-cluster to edge in the local density of sample point in each ideal cluster
On the basis of, the Density Distribution of subclass sample point is estimated using Gaussian distribution model.Assuming that in the data set X for having N number of sample point
In, subclass number is K, initial cluster center wi(i=1,2,3 ..., K), cluster centre is W in iterative processi, submanifold radius
R is WiThe distance of farthest sample point in the cluster.For convenience of the estimation density of statistics submanifold, one is chosen with wiFor the center of circle, half
The circle of diameter r=R × Ra, the sample areas as statistics.
As shown in Fig. 2, under decay factor Ra effect, x1=3 σ × Ra.Pass through calculating [- x1,x1] distribution letter in range
Number estimation WiDensity in r circle.Calculate pre-estimation density value:
Fi- 1 (1)=2 φ (3 σ × Ra)
The actual density value of cluster centre:
Wherein dijFor WiTo sample point nijEuclidean distance, SiIndicate the number of sample point in i-th of submanifold, c ∈ [1,
cmax], cmaxFor peak excursion number,
Traditional K mean value, can be again using the mass center of new subclass as the cluster centre W of new class in iterative process each timei。
Local distance and the smallest point are easily trapped into when mass center is as cluster centre, to easily fall into local optimum, and mass center is simultaneously
Non- globe optimum.For the appearance for avoiding such case, offset operation is added in KMS-GOSD algorithm in an iterative process, by density
Wherein m indicates maximum the number of iterations, and t indicates current the number of iterations.It can gradually be dropped by the way that decay factor is added
Low pre-estimation density value, accelerates the convergence of migration process.For the complexity for reducing KMS-GOSD algorithm, migration process is only occurred in
WiOn that minimum cluster centre of middle density, other cluster centres are remained unchanged.Then KMS-GOSD algorithm early period can
There is biggish exploring ability, the later period is gradually consistent with traditional K mean value and has stronger development ability and constringency performance.
A kind of global optimization K mean cluster method based on sample rate, detailed process are shown in Fig. 1, include following step
It is rapid:
S1, raw data set X, submanifold number K and scale parameter Ra comprising N number of sample point are obtained, wherein N is greater than 1;
S2, K sample point is randomly selected as initial cluster center in the raw data set X, be denoted as wi, wherein i
=1,2,3 ..., K;
S3, calculate separately initial data concentrate initial cluster center other than all sample points apart from each initial clustering
Center wiDistance, and by the initial data concentrate initial cluster center other than all sample points be assigned to away from nearest
Initial cluster center formed K submanifold;
S4, the mass center of all submanifolds is denoted as W respectivelyi, according to formulaCalculate Wi's
Pre-estimation density value Fi,t, and calculate WiActual density value Fi,c;
Wherein m is the maximum number of iterations of the number of iterations and is preset value, and t indicates current the number of iterations, 2 φ (3
σ × Ra) value can be tabled look-up and obtain according to standard normal distribution function;
S5, the mass center W by each submanifoldiAs new cluster centre, where judging each new cluster centre respectively
It whether there is actual density value F in submanifoldi,cLess than pre-estimation density value Fi,tSample point, if it does not, jumping to S10;
If it does, jumping to S6;
S6, actual density value F is obtainedi,cWith pre-estimation density value Fi,tThe maximum sample point of absolute difference where son
Cluster;
S7, in the submanifold that S6 is obtained, obtain several sample points at random, and calculate separately several sample points
Actual density value Fi,c;
S8, judge in each and every one several sample points with the presence or absence of actual density value Fi,cGreater than pre-estimation density value Fi,t's
Sample point;If it is present jumping to S10;Otherwise S9 is jumped to;
S9, by actual density value Fi,cWith pre-estimation density value Fi,tThe maximum sample point of absolute difference is as new cluster
Then center executes step S10;
S10, judge cluster centre WiWhether no longer change, jumps to S11 if meeting;Otherwise the number of iterations t is updated to t
+ 1, using new cluster centre as new initial cluster center, jump to S3;
S11, output cluster result.
Wherein the loop termination Rule of judgment in S10 is cluster centre WiWhether no longer change, specifically, in initial clustering
The heart obtains second cluster centre after executing a complete process, by second cluster centre compared with initial cluster center
Compared with if changed, second of circulation of execution obtains third cluster centre.If second cluster centre and initial clustering
Center is identical, then jumps out circulation output category result.Third cluster centre is compared with second cluster centre, if changed
Become, then executes third time and recycle, obtain the 4th cluster centre.If second cluster centre and second cluster centre phase
Together, then circulation output category result is jumped out, and so on, circulation is executed until obtaining final cluster centre.
In a kind of global optimization K mean cluster method based on sample rate of the invention, the actual density value Fi,c
According to formulaIt calculates, whereindijFor WiJ-th of sample into i-th of submanifold
This nijEuclidean distance, SiIndicate the number of sample point in i-th of submanifold, j is j-th of sample point, c ∈ [1, cmax], cmax
For preset peak excursion number, r=R × Ra;R is most long distance of the cluster centre from sample point in the submanifold of place in any submanifold
From.
In order to verify the validity of KMS-GOSD method, Balance, Wine in selection standard UCI data set of the present invention,
For this five groups of data of Zoo, Iris and Diabetes as test data, essential information is as shown in table 1.Every one kind data are used respectively
Traditional K mean value and improved K mean value are tested 50 times.The present invention uses Windows 7, Matlab 2013a programmed environment, host
It is configured to 2 double-core P8600@2.4GHz processor of Intel Duo, running memory 4GB.
Table 1 selects data set explanation
In guaranteeing the present invention every time in the initial cluster center situation identical with tradition K mean value of KMS-GOSD method,
Result of the present invention such as the following table 2, table 3, table 4.
Test result of 2 two methods of table to Balance data set
It can be obtained by table 2, for Balance data set, KMS-GOSD method highest accuracy rate reaches 73.15%, minimum
Also average up to 70.49% up to 68.93%, it is little compared with the fluctuation range of high-accuracy, maintain essentially in 71% or so.
Test result of 3 two methods of table to Wine data set
As shown in Table 3, for Wine data set, the accuracy rate of traditional K mean value is generally relatively low.But KMS-GOSD method highest
Reach 78.14%, minimum 69.60%, average up to 74.76%, more traditional K mean value is improved nearly 10 percentage point.
It is obtained by table 4, for Zoo data set, KMS-GOSD method highest accuracy rate reaches 82.20%, minimum
77.92%, it is average up to 80.54%, it is all higher in the case of remaining wherein only accuracy rate once is identical with tradition K mean value.
Test result of 4 two methods of table to Zoo data set
Test result of 5 two methods of table to Iris data set
As shown in Table 5, for Iris data set, KMS-GOSD method highest accuracy rate reaches 88.72%, minimum to be also
87.32%, average up to 88.23%, accuracy rate substantially remains in average value or so, has stability preferable.
Test result of 6 two methods of table to Diabetes number data set
As shown in Table 6, for Diabetes data set, KMS-GOSD method highest accuracy rate has 68.10%, minimum to be
65.17%, average 66.83%, more traditional K mean value can stablize raising 3% or so.
Using identical random initial center point, KMS- can be seen that from above result of the present invention
GOSD method can press down to a certain extent compared to traditional K mean cluster method accuracy rate with higher and stability
Dependence of the cluster result processed to initial cluster center.
It can be seen that by table 2- table 6, traditional K mean value has the shortcomings that initially to the dependence of cluster centre, causing to be easy
Fall into local optimum.Text propose KMS-GOSD method pass through in an iterative process Gauss model obtain cluster centre pre-estimation it is close
Angle value, and actual density value is displaced to the higher point of density lower than the maximum cluster centre of pre-estimation density value.By above
Operation not only reduces the calculation amount in energy data analysis, but also cluster result can be overcome to the dependence of initial cluster center
And cluster centre is enhanced to global exploring ability.The present invention the result shows that, the KMS-GOSD in UCI typical case's test data set
Method can promote 20.68% for different data set accuracy rate highests.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific
Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art
Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much
Form, all of these belong to the protection of the present invention.
Claims (4)
1. a kind of global optimization K mean cluster method based on sample rate, which is characterized in that comprise the steps of:
S1, raw data set X, submanifold number K and scale parameter Ra comprising N number of sample point are obtained, wherein N is greater than 1;
S2, K sample point is randomly selected as initial cluster center in the raw data set X, be denoted as wi, wherein i=1,2,
3,…,K;
S3, calculate separately initial data concentrate initial cluster center other than all sample points apart from each initial cluster center
wiDistance, and by the initial data concentrate all sample points other than initial cluster center be assigned to away from it is nearest just
Beginning cluster centre forms K submanifold;
S4, the mass center of all submanifolds is denoted as W respectivelyi, according to formulaCalculate WiEstimate
Count density value Fi,t, and calculate WiActual density value Fi,c;
Wherein m is the maximum number of iterations of the number of iterations and is preset value, and t indicates current the number of iterations, 2 (3 φ σ ×
Ra value) is tabled look-up according to standard normal distribution function to be obtained;
S5, the mass center W by each submanifoldiAs new cluster centre, judged in the submanifold where each new cluster centre respectively
With the presence or absence of actual density value Fi,cLess than pre-estimation density value Fi,tSample point, if it does not, jumping to S10;If deposited
Jumping to S6;
S6, actual density value F is obtainedi,cWith pre-estimation density value Fi,tThe maximum sample point of absolute difference where submanifold;
S7, in the submanifold that S6 is obtained, obtain several sample points at random, and calculate separately the reality of several sample points
Density value Fi,c;
S8, judge in each and every one several sample points with the presence or absence of actual density value Fi,cGreater than pre-estimation density value Fi,tSample
Point;If it is present jumping to S10;Otherwise S9 is jumped to;
S9, by actual density value Fi,cWith pre-estimation density value Fi,tThe maximum sample point of absolute difference is as in new cluster
Then the heart executes step S10;
S10, judge cluster centre WiWhether no longer change, jumps to S11 if meeting;Otherwise the number of iterations t is updated to t+1, will
New cluster centre jumps to S3 as new initial cluster center;
S11, output cluster result.
2. a kind of global optimization K mean cluster method based on sample rate according to claim 1, which is characterized in that institute
State actual density value Fi,cAccording to formulaIt calculates, whereindijFor WiTo i-th
J-th of sample point n in a submanifoldijEuclidean distance, SiIndicating the number of sample point in i-th of submanifold, j is j-th of sample point,
c∈[1,cmax], cmaxFor preset peak excursion number, r=R × Ra;R is cluster centre in any submanifold from the submanifold of place
The maximum distance of sample point.
3. a kind of global optimization K mean cluster system based on sample rate, which is characterized in that comprising with lower module:
Initialization module, for obtaining raw data set X, submanifold number K and scale parameter Ra comprising N number of sample point, wherein N
Greater than 1;
Initial cluster center obtains module, for randomly selecting K sample point as initial clustering in the raw data set X
Center is denoted as wi, wherein i=1,2,3 ..., K;
Submanifold forms module, concentrates all sample points other than initial cluster center apart from each for calculating separately initial data
A initial cluster center wiDistance, and by the initial data concentrate initial cluster center other than all sample points be assigned to
Away from nearest initial cluster center wiForm K submanifold;
Pre-estimation density value and actual density value computing module, for the mass center of all submanifolds to be denoted as W respectivelyi, pass through formulaCalculate WiPre-estimation density value Fi,t, and calculate WiActual density value Fi,c;
Wherein m is the maximum number of iterations of the number of iterations and is preset value, and t indicates current the number of iterations, 2 φ (3 σ ×
Ra value) can table look-up according to standard normal distribution function and obtain;
First judgment module, for by the mass center W of each submanifoldiAs new cluster centre, judged in each new cluster respectively
It whether there is actual density value F in submanifold where the hearti,cLess than pre-estimation density value Fi,tSample point, if it does not, jump
Go to third judgment module;If it does, jumping to submanifold obtains module;
Submanifold obtains module, for obtaining actual density value Fi,cWith pre-estimation density value Fi,tThe maximum sample of absolute difference
Submanifold where point;
Actual density value computing module, for obtaining several sample points at random in the submanifold that submanifold obtains that module obtains, and
Calculate separately the actual density value F of several sample pointsi,c;
Second judgment module, for judging in each and every one several sample points with the presence or absence of actual density value Fi,cGreater than pre-estimation
Density value Fi,tSample point;If it is present jumping to third judgment module;Otherwise it jumps to new cluster centre and obtains mould
Block;
New cluster centre obtains module, is used for actual density value Fi,cWith pre-estimation density value Fi,tAbsolute difference is maximum
Then sample point executes third judgment module as new cluster centre;
Third judgment module, for judging cluster centre WiWhether no longer change, jumps to output object module if meeting;It is no
Then the number of iterations t is updated to t+1, using new cluster centre as new initial cluster center, jumps to submanifold and forms module;
Object module is exported, for exporting cluster result.
4. a kind of global optimization K mean cluster system based on sample rate according to claim 1, which is characterized in that institute
State the actual density value F in pre-estimation density value and actual density value computing modulei,cAccording to formulaCome
It calculates, whereindijFor WiJ-th of sample point n into i-th of submanifoldijEuclidean distance, SiIndicate the
The number of sample point in i submanifold, j are j-th of sample point, c ∈ [1, cmax], cmaxFor preset peak excursion number, r=R
×Ra;R is maximum distance of the cluster centre from sample point in the submanifold of place in any submanifold.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810525709.1A CN108985318A (en) | 2018-05-28 | 2018-05-28 | A kind of global optimization K mean cluster method and system based on sample rate |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810525709.1A CN108985318A (en) | 2018-05-28 | 2018-05-28 | A kind of global optimization K mean cluster method and system based on sample rate |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108985318A true CN108985318A (en) | 2018-12-11 |
Family
ID=64542224
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810525709.1A Pending CN108985318A (en) | 2018-05-28 | 2018-05-28 | A kind of global optimization K mean cluster method and system based on sample rate |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108985318A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110046801A (en) * | 2019-03-25 | 2019-07-23 | 国网江苏省电力有限公司经济技术研究院 | A kind of typical scene generation method of power distribution network electric system |
CN112101210A (en) * | 2020-09-15 | 2020-12-18 | 贵州电网有限责任公司 | Low-voltage distribution network fault diagnosis method based on multi-source information fusion |
WO2021044251A1 (en) * | 2019-09-06 | 2021-03-11 | International Business Machines Corporation | Elastic-centroid based clustering |
CN113378954A (en) * | 2021-06-23 | 2021-09-10 | 云南电网有限责任公司电力科学研究院 | Load curve clustering method and system based on particle swarm improved K-means algorithm |
CN113850281A (en) * | 2021-02-05 | 2021-12-28 | 天翼智慧家庭科技有限公司 | Data processing method and device based on MEANSHIFT optimization |
-
2018
- 2018-05-28 CN CN201810525709.1A patent/CN108985318A/en active Pending
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110046801A (en) * | 2019-03-25 | 2019-07-23 | 国网江苏省电力有限公司经济技术研究院 | A kind of typical scene generation method of power distribution network electric system |
CN110046801B (en) * | 2019-03-25 | 2023-07-21 | 国网江苏省电力有限公司经济技术研究院 | Typical scene generation method of power distribution network power system |
WO2021044251A1 (en) * | 2019-09-06 | 2021-03-11 | International Business Machines Corporation | Elastic-centroid based clustering |
US11727250B2 (en) | 2019-09-06 | 2023-08-15 | International Business Machines Corporation | Elastic-centroid based clustering |
CN112101210A (en) * | 2020-09-15 | 2020-12-18 | 贵州电网有限责任公司 | Low-voltage distribution network fault diagnosis method based on multi-source information fusion |
CN113850281A (en) * | 2021-02-05 | 2021-12-28 | 天翼智慧家庭科技有限公司 | Data processing method and device based on MEANSHIFT optimization |
WO2022166380A1 (en) * | 2021-02-05 | 2022-08-11 | 天翼数字生活科技有限公司 | Data processing method and apparatus based on meanshift optimization |
CN113850281B (en) * | 2021-02-05 | 2024-03-12 | 天翼数字生活科技有限公司 | MEANSHIFT optimization-based data processing method and device |
CN113378954A (en) * | 2021-06-23 | 2021-09-10 | 云南电网有限责任公司电力科学研究院 | Load curve clustering method and system based on particle swarm improved K-means algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108985318A (en) | A kind of global optimization K mean cluster method and system based on sample rate | |
CN106682682A (en) | Method for optimizing support vector machine based on Particle Swarm Optimization | |
CN105868775A (en) | Imbalance sample classification method based on PSO (Particle Swarm Optimization) algorithm | |
CN111986811A (en) | Disease prediction system based on big data | |
CN110083665A (en) | Data classification method based on the detection of improved local outlier factor | |
CN110059852A (en) | A kind of stock yield prediction technique based on improvement random forests algorithm | |
CN109086412A (en) | A kind of unbalanced data classification method based on adaptive weighted Bagging-GBDT | |
CN104731916A (en) | Optimizing initial center K-means clustering method based on density in data mining | |
De Amorim | Constrained clustering with minkowski weighted k-means | |
CN109145965A (en) | Cell recognition method and device based on random forest disaggregated model | |
CN104573708A (en) | Ensemble-of-under-sampled extreme learning machine | |
CN111062425B (en) | Unbalanced data set processing method based on C-K-SMOTE algorithm | |
CN108564592A (en) | Based on a variety of image partition methods for being clustered to differential evolution algorithm of dynamic | |
CN109444840B (en) | Radar clutter suppression method based on machine learning | |
CN109271427A (en) | A kind of clustering method based on neighbour's density and manifold distance | |
CN107045717A (en) | The detection method of leucocyte based on artificial bee colony algorithm | |
CN113435108B (en) | Battlefield target grouping method based on improved whale optimization algorithm | |
CN109150830A (en) | A kind of multilevel intrusion detection method based on support vector machines and probabilistic neural network | |
CN115310554A (en) | Item allocation strategy, system, storage medium and device based on deep clustering | |
CN116821715A (en) | Artificial bee colony optimization clustering method based on semi-supervision constraint | |
CN114841241A (en) | Unbalanced data classification method based on clustering and distance weighting | |
CN110032973A (en) | A kind of unsupervised helminth classification method and system based on artificial intelligence | |
CN105913085A (en) | Tensor model-based multi-source data classification optimizing method and system | |
CN109934344B (en) | Improved multi-target distribution estimation method based on rule model | |
CN117035983A (en) | Method and device for determining credit risk level, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181211 |
|
RJ01 | Rejection of invention patent application after publication |