CN108549913A - Improvement K-means clustering algorithms based on density radius - Google Patents

Improvement K-means clustering algorithms based on density radius Download PDF

Info

Publication number
CN108549913A
CN108549913A CN201810354305.0A CN201810354305A CN108549913A CN 108549913 A CN108549913 A CN 108549913A CN 201810354305 A CN201810354305 A CN 201810354305A CN 108549913 A CN108549913 A CN 108549913A
Authority
CN
China
Prior art keywords
barycenter
sample
data set
classification
bic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810354305.0A
Other languages
Chinese (zh)
Inventor
万思思
刘丹
王永松
伍功宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Kang Qiao Electronic LLC
University of Electronic Science and Technology of China
Original Assignee
Chengdu Kang Qiao Electronic LLC
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Kang Qiao Electronic LLC, University of Electronic Science and Technology of China filed Critical Chengdu Kang Qiao Electronic LLC
Priority to CN201810354305.0A priority Critical patent/CN108549913A/en
Publication of CN108549913A publication Critical patent/CN108549913A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to clustering algorithm fields, disclose a kind of improvement K means clustering algorithms based on density radius, and it is inaccurate to solve the problems, such as that locally optimal solution existing for existing K means clustering algorithms, k value more sensitive to noise and outlier are chosen.The present invention is ranked up all sample points first, in accordance with density radius, chooses the maximum sample point of density radius as initial value, repetition is aforementioned to state step, selects all initial points and categorical measure k, and start cluster operation;Two nearest barycenter of distance are selected from the class centroid after cluster, classification where the two barycenter is individually taken out and regards one two classification as, and calculate Bayes's score secondly classification, then it is a classification by this two categories combinations, and calculate Bayes's score after merging, judge whether to need to merge the two classifications further according to score, repeats abovementioned steps until not having to merging.The present invention is suitable for big data clustering processing.

Description

Improvement K-means clustering algorithms based on density radius
Technical field
The present invention relates to clustering algorithm fields, more particularly to the improvement K-means clustering algorithms based on density radius.
Background technology
Cluster is that some physics or abstract object are divided into several according to the similarity degree between object Cluster class makes the data in the same cluster class have higher similitude, the data similarity in different cluster classes low.Cluster is a kind of Unsupervised learning method will not have markd data to classify under the premise of no prior information.K-means algorithms It is most common a kind of typical partitioning algorithm in clustering, which carried out according to certain method for measuring similarity of data It divides, keeps each data small as far as possible to the distance of the cluster class barycenter belonging to it, the algorithm is simple, efficient etc. excellent due to its Point is widely used.But at the same time it there is also some defects, such as need to preset the number (k values) of cluster, the choosing of k values Inaccuracy is taken to may result in classification inaccurate;The value of initial value has randomness, the case where being easy to cause locally optimal solution; It is more sensitive to noise and outlier, also influences whether the result finally clustered.
Invention content
The technical problem to be solved by the present invention is to:A kind of improvement K-means clustering algorithms based on density radius are provided, are solved Locally optimal solution existing for certainly existing K-means clustering algorithms, k value more sensitive to noise and outlier are chosen inaccuracy and are asked Topic.
To solve the above problems, the technical solution adopted by the present invention is:Improvement K-means clusters based on density radius are calculated Method includes the following steps:
A. the distance of all sample points between any two in sample data set T is calculated;
B. a density radius d is specified, each sample is found out according to the distance of density radius d and sample point between any two All sample points of the point in density radius d;
C. it is sorted to the sample point in sample data set according to the number of each sample point sample point in density radius d, from And the data set T after sorting is obtained,;
D. an empty set S is defined, first sample point in data set T ' is put into set S, and by data set T ' In all sample points in first sample point and one sample point density radius d deleted from data set T ',;
E. step D is repeated, until setWork as setWhen, the number of the sample in set S is The possible k values of K-means clustering algorithms, the value in set S is the possible initial value of K-means clustering algorithms;
F. regard set S as barycenter set, each initial value is different class centroid in barycenter set, calculates sample In data set T all sample points in barycenter set at a distance from barycenter of all categories, and each sample point in marker samples data set T Classification be the barycenter minimum with sample point distance classification;
G. new class centroid of all categories is recalculated using interior all sample points of all categories, to update barycenter collection It closes;
H. judge that the error sum of squares between the sample point in the barycenter and sample data set T after updating in barycenter set is accurate Then whether function restrains, and step I is directly entered if not changing before barycenter set is relative to update if convergence and after updating; Otherwise, step F and G are repeated, it is unchanged again after the convergence of error sum of squares criterion function and the update of barycenter set, it enters step I;
I. the distance of all barycenter between any two in barycenter set, and two barycenter of chosen distance minimum are found out, it will be away from It is individually taken out from two classifications where two minimum barycenter;
Whether two classifications that J. judgment step I individually takes out need to merge, if need not merge, algorithm terminates; If desired merge, then two categories combinations individually taken out step I, and calculate the barycenter of classification after merging, while will step Two barycenter of rapid I selections are deleted from barycenter set, and the barycenter of classification after merging is put into barycenter set, while redirecting step Rapid I.
Further, step J judges need whether the method that should merge includes when two classifications:
It regards two classifications to be judged as in two points of clusters two classes, calculates the BIC values of two points of clusters, be denoted as BIC score2;
It regards two classifications to be judged as a whole classification, calculates the BIC values of the entirety classification, be denoted as BIC score1;
If | BIC score1 | >=| BIC score2 |, to be judged two classifications need to merge;If | BIC score1 | < | BIC score2 |, then two classifications to be judged need not merge.
Further, the calculation formula of BIC values is:
BIC=-2 × ln (L)+ln (s) × t
Wherein, s indicates the number of sample point in data set;L indicates likelihood function;T indicates number of features, described in calculating T=2 when the BIC values of two points of clusters, the t=1 when calculating the BIC values of the whole classification.
Further, further include before step A:Noise and outlier processing are removed to sample data set;Step K it After further include:The distance between outlier and class centroid are calculated, by outlier labeled as the class apart from nearest barycenter therewith Not.
Further, noise is removed to sample data set using lof methods and outlier is handled.
Further, it after being removed noise and outlier processing to sample data set, and before step A, also wraps It includes and sample data set is normalized, the sample coordinate after sample data set normalization is xi.j,
Wherein, m indicates that the number of sample point, v indicate dimension.
Further, step A, F and I calculate apart from when calculated using Euclidean distance formula, formula is as follows:
Further, the formula of step G calculating center-of-mass coordinate is:
Wherein, ZiIndicate the coordinate of i-th of barycenter.
Further, in step H, the error between sample point in the barycenter in barycenter set and sample data set T is flat Side and the formula of criterion function are:
Wherein, the barycenter number in k barycenter set, ZjIndicate j-th of barycenter.
The beneficial effects of the invention are as follows:The present invention mainly chooses the initial value of cluster by density radius, and density is bigger Illustrate the sample point it is easier be class centroid, therefore all sample points are ranked up first, in accordance with density radius, are chosen close The maximum sample point of radius is spent as initial value, and deletes all sample points in the radius, then repeats above-mentioned steps, All initial points and categorical measure k are selected, and carries out cluster operation.Institute can be covered since the present invention finds out the initial value come There is the higher point of density, therefore find out the k values come and be greater than equal to true k values, it is therefore desirable to further determine that k values.From poly- Two nearest barycenter of distance are selected in class centroid after class, and the classification where the two barycenter is individually taken out and regards one as A two classification, and Bayes's score secondly classification is calculated, it is then a classification by this two categories combinations, and calculate conjunction Bayes's score after and judges whether to need to merge the two classifications further according to score, steps be repeated alternatively until without closing Until and.Since this method most first meeting covers all higher points of density, the existing clustering algorithm solved well is deposited Locally optimal solution the case where.
Description of the drawings
Fig. 1 is that the present invention carries out process chart to initial data set;
Fig. 2 is that the present invention primarily determines k values and initial value flow chart according to density radius;
Fig. 3 is preliminary clusters flow chart of the present invention;
Fig. 4 is classification of the present invention (k values) optimized flow chart;
Fig. 5 is overview flow chart of the present invention.
Specific implementation mode
The present invention is primarily to overcome insufficient existing for some existing clustering algorithms, it is proposed that a kind of new cluster calculation Method, it can solve locally optimal solution existing for existing clustering algorithm, k value more sensitive to noise and outlier is chosen and be not allowed The problems such as true.The invention mainly comprises:Initial data set is handled, primarily determines classification number k values according to density radius And initial value, cluster and the several steps of determining barycenter number.It is carried out below in conjunction with Fig. 1-4 pairs of the technical solution adopted by the present invention detailed It describes in detail bright.
One, initial data set is handled.
There may be some noises and outliers for the sample point concentrated due to primary data, and these points are to initial value and k The selection of value has prodigious influence.Therefore it should first remove these outliers before cluster, remaining sample point is carried out Finally these outliers are added in corresponding classification further according to the case where cluster for cluster.The operation of removal outlier will use Lof algorithms carry out, i.e., judge whether the point is outlier by comparing the density of each point and its field point, if the point The density the low more is possible to be identified as outlier.And density, then it is to be calculated apart from field by the kth of point, kth distance Field is the kth distance of point p and the number of all the points within kth distance.Therefore density is higher, and distance is closer, and density is got over Low, distance is remoter.Low density outlier is therefrom chosen again, and removes these outliers.
Then in order to limit all data in a certain range, facilitate subsequent calculations, remaining sample is clicked through Row normalized makes the coordinate in each dimension of all sample points between [0,1].
The distance between sample point will be calculated after the completion of normalization, and be calculated and be less than centainly apart from each sample point distance The quantity of point in range, the distance are density radius.Then further according to this quantity sequence from big to small again to sample Point is ranked up.To sum up, it is as shown in Figure 1 to carry out process flow to initial data set by the present invention.
Two, classification number k values and initial value are primarily determined according to density radius.
Present invention determine that the basic skills of initial point is chosen according to density, i.e., each dot density of distance in previous step The number of point in radius.Easier density bigger explanation point is class centroid.According to above-mentioned thought, by sample point according to The case where front is sorted, this sample point is set to one of them by the maximum point set center of sample point i.e. density to make number one Initial value, and by all point deletions in the density radius.Aforesaid operations are repeated after deletion, i.e., are selected from remaining sample point The maximum sample point of density is taken, and deletes all the points in the density radius, until data set is sky.The point chosen at this time Tentatively it is set to initial point, number is set to classification number k values.To sum up, the present invention primarily determines k values and initial value flow such as Fig. 2 It is shown.
Three, it clusters and determines barycenter number
The k value points calculated according to above-mentioned steps can cover the higher point of all density, i.e., should belong to same class Sample set have been partitioned into multiple classes.Therefore the k values come are found out to be greater than equal to true k values (categorical measure), therefore are needed K value ranges are reduced.
The k values and initial value obtained first, in accordance with previous step carries out preliminary clusters, the flow of preliminary clusters to sample point As shown in figure 3, to find out new class centroid after preliminary clusters.The flow of preliminary clusters is as shown in figure 3, include:1. calculating In sample data set T all sample points in barycenter set at a distance from barycenter of all categories;2. each sample in marker samples data set T The classification of this point is the classification of the barycenter minimum with sample point distance;3. being recalculated using interior all sample points of all categories Go out new class centroid of all categories, to update barycenter set;4. judging barycenter and sample data in barycenter set after update Whether the error sum of squares criterion function between sample point in collection T restrains, if barycenter set is relative to more after convergence and update It does not change before new, then terminates preliminary clusters;2. and 3. otherwise step is repeated, until error sum of squares criterion function is restrained And it is unchanged again after the update of barycenter set, terminate preliminary clusters.
If be divided for multiple classes due to belonging to of a sort sample set, their centroid distance can also compare Closely, it is therefore desirable to classification optimization is carried out, as shown in figure 4, the present invention first calculates the distance between all barycenter, then apart from most Sample set where two close barycenter is individually taken out, and is regarded as one two classification, and calculates the BIC secondly classification score.Then by the two categories combinations it is again a classification, and calculates its BIC score.According to the BIC calculated twice Score, judges whether the two classifications should merge.Execution above-mentioned steps are recycled if merging, otherwise category division at this time Even if optimal division.Since initial initial value point can cover the higher point of all density, then gradually optimize again, because The case where this can solve locally optimal solution existing for some existing clustering methods by the method.
Finally again the outlier got rid of before is added, by calculating the distance between outlier and class centroid, By outlier labeled as the classification apart from nearest barycenter therewith.
In conjunction with described above, overall flow figure as shown in Figure 5 is finally obtained.
Embodiment
Embodiment provides a kind of improvement K-means clustering algorithms based on density radius, includes the following steps:
1, data set prepares, if data are concentrated with m sample point, each sample point is v dimensions, wherein v ∈ Z*.Data Collection is denoted as T={ n1,n2,…,nm, wherein niIndicate that sample point, m indicate the number of sample point, sample point niCoordinate be denoted as (xi,1,xi,2,…,xi,v), v indicates dimension;
2, data prediction:Noise and outlier are removed with lof methods;
3, data are normalized:By each latitude coordinates of sample point, all divided by respective dimensions sample point is sat Target maximum value, calculation formula such as shown in (1), make the sample coordinate x after normalizationi.j∈ [0,1],
4, after normalized, the Euclidean distance of all sample points between any two is calculated, wherein i-th of sample point niWith j-th of sample point njThe distance between d (ni,nj) calculation formula is such as shown in (2),
5, a density radius d is specified,According to density radius d and sample point between any two away from From all sample points of each sample point in density radius d are found out, include the value and number of sample point, by sample point niWeek The all the points number enclosed in density radius is denoted as
6, according to the number of each sample point sample point in density radius d to the sample point in sample data set from high to low It is ranked up, to obtain the data set T ' after sorting, is denoted as T '={ n '1,n’2,…,n’m, wherein T '=T;
7, an empty set S is defined, for storing possible initial value, by first element n ' in data set T 'i(the As n ' when primary execution1) be put into set S, and by first element n 'iAnd all sample points in its density radius d from It is deleted in data set T ', wherein element n 'iDensity radius d in all sample points, that is, all and n 'iDistance is less thanSample This n 'j,j∈[2,m];
8, first new element n ' is chosen again from the data set T ' after deletioni, step 7 is so repeated, until SetWork as setWhen, the number of the sample in set S is the possible k values of K-means clustering algorithms at this time, Value in set S is possible initial value, remembers set S={ s at this time1,s2,…,sm’, wherein m '≤m;
9. by set S={ s1,s2,…,sm’Regard barycenter set as, each initial value is different class in barycenter set Other barycenter calculates all sample point n in sample data set T using Euclidean distance formulaiWith matter of all categories in barycenter set The distance of the heart, and the classification of each sample point is the classification of the barycenter minimum with sample point distance in marker samples data set T;
10, new class centroid of all categories is recalculated using interior all sample points of all categories, to update barycenter The formula of set S, center-of-mass coordinate is:
Wherein, ZiIndicate the coordinate of i-th of barycenter;
11, judge the barycenter in barycenter set S and the sample point n in sample data set T after updatingiBetween square-error Whether restrain with criterion function, is directly entered if not changing before barycenter set S is relative to update if convergence and after updating Step 12;Otherwise, step 9 and 10 is repeated, again without change after the convergence of error sum of squares criterion function and the S updates of barycenter set Change, enters step 12;
Wherein, the error sum of squares criterion function between the sample point in the barycenter and sample data set T in barycenter set Formula be:
In formula, the barycenter number in k barycenter set, ZjIndicate j-th of barycenter.
12, the distance of all barycenter between any two in barycenter set is found out using Euclidean distance formula, and select away from From two minimum barycenter, two classifications where two minimum barycenter of distance are individually taken out;
13, whether two classifications that judgment step 12 is individually taken out need to merge, if need not merge, barycenter collection at this time The number closed in S is classification number, and the value in barycenter set S is class centroid, goes successively to step 14;If desired it closes And two categories combinations for then individually taking out step 12, and the barycenter of classification after merging is calculated, while step 12 being selected Two barycenter deleted from barycenter set S, and the barycenter of classification after merging is put into barycenter set S, while jump procedure 12;
In this step, the method for two categories combinations that judgment step 12 is individually taken out is as follows:
I. two classifications to be judged are regarded as in two points of clusters two classes, calculates the BIC value (shellfishes of two points of clusters Leaf this score), it is denoted as BIC score2;
Ii. regard two classifications to be judged as a whole classification, calculate the BIC values of the entirety classification, be denoted as BIC score1;
When calculating BIC values, calculation formula is step i and ii:BIC=-2 × ln (L)+ln (s) × t, wherein s tables Registration is according to the number for concentrating sample point;L indicates likelihood function;T indicates number of features, when the BIC values for calculating two points of clusters When t=2, the t=1 when calculating the BIC values of the whole classification;
If iii. | BIC score1 | >=| BIC score2 |, show two classification needs that step 12 is individually taken out Merge;If | BIC score1 | < | BIC score2 |, show that two classifications that step 12 is individually taken out need not merge;
14, last to add the outlier got rid of before to come again, by calculating the Europe between outlier and class centroid Formula distance, by outlier labeled as the classification apart from nearest barycenter therewith.
The foregoing describe the basic principle of the present invention and main feature, the description of specification only illustrates the original of the present invention Reason, without departing from the spirit and scope of the present invention, various changes and improvements may be made to the invention, these changes and improvements It all fall within the protetion scope of the claimed invention.

Claims (9)

1. the improvement K-means clustering algorithms based on density radius, which is characterized in that include the following steps:
A. the distance of all sample points between any two in sample data set T is calculated;
B. a density radius d is specified, each sample point is found out according to the distance of density radius d and sample point between any two and is existed All sample points in density radius d;
C. it is sorted to the sample point in sample data set according to the number of each sample point sample point in density radius d, to Data set T ' to after will sort;
D. an empty set S is defined, first sample point in data set T ' is put into set S, and by data set T ' the All sample points in one sample point and its density radius d are deleted from data set T ';
E. step D is repeated, until setWork as setWhen, the number of the sample in set S is K- The possible k values of means clustering algorithms, the value in set S is the possible initial value of K-means clustering algorithms;
F. regard set S as barycenter set, each initial value is different class centroid in barycenter set, calculates sample data Collect in T all sample points in barycenter set at a distance from barycenter of all categories, and in marker samples data set T each sample point class The classification of barycenter that Wei be not minimum with sample point distance;
G. new class centroid of all categories is recalculated using interior all sample points of all categories, to update barycenter set;
H. judge the error sum of squares criterion letter between the sample point in the barycenter and sample data set T after updating in barycenter set Whether number restrains, if barycenter set is directly entered step I relative to not changing before update after convergence and update;It is no Then, step F and G are repeated, it is unchanged again after the convergence of error sum of squares criterion function and the update of barycenter set, into step Rapid I;
I. the distance of all barycenter between any two in barycenter set, and two barycenter of chosen distance minimum are found out, it will be apart from most Two classifications where two small barycenter are individually taken out;
Whether two classifications that J. judgment step I individually takes out need to merge, if need not merge, algorithm terminates;If needing Merge, then two categories combinations individually taken out step I, and calculate the barycenter of classification after merging, while step I being selected Two barycenter selected are deleted from barycenter set, and the barycenter of classification after merging is put into barycenter set, while jump procedure I.
2. the improvement K-means clustering algorithms based on density radius as described in claim 1, which is characterized in that step J judges Need whether the method that should merge includes when two classifications:
It regards two classifications to be judged as in two points of clusters two classes, calculates the BIC values of two points of clusters, be denoted as BIC score2;
It regards two classifications to be judged as a whole classification, calculates the BIC values of the entirety classification, be denoted as BIC score1;
If | BIC score1 | >=| BIC score2 |, to be judged two classifications need to merge;If | BIC score1 | < | BIC score2 |, then two classifications to be judged need not merge.
3. the improvement K-means clustering algorithms based on density radius as claimed in claim 2, which is characterized in that the meter of BIC values Calculating formula is:
BIC=-2 × ln (L)+ln (s) × t
Wherein, s indicates the number of sample point in data set;L indicates likelihood function;T indicates number of features, described two points when calculating T=2 when the BIC values of cluster, the t=1 when calculating the BIC values of the whole classification.
4. the improvement K-means clustering algorithms based on density radius as claimed in claim 1 or 2, which is characterized in that step A Further include before:Noise and outlier processing are removed to sample data set;Further include after step K:Calculate outlier with The distance between class centroid, by outlier labeled as the classification apart from nearest barycenter therewith.
5. the improvement K-means clustering algorithms based on density radius as claimed in claim 4, which is characterized in that use the side lof Method is removed noise to sample data set and outlier is handled.
6. the improvement K-means clustering algorithms based on density radius as claimed in claim 4, which is characterized in that sample Data set is removed after noise and outlier processing, and before step A, further includes:Sample data set is normalized, sample Sample coordinate after data set normalization is xi.j, then
Wherein, m indicates that the number of sample point, v indicate dimension.
7. the improvement K-means clustering algorithms based on density radius as claimed in claim 6, which is characterized in that step A, F and I calculate apart from when calculated using Euclidean distance formula, formula is as follows:
8. the improvement K-means clustering algorithms based on density radius as claimed in claim 6, which is characterized in that step G is calculated The formula of center-of-mass coordinate is:
Wherein, ZiIndicate the coordinate of i-th of barycenter.
9. the improvement K-means clustering algorithms based on density radius as claimed in claim 6, which is characterized in that in step H, The formula of error sum of squares criterion function between sample point in barycenter in barycenter set and sample data set T is:
Wherein, the barycenter number in k barycenter set, ZjIndicate j-th of barycenter.
CN201810354305.0A 2018-04-19 2018-04-19 Improvement K-means clustering algorithms based on density radius Pending CN108549913A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810354305.0A CN108549913A (en) 2018-04-19 2018-04-19 Improvement K-means clustering algorithms based on density radius

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810354305.0A CN108549913A (en) 2018-04-19 2018-04-19 Improvement K-means clustering algorithms based on density radius

Publications (1)

Publication Number Publication Date
CN108549913A true CN108549913A (en) 2018-09-18

Family

ID=63515608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810354305.0A Pending CN108549913A (en) 2018-04-19 2018-04-19 Improvement K-means clustering algorithms based on density radius

Country Status (1)

Country Link
CN (1) CN108549913A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111382765A (en) * 2018-12-29 2020-07-07 中国移动通信集团四川有限公司 Complaint hot spot region clustering method, device, equipment and medium
CN111966951A (en) * 2020-07-06 2020-11-20 东南数字经济发展研究院 User group hierarchy dividing method based on social e-commerce transaction data

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111382765A (en) * 2018-12-29 2020-07-07 中国移动通信集团四川有限公司 Complaint hot spot region clustering method, device, equipment and medium
CN111382765B (en) * 2018-12-29 2023-07-04 中国移动通信集团四川有限公司 Complaint hot spot area clustering method, device, equipment and medium
CN111966951A (en) * 2020-07-06 2020-11-20 东南数字经济发展研究院 User group hierarchy dividing method based on social e-commerce transaction data

Similar Documents

Publication Publication Date Title
CN109273096B (en) Medicine risk grading evaluation method based on machine learning
CN108846259A (en) A kind of gene sorting method and system based on cluster and random forests algorithm
CN105930862A (en) Density peak clustering algorithm based on density adaptive distance
CN110222744A (en) A kind of Naive Bayes Classification Model improved method based on attribute weight
CN108280472A (en) A kind of density peak clustering method optimized based on local density and cluster centre
CN113344019A (en) K-means algorithm for improving decision value selection initial clustering center
CN110569982A (en) Active sampling method based on meta-learning
CN110837884B (en) Effective mixed characteristic selection method based on improved binary krill swarm algorithm and information gain algorithm
CN111104398B (en) Detection method and elimination method for intelligent ship approximate repeated record
CN108280236A (en) A kind of random forest visualization data analysing method based on LargeVis
WO2018006631A1 (en) User level automatic segmentation method and system
CN108549913A (en) Improvement K-means clustering algorithms based on density radius
CN114119966A (en) Small sample target detection method based on multi-view learning and meta-learning
CN110399493B (en) Author disambiguation method based on incremental learning
CN110909785B (en) Multitask Triplet loss function learning method based on semantic hierarchy
CN111339385A (en) CART-based public opinion type identification method and system, storage medium and electronic equipment
CN111797267A (en) Medical image retrieval method and system, electronic device and storage medium
CN114882531A (en) Cross-domain pedestrian re-identification method based on deep learning
CN117407732A (en) Unconventional reservoir gas well yield prediction method based on antagonistic neural network
CN105760471B (en) Based on the two class text classification methods for combining convex linear perceptron
CN117195027A (en) Cluster weighted clustering integration method based on member selection
CN116662832A (en) Training sample selection method based on clustering and active learning
CN113780378B (en) Disease high risk crowd prediction device
CN107423319B (en) Junk web page detection method
CN111753084B (en) Short text feature extraction and classification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180918

RJ01 Rejection of invention patent application after publication