CN104199853A - Clustering method - Google Patents

Clustering method Download PDF

Info

Publication number
CN104199853A
CN104199853A CN201410394502.7A CN201410394502A CN104199853A CN 104199853 A CN104199853 A CN 104199853A CN 201410394502 A CN201410394502 A CN 201410394502A CN 104199853 A CN104199853 A CN 104199853A
Authority
CN
China
Prior art keywords
cluster
bunch
text
class
clustering method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410394502.7A
Other languages
Chinese (zh)
Inventor
侯荣涛
王琴
周彬
路郁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN201410394502.7A priority Critical patent/CN104199853A/en
Publication of CN104199853A publication Critical patent/CN104199853A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Abstract

The invention discloses a clustering method. The method comprises the steps that firstly, the pre-classification technology based on the density algorithm is used for obtaining a high-density core class, and a class hierarchy tree capable of representing a dataset structure is determined; then, K-MEANS clustering is carried out according to subclass centers with high representativeness in the class hierarchy tree to obtain fine clusters; finally, the fine clusters are combined according to class attributes in the class hierarchy tree to achieve a precise and stable clustering effect. The stable algorithm based on the fine clusters is provided according to sensibility of K-MEANS to initial clustering centers, convex type classes in a dataset can be divided, and the optimal division can be carried out on classes in irregular shapes.

Description

A kind of clustering method
Technical field
The present invention relates to relate to a kind of clustering method, particularly relate to a kind of novel K-MEANS clustering method, belong to data mining technology field.
Background technology
Along with the development of internet, data are shared and accumulation in a large number, data overload and the phenomenon of knowledge deficiency is more and more outstanding.The data that day by day expand can become data tomb because being not used, if can access fully, excavate, and the potential information wherein containing will be created high value.The task of data mining is from mass data, to find knowledge, and it is mainly for structural data, and in fact a large amount of data are stored in database with the form of text, and this makes text data digging become an important branch of data mining.
Clustering technique is the key means of data mining, and its task is that the similar text of subject content is classified as to a class, and the different text of content is separated from each other.Wherein K-MEANS algorithm is one of the most classical clustering algorithm, its feature simple and quick and that be easy to realize makes it become algorithm the most frequently used in text data digging, yet K-MEANS exists that affected by initial cluster center excessive, the execution efficiency shortcoming such as can not meet the demands.Text mining application is desired is a kind of unsupervised Text Clustering Method, requires clustering method can stably obtain high precision cluster result, for traditional clustering method, also needs to improve further and could be applied in text data digging well.
Summary of the invention
Technical matters to be solved by this invention is: a kind of clustering method is provided, made up traditional K-MEANS algorithm and be subject to initial cluster center to affect excessive shortcoming, thus cluster more exactly.
The present invention is for solving the problems of the technologies described above by the following technical solutions:
A clustering method, comprises the steps:
Step 1, the OPTICS clustering method of utilization based on density carry out preliminary cluster to data set, obtain reachability graph;
The reachability graph that step 2, obtaining step 1 obtain comprises all bunches, by all bunches according to how many descending sequences of data object that each bunch comprises, root node using described data set as hierarchical tree, by bunch traveling through and joining successively in hierarchical tree by range after sequence, build hierarchical tree, and definition can to comprise the node of bunch be the father node of this bunch, the brotgher of node that the node that can not comprise bunch is this bunch, hierarchical tree other nodes except root node are described bunch, and each subtree of hierarchical tree is distributed to different id;
Step 3, the number of the leaf node of the hierarchical tree that step 2 is obtained is as the initial category number of K-MEANS cluster, the data object that each leaf node of hierarchical tree is comprised is averaged as the initial cluster center of each initial category of K-MEANS cluster, the id that step 2 is distributed is as the initial id of each leaf node initial cluster center of hierarchical tree and the data object that comprises, described data set is carried out to K-MEANS cluster, every iteration once, the id of new cluster centre is identical with the id of cluster centre before iteration, and the id that is classified as the data object of a class with new cluster centre is consistent with the id of new cluster centre, obtain the K-MEANS clustering cluster with id,
Step 4, the K-MEANS clustering cluster with id that step 3 is obtained merge, and the K-MEANS clustering cluster that id is identical is merged into same class, obtains the final cluster result of data set.
Preferably, the range formula in the described OPTICS clustering method based on density is wherein, Distance (x i, x j) represent any two data object x i, x jdistance, cos (x i, x j) represent any two data object x i, x jcosine similarity, x i, x jrepresent i, a j text object, and i ≠ j.
A Text Clustering Method, comprises the steps:
Step 1, choose at least two text categories arbitrarily, each classification is chosen at least one text object and is formed text data set;
Step 2, utilize clustering method as claimed in claim 1 to carry out cluster to text data set, obtain text cluster result.
Preferably, the range formula between described K-MEANS cluster Chinese version object is wherein, Distance (x i, x j) represent any two text object x i, x jdistance, cos (x i, x j) represent any two text object x i, x jcosine similarity, x i, x jbe text object, and i ≠ j.
Preferably, the convergence criterion of described K-MEANS cluster is bisection error criterion, and formula is wherein, x is the vector of data centralization class i, m ifor the barycenter of class i, Ci is bunch, the quantity that k is class, dis (x, m i) for vector x is to barycenter m idistance.
The present invention adopts above technical scheme compared with prior art, has following technique effect:
1, a kind of clustering method of the present invention can dividing data be concentrated the class that presents convex, for erose class, also can carry out optimum division.
2, a kind of clustering method of the present invention has overcome traditional K-MEANS algorithm and is subject to initial cluster center to affect excessive shortcoming, has obtained a kind of stable algorithm based on meticulous bunch, and Clustering Effect is better.
3, succinct, the easy to understand of a kind of clustering method algorithm of the present invention, and easily realize.
Accompanying drawing explanation
Fig. 1 (a), (b) are traditional K-MEANS cluster result figure.
Fig. 2 (a), (b), (c) are cluster result figure of the present invention.
Fig. 3 (a), (b) are OPTICS cluster result figure of the present invention.
Fig. 4 is the structural drawing of hierarchical tree of the present invention.
Fig. 5 is clustering method process flow diagram of the present invention.
Fig. 6 is algorithm of the present invention and traditional K-MEANS algorithm cluster accuracy rate comparison diagram.
Embodiment
Describe embodiments of the present invention below in detail, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has the element of identical or similar functions from start to finish.Below by the embodiment being described with reference to the drawings, be exemplary, only for explaining the present invention, and can not be interpreted as limitation of the present invention.
The main advantage of tradition K-MEANS algorithm is except algorithm is succinct, easy to understand and realization etc., for the processing of sparse matrix, has good performance.Yet this algorithm needs manually to specify clusters number K before carrying out.For unknown data set, before cluster, be cannot predict the real class of data centralization to distribute, the process need user's of true defining K value practical experience, and exist very large uncertainty.Secondly, because initial cluster center is random generation, this algorithm is easily absorbed in locally optimal solution and cluster result difference is very large.Finally, there is the restriction can only find convex bunch in K-MEANS algorithm.Due to K-MEANS, to be that the lineoid that consists of a plurality of cluster centres is divided all kinds of, so these classes present convex.In practical application, class not necessarily presents convex regularly, and for the class of some irregular shapes, K-MEANS cannot obtain optimum division.As shown in Fig. 1 (a), (b), be the K-MEANS cluster result of out-of-shape bunch, figure hollow core round dot is that data set distributes, solid black round dot is for by K-MEANS iteration convergence Shi Cu center.According to data object, be assigned to the principle in class under center nearest with it, division result can You Lianggecu center bisector (in figure with dotted line) separates, due to data centralization bunch for presenting complete convex, partial data point is grouped in wrong class, makes data set cannot obtain best division result.
Due to data centralization, true class number K cannot predict, random initial cluster center lacks basis, strengthened the possibility that K-MEANS algorithm is absorbed in local optimum, and iterations increases thereupon.The convex of cluster gained bunch also makes the object in irregularly shaped bunch cannot be divided in correct class.For these shortcomings, the present invention proposes the meticulous genus that gathers, real bunch is divided meticulouslyr and merge these meticulous bunch to obtain higher clustering precision.Adopt meticulous bunch of main thought that carries out K-MEANS cluster as shown in Figure 2, if increasing the value of K is 4,, after K-MEANS cluster, bunch center is as shown in Fig. 2 (a), it is meticulousr that the bisector being comprised of these centers is divided data set, as shown in Fig. 2 (b).If two meticulous bunch, left side in Fig. 2 (b) can be merged into one bunch, two bunches, right side is merged into one bunch, data set will obtain more accurately and divide, as shown in Fig. 2 (c).
Thereby the introducing of meticulous bunch will overcome shortcoming that K-MEANS algorithm can only identify convex structure bunch cluster more exactly to a certain extent, and wherein most critical be how to obtain and to merge meticulous bunch, need to obtain submanifold and his father's bunch hierarchical structure.The present invention has introduced the OPTICS cluster based on density, and this clustering method can refer to < < OPTICS algorithm that 01 phase < < computer utility > > Hou Rong great waves in 2014 etc. the deliver application > > in thunder and lightning nowcasting.This algorithm not explicitly produces cluster result, but concept scan data set based on its core distance and reach distance obtains the reachability graph of a designation data collection object distribution.By reachability graph, reflect data set inner structure, the cluster result finally obtaining bunch from extracting reachability graph's recess region.Fig. 3 (a), (b) are the reachability graph that OPTICS generates text data set cluster analysis, as can be seen from the figure, because algorithm is all the time toward the high regional development of density, make the object in the sparse region of density be adjusted to the afterbody slowly raising up after reachability graph's groove area, these parts cannot obtain correct classification.Particularly in actual text cluster application, text is compact distribution not bunch conventionally, and the region that causes slowly raising up in reachability graph becomes very wide, and the object in this part region cannot correctly be sorted out.
Although OPTICS has the advantage of automatic identification bunch class quantity, merely use OPTICS text cluster cannot obtain the correct classification of full text object.The density structure that has reflected data set integral body in reachability graph, the reachability graph schematic diagram of the data set of illustrating in Fig. 1 after OPTICS cluster is as shown in Fig. 3 (a), (b), and wherein C1 has comprised submanifold a and b, and C2 has comprised submanifold c and d.When carrying out meticulous bunch of K-MEANS cluster, such hierarchical structure can effectively instruct choosing and the merging of meticulous bunch of initial cluster center.Therefore, bunch hierarchical structure of effectively utilizing OPTICS reachability graph to comprise, can be K-MEANS algorithm initialization foundation is provided, and the overall splitting scheme in conjunction with K-MEANS algorithm, will obtain better clustering performance.
On data model basis, the present invention adopts the similarity degree between two text objects of cosine measuring similarity between text.Known according to the character of cosine similarity, the similarity degree of two objects is higher, and cosine similarity value is larger.Because clustering algorithm carries out according to the distinctiveness ratio between object, being about to that object at a distance of little (being that distance value is little) gathers is a class, and in order to meet this specific character, revising cosine formula is as shown in formula (1), to represent the distance between text object.
Dis tan ce ( x i , x j ) = 1 cos ( x i , x j ) + 0.001 - - - ( 1 )
Wherein, Distance (x i, x j) represent any two text object x i, x jdistance be distinctiveness ratio, cos (x i, x j) represent the cosine similarity of any two text objects, x i, x jbe text object, and i ≠ j.
Utilize OPTICS algorithm to carry out preliminary cluster to data set, obtain reflecting the reachability graph of data set inner structure.At present existing method can be obtained all possible bunch according to precipitous decline and precipitous elevated areas from reachability graph, the method can be referring to Mihael Ankerst, Markus M.Breunig in 1999, Hans-Peter Kriegel and the < < OPTICS:Ordering Points To Identify the Clustering Structure > > that Sander delivers on ACM SIGMOD Record.In these bunches, comprised the affiliated father bunch of submanifold and submanifold, the hierarchical tree that comprises structure according to reachability graph's bunch boundary formation bunch, the structure of hierarchical tree is as shown in Figure 4.
As follows according to above-mentioned all possible bunch of method of constructing hierarchical tree: first, by all possible bunch of obtaining from reachability graph according to bunch how many descending sequences of object number; Then, by what sorted, bunch take out successively and join in hierarchical tree, the mode traveling through by range is searched for the node that can comprise each bunch in hierachy number, if certain node can comprise one bunch, show the child nodes that this bunch is certain node, if certain node can not comprise one bunch, show the brotgher of node that this bunch is certain node; Since the second bunch, each bunch repeats above-mentioned ergodic process, until last bunch, all bunches all joined in hierarchical tree, completes the structure of hierarchical tree.
Calculate all leaf node representative Cu center in hierarchical tree, the input using these centers as K-MEANS algorithm.According to initial center point, start iteration, in iterative process, adopt the range formula Distance (x identical with OPTICS i, x j) weigh distance between text object, adopt barycenter as Cu center.Convergence criterion is square error criterion, as shown in formula (2).
E = &Sigma; i = 1 k &Sigma; x &Element; C i dis ( x , m i ) - - - ( 2 )
Wherein, x is the vector of data centralization class i, m ifor the average of class i is barycenter, Ci is bunch, the quantity that k is class, dis (x, m i) be the distance of vector x to barycenter.In iterative process, square error E constantly diminishes, and when E no longer changes, iteration finishes, and all text objects are divided in meticulous bunch, and merging has meticulous bunch of same item mark.So far, the cluster of whole data set finishes.Meticulous bunch of K-MEANS clustering algorithm flow process as shown in Figure 5.
Below the quality of cluster result is evaluated, the evaluation method of check cluster result quality has precision ratio, recall ratio and F-Measure.Suppose that document classification is K, wherein cluster C themes as T, and the object number that in C, subject categories is T is C t, subject categories not for the number of T be C f, the number that themes as T in other classes is C oT, in other classes, theme is for the number of T is not C oF, the corresponding relation of variable and class is as shown in table 1.
The relation of table 1 variable name and class
Classification Relevant to T Uncorrelated with T
Theme T class C T C F
Other classes C OT C OF
Precision ratio also claims accuracy rate sometimes, and after being all text object classification, the quantity that its affiliated classification is consistent with concrete class and the ratio of all kinds of middle object sums, as shown in formula (3).
P ( C , T ) = C T C T + C F - - - ( 3 )
Recall ratio also claims recall rate, be in cluster C with all number of documents ratios of number of documents theme corresponding to manual sort of Topic relative, as shown in formula (4).
P ( C , T ) = C T C T + C OT - - - ( 4 )
F-Measure is that to take precision ratio and recall ratio be basic aggregate concept, for the F-Measure value definition of i class as, as shown in formula (5).
F ( i ) = 2 PR P + R - - - ( 5 )
In order to test Clustering Effect, the corpus that adopts clustering method of the present invention to make the classification of search dog laboratory is tested, choose respectively class automobile, finance and economics, IT, healthy four classes, each classification arbitrary extracting 200 pieces of documents wherein, totally 800 pieces of documents carry out cluster.Through many experiments, determine that OPTICS parameter chooses MinPts=18 and ε=20.The present invention and traditional K-MEANS algorithm contrast, and in order to keep consistency, have adopted equally cosine similarity and square error convergence criterion in traditional K-MEANS algorithm, and the K of setting K-MEANS algorithm is 4.In experiment, the inventive method and traditional K-MEANS algorithm have carried out respectively 10 experiments, and by statistics, classification accuracy as shown in Figure 6.
As can be seen from Figure 6, even if the given K of cluster numbers accurately in traditional K-MEANS algorithm also can cause precision ratio unstable (domain of walker 0.7064-0.8953) due to the choosing at random of initial seed, float larger.Because the present invention is combined OPTICS algorithm with K-MEANS algorithm, for K-MEANS algorithm, automatically determine initial classes and counted K, and provide high-quality initial cluster center, make cluster result relatively stable (domain of walker 0.9114-0.9207), the meticulous raising cluster accuracy rate aspect that is introduced in of gathering genus has also played certain effect.The recall ratio of each experimental calculation gained presents the fluctuating trend identical with precision ratio with F-Measure value, and every data statistics result is as shown in table 2.From experimental result, the more traditional K-MEANS algorithm of average belavior of algorithm of the present invention has promoted 7% left and right.
Table 2 algorithm of the present invention and traditional K-MEANS algorithm cluster quality mean value compare
Algorithm Accuracy rate Recall ratio F-Measure
Text algorithm of the present invention 0.918 0.915 0.916
Tradition K-MEANS 0.851 0.845 0.852
Above embodiment only, for explanation technological thought of the present invention, can not limit protection scope of the present invention with this, every technological thought proposing according to the present invention, and any change of doing on technical scheme basis, within all falling into protection domain of the present invention.

Claims (5)

1. a clustering method, is characterized in that: comprise the steps:
Step 1, the OPTICS clustering method of utilization based on density carry out preliminary cluster to data set, obtain reachability graph;
The reachability graph that step 2, obtaining step 1 obtain comprises all bunches, by all bunches according to how many descending sequences of data object that each bunch comprises, root node using described data set as hierarchical tree, by bunch traveling through and joining successively in hierarchical tree by range after sequence, build hierarchical tree, and definition can to comprise the node of bunch be the father node of this bunch, the brotgher of node that the node that can not comprise bunch is this bunch, hierarchical tree other nodes except root node are described bunch, and each subtree of hierarchical tree is distributed to different id;
Step 3, the number of the leaf node of the hierarchical tree that step 2 is obtained is as the initial category number of K-MEANS cluster, the data object that each leaf node of hierarchical tree is comprised is averaged as the initial cluster center of each initial category of K-MEANS cluster, the id that step 2 is distributed is as the initial id of each leaf node initial cluster center of hierarchical tree and the data object that comprises, described data set is carried out to K-MEANS cluster, every iteration once, make the id of new cluster centre identical with the id of cluster centre before iteration, and the id that is classified as the data object of a class with new cluster centre is consistent with the id of new cluster centre, obtain the K-MEANS clustering cluster with id,
Step 4, the K-MEANS clustering cluster with id that step 3 is obtained merge, and the K-MEANS clustering cluster that id is identical is merged into same class, obtains the final cluster result of data set.
2. clustering method as claimed in claim 1, is characterized in that: the range formula in the described OPTICS clustering method based on density is wherein, Distance (x i, x j) represent any two data object x i, x jdistance, cos (x i, x j) represent any two data object x i, x jcosine similarity, x i, x jrepresent i, a j text object, and i ≠ j.
3. a Text Clustering Method, is characterized in that: comprise the steps:
Step 1, choose at least two text categories arbitrarily, each classification is chosen at least one text object and is formed text data set;
Step 2, utilize clustering method as claimed in claim 1 to carry out cluster to text data set, obtain text cluster result.
4. Text Clustering Method as claimed in claim 2, is characterized in that: the range formula between described K-MEANS cluster Chinese version object is wherein, Distance (x i, x j) represent any two text object x i, x jdistance, cos (x i, x j) represent any two text object x i, x jcosine similarity, x i, x jrepresent i, a j text object, and i ≠ j.
5. Text Clustering Method as claimed in claim 2, is characterized in that: the convergence criterion of described K-MEANS cluster is bisection error criterion, and formula is wherein, x is the vector of data centralization class i, m ifor the barycenter of class i, Ci is bunch, the quantity that k is class, dis (x, m i) for vector x is to barycenter m idistance.
CN201410394502.7A 2014-08-12 2014-08-12 Clustering method Pending CN104199853A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410394502.7A CN104199853A (en) 2014-08-12 2014-08-12 Clustering method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410394502.7A CN104199853A (en) 2014-08-12 2014-08-12 Clustering method

Publications (1)

Publication Number Publication Date
CN104199853A true CN104199853A (en) 2014-12-10

Family

ID=52085146

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410394502.7A Pending CN104199853A (en) 2014-08-12 2014-08-12 Clustering method

Country Status (1)

Country Link
CN (1) CN104199853A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104820775A (en) * 2015-04-17 2015-08-05 南京大学 Discovery method of core drug of traditional Chinese medicine prescription
CN106776600A (en) * 2015-11-19 2017-05-31 北京国双科技有限公司 The method and device of text cluster
CN107239434A (en) * 2015-11-19 2017-10-10 英特尔公司 Technology for the automatic rearrangement of sparse matrix
CN107480708A (en) * 2017-07-31 2017-12-15 微梦创科网络科技(中国)有限公司 The clustering method and system of a kind of complex model
CN107644233A (en) * 2017-10-11 2018-01-30 上海电力学院 FILTERSIM analogy methods based on Cluster Classification
CN108369638A (en) * 2015-12-16 2018-08-03 三星电子株式会社 The image management based on event carried out using cluster
CN109033084A (en) * 2018-07-26 2018-12-18 国信优易数据有限公司 A kind of semantic hierarchies tree constructing method and device
CN109685092A (en) * 2018-08-21 2019-04-26 中国平安人寿保险股份有限公司 Clustering method, equipment, storage medium and device based on big data
CN110597719A (en) * 2019-09-05 2019-12-20 腾讯科技(深圳)有限公司 Image clustering method, device and medium for adaptation test
CN112465034A (en) * 2020-11-30 2021-03-09 中国长江电力股份有限公司 Method and system for establishing T-S fuzzy model based on hydraulic generator
CN113570004A (en) * 2021-09-24 2021-10-29 西南交通大学 Riding hot spot area prediction method, device, equipment and readable storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SANDER J ,ET AL.: "Automatic Extraction of Clusters from Hierarchical Clustering Representations", 《PAKDD 2003:PROCEEDINGS OF THE 7TH PACFIC-ASIA CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING》 *
党秋月: "基于OPTICS可达图的自动识别簇方法", 《计算机应用》 *
黄志红: "基于层次聚类的K均值算法研究", 《电脑开发与应用》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104820775A (en) * 2015-04-17 2015-08-05 南京大学 Discovery method of core drug of traditional Chinese medicine prescription
CN106776600A (en) * 2015-11-19 2017-05-31 北京国双科技有限公司 The method and device of text cluster
CN107239434A (en) * 2015-11-19 2017-10-10 英特尔公司 Technology for the automatic rearrangement of sparse matrix
CN107239434B (en) * 2015-11-19 2020-11-10 英特尔公司 Techniques for automatic reordering of sparse matrices
CN108369638B (en) * 2015-12-16 2022-06-21 三星电子株式会社 Event-based image management using clustering
CN108369638A (en) * 2015-12-16 2018-08-03 三星电子株式会社 The image management based on event carried out using cluster
CN107480708A (en) * 2017-07-31 2017-12-15 微梦创科网络科技(中国)有限公司 The clustering method and system of a kind of complex model
CN107644233A (en) * 2017-10-11 2018-01-30 上海电力学院 FILTERSIM analogy methods based on Cluster Classification
CN109033084A (en) * 2018-07-26 2018-12-18 国信优易数据有限公司 A kind of semantic hierarchies tree constructing method and device
CN109033084B (en) * 2018-07-26 2022-10-28 国信优易数据股份有限公司 Semantic hierarchical tree construction method and device
CN109685092A (en) * 2018-08-21 2019-04-26 中国平安人寿保险股份有限公司 Clustering method, equipment, storage medium and device based on big data
CN109685092B (en) * 2018-08-21 2024-02-06 中国平安人寿保险股份有限公司 Clustering method, equipment, storage medium and device based on big data
CN110597719A (en) * 2019-09-05 2019-12-20 腾讯科技(深圳)有限公司 Image clustering method, device and medium for adaptation test
CN110597719B (en) * 2019-09-05 2021-06-15 腾讯科技(深圳)有限公司 Image clustering method, device and medium for adaptation test
CN112465034B (en) * 2020-11-30 2023-06-27 中国长江电力股份有限公司 Method and system for establishing T-S fuzzy model based on hydraulic generator
CN112465034A (en) * 2020-11-30 2021-03-09 中国长江电力股份有限公司 Method and system for establishing T-S fuzzy model based on hydraulic generator
CN113570004B (en) * 2021-09-24 2022-01-07 西南交通大学 Riding hot spot area prediction method, device, equipment and readable storage medium
CN113570004A (en) * 2021-09-24 2021-10-29 西南交通大学 Riding hot spot area prediction method, device, equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN104199853A (en) Clustering method
Huang et al. Revealing density-based clustering structure from the core-connected tree of a network
Popat et al. Hierarchical document clustering based on cosine similarity measure
CN103927302A (en) Text classification method and system
Rani A Survey on STING and CLIQUE Grid Based Clustering Methods.
CN109345007A (en) A kind of Favorable Reservoir development area prediction technique based on XGBoost feature selecting
Wu et al. $ K $-Ary Tree Hashing for Fast Graph Classification
CN102360436B (en) Identification method for on-line handwritten Tibetan characters based on components
Suyal et al. Text clustering algorithms: a review
CN104699666B (en) Based on neighbour&#39;s propagation model from the method for library catalogue learning hierarchical structure
CN105956012A (en) Database mode abstract method based on graphical partition strategy
Sundari et al. A study of various text mining techniques
Al-Mukhtar et al. Greedy modularity graph clustering for community detection of large co-authorship network
Li Glowworm swarm optimization algorithm-and K-prototypes algorithm-based metadata tree clustering
CN106294652A (en) Web page information search method
CN106168982A (en) Data retrieval method for particular topic
Yao et al. Applying an improved DBSCAN clustering algorithm to network intrusion detection
Mahajan et al. Various approaches of community detection in complex networks: a glance
Chauhan Clustering Techniques: A Comprehensive Study of Various Clustering Techniques.
Vadgasiya et al. An enhanced algorithm for improved cluster generation to remove outlier’s ratio for large datasets in data mining
Manikandan et al. The study on clustering analysis on data mining
Kaur Dhillon et al. A Study on Clustering Based Methods.
Hamsathvani et al. Survey on Infrequent Weighted Item set Mining Using FP Growth
Feng et al. Research on Faceted Search Method for Water Data Catalogue Service
Ingale et al. Review of algorithms for clustering random data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20141210