CN103207896A - Method and system for stable and efficient self-adaptive clustering - Google Patents

Method and system for stable and efficient self-adaptive clustering Download PDF

Info

Publication number
CN103207896A
CN103207896A CN2013100826712A CN201310082671A CN103207896A CN 103207896 A CN103207896 A CN 103207896A CN 2013100826712 A CN2013100826712 A CN 2013100826712A CN 201310082671 A CN201310082671 A CN 201310082671A CN 103207896 A CN103207896 A CN 103207896A
Authority
CN
China
Prior art keywords
input data
cluster
candidate
selecting
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013100826712A
Other languages
Chinese (zh)
Other versions
CN103207896B (en
Inventor
张兰
刘云浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WUXI QINGHUA INFORMATION SCIENCE AND TECHNOLOGY NATIONAL LABORATORY INTERNET OF THINGS TECHNOLOGY CENTER
Original Assignee
WUXI QINGHUA INFORMATION SCIENCE AND TECHNOLOGY NATIONAL LABORATORY INTERNET OF THINGS TECHNOLOGY CENTER
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WUXI QINGHUA INFORMATION SCIENCE AND TECHNOLOGY NATIONAL LABORATORY INTERNET OF THINGS TECHNOLOGY CENTER filed Critical WUXI QINGHUA INFORMATION SCIENCE AND TECHNOLOGY NATIONAL LABORATORY INTERNET OF THINGS TECHNOLOGY CENTER
Priority to CN201310082671.2A priority Critical patent/CN103207896B/en
Publication of CN103207896A publication Critical patent/CN103207896A/en
Application granted granted Critical
Publication of CN103207896B publication Critical patent/CN103207896B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method and a system for stable and efficient self-adaptive clustering. The method comprises the following steps of: a, obtaining a set p of input data from p1 to pn, in which n input data is included in the set, and obtaining a threshold value theta of cluster radius; b, adding pi and input data in the set with a distance from the input data pi of smaller than the threshold value theta in a candidate cluster Cpi corresponding to the input data pi, in which the input data pi represents the i-th input data in the set; and c, defining m input data in the candidate cluster Cpi, using a function d (pi, pij) as a distance between the two input data pi, pj, and calculating the input data pi as the probability of a cluster center. The method and the system are applied for establishing a stable and efficient self-adaptive cluster system; the amount of final clusters is not required to be preset, so that the system has calculation efficiency and can realize calculation complexity of o(m2), and the method and the system can be suitable for various mobile intelligent terminals at present.

Description

A kind of clustering method of adaptive stability and high efficiency and system
Technical field
The present invention relates to technical field of computer information processing, relate in particular to a kind of clustering method and system of adaptive stability and high efficiency.
Background technology
Along with the quick growth of computerized information, people are more and more stronger to the processing demands of all kinds of computerized informations.Clustering algorithm for the cluster function that various data managements, artificial intelligence, machine learning provide the foundation, is being brought into play important role as a very important class algorithm in the information processing in various information processings,
Use general today at intelligent mobile terminal, various information service based on intelligent mobile device has appearred, they need provide the service of efficient stable to various intelligent terminals, wherein a large amount of services all need to use clustering algorithm, as in the mobile social networking to social good friend's cluster, during shopping is used to the cluster of commodity etc.At present a large amount of mobile device terminal are passed through GPS, the base station, modes such as WAP have possessed station-keeping ability, have therefore also produced many services based on the geographic position, clustering method then can provide abundant more and useful function, the hot zones cluster of for example classifying for this class service.Simple example has often been added all kinds of geographical labels by the user on the present electronic chart, and as shopping, cuisines, sight spot etc., these geographical labels are dispersed on the whole electronic chart.When a smart mobile phone user travelled outdoors or goes window-shopping, he usually needed to seek own interested popular commercial circle, i.e. the intensive place of a certain class label, as the concentrated commercial circle of doing shopping, and obtain navigation Service.But but can only obtain being dispersed in " shopping " label on the whole map by present cell phone map inquiry " shopping ", allow the user be difficult to choose route destination address.Yet by the effective cluster with these " shopping " labels, be about to label and be divided into a plurality of intensive subregions (cluster), can find popular " shopping " commercial circle fast.And by to a plurality of labels, as " shopping " and " cuisines ", cluster result integrate, then can effectively help the user to find to satisfy the popular commercial circle of its multiple requirement.Clustering method can bring a large amount of abundant application for novel mobile device, but the limited characteristics of the application of portable terminal variation and computational resource then to the proposition of clustering method self-adaptation, stablize, efficiently new demand.
Present existing multiple clustering method, as k-means commonly used and the maximum method of expectation, though that their are realized is fast simple, they need set in advance the number of final subregion, and this obviously makes the application that such method can't be scalable.Because the user can't be known number of partitions in advance in great majority are used, as a city what gourmet centers are arranged actually.In addition, all there is unsettled phenomenon in these two kinds of methods, and namely repeatedly moving the cluster result that obtains may be inconsistent.Though and the another kind of method that is called QT need not set in advance number of partitions, and can get access to stable cluster result, it but needs o[(n] 3) computing cost, in the face of huge quantity of information, for the limited mobile device of computational resource, such expense is difficult to bear often.
Summary of the invention
The objective of the invention is to propose a kind of clustering method and system of adaptive stability and high efficiency, to solve the big problem of computing cost.
For reaching this purpose, the present invention by the following technical solutions:
A kind of clustering method of adaptive stability and high efficiency comprises:
The set that a obtains the input data is p={p 1... p n, comprise n input data in the set, obtain the threshold value θ of cluster radius;
B is with p iAnd in the set with input data p iDistance all add input data p less than the input data of threshold value θ iCorresponding candidate's cluster C Pi, input data p iI input data in the expression set;
C makes candidate's cluster C PiIn the input data be m, function d (p i, p j) be two input data p i, p jBetween distance, calculate input data p iProbability as cluster centre is
Figure 2013100826712100002DEST_PATH_IMAGE001
1≤j≤m;
D selects the input data that become cluster centre probability maximum from the input data of set, candidate's cluster of this input data correspondence of selecting is added final cluster.
Further, described candidate's cluster with this input data correspondence of selecting adds after the final cluster, further comprises:
E deletes the input data that add final cluster from input data set closes, select the input data that become cluster centre probability maximum again from the present input data set, and candidate's cluster of this input data correspondence of selecting is added final cluster;
Whether the quantity of judging the input data in the set is zero, if, then finish, otherwise, step e continued.
A kind of clustering system of adaptive stability and high efficiency comprises:
Initialization module, being used for obtaining the set of importing data is p={p 1... p n, comprise n input data in the set, obtain the threshold value θ of cluster radius;
Candidate's cluster is set up module, is used for p iAnd in the set with input data p iDistance all add input data p less than the input data of threshold value θ iCorresponding candidate's cluster C Pi, input data p iI input data in the expression set;
The probability calculation module is used for making candidate's cluster C PiIn the input data be m, function d (p i, p j) be two input data p i, p jBetween distance, calculate input data p iProbability as cluster centre is q i = Σ 1 m 1 / ( 1 + d ( p i , p j ) ) , 1 ≤ j ≤ m ;
Cluster screening module is used for selecting the input data that become cluster centre probability maximum from the input data of set, and candidate's cluster of this input data correspondence of selecting is added final cluster.
Further, also comprise:
Removing module, be used for closing the input data that deletion adds final cluster from input data set, again from present input data set, select the input data that become cluster centre probability maximum, candidate's cluster of this input data correspondence of selecting is added final cluster;
First detection module is used for judging whether the quantity of the input data of gathering is zero, if, then finish, otherwise, step e continued.
Beneficial effect of the present invention is: clustering method and the system of the present invention by a kind of adaptive stability and high efficiency is provided proposed a kind of clustering method of stability and high efficiency of New Adaptive.This method need not to set in advance the number of subregion, can self-adaptation input data realize appropriate subregion.And this method can realize stable cluster result at identical input.Compare classic method, this method also has the calculating high efficiency, can realize o (n 2) computation complexity, can be applicable to present various mobile intelligent terminals.
Description of drawings
Fig. 1 is the first embodiment process flow diagram of the clustering method of a kind of adaptive stability and high efficiency of the present invention;
Fig. 2 is the second embodiment process flow diagram of the clustering method of a kind of adaptive stability and high efficiency of the present invention;
Fig. 3 is the first embodiment block diagram of the clustering system of a kind of adaptive stability and high efficiency of the present invention;
Fig. 4 is the second embodiment block diagram of the clustering system of a kind of adaptive stability and high efficiency of the present invention.
Embodiment
Further specify technical scheme of the present invention below in conjunction with accompanying drawing and by embodiment.
The first embodiment flow process of the clustering method of a kind of adaptive stability and high efficiency of the present invention is as shown in Figure 1:
Step 101, the set of obtaining the input data is p={p 1... p n, comprise n input data in the set, obtain the threshold value θ of cluster radius;
Step 102 is with p iAnd in the set with input data p iDistance all add input data p less than the input data of threshold value θ iCorresponding candidate's cluster C Pi,, input data p iI input data in the expression set;
Step 103 makes candidate's cluster C PiIn the input data be m, function d (p i, p j) be two input data p i, p jBetween distance, calculate input data p iProbability as cluster centre is q i = Σ 1 m 1 / ( 1 + d ( p i , p j ) ) , 1 ≤ j ≤ m ;
Step 104 from the input data of set, is selected the input data that become cluster centre probability maximum, and candidate's cluster of this input data correspondence of selecting is added final cluster.
Present embodiment proposes a kind of clustering method of adaptive stability and high efficiency, this method need not the number that the user sets in advance cluster, can adaptive generation cluster result, can realize stable cluster result to identical input data, compare traditional algorithm, this method has high efficiency, and its computation complexity is o (n 2).
The second embodiment flow process of the clustering method of a kind of adaptive stability and high efficiency of the present invention is as shown in Figure 2:
Step 201, the set of obtaining the input data is p={p 1... p n, comprise n input data in the set, obtain the threshold value θ of cluster radius.
If each input data p iThe probability that becomes cluster centre is q i, all q of initialization iBe 0.Make C PiBe p iCorresponding cluster, and initialization
Figure 2013100826712100002DEST_PATH_IMAGE004
The initialization cluster result
Figure 2013100826712100002DEST_PATH_IMAGE005
Make function d (p i, p j) be two input data p i, p jBetween the tolerance of distance.
Step 202 is with p iAnd in the set with input data p iDistance all add input data p less than the input data of threshold value θ iCorresponding candidate's cluster C Pi, input data p iI input data in the expression set.
Step 203 makes candidate's cluster C PiIn the input data be m, function d (p i, p j) be two input data p i, p jBetween distance, calculate input data p iProbability as cluster centre is q i = Σ 1 m 1 / ( 1 + d ( p i , p j ) ) , 1 ≤ j ≤ m .
Step 204 from the input data of set, is selected the input data that become cluster centre probability maximum, and candidate's cluster of this input data correspondence of selecting is added final cluster.
Step 205, deletion adds the input data of final cluster from input data set closes, and selects the input data that become cluster centre probability maximum again from the present input data set, and candidate's cluster of this input data correspondence of selecting is added final cluster.
Step 206 judges whether the quantity of the input data in the set is zero, if, then finish, otherwise, step 205 continued.
The first embodiment block diagram of the clustering system of the adaptive stability and high efficiency of the present invention as shown in Figure 3, this system comprises that initialization module 310, candidate's cluster set up module 320, probability calculation module 330, cluster screening module 340.
Wherein, initialization module 310, being used for obtaining the set of importing data is p={p 1... p n, comprise n input data in the set, obtain the threshold value θ of cluster radius; Candidate's cluster is set up module 320, is used for p iAnd in the set with input data p iDistance all add input data p less than the input data of threshold value θ iCorresponding candidate's cluster C Pi, input data p iI input data in the expression set; Probability calculation module 330 is used for making candidate's cluster C PiIn the input data be m, function d (p i, p j) be two input data p i, p jBetween distance, calculate input data p iProbability as cluster centre is q i = Σ 1 m 1 / ( 1 + d ( p i , p j ) ) , 1 ≤ j ≤ m ; Cluster screening module 340 is used for selecting the input data that become cluster centre probability maximum from the input data of set, and candidate's cluster of this input data correspondence of selecting is added final cluster.
The second embodiment block diagram of the clustering system of the adaptive stability and high efficiency of the present invention comprises that initialization module 310, candidate's cluster set up module 320, probability calculation module 330, cluster screening module 340, removing module 350, first detection module 360 as shown in Figure 4.
Wherein, initialization module 310, being used for obtaining the set of importing data is p={p 1... p n, comprise n input data in the set, obtain the threshold value θ of cluster radius; Candidate's cluster is set up module 320, is used for p iAnd in the set with input data p iDistance all add input data p less than the input data of threshold value θ iCorresponding candidate's cluster C Pi, input data p iI input data in the expression set; Probability calculation module 330 is used for making candidate's cluster C PiIn the input data be m, function d (p i, p j) be two input data p i, p jBetween distance, calculate input data p iProbability as cluster centre is
Figure 2013100826712100002DEST_PATH_IMAGE008
1≤j≤m; Cluster screening module 340 is used for selecting the input data that become cluster centre probability maximum from the input data of set, and candidate's cluster of this input data correspondence of selecting is added final cluster.Removing module 350, be used for closing the input data that deletion adds final cluster from input data set, again from present input data set, select the input data that become cluster centre probability maximum, candidate's cluster of this input data correspondence of selecting is added final cluster; First detection module 360 is used for judging whether the quantity of the input data of gathering is zero, if, then finish, otherwise, step e continued.
The advantage of the clustering method of the adaptive stability and high efficiency that this invention proposes comprises: the first, and this method need not to set in advance the number of final cluster, can self-adaptation input data realize appropriate cluster, can be widely used in plurality of application scenes; The second, this method can realize stable cluster result at identical input, can repeatedly repeat, and guarantees the consistance of service; The 3rd, compare classic method, this method has the calculating high efficiency, can realize o (n 2) computation complexity, can be applicable to present various mobile intelligent terminals; The 4th, when this method obtains final cluster, also obtained the central point of each cluster.
One of ordinary skill in the art will appreciate that all or part of flow process that realizes in above-described embodiment method, be to instruct relevant hardware to finish by computer program, described program can be stored in the computer read/write memory medium, this program can comprise the flow process as the embodiment of above-mentioned each side method when carrying out.Wherein, described storage medium can be magnetic disc, CD, read-only storage memory body (Read-Only Memory, ROM) or at random store memory body (Random Access Memory, RAM) etc.
Know-why of the present invention has below been described in conjunction with specific embodiments.These are described just in order to explain principle of the present invention, and can not be interpreted as limiting the scope of the invention by any way.Based on explanation herein, those skilled in the art does not need to pay performing creative labour can associate other embodiment of the present invention, and these modes all will fall within protection scope of the present invention.

Claims (4)

1. the clustering method of an adaptive stability and high efficiency is characterized in that, comprising:
The set that a obtains the input data is p={p 1... p n, comprise n input data in the set, obtain the threshold value θ of cluster radius;
B is with p iAnd in the set with input data p iDistance all add input data p less than the input data of threshold value θ iCorresponding candidate's cluster C Pi, input data p iI input data in the expression set;
C makes candidate's cluster C PiIn the input data be m, function d (p i, p j) be two input data p i, p jBetween distance, calculate input data p iProbability as cluster centre is
Figure 775021DEST_PATH_IMAGE001
1≤j≤m;
D selects the input data that become cluster centre probability maximum from the input data of set, candidate's cluster of this input data correspondence of selecting is added final cluster.
2. the clustering method of a kind of adaptive stability and high efficiency according to claim 1 is characterized in that, described candidate's cluster with this input data correspondence of selecting adds after the final cluster, further comprises:
E deletes the input data that add final cluster from input data set closes, select the input data that become cluster centre probability maximum again from the present input data set, and candidate's cluster of this input data correspondence of selecting is added final cluster;
Whether the quantity of judging the input data in the set is zero, if, then finish, otherwise, step e continued.
3. the clustering system of an adaptive stability and high efficiency is characterized in that, comprising:
Initialization module, being used for obtaining the set of importing data is p={p 1... p n, comprise n input data in the set, obtain the threshold value θ of cluster radius;
Candidate's cluster is set up module, is used for p iAnd in the set with input data p iDistance all add input data p less than the input data of threshold value θ iCorresponding candidate's cluster C Pi, input data p iI input data in the expression set;
The probability calculation module is used for making candidate's cluster C PiIn the input data be m, function d (p i, p j) be two input data p i, p jBetween distance, calculate input data p iProbability as cluster centre is
Figure 692161DEST_PATH_IMAGE002
Cluster screening module is used for selecting the input data that become cluster centre probability maximum from the input data of set, and candidate's cluster of this input data correspondence of selecting is added final cluster.
4. system as claimed in claim 4 is characterized in that, also comprises:
Removing module, be used for closing the input data that deletion adds final cluster from input data set, again from present input data set, select the input data that become cluster centre probability maximum, candidate's cluster of this input data correspondence of selecting is added final cluster;
First detection module, whether the quantity that is used for the input data of disconnected set is zero, if, then finish, otherwise, step e continued.
CN201310082671.2A 2013-03-14 2013-03-14 Method and system for stable and efficient self-adaptive clustering Expired - Fee Related CN103207896B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310082671.2A CN103207896B (en) 2013-03-14 2013-03-14 Method and system for stable and efficient self-adaptive clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310082671.2A CN103207896B (en) 2013-03-14 2013-03-14 Method and system for stable and efficient self-adaptive clustering

Publications (2)

Publication Number Publication Date
CN103207896A true CN103207896A (en) 2013-07-17
CN103207896B CN103207896B (en) 2017-02-01

Family

ID=48755118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310082671.2A Expired - Fee Related CN103207896B (en) 2013-03-14 2013-03-14 Method and system for stable and efficient self-adaptive clustering

Country Status (1)

Country Link
CN (1) CN103207896B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559502A (en) * 2013-10-25 2014-02-05 华南理工大学 Pedestrian detection system and method based on adaptive clustering analysis
CN104702432A (en) * 2014-01-15 2015-06-10 杭州海康威视系统技术有限公司 Alarm method based on position area division and server

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101308496A (en) * 2008-07-04 2008-11-19 沈阳格微软件有限责任公司 Large scale text data external clustering method and system
EP2184692A1 (en) * 2008-10-28 2010-05-12 Sony Corporation Information processing
CN101989281A (en) * 2009-08-03 2011-03-23 中国移动通信集团公司 Clustering method and device
CN102289478A (en) * 2011-08-01 2011-12-21 江苏广播电视大学 System and method for recommending video on demand based on fuzzy clustering

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101308496A (en) * 2008-07-04 2008-11-19 沈阳格微软件有限责任公司 Large scale text data external clustering method and system
EP2184692A1 (en) * 2008-10-28 2010-05-12 Sony Corporation Information processing
CN101989281A (en) * 2009-08-03 2011-03-23 中国移动通信集团公司 Clustering method and device
CN102289478A (en) * 2011-08-01 2011-12-21 江苏广播电视大学 System and method for recommending video on demand based on fuzzy clustering

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
段明秀: ""结合SOFM的改进CLARA聚类算法"", 《计算机工程》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559502A (en) * 2013-10-25 2014-02-05 华南理工大学 Pedestrian detection system and method based on adaptive clustering analysis
CN104702432A (en) * 2014-01-15 2015-06-10 杭州海康威视系统技术有限公司 Alarm method based on position area division and server
CN104702432B (en) * 2014-01-15 2018-03-30 杭州海康威视系统技术有限公司 The method and server alerted based on band of position division

Also Published As

Publication number Publication date
CN103207896B (en) 2017-02-01

Similar Documents

Publication Publication Date Title
CN105427129A (en) Information delivery method and system
CN104796481A (en) Intelligent audio and video selection method
Mo et al. A two-stage clustering approach for multi-region segmentation
CN106162544B (en) A kind of generation method and equipment of geography fence
CN106651213A (en) Processing method and device for service orders
CN109902250A (en) Sharing method, sharing means, computer equipment and the storage medium of questionnaire survey
CN109408522A (en) A kind of update method and device of user characteristic data
Qu et al. Fl-sec: Privacy-preserving decentralized federated learning using signsgd for the internet of artificially intelligent things
CN112950119A (en) Method, device, equipment and storage medium for splitting instant logistics order
CN109492891A (en) Customer churn prediction technique and device
CN112597399B (en) Graph data processing method and device, computer equipment and storage medium
CN103207896A (en) Method and system for stable and efficient self-adaptive clustering
CN112307247B (en) Distributed face retrieval system and method
CN109451334A (en) User, which draws a portrait, generates processing method, device and electronic equipment
CN108932525A (en) A kind of behavior prediction method and device
CN111932302A (en) Method, device, equipment and system for determining number of service sites in area
CN108830298A (en) A kind of method and device of determining user characteristics label
CN104156475A (en) Geographic information reading method and device
CN105512914A (en) Information processing method and electronic device
CN112200644B (en) Method and device for identifying fraudulent user, computer equipment and storage medium
CN109933679A (en) Object type recognition methods, device and equipment in image
CN115292475A (en) Cloud computing service information processing method and system based on smart city
CN111882421B (en) Information processing method, wind control method, device, equipment and storage medium
Gelda et al. Forecasting supply in Voronoi regions for app-based taxi hailing services
CN114219581A (en) Personalized interest point recommendation method and system based on heteromorphic graph

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170201

CF01 Termination of patent right due to non-payment of annual fee