CN103207896A

CN103207896A - Method and system for stable and efficient self-adaptive clustering

Info

Publication number: CN103207896A
Application number: CN2013100826712A
Authority: CN
Inventors: 张兰; 刘云浩
Original assignee: WUXI QINGHUA INFORMATION SCIENCE AND TECHNOLOGY NATIONAL LABORATORY INTERNET OF THINGS TECHNOLOGY CENTER
Current assignee: WUXI QINGHUA INFORMATION SCIENCE AND TECHNOLOGY NATIONAL LABORATORY INTERNET OF THINGS TECHNOLOGY CENTER
Priority date: 2013-03-14
Filing date: 2013-03-14
Publication date: 2013-07-17
Anticipated expiration: 2033-03-14
Also published as: CN103207896B

Abstract

The invention discloses a method and a system for stable and efficient self-adaptive clustering. The method comprises the following steps of: a, obtaining a set p of input data from p1 to pn, in which n input data is included in the set, and obtaining a threshold value theta of cluster radius; b, adding pi and input data in the set with a distance from the input data pi of smaller than the threshold value theta in a candidate cluster Cpi corresponding to the input data pi, in which the input data pi represents the i-th input data in the set; and c, defining m input data in the candidate cluster Cpi, using a function d (pi, pij) as a distance between the two input data pi, pj, and calculating the input data pi as the probability of a cluster center. The method and the system are applied for establishing a stable and efficient self-adaptive cluster system; the amount of final clusters is not required to be preset, so that the system has calculation efficiency and can realize calculation complexity of o(m2), and the method and the system can be suitable for various mobile intelligent terminals at present.

Description

A kind of clustering method of adaptive stability and high efficiency and system

Technical field

The present invention relates to technical field of computer information processing, relate in particular to a kind of clustering method and system of adaptive stability and high efficiency.

Background technology

Along with the quick growth of computerized information, people are more and more stronger to the processing demands of all kinds of computerized informations.Clustering algorithm for the cluster function that various data managements, artificial intelligence, machine learning provide the foundation, is being brought into play important role as a very important class algorithm in the information processing in various information processings,

Use general today at intelligent mobile terminal, various information service based on intelligent mobile device has appearred, they need provide the service of efficient stable to various intelligent terminals, wherein a large amount of services all need to use clustering algorithm, as in the mobile social networking to social good friend's cluster, during shopping is used to the cluster of commodity etc.At present a large amount of mobile device terminal are passed through GPS, the base station, modes such as WAP have possessed station-keeping ability, have therefore also produced many services based on the geographic position, clustering method then can provide abundant more and useful function, the hot zones cluster of for example classifying for this class service.Simple example has often been added all kinds of geographical labels by the user on the present electronic chart, and as shopping, cuisines, sight spot etc., these geographical labels are dispersed on the whole electronic chart.When a smart mobile phone user travelled outdoors or goes window-shopping, he usually needed to seek own interested popular commercial circle, i.e. the intensive place of a certain class label, as the concentrated commercial circle of doing shopping, and obtain navigation Service.But but can only obtain being dispersed in " shopping " label on the whole map by present cell phone map inquiry " shopping ", allow the user be difficult to choose route destination address.Yet by the effective cluster with these " shopping " labels, be about to label and be divided into a plurality of intensive subregions (cluster), can find popular " shopping " commercial circle fast.And by to a plurality of labels, as " shopping " and " cuisines ", cluster result integrate, then can effectively help the user to find to satisfy the popular commercial circle of its multiple requirement.Clustering method can bring a large amount of abundant application for novel mobile device, but the limited characteristics of the application of portable terminal variation and computational resource then to the proposition of clustering method self-adaptation, stablize, efficiently new demand.

Present existing multiple clustering method, as k-means commonly used and the maximum method of expectation, though that their are realized is fast simple, they need set in advance the number of final subregion, and this obviously makes the application that such method can't be scalable.Because the user can't be known number of partitions in advance in great majority are used, as a city what gourmet centers are arranged actually.In addition, all there is unsettled phenomenon in these two kinds of methods, and namely repeatedly moving the cluster result that obtains may be inconsistent.Though and the another kind of method that is called QT need not set in advance number of partitions, and can get access to stable cluster result, it but needs o[(n] ³) computing cost, in the face of huge quantity of information, for the limited mobile device of computational resource, such expense is difficult to bear often.

Summary of the invention

The objective of the invention is to propose a kind of clustering method and system of adaptive stability and high efficiency, to solve the big problem of computing cost.

For reaching this purpose, the present invention by the following technical solutions:

A kind of clustering method of adaptive stability and high efficiency comprises:

The set that a obtains the input data is p={p ₁... p _n, comprise n input data in the set, obtain the threshold value θ of cluster radius;

B is with p _iAnd in the set with input data p _iDistance all add input data p less than the input data of threshold value θ _iCorresponding candidate's cluster C _Pi, input data p _iI input data in the expression set;

C makes candidate's cluster C _PiIn the input data be m, function d (p _i, p _j) be two input data p _i, p _jBetween distance, calculate input data p _iProbability as cluster centre is

Figure 2013100826712100002DEST_PATH_IMAGE001

1≤j≤m;

D selects the input data that become cluster centre probability maximum from the input data of set, candidate's cluster of this input data correspondence of selecting is added final cluster.

Further, described candidate's cluster with this input data correspondence of selecting adds after the final cluster, further comprises:

E deletes the input data that add final cluster from input data set closes, select the input data that become cluster centre probability maximum again from the present input data set, and candidate's cluster of this input data correspondence of selecting is added final cluster;

Whether the quantity of judging the input data in the set is zero, if, then finish, otherwise, step e continued.

A kind of clustering system of adaptive stability and high efficiency comprises:

Initialization module, being used for obtaining the set of importing data is p={p ₁... p _n, comprise n input data in the set, obtain the threshold value θ of cluster radius;

Candidate's cluster is set up module, is used for p _iAnd in the set with input data p _iDistance all add input data p less than the input data of threshold value θ _iCorresponding candidate's cluster C _Pi, input data p _iI input data in the expression set;

The probability calculation module is used for making candidate's cluster C _PiIn the input data be m, function d (p _i, p _j) be two input data p _i, p _jBetween distance, calculate input data p _iProbability as cluster centre is

q_{i} = Σ_{1}^{m} 1 / (1 + d (p_{i}, p_{j})), 1 \leq j \leq m;

Cluster screening module is used for selecting the input data that become cluster centre probability maximum from the input data of set, and candidate's cluster of this input data correspondence of selecting is added final cluster.

Further, also comprise:

Removing module, be used for closing the input data that deletion adds final cluster from input data set, again from present input data set, select the input data that become cluster centre probability maximum, candidate's cluster of this input data correspondence of selecting is added final cluster;

First detection module is used for judging whether the quantity of the input data of gathering is zero, if, then finish, otherwise, step e continued.

Beneficial effect of the present invention is: clustering method and the system of the present invention by a kind of adaptive stability and high efficiency is provided proposed a kind of clustering method of stability and high efficiency of New Adaptive.This method need not to set in advance the number of subregion, can self-adaptation input data realize appropriate subregion.And this method can realize stable cluster result at identical input.Compare classic method, this method also has the calculating high efficiency, can realize o (n ²) computation complexity, can be applicable to present various mobile intelligent terminals.

Description of drawings

Fig. 1 is the first embodiment process flow diagram of the clustering method of a kind of adaptive stability and high efficiency of the present invention;

Fig. 2 is the second embodiment process flow diagram of the clustering method of a kind of adaptive stability and high efficiency of the present invention;

Fig. 3 is the first embodiment block diagram of the clustering system of a kind of adaptive stability and high efficiency of the present invention;

Fig. 4 is the second embodiment block diagram of the clustering system of a kind of adaptive stability and high efficiency of the present invention.

Embodiment

Further specify technical scheme of the present invention below in conjunction with accompanying drawing and by embodiment.

The first embodiment flow process of the clustering method of a kind of adaptive stability and high efficiency of the present invention is as shown in Figure 1:

Step 101, the set of obtaining the input data is p={p ₁... p _n, comprise n input data in the set, obtain the threshold value θ of cluster radius;

Step 102 is with p _iAnd in the set with input data p _iDistance all add input data p less than the input data of threshold value θ _iCorresponding candidate's cluster C _Pi,, input data p _iI input data in the expression set;

Step 103 makes candidate's cluster C _PiIn the input data be m, function d (p _i, p _j) be two input data p _i, p _jBetween distance, calculate input data p _iProbability as cluster centre is

q_{i} = Σ_{1}^{m} 1 / (1 + d (p_{i}, p_{j})), 1 \leq j \leq m;

Step 104 from the input data of set, is selected the input data that become cluster centre probability maximum, and candidate's cluster of this input data correspondence of selecting is added final cluster.

Present embodiment proposes a kind of clustering method of adaptive stability and high efficiency, this method need not the number that the user sets in advance cluster, can adaptive generation cluster result, can realize stable cluster result to identical input data, compare traditional algorithm, this method has high efficiency, and its computation complexity is o (n ²).

The second embodiment flow process of the clustering method of a kind of adaptive stability and high efficiency of the present invention is as shown in Figure 2:

Step 201, the set of obtaining the input data is p={p ₁... p _n, comprise n input data in the set, obtain the threshold value θ of cluster radius.

If each input data p _iThe probability that becomes cluster centre is q _i, all q of initialization _iBe 0.Make C _PiBe p _iCorresponding cluster, and initialization

Figure 2013100826712100002DEST_PATH_IMAGE004

The initialization cluster result

Make function d (p _i, p _j) be two input data p _i, p _jBetween the tolerance of distance.

Step 202 is with p _iAnd in the set with input data p _iDistance all add input data p less than the input data of threshold value θ _iCorresponding candidate's cluster C _Pi, input data p _iI input data in the expression set.

Step 203 makes candidate's cluster C _PiIn the input data be m, function d (p _i, p _j) be two input data p _i, p _jBetween distance, calculate input data p _iProbability as cluster centre is

q_{i} = Σ_{1}^{m} 1 / (1 + d (p_{i}, p_{j})), 1 \leq j \leq m .

Step 204 from the input data of set, is selected the input data that become cluster centre probability maximum, and candidate's cluster of this input data correspondence of selecting is added final cluster.

Step 205, deletion adds the input data of final cluster from input data set closes, and selects the input data that become cluster centre probability maximum again from the present input data set, and candidate's cluster of this input data correspondence of selecting is added final cluster.

Step 206 judges whether the quantity of the input data in the set is zero, if, then finish, otherwise, step 205 continued.

The first embodiment block diagram of the clustering system of the adaptive stability and high efficiency of the present invention as shown in Figure 3, this system comprises that initialization module 310, candidate's cluster set up module 320, probability calculation module 330, cluster screening module 340.

Wherein, initialization module 310, being used for obtaining the set of importing data is p={p ₁... p _n, comprise n input data in the set, obtain the threshold value θ of cluster radius; Candidate's cluster is set up module 320, is used for p _iAnd in the set with input data p _iDistance all add input data p less than the input data of threshold value θ _iCorresponding candidate's cluster C _Pi, input data p _iI input data in the expression set; Probability calculation module 330 is used for making candidate's cluster C _PiIn the input data be m, function d (p _i, p _j) be two input data p _i, p _jBetween distance, calculate input data p _iProbability as cluster centre is

q_{i} = Σ_{1}^{m} 1 / (1 + d (p_{i}, p_{j})), 1 \leq j \leq m;

Cluster screening module 340 is used for selecting the input data that become cluster centre probability maximum from the input data of set, and candidate's cluster of this input data correspondence of selecting is added final cluster.

The second embodiment block diagram of the clustering system of the adaptive stability and high efficiency of the present invention comprises that initialization module 310, candidate's cluster set up module 320, probability calculation module 330, cluster screening module 340, removing module 350, first detection module 360 as shown in Figure 4.

Figure 2013100826712100002DEST_PATH_IMAGE008

1≤j≤m; Cluster screening module 340 is used for selecting the input data that become cluster centre probability maximum from the input data of set, and candidate's cluster of this input data correspondence of selecting is added final cluster.Removing module 350, be used for closing the input data that deletion adds final cluster from input data set, again from present input data set, select the input data that become cluster centre probability maximum, candidate's cluster of this input data correspondence of selecting is added final cluster; First detection module 360 is used for judging whether the quantity of the input data of gathering is zero, if, then finish, otherwise, step e continued.

The advantage of the clustering method of the adaptive stability and high efficiency that this invention proposes comprises: the first, and this method need not to set in advance the number of final cluster, can self-adaptation input data realize appropriate cluster, can be widely used in plurality of application scenes; The second, this method can realize stable cluster result at identical input, can repeatedly repeat, and guarantees the consistance of service; The 3rd, compare classic method, this method has the calculating high efficiency, can realize o (n ²) computation complexity, can be applicable to present various mobile intelligent terminals; The 4th, when this method obtains final cluster, also obtained the central point of each cluster.

One of ordinary skill in the art will appreciate that all or part of flow process that realizes in above-described embodiment method, be to instruct relevant hardware to finish by computer program, described program can be stored in the computer read/write memory medium, this program can comprise the flow process as the embodiment of above-mentioned each side method when carrying out.Wherein, described storage medium can be magnetic disc, CD, read-only storage memory body (Read-Only Memory, ROM) or at random store memory body (Random Access Memory, RAM) etc.

Know-why of the present invention has below been described in conjunction with specific embodiments.These are described just in order to explain principle of the present invention, and can not be interpreted as limiting the scope of the invention by any way.Based on explanation herein, those skilled in the art does not need to pay performing creative labour can associate other embodiment of the present invention, and these modes all will fall within protection scope of the present invention.

Claims

1. the clustering method of an adaptive stability and high efficiency is characterized in that, comprising:

1≤j≤m;

2. the clustering method of a kind of adaptive stability and high efficiency according to claim 1 is characterized in that, described candidate's cluster with this input data correspondence of selecting adds after the final cluster, further comprises:

3. the clustering system of an adaptive stability and high efficiency is characterized in that, comprising:

4. system as claimed in claim 4 is characterized in that, also comprises:

First detection module, whether the quantity that is used for the input data of disconnected set is zero, if, then finish, otherwise, step e continued.