CN103207896B

CN103207896B - Method and system for stable and efficient self-adaptive clustering

Info

Publication number: CN103207896B
Application number: CN201310082671.2A
Authority: CN
Inventors: 张兰; 刘云浩
Original assignee: WUXI QINGHUA INFORMATION SCIENCE AND TECHNOLOGY NATIONAL LABORATORY INTERNET OF THINGS TECHNOLOGY CENTER
Current assignee: WUXI QINGHUA INFORMATION SCIENCE AND TECHNOLOGY NATIONAL LABORATORY INTERNET OF THINGS TECHNOLOGY CENTER
Priority date: 2013-03-14
Filing date: 2013-03-14
Publication date: 2017-02-01
Anticipated expiration: 2033-03-14
Also published as: CN103207896A

Abstract

The invention discloses a method and a system for stable and efficient self-adaptive clustering. The method comprises the following steps of: a, obtaining a set p of input data from p1 to pn, in which n input data is included in the set, and obtaining a threshold value theta of cluster radius; b, adding pi and input data in the set with a distance from the input data pi of smaller than the threshold value theta in a candidate cluster Cpi corresponding to the input data pi, in which the input data pi represents the i-th input data in the set; and c, defining m input data in the candidate cluster Cpi, using a function d (pi, pij) as a distance between the two input data pi, pj, and calculating the input data pi as the probability of a cluster center. The method and the system are applied for establishing a stable and efficient self-adaptive cluster system; the amount of final clusters is not required to be preset, so that the system has calculation efficiency and can realize calculation complexity of o(m2), and the method and the system can be suitable for various mobile intelligent terminals at present.

Description

A kind of clustering method of adaptive stability and high efficiency and system

Technical field

The present invention relates to technical field of computer information processing, the cluster side of more particularly, to a kind of adaptive stability and high efficiency Method and system.

Background technology

With the rapid growth of computerized information, people are more and more stronger to the process demand of all kinds of computerized informations.Poly- Class algorithm, as a class algorithm very important in information processing, is that various data managements, artificial intelligence, machine learning provide The function of convergence on basis, plays important role in various information processings,

In today that intelligent mobile terminal application is universal, occur in that various clothes of the information based on Intelligent mobile equipment Business, they need various intelligent terminal are provided with the service of efficient stable, and wherein substantial amounts of service is required for using clustering algorithm, As the cluster to social good friend in mobile social networking, to cluster of commodity etc. in shopping application.A large amount of mobile devices are whole at present Gps, base station are passed through in end, and the mode such as WAP possesses stationkeeping ability, therefore also create many based on geographical position Service, and clustering method then can provide more abundant and useful function for this kind of service, such as classification hot zones cluster. Simple example, current electronic chart often with the addition of all kinds of geographical labels by user, such as shopping, cuisines, sight spot etc., these Geographical labels are dispersed on whole electronic chart.When a smart phone user is travelled outdoors or gone window-shopping, he usually needs to seek Look for the popular commercial circle that oneself is interested, i.e. the intensive place of a certain class label, the commercial circle that such as shopping is concentrated, and obtain navigation clothes Business.But " shopping " label being dispersed on whole map but be can only obtain by current handset map inquiry " shopping ", allow User is difficult to choose route destination address.But by by effective cluster of these labels of " doing shopping ", will label being divided into Multiple intensive subregions (cluster), then can quickly find " shopping " commercial circle of hot topic.And by multiple labels, such as " shopping " " cuisines ", cluster result integrated, then can effectively help the user discover that the popular commercial circle meeting its multiple requirement.Poly- Class method can bring, for Novel movable equipment, the application enriched in a large number, but the application of mobile terminal variation and computing resource Limited feature then proposes self adaptation to clustering method, stable, efficient new demand.

The maximum method of existing multiple clustering methods at present, the k-means such as commonly using and expectation is although they are realized simply Quickly, but they need to pre-set the number of final subregion, this obviously makes such method cannot adapt to widely should With.Because user cannot know number of partitions in advance in most applications, such as one city how many cuisines gathering actually Ground.Additionally, both approaches all have unstable phenomenon, that is, the cluster result that obtain is run multiple times may be inconsistent.And it is another Although a kind of method being called qt is not required to pre-set number of partitions, and can get stable cluster result, it is but Need o[(n]³) computing cost, in the face of huge quantity of information, for the mobile device limited by for computing resource, such Expense is often difficult to bear.

Content of the invention

It is an object of the invention to proposing a kind of clustering method of adaptive stability and high efficiency and system, opened with solving calculating Sell big problem.

For reaching this purpose, the present invention employs the following technical solutions:

A kind of clustering method of adaptive stability and high efficiency, comprising:

The collection that a obtains input data is combined into p={ p₁,...p_n, set includes n input data, obtains cluster radius Threshold θ；

B is by p_iAnd set in input data p_iDistance be less than threshold θ input data all add input data p_i Corresponding candidate cluster c_pi, input data p_iRepresent i-th input data in set；

C makes candidate cluster c_piIn input data be m, function d (p_i,p_j) it is two input datas p_i,p_jSpacing From calculating input data p_iProbability as cluster centre is1≤j≤m；

D, from the input data of set, selects the input data becoming cluster centre maximum probability, the input that this is selected The corresponding candidate cluster of data adds final cluster.

Further, after described corresponding candidate cluster of input data selecting this adds final cluster, further Including:

E deletes the input data adding final cluster from input data set, again from present input data set Select the input data becoming cluster centre maximum probability, this corresponding candidate cluster of input data selected is added and finally gathers Class；

Whether the quantity of the input data in judging to gather is zero, if it is, terminating, otherwise, continues step e.

A kind of clustering system of adaptive stability and high efficiency, comprising:

Initialization module, the collection for obtaining input data is combined into p={ p₁,...p_n, set includes n input number According to the threshold θ of acquisition cluster radius；

Candidate cluster sets up module, for by p_iAnd set in input data p_iDistance be less than threshold θ input Data all adds input data p_iCorresponding candidate cluster c_pi, input data p_iRepresent i-th input data in set；

Probability evaluation entity, is used for making candidate cluster c_piIn input data be m, function d (p_i,p_j) input numbers for two According to p_i,p_jBetween distance, calculate input data p_iProbability as cluster centre is

q_{i} = σ_{1}^{m} 1 / (1 + d (p_{i}, p_{j})), 1 \leq j \leq m;

Cluster screening module, for, from the input data of set, selecting the input number becoming cluster centre maximum probability According to the final cluster of input data corresponding candidate cluster addition selecting this.

Further, also include:

Removing module, for deleting, from input data set, the input data adding final cluster, again from currently defeated Enter and in data acquisition system, select the input data becoming cluster centre maximum probability, this corresponding candidate of input data selecting is gathered Class adds final cluster；

First detection module, whether the quantity for judging the input data in gathering is zero, if it is, terminating, no Then, continue step e.

The invention has the benefit that the present invention passes through to provide a kind of clustering method of adaptive stability and high efficiency and is System is it is proposed that a kind of clustering method of new adaptive stability and high efficiency.The method need not pre-set the number of subregion, energy Adaptive input data realizes appropriate subregion.And the method enables stable cluster result for identical input.Compare Traditional method, the method also has calculating high efficiency, enables o (n²) computation complexity so as to can be suitably used for current each Plant mobile intelligent terminal.

Brief description

Fig. 1 is a kind of first embodiment flow chart of the clustering method of adaptive stability and high efficiency of the present invention；

Fig. 2 is a kind of second embodiment flow chart of the clustering method of adaptive stability and high efficiency of the present invention；

Fig. 3 is a kind of first embodiment block diagram of the clustering system of adaptive stability and high efficiency of the present invention；

Fig. 4 is a kind of second embodiment block diagram of the clustering system of adaptive stability and high efficiency of the present invention.

Specific embodiment

Further illustrate technical scheme below in conjunction with the accompanying drawings and by specific embodiment.

A kind of first embodiment flow process of the clustering method of adaptive stability and high efficiency of the present invention is as shown in Figure 1:

Step 101, the collection obtaining input data is combined into p={ p₁,...p_n, set includes n input data, obtains poly- The threshold θ of class radius；

Step 102, by p_iAnd set in input data p_iDistance be less than threshold θ input data all add defeated Enter data p_iCorresponding candidate cluster c_pi, input data p_iRepresent i-th input data in set；

Step 103, makes candidate cluster c_piIn input data be m, function d (p_i,p_j) it is two input datas p_i,p_jIt Between distance, calculate input data p_iProbability as cluster centre is

q_{i} = σ_{1}^{m} 1 / (1 + d (p_{i}, p_{j})), 1 \leq j \leq m;

Step 104, from the input data of set, selects the input data becoming cluster centre maximum probability, this is selected The corresponding candidate cluster of input data going out adds final cluster.

The present embodiment proposes a kind of clustering method of adaptive stability and high efficiency, and the method pre-sets cluster without user Number, can adaptive generation cluster result, stable cluster result is enabled to identical input data, compare tradition calculation Method, the method has high efficiency, and its computation complexity is o (n²).

A kind of second embodiment flow process of the clustering method of adaptive stability and high efficiency of the present invention is as shown in Figure 2:

Step 201, the collection obtaining input data is combined into p={ p₁... p_n, set includes n input data, obtains poly- The threshold θ of class radius.

If each input data p_iThe probability becoming cluster centre is q_i, initialize all q_iFor 0.Make c_piFor p_iCorresponding Cluster, and initializeInitialization cluster resultMake function d (p_i, p_j) it is two input datas p_i, p_jIt Between distance tolerance.

Step 202, by p_iAnd set in input data p_iDistance be less than threshold θ input data all add defeated Enter data p_iCorresponding candidate cluster c_pi, input data p_iRepresent i-th input data in set.

Step 203, makes candidate cluster c_piIn input data be m, function d (p_i, p_j) it is two input datas p_i, p_j Between distance, calculate input data p_iProbability as cluster centre is

q_{i} = σ_{1}^{m} 1 / (1 + d (p_{i}, p_{j})), 1 \leq j \leq m .

Step 204, from the input data of set, selects the input data becoming cluster centre maximum probability, this is selected The corresponding candidate cluster of input data going out adds final cluster.

Step 205, deletes, from input data set, the input data adding final cluster, again from present input data Select the input data becoming cluster centre maximum probability in set, this corresponding candidate cluster of input data selected is added Final cluster.

Step 206, whether the quantity of the input data in judging to gather is zero, if it is, terminating, otherwise, continues step Rapid 205.

The first embodiment block diagram of the clustering system of the adaptive stability and high efficiency of the present invention is as shown in figure 3, this system includes Initialization module 310, candidate cluster set up module 320, probability evaluation entity 330, cluster screening module 340.

Wherein, initialization module 310, the collection for obtaining input data is combined into p={ p₁... p_n, set includes n Input data, obtains the threshold θ of cluster radius；Candidate cluster sets up module 320, for by p_iAnd set in input number According to p_iDistance be less than threshold θ input data all add input data p_iCorresponding candidate cluster c_pi, input data p_iRepresent collection I-th input data in conjunction；Probability evaluation entity 330, is used for making candidate cluster c_piIn input data be m, function d (p_i, p_j) it is two input datas p_i, p_jBetween distance, calculate input data p_iProbability as cluster centre is

q_{i} = σ_{1}^{m} 1 / (1 + d (p_{i}, p_{j})), 1 \leq j \leq m;

Cluster screening module 340, for, from the input data of set, selecting Go out to become the input data of cluster centre maximum probability, this corresponding candidate cluster of input data selected is added and finally gathers Class.

The second embodiment block diagram of the clustering system of the adaptive stability and high efficiency of the present invention is as shown in figure 4, include initialization Module 310, candidate cluster set up module 320, probability evaluation entity 330, cluster screening module 340, removing module 350, first Detection module 360.

Wherein, initialization module 310, the collection for obtaining input data is combined into p={ p₁... p_n, set includes n Input data, obtains the threshold θ of cluster radius；Candidate cluster sets up module 320, for by p_iAnd set in input number According to p_iDistance be less than threshold θ input data all add input data p_iCorresponding candidate cluster c_pi, input data p_iRepresent collection I-th input data in conjunction；Probability evaluation entity 330, is used for making candidate cluster c_piIn input data be m, function d (p_i, p_j) it is two input datas p_i, p_jBetween distance, calculate input data p_iProbability as cluster centre is1≤j≤m；Cluster screening module 340, for, from the input data of set, selecting Go out to become the input data of cluster centre maximum probability, this corresponding candidate cluster of input data selected is added and finally gathers Class.Removing module 350, for deleting the input data adding final cluster from input data set, inputs from current again Select the input data becoming cluster centre maximum probability in data acquisition system, this corresponding candidate of input data selecting is gathered Class adds final cluster；First detection module 360, whether the quantity for judging the input data in gathering is zero, if it is, Then terminate, otherwise, continue step e.

The advantage of the clustering method of adaptive stability and high efficiency that this invention proposes includes: first, the method need not be in advance The number of the final cluster of setting, adaptive input data can realize appropriate cluster so as to multiple applied field can be widely used in Scape；Second, the method enables stable cluster result for identical input, execution can be repeated several times it is ensured that service is consistent Property；3rd, compare traditional method, the method has calculating high efficiency, enable o (n²) computation complexity so as to can be suitable for In current various mobile intelligent terminals；4th, while the method is finally clustered, also obtain the center of each cluster Point.

One of ordinary skill in the art will appreciate that realizing all or part of flow process in above-described embodiment method, it is permissible Instruct related hardware to complete by computer program, described program can be stored in a computer read/write memory medium In, this program is upon execution, it may include as the flow process of the embodiment of above-mentioned each method.Wherein, described storage medium can be magnetic Dish, CD, read-only memory (read-only memory, rom) or random access memory (random access Memory, ram) etc..

Describe the know-why of the present invention above in association with specific embodiment.These descriptions are intended merely to explain the present invention's Principle, and limiting the scope of the invention can not be construed to by any way.Based on explanation herein, the technology of this area Personnel do not need to pay other specific embodiments that performing creative labour can associate the present invention, and these modes fall within Within protection scope of the present invention.

Claims

1. a kind of clustering system of adaptive stability and high efficiency is it is characterised in that include:

Initialization module, the collection for obtaining input data is combined into p={ p₁... p_n, set includes n input data, obtains Take the threshold θ of cluster radius；

Candidate cluster sets up module, for by p_iAnd set in input data p_iDistance be less than threshold θ input data All add input data p_iCorresponding candidate cluster c_pi, input data p_iRepresent i-th input data in set；

Probability evaluation entity, is used for making candidate cluster c_piIn input data be m, function d (p_i, p_j) it is two input datas p_i, p_jBetween distance, calculate input data p_iProbability as cluster centre is

Cluster screening module, for, from the input data of set, selecting the input data becoming cluster centre maximum probability, will The corresponding candidate cluster of input data that this is selected adds final cluster.

2. the system as claimed in claim 1 is it is characterised in that also include:

Removing module, for deleting the input data adding final cluster from input data set, again from current input number Become the input data of cluster centre maximum probability according to selecting in set, this corresponding candidate cluster of input data selected is added Enter final cluster；

First detection module, whether the quantity for the input data in disconnected set is zero, if it is, terminating, otherwise, continues Step e.