CN103207896B - Method and system for stable and efficient self-adaptive clustering - Google Patents

Method and system for stable and efficient self-adaptive clustering Download PDF

Info

Publication number
CN103207896B
CN103207896B CN201310082671.2A CN201310082671A CN103207896B CN 103207896 B CN103207896 B CN 103207896B CN 201310082671 A CN201310082671 A CN 201310082671A CN 103207896 B CN103207896 B CN 103207896B
Authority
CN
China
Prior art keywords
input data
cluster
module
input
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310082671.2A
Other languages
Chinese (zh)
Other versions
CN103207896A (en
Inventor
张兰
刘云浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WUXI QINGHUA INFORMATION SCIENCE AND TECHNOLOGY NATIONAL LABORATORY INTERNET OF THINGS TECHNOLOGY CENTER
Original Assignee
WUXI QINGHUA INFORMATION SCIENCE AND TECHNOLOGY NATIONAL LABORATORY INTERNET OF THINGS TECHNOLOGY CENTER
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WUXI QINGHUA INFORMATION SCIENCE AND TECHNOLOGY NATIONAL LABORATORY INTERNET OF THINGS TECHNOLOGY CENTER filed Critical WUXI QINGHUA INFORMATION SCIENCE AND TECHNOLOGY NATIONAL LABORATORY INTERNET OF THINGS TECHNOLOGY CENTER
Priority to CN201310082671.2A priority Critical patent/CN103207896B/en
Publication of CN103207896A publication Critical patent/CN103207896A/en
Application granted granted Critical
Publication of CN103207896B publication Critical patent/CN103207896B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a system for stable and efficient self-adaptive clustering. The method comprises the following steps of: a, obtaining a set p of input data from p1 to pn, in which n input data is included in the set, and obtaining a threshold value theta of cluster radius; b, adding pi and input data in the set with a distance from the input data pi of smaller than the threshold value theta in a candidate cluster Cpi corresponding to the input data pi, in which the input data pi represents the i-th input data in the set; and c, defining m input data in the candidate cluster Cpi, using a function d (pi, pij) as a distance between the two input data pi, pj, and calculating the input data pi as the probability of a cluster center. The method and the system are applied for establishing a stable and efficient self-adaptive cluster system; the amount of final clusters is not required to be preset, so that the system has calculation efficiency and can realize calculation complexity of o(m2), and the method and the system can be suitable for various mobile intelligent terminals at present.

Description

A kind of clustering method of adaptive stability and high efficiency and system
Technical field
The present invention relates to technical field of computer information processing, the cluster side of more particularly, to a kind of adaptive stability and high efficiency Method and system.
Background technology
With the rapid growth of computerized information, people are more and more stronger to the process demand of all kinds of computerized informations.Poly- Class algorithm, as a class algorithm very important in information processing, is that various data managements, artificial intelligence, machine learning provide The function of convergence on basis, plays important role in various information processings,
In today that intelligent mobile terminal application is universal, occur in that various clothes of the information based on Intelligent mobile equipment Business, they need various intelligent terminal are provided with the service of efficient stable, and wherein substantial amounts of service is required for using clustering algorithm, As the cluster to social good friend in mobile social networking, to cluster of commodity etc. in shopping application.A large amount of mobile devices are whole at present Gps, base station are passed through in end, and the mode such as WAP possesses stationkeeping ability, therefore also create many based on geographical position Service, and clustering method then can provide more abundant and useful function for this kind of service, such as classification hot zones cluster. Simple example, current electronic chart often with the addition of all kinds of geographical labels by user, such as shopping, cuisines, sight spot etc., these Geographical labels are dispersed on whole electronic chart.When a smart phone user is travelled outdoors or gone window-shopping, he usually needs to seek Look for the popular commercial circle that oneself is interested, i.e. the intensive place of a certain class label, the commercial circle that such as shopping is concentrated, and obtain navigation clothes Business.But " shopping " label being dispersed on whole map but be can only obtain by current handset map inquiry " shopping ", allow User is difficult to choose route destination address.But by by effective cluster of these labels of " doing shopping ", will label being divided into Multiple intensive subregions (cluster), then can quickly find " shopping " commercial circle of hot topic.And by multiple labels, such as " shopping " " cuisines ", cluster result integrated, then can effectively help the user discover that the popular commercial circle meeting its multiple requirement.Poly- Class method can bring, for Novel movable equipment, the application enriched in a large number, but the application of mobile terminal variation and computing resource Limited feature then proposes self adaptation to clustering method, stable, efficient new demand.
The maximum method of existing multiple clustering methods at present, the k-means such as commonly using and expectation is although they are realized simply Quickly, but they need to pre-set the number of final subregion, this obviously makes such method cannot adapt to widely should With.Because user cannot know number of partitions in advance in most applications, such as one city how many cuisines gathering actually Ground.Additionally, both approaches all have unstable phenomenon, that is, the cluster result that obtain is run multiple times may be inconsistent.And it is another Although a kind of method being called qt is not required to pre-set number of partitions, and can get stable cluster result, it is but Need o[(n]3) computing cost, in the face of huge quantity of information, for the mobile device limited by for computing resource, such Expense is often difficult to bear.
Content of the invention
It is an object of the invention to proposing a kind of clustering method of adaptive stability and high efficiency and system, opened with solving calculating Sell big problem.
For reaching this purpose, the present invention employs the following technical solutions:
A kind of clustering method of adaptive stability and high efficiency, comprising:
The collection that a obtains input data is combined into p={ p1,...pn, set includes n input data, obtains cluster radius Threshold θ;
B is by piAnd set in input data piDistance be less than threshold θ input data all add input data pi Corresponding candidate cluster cpi, input data piRepresent i-th input data in set;
C makes candidate cluster cpiIn input data be m, function d (pi,pj) it is two input datas pi,pjSpacing From calculating input data piProbability as cluster centre is1≤j≤m;
D, from the input data of set, selects the input data becoming cluster centre maximum probability, the input that this is selected The corresponding candidate cluster of data adds final cluster.
Further, after described corresponding candidate cluster of input data selecting this adds final cluster, further Including:
E deletes the input data adding final cluster from input data set, again from present input data set Select the input data becoming cluster centre maximum probability, this corresponding candidate cluster of input data selected is added and finally gathers Class;
Whether the quantity of the input data in judging to gather is zero, if it is, terminating, otherwise, continues step e.
A kind of clustering system of adaptive stability and high efficiency, comprising:
Initialization module, the collection for obtaining input data is combined into p={ p1,...pn, set includes n input number According to the threshold θ of acquisition cluster radius;
Candidate cluster sets up module, for by piAnd set in input data piDistance be less than threshold θ input Data all adds input data piCorresponding candidate cluster cpi, input data piRepresent i-th input data in set;
Probability evaluation entity, is used for making candidate cluster cpiIn input data be m, function d (pi,pj) input numbers for two According to pi,pjBetween distance, calculate input data piProbability as cluster centre is q i = σ 1 m 1 / ( 1 + d ( p i , p j ) ) , 1 ≤ j ≤ m ;
Cluster screening module, for, from the input data of set, selecting the input number becoming cluster centre maximum probability According to the final cluster of input data corresponding candidate cluster addition selecting this.
Further, also include:
Removing module, for deleting, from input data set, the input data adding final cluster, again from currently defeated Enter and in data acquisition system, select the input data becoming cluster centre maximum probability, this corresponding candidate of input data selecting is gathered Class adds final cluster;
First detection module, whether the quantity for judging the input data in gathering is zero, if it is, terminating, no Then, continue step e.
The invention has the benefit that the present invention passes through to provide a kind of clustering method of adaptive stability and high efficiency and is System is it is proposed that a kind of clustering method of new adaptive stability and high efficiency.The method need not pre-set the number of subregion, energy Adaptive input data realizes appropriate subregion.And the method enables stable cluster result for identical input.Compare Traditional method, the method also has calculating high efficiency, enables o (n2) computation complexity so as to can be suitably used for current each Plant mobile intelligent terminal.
Brief description
Fig. 1 is a kind of first embodiment flow chart of the clustering method of adaptive stability and high efficiency of the present invention;
Fig. 2 is a kind of second embodiment flow chart of the clustering method of adaptive stability and high efficiency of the present invention;
Fig. 3 is a kind of first embodiment block diagram of the clustering system of adaptive stability and high efficiency of the present invention;
Fig. 4 is a kind of second embodiment block diagram of the clustering system of adaptive stability and high efficiency of the present invention.
Specific embodiment
Further illustrate technical scheme below in conjunction with the accompanying drawings and by specific embodiment.
A kind of first embodiment flow process of the clustering method of adaptive stability and high efficiency of the present invention is as shown in Figure 1:
Step 101, the collection obtaining input data is combined into p={ p1,...pn, set includes n input data, obtains poly- The threshold θ of class radius;
Step 102, by piAnd set in input data piDistance be less than threshold θ input data all add defeated Enter data piCorresponding candidate cluster cpi, input data piRepresent i-th input data in set;
Step 103, makes candidate cluster cpiIn input data be m, function d (pi,pj) it is two input datas pi,pjIt Between distance, calculate input data piProbability as cluster centre is q i = σ 1 m 1 / ( 1 + d ( p i , p j ) ) , 1 ≤ j ≤ m ;
Step 104, from the input data of set, selects the input data becoming cluster centre maximum probability, this is selected The corresponding candidate cluster of input data going out adds final cluster.
The present embodiment proposes a kind of clustering method of adaptive stability and high efficiency, and the method pre-sets cluster without user Number, can adaptive generation cluster result, stable cluster result is enabled to identical input data, compare tradition calculation Method, the method has high efficiency, and its computation complexity is o (n2).
A kind of second embodiment flow process of the clustering method of adaptive stability and high efficiency of the present invention is as shown in Figure 2:
Step 201, the collection obtaining input data is combined into p={ p1... pn, set includes n input data, obtains poly- The threshold θ of class radius.
If each input data piThe probability becoming cluster centre is qi, initialize all qiFor 0.Make cpiFor piCorresponding Cluster, and initializeInitialization cluster resultMake function d (pi, pj) it is two input datas pi, pjIt Between distance tolerance.
Step 202, by piAnd set in input data piDistance be less than threshold θ input data all add defeated Enter data piCorresponding candidate cluster cpi, input data piRepresent i-th input data in set.
Step 203, makes candidate cluster cpiIn input data be m, function d (pi, pj) it is two input datas pi, pj Between distance, calculate input data piProbability as cluster centre is q i = σ 1 m 1 / ( 1 + d ( p i , p j ) ) , 1 ≤ j ≤ m .
Step 204, from the input data of set, selects the input data becoming cluster centre maximum probability, this is selected The corresponding candidate cluster of input data going out adds final cluster.
Step 205, deletes, from input data set, the input data adding final cluster, again from present input data Select the input data becoming cluster centre maximum probability in set, this corresponding candidate cluster of input data selected is added Final cluster.
Step 206, whether the quantity of the input data in judging to gather is zero, if it is, terminating, otherwise, continues step Rapid 205.
The first embodiment block diagram of the clustering system of the adaptive stability and high efficiency of the present invention is as shown in figure 3, this system includes Initialization module 310, candidate cluster set up module 320, probability evaluation entity 330, cluster screening module 340.
Wherein, initialization module 310, the collection for obtaining input data is combined into p={ p1... pn, set includes n Input data, obtains the threshold θ of cluster radius;Candidate cluster sets up module 320, for by piAnd set in input number According to piDistance be less than threshold θ input data all add input data piCorresponding candidate cluster cpi, input data piRepresent collection I-th input data in conjunction;Probability evaluation entity 330, is used for making candidate cluster cpiIn input data be m, function d (pi, pj) it is two input datas pi, pjBetween distance, calculate input data piProbability as cluster centre is q i = σ 1 m 1 / ( 1 + d ( p i , p j ) ) , 1 ≤ j ≤ m ; Cluster screening module 340, for, from the input data of set, selecting Go out to become the input data of cluster centre maximum probability, this corresponding candidate cluster of input data selected is added and finally gathers Class.
The second embodiment block diagram of the clustering system of the adaptive stability and high efficiency of the present invention is as shown in figure 4, include initialization Module 310, candidate cluster set up module 320, probability evaluation entity 330, cluster screening module 340, removing module 350, first Detection module 360.
Wherein, initialization module 310, the collection for obtaining input data is combined into p={ p1... pn, set includes n Input data, obtains the threshold θ of cluster radius;Candidate cluster sets up module 320, for by piAnd set in input number According to piDistance be less than threshold θ input data all add input data piCorresponding candidate cluster cpi, input data piRepresent collection I-th input data in conjunction;Probability evaluation entity 330, is used for making candidate cluster cpiIn input data be m, function d (pi, pj) it is two input datas pi, pjBetween distance, calculate input data piProbability as cluster centre is1≤j≤m;Cluster screening module 340, for, from the input data of set, selecting Go out to become the input data of cluster centre maximum probability, this corresponding candidate cluster of input data selected is added and finally gathers Class.Removing module 350, for deleting the input data adding final cluster from input data set, inputs from current again Select the input data becoming cluster centre maximum probability in data acquisition system, this corresponding candidate of input data selecting is gathered Class adds final cluster;First detection module 360, whether the quantity for judging the input data in gathering is zero, if it is, Then terminate, otherwise, continue step e.
The advantage of the clustering method of adaptive stability and high efficiency that this invention proposes includes: first, the method need not be in advance The number of the final cluster of setting, adaptive input data can realize appropriate cluster so as to multiple applied field can be widely used in Scape;Second, the method enables stable cluster result for identical input, execution can be repeated several times it is ensured that service is consistent Property;3rd, compare traditional method, the method has calculating high efficiency, enable o (n2) computation complexity so as to can be suitable for In current various mobile intelligent terminals;4th, while the method is finally clustered, also obtain the center of each cluster Point.
One of ordinary skill in the art will appreciate that realizing all or part of flow process in above-described embodiment method, it is permissible Instruct related hardware to complete by computer program, described program can be stored in a computer read/write memory medium In, this program is upon execution, it may include as the flow process of the embodiment of above-mentioned each method.Wherein, described storage medium can be magnetic Dish, CD, read-only memory (read-only memory, rom) or random access memory (random access Memory, ram) etc..
Describe the know-why of the present invention above in association with specific embodiment.These descriptions are intended merely to explain the present invention's Principle, and limiting the scope of the invention can not be construed to by any way.Based on explanation herein, the technology of this area Personnel do not need to pay other specific embodiments that performing creative labour can associate the present invention, and these modes fall within Within protection scope of the present invention.

Claims (2)

1. a kind of clustering system of adaptive stability and high efficiency is it is characterised in that include:
Initialization module, the collection for obtaining input data is combined into p={ p1... pn, set includes n input data, obtains Take the threshold θ of cluster radius;
Candidate cluster sets up module, for by piAnd set in input data piDistance be less than threshold θ input data All add input data piCorresponding candidate cluster cpi, input data piRepresent i-th input data in set;
Probability evaluation entity, is used for making candidate cluster cpiIn input data be m, function d (pi, pj) it is two input datas pi, pjBetween distance, calculate input data piProbability as cluster centre is
Cluster screening module, for, from the input data of set, selecting the input data becoming cluster centre maximum probability, will The corresponding candidate cluster of input data that this is selected adds final cluster.
2. the system as claimed in claim 1 is it is characterised in that also include:
Removing module, for deleting the input data adding final cluster from input data set, again from current input number Become the input data of cluster centre maximum probability according to selecting in set, this corresponding candidate cluster of input data selected is added Enter final cluster;
First detection module, whether the quantity for the input data in disconnected set is zero, if it is, terminating, otherwise, continues Step e.
CN201310082671.2A 2013-03-14 2013-03-14 Method and system for stable and efficient self-adaptive clustering Expired - Fee Related CN103207896B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310082671.2A CN103207896B (en) 2013-03-14 2013-03-14 Method and system for stable and efficient self-adaptive clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310082671.2A CN103207896B (en) 2013-03-14 2013-03-14 Method and system for stable and efficient self-adaptive clustering

Publications (2)

Publication Number Publication Date
CN103207896A CN103207896A (en) 2013-07-17
CN103207896B true CN103207896B (en) 2017-02-01

Family

ID=48755118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310082671.2A Expired - Fee Related CN103207896B (en) 2013-03-14 2013-03-14 Method and system for stable and efficient self-adaptive clustering

Country Status (1)

Country Link
CN (1) CN103207896B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559502A (en) * 2013-10-25 2014-02-05 华南理工大学 Pedestrian detection system and method based on adaptive clustering analysis
CN104702432B (en) * 2014-01-15 2018-03-30 杭州海康威视系统技术有限公司 The method and server alerted based on band of position division

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101308496A (en) * 2008-07-04 2008-11-19 沈阳格微软件有限责任公司 Large scale text data external clustering method and system
EP2184692A1 (en) * 2008-10-28 2010-05-12 Sony Corporation Information processing
CN101989281A (en) * 2009-08-03 2011-03-23 中国移动通信集团公司 Clustering method and device
CN102289478A (en) * 2011-08-01 2011-12-21 江苏广播电视大学 System and method for recommending video on demand based on fuzzy clustering

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101308496A (en) * 2008-07-04 2008-11-19 沈阳格微软件有限责任公司 Large scale text data external clustering method and system
EP2184692A1 (en) * 2008-10-28 2010-05-12 Sony Corporation Information processing
CN101989281A (en) * 2009-08-03 2011-03-23 中国移动通信集团公司 Clustering method and device
CN102289478A (en) * 2011-08-01 2011-12-21 江苏广播电视大学 System and method for recommending video on demand based on fuzzy clustering

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"结合SOFM的改进CLARA聚类算法";段明秀;《计算机工程》;20101130;第46卷(第22期);第210-212页 *

Also Published As

Publication number Publication date
CN103207896A (en) 2013-07-17

Similar Documents

Publication Publication Date Title
Kourtit et al. Smart cities in perspective–a comparative European study by means of self-organizing maps
Chen et al. Exploiting spatio-temporal user behaviors for user linkage
CN106681996B (en) The method and apparatus for determining interest region in geographic range, point of interest
Le Falher et al. Where is the Soho of Rome? Measures and algorithms for finding similar neighborhoods in cities
CN106162544B (en) A kind of generation method and equipment of geography fence
CN107395680B (en) Shop group's information push and output method and device, equipment
CN106844407A (en) Label network production method and system based on data set correlation
CN109657063A (en) A kind of processing method and storage medium of magnanimity environment-protection artificial reported event data
CN110148032A (en) Products Show method, apparatus, storage medium and server based on geographical location
CN113837635A (en) Risk detection processing method, device and equipment
CN106488401B (en) Generate the method and device of seamless adjacent geography fence
Colantonio et al. Smart regions in Italy: a comparative study through self–organizing maps
CN116993555A (en) Partition method, system and storage medium for identifying territory space planning key region
CN106649380A (en) Hot spot recommendation method and system based on tag
CN103207896B (en) Method and system for stable and efficient self-adaptive clustering
Domagala Internet of Things and Big Data technologises as an opportunity for organizations based on Knowledge Management
CN109670873A (en) Real estate opens up objective method, apparatus and server
CN106021245A (en) Visualization method and visualization device for data
Li et al. Annotating semantic tags of locations in location-based social networks
CN112597399A (en) Graph data processing method and device, computer equipment and storage medium
Jiang et al. From social community to spatio-temporal information: A new method for mobile data exploration
Rai et al. Top-k community similarity search over large road-network graphs
CN108830298A (en) A kind of method and device of determining user characteristics label
CN113988594A (en) Multi-target site selection method and system for disaster backup data center
CN113792206A (en) Data processing method and device, computer readable storage medium and computer equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170201

CF01 Termination of patent right due to non-payment of annual fee