CN103207896B - Method and system for stable and efficient self-adaptive clustering - Google Patents
Method and system for stable and efficient self-adaptive clustering Download PDFInfo
- Publication number
- CN103207896B CN103207896B CN201310082671.2A CN201310082671A CN103207896B CN 103207896 B CN103207896 B CN 103207896B CN 201310082671 A CN201310082671 A CN 201310082671A CN 103207896 B CN103207896 B CN 103207896B
- Authority
- CN
- China
- Prior art keywords
- input data
- cluster
- module
- input
- probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title abstract description 41
- 230000003044 adaptive effect Effects 0.000 claims description 21
- 241001269238 Data Species 0.000 claims description 7
- 238000011156 evaluation Methods 0.000 claims description 6
- 238000012216 screening Methods 0.000 claims description 6
- 238000001514 detection method Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000010365 information processing Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000011430 maximum method Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method and a system for stable and efficient self-adaptive clustering. The method comprises the following steps of: a, obtaining a set p of input data from p1 to pn, in which n input data is included in the set, and obtaining a threshold value theta of cluster radius; b, adding pi and input data in the set with a distance from the input data pi of smaller than the threshold value theta in a candidate cluster Cpi corresponding to the input data pi, in which the input data pi represents the i-th input data in the set; and c, defining m input data in the candidate cluster Cpi, using a function d (pi, pij) as a distance between the two input data pi, pj, and calculating the input data pi as the probability of a cluster center. The method and the system are applied for establishing a stable and efficient self-adaptive cluster system; the amount of final clusters is not required to be preset, so that the system has calculation efficiency and can realize calculation complexity of o(m2), and the method and the system can be suitable for various mobile intelligent terminals at present.
Description
Technical field
The present invention relates to technical field of computer information processing, the cluster side of more particularly, to a kind of adaptive stability and high efficiency
Method and system.
Background technology
With the rapid growth of computerized information, people are more and more stronger to the process demand of all kinds of computerized informations.Poly-
Class algorithm, as a class algorithm very important in information processing, is that various data managements, artificial intelligence, machine learning provide
The function of convergence on basis, plays important role in various information processings,
In today that intelligent mobile terminal application is universal, occur in that various clothes of the information based on Intelligent mobile equipment
Business, they need various intelligent terminal are provided with the service of efficient stable, and wherein substantial amounts of service is required for using clustering algorithm,
As the cluster to social good friend in mobile social networking, to cluster of commodity etc. in shopping application.A large amount of mobile devices are whole at present
Gps, base station are passed through in end, and the mode such as WAP possesses stationkeeping ability, therefore also create many based on geographical position
Service, and clustering method then can provide more abundant and useful function for this kind of service, such as classification hot zones cluster.
Simple example, current electronic chart often with the addition of all kinds of geographical labels by user, such as shopping, cuisines, sight spot etc., these
Geographical labels are dispersed on whole electronic chart.When a smart phone user is travelled outdoors or gone window-shopping, he usually needs to seek
Look for the popular commercial circle that oneself is interested, i.e. the intensive place of a certain class label, the commercial circle that such as shopping is concentrated, and obtain navigation clothes
Business.But " shopping " label being dispersed on whole map but be can only obtain by current handset map inquiry " shopping ", allow
User is difficult to choose route destination address.But by by effective cluster of these labels of " doing shopping ", will label being divided into
Multiple intensive subregions (cluster), then can quickly find " shopping " commercial circle of hot topic.And by multiple labels, such as " shopping "
" cuisines ", cluster result integrated, then can effectively help the user discover that the popular commercial circle meeting its multiple requirement.Poly-
Class method can bring, for Novel movable equipment, the application enriched in a large number, but the application of mobile terminal variation and computing resource
Limited feature then proposes self adaptation to clustering method, stable, efficient new demand.
The maximum method of existing multiple clustering methods at present, the k-means such as commonly using and expectation is although they are realized simply
Quickly, but they need to pre-set the number of final subregion, this obviously makes such method cannot adapt to widely should
With.Because user cannot know number of partitions in advance in most applications, such as one city how many cuisines gathering actually
Ground.Additionally, both approaches all have unstable phenomenon, that is, the cluster result that obtain is run multiple times may be inconsistent.And it is another
Although a kind of method being called qt is not required to pre-set number of partitions, and can get stable cluster result, it is but
Need o[(n]3) computing cost, in the face of huge quantity of information, for the mobile device limited by for computing resource, such
Expense is often difficult to bear.
Content of the invention
It is an object of the invention to proposing a kind of clustering method of adaptive stability and high efficiency and system, opened with solving calculating
Sell big problem.
For reaching this purpose, the present invention employs the following technical solutions:
A kind of clustering method of adaptive stability and high efficiency, comprising:
The collection that a obtains input data is combined into p={ p1,...pn, set includes n input data, obtains cluster radius
Threshold θ;
B is by piAnd set in input data piDistance be less than threshold θ input data all add input data pi
Corresponding candidate cluster cpi, input data piRepresent i-th input data in set;
C makes candidate cluster cpiIn input data be m, function d (pi,pj) it is two input datas pi,pjSpacing
From calculating input data piProbability as cluster centre is1≤j≤m;
D, from the input data of set, selects the input data becoming cluster centre maximum probability, the input that this is selected
The corresponding candidate cluster of data adds final cluster.
Further, after described corresponding candidate cluster of input data selecting this adds final cluster, further
Including:
E deletes the input data adding final cluster from input data set, again from present input data set
Select the input data becoming cluster centre maximum probability, this corresponding candidate cluster of input data selected is added and finally gathers
Class;
Whether the quantity of the input data in judging to gather is zero, if it is, terminating, otherwise, continues step e.
A kind of clustering system of adaptive stability and high efficiency, comprising:
Initialization module, the collection for obtaining input data is combined into p={ p1,...pn, set includes n input number
According to the threshold θ of acquisition cluster radius;
Candidate cluster sets up module, for by piAnd set in input data piDistance be less than threshold θ input
Data all adds input data piCorresponding candidate cluster cpi, input data piRepresent i-th input data in set;
Probability evaluation entity, is used for making candidate cluster cpiIn input data be m, function d (pi,pj) input numbers for two
According to pi,pjBetween distance, calculate input data piProbability as cluster centre is
Cluster screening module, for, from the input data of set, selecting the input number becoming cluster centre maximum probability
According to the final cluster of input data corresponding candidate cluster addition selecting this.
Further, also include:
Removing module, for deleting, from input data set, the input data adding final cluster, again from currently defeated
Enter and in data acquisition system, select the input data becoming cluster centre maximum probability, this corresponding candidate of input data selecting is gathered
Class adds final cluster;
First detection module, whether the quantity for judging the input data in gathering is zero, if it is, terminating, no
Then, continue step e.
The invention has the benefit that the present invention passes through to provide a kind of clustering method of adaptive stability and high efficiency and is
System is it is proposed that a kind of clustering method of new adaptive stability and high efficiency.The method need not pre-set the number of subregion, energy
Adaptive input data realizes appropriate subregion.And the method enables stable cluster result for identical input.Compare
Traditional method, the method also has calculating high efficiency, enables o (n2) computation complexity so as to can be suitably used for current each
Plant mobile intelligent terminal.
Brief description
Fig. 1 is a kind of first embodiment flow chart of the clustering method of adaptive stability and high efficiency of the present invention;
Fig. 2 is a kind of second embodiment flow chart of the clustering method of adaptive stability and high efficiency of the present invention;
Fig. 3 is a kind of first embodiment block diagram of the clustering system of adaptive stability and high efficiency of the present invention;
Fig. 4 is a kind of second embodiment block diagram of the clustering system of adaptive stability and high efficiency of the present invention.
Specific embodiment
Further illustrate technical scheme below in conjunction with the accompanying drawings and by specific embodiment.
A kind of first embodiment flow process of the clustering method of adaptive stability and high efficiency of the present invention is as shown in Figure 1:
Step 101, the collection obtaining input data is combined into p={ p1,...pn, set includes n input data, obtains poly-
The threshold θ of class radius;
Step 102, by piAnd set in input data piDistance be less than threshold θ input data all add defeated
Enter data piCorresponding candidate cluster cpi, input data piRepresent i-th input data in set;
Step 103, makes candidate cluster cpiIn input data be m, function d (pi,pj) it is two input datas pi,pjIt
Between distance, calculate input data piProbability as cluster centre is
Step 104, from the input data of set, selects the input data becoming cluster centre maximum probability, this is selected
The corresponding candidate cluster of input data going out adds final cluster.
The present embodiment proposes a kind of clustering method of adaptive stability and high efficiency, and the method pre-sets cluster without user
Number, can adaptive generation cluster result, stable cluster result is enabled to identical input data, compare tradition calculation
Method, the method has high efficiency, and its computation complexity is o (n2).
A kind of second embodiment flow process of the clustering method of adaptive stability and high efficiency of the present invention is as shown in Figure 2:
Step 201, the collection obtaining input data is combined into p={ p1... pn, set includes n input data, obtains poly-
The threshold θ of class radius.
If each input data piThe probability becoming cluster centre is qi, initialize all qiFor 0.Make cpiFor piCorresponding
Cluster, and initializeInitialization cluster resultMake function d (pi, pj) it is two input datas pi, pjIt
Between distance tolerance.
Step 202, by piAnd set in input data piDistance be less than threshold θ input data all add defeated
Enter data piCorresponding candidate cluster cpi, input data piRepresent i-th input data in set.
Step 203, makes candidate cluster cpiIn input data be m, function d (pi, pj) it is two input datas pi, pj
Between distance, calculate input data piProbability as cluster centre is
Step 204, from the input data of set, selects the input data becoming cluster centre maximum probability, this is selected
The corresponding candidate cluster of input data going out adds final cluster.
Step 205, deletes, from input data set, the input data adding final cluster, again from present input data
Select the input data becoming cluster centre maximum probability in set, this corresponding candidate cluster of input data selected is added
Final cluster.
Step 206, whether the quantity of the input data in judging to gather is zero, if it is, terminating, otherwise, continues step
Rapid 205.
The first embodiment block diagram of the clustering system of the adaptive stability and high efficiency of the present invention is as shown in figure 3, this system includes
Initialization module 310, candidate cluster set up module 320, probability evaluation entity 330, cluster screening module 340.
Wherein, initialization module 310, the collection for obtaining input data is combined into p={ p1... pn, set includes n
Input data, obtains the threshold θ of cluster radius;Candidate cluster sets up module 320, for by piAnd set in input number
According to piDistance be less than threshold θ input data all add input data piCorresponding candidate cluster cpi, input data piRepresent collection
I-th input data in conjunction;Probability evaluation entity 330, is used for making candidate cluster cpiIn input data be m, function d
(pi, pj) it is two input datas pi, pjBetween distance, calculate input data piProbability as cluster centre is Cluster screening module 340, for, from the input data of set, selecting
Go out to become the input data of cluster centre maximum probability, this corresponding candidate cluster of input data selected is added and finally gathers
Class.
The second embodiment block diagram of the clustering system of the adaptive stability and high efficiency of the present invention is as shown in figure 4, include initialization
Module 310, candidate cluster set up module 320, probability evaluation entity 330, cluster screening module 340, removing module 350, first
Detection module 360.
Wherein, initialization module 310, the collection for obtaining input data is combined into p={ p1... pn, set includes n
Input data, obtains the threshold θ of cluster radius;Candidate cluster sets up module 320, for by piAnd set in input number
According to piDistance be less than threshold θ input data all add input data piCorresponding candidate cluster cpi, input data piRepresent collection
I-th input data in conjunction;Probability evaluation entity 330, is used for making candidate cluster cpiIn input data be m, function d
(pi, pj) it is two input datas pi, pjBetween distance, calculate input data piProbability as cluster centre is1≤j≤m;Cluster screening module 340, for, from the input data of set, selecting
Go out to become the input data of cluster centre maximum probability, this corresponding candidate cluster of input data selected is added and finally gathers
Class.Removing module 350, for deleting the input data adding final cluster from input data set, inputs from current again
Select the input data becoming cluster centre maximum probability in data acquisition system, this corresponding candidate of input data selecting is gathered
Class adds final cluster;First detection module 360, whether the quantity for judging the input data in gathering is zero, if it is,
Then terminate, otherwise, continue step e.
The advantage of the clustering method of adaptive stability and high efficiency that this invention proposes includes: first, the method need not be in advance
The number of the final cluster of setting, adaptive input data can realize appropriate cluster so as to multiple applied field can be widely used in
Scape;Second, the method enables stable cluster result for identical input, execution can be repeated several times it is ensured that service is consistent
Property;3rd, compare traditional method, the method has calculating high efficiency, enable o (n2) computation complexity so as to can be suitable for
In current various mobile intelligent terminals;4th, while the method is finally clustered, also obtain the center of each cluster
Point.
One of ordinary skill in the art will appreciate that realizing all or part of flow process in above-described embodiment method, it is permissible
Instruct related hardware to complete by computer program, described program can be stored in a computer read/write memory medium
In, this program is upon execution, it may include as the flow process of the embodiment of above-mentioned each method.Wherein, described storage medium can be magnetic
Dish, CD, read-only memory (read-only memory, rom) or random access memory (random access
Memory, ram) etc..
Describe the know-why of the present invention above in association with specific embodiment.These descriptions are intended merely to explain the present invention's
Principle, and limiting the scope of the invention can not be construed to by any way.Based on explanation herein, the technology of this area
Personnel do not need to pay other specific embodiments that performing creative labour can associate the present invention, and these modes fall within
Within protection scope of the present invention.
Claims (2)
1. a kind of clustering system of adaptive stability and high efficiency is it is characterised in that include:
Initialization module, the collection for obtaining input data is combined into p={ p1... pn, set includes n input data, obtains
Take the threshold θ of cluster radius;
Candidate cluster sets up module, for by piAnd set in input data piDistance be less than threshold θ input data
All add input data piCorresponding candidate cluster cpi, input data piRepresent i-th input data in set;
Probability evaluation entity, is used for making candidate cluster cpiIn input data be m, function d (pi, pj) it is two input datas pi,
pjBetween distance, calculate input data piProbability as cluster centre is
Cluster screening module, for, from the input data of set, selecting the input data becoming cluster centre maximum probability, will
The corresponding candidate cluster of input data that this is selected adds final cluster.
2. the system as claimed in claim 1 is it is characterised in that also include:
Removing module, for deleting the input data adding final cluster from input data set, again from current input number
Become the input data of cluster centre maximum probability according to selecting in set, this corresponding candidate cluster of input data selected is added
Enter final cluster;
First detection module, whether the quantity for the input data in disconnected set is zero, if it is, terminating, otherwise, continues
Step e.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310082671.2A CN103207896B (en) | 2013-03-14 | 2013-03-14 | Method and system for stable and efficient self-adaptive clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310082671.2A CN103207896B (en) | 2013-03-14 | 2013-03-14 | Method and system for stable and efficient self-adaptive clustering |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103207896A CN103207896A (en) | 2013-07-17 |
CN103207896B true CN103207896B (en) | 2017-02-01 |
Family
ID=48755118
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310082671.2A Expired - Fee Related CN103207896B (en) | 2013-03-14 | 2013-03-14 | Method and system for stable and efficient self-adaptive clustering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103207896B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103559502A (en) * | 2013-10-25 | 2014-02-05 | 华南理工大学 | Pedestrian detection system and method based on adaptive clustering analysis |
CN104702432B (en) * | 2014-01-15 | 2018-03-30 | 杭州海康威视系统技术有限公司 | The method and server alerted based on band of position division |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101308496A (en) * | 2008-07-04 | 2008-11-19 | 沈阳格微软件有限责任公司 | Large scale text data external clustering method and system |
EP2184692A1 (en) * | 2008-10-28 | 2010-05-12 | Sony Corporation | Information processing |
CN101989281A (en) * | 2009-08-03 | 2011-03-23 | 中国移动通信集团公司 | Clustering method and device |
CN102289478A (en) * | 2011-08-01 | 2011-12-21 | 江苏广播电视大学 | System and method for recommending video on demand based on fuzzy clustering |
-
2013
- 2013-03-14 CN CN201310082671.2A patent/CN103207896B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101308496A (en) * | 2008-07-04 | 2008-11-19 | 沈阳格微软件有限责任公司 | Large scale text data external clustering method and system |
EP2184692A1 (en) * | 2008-10-28 | 2010-05-12 | Sony Corporation | Information processing |
CN101989281A (en) * | 2009-08-03 | 2011-03-23 | 中国移动通信集团公司 | Clustering method and device |
CN102289478A (en) * | 2011-08-01 | 2011-12-21 | 江苏广播电视大学 | System and method for recommending video on demand based on fuzzy clustering |
Non-Patent Citations (1)
Title |
---|
"结合SOFM的改进CLARA聚类算法";段明秀;《计算机工程》;20101130;第46卷(第22期);第210-212页 * |
Also Published As
Publication number | Publication date |
---|---|
CN103207896A (en) | 2013-07-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kourtit et al. | Smart cities in perspective–a comparative European study by means of self-organizing maps | |
Chen et al. | Exploiting spatio-temporal user behaviors for user linkage | |
CN106681996B (en) | The method and apparatus for determining interest region in geographic range, point of interest | |
Le Falher et al. | Where is the Soho of Rome? Measures and algorithms for finding similar neighborhoods in cities | |
CN106162544B (en) | A kind of generation method and equipment of geography fence | |
CN107395680B (en) | Shop group's information push and output method and device, equipment | |
CN106844407A (en) | Label network production method and system based on data set correlation | |
CN109657063A (en) | A kind of processing method and storage medium of magnanimity environment-protection artificial reported event data | |
CN110148032A (en) | Products Show method, apparatus, storage medium and server based on geographical location | |
CN113837635A (en) | Risk detection processing method, device and equipment | |
CN106488401B (en) | Generate the method and device of seamless adjacent geography fence | |
Colantonio et al. | Smart regions in Italy: a comparative study through self–organizing maps | |
CN116993555A (en) | Partition method, system and storage medium for identifying territory space planning key region | |
CN106649380A (en) | Hot spot recommendation method and system based on tag | |
CN103207896B (en) | Method and system for stable and efficient self-adaptive clustering | |
Domagala | Internet of Things and Big Data technologises as an opportunity for organizations based on Knowledge Management | |
CN109670873A (en) | Real estate opens up objective method, apparatus and server | |
CN106021245A (en) | Visualization method and visualization device for data | |
Li et al. | Annotating semantic tags of locations in location-based social networks | |
CN112597399A (en) | Graph data processing method and device, computer equipment and storage medium | |
Jiang et al. | From social community to spatio-temporal information: A new method for mobile data exploration | |
Rai et al. | Top-k community similarity search over large road-network graphs | |
CN108830298A (en) | A kind of method and device of determining user characteristics label | |
CN113988594A (en) | Multi-target site selection method and system for disaster backup data center | |
CN113792206A (en) | Data processing method and device, computer readable storage medium and computer equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170201 |
|
CF01 | Termination of patent right due to non-payment of annual fee |