CN105224958A

CN105224958A - A kind of clustering method and device

Info

Publication number: CN105224958A
Application number: CN201510727800.8A
Authority: CN
Inventors: 冯研
Original assignee: TCL Corp
Current assignee: TCL Corp
Priority date: 2015-10-29
Filing date: 2015-10-29
Publication date: 2016-01-06

Abstract

The invention provides a kind of clustering method and device, described method comprises: the similarity calculating adjacent data in the ant data of bearing and Square Neighborhood; If the similarity of adjacent data is not more than first threshold in the data that ant bears and Square Neighborhood, then move data that ant bears until the similarity of the adjacent data of any one data and any one data is greater than first threshold in Square Neighborhood; The process of Similarity Measure and data mobile is repeated after increasing the area of Square Neighborhood; When the area of Square Neighborhood is increased to preset value after the repeating and reach preset times of Similarity Measure and data mobile, iteratively cluster is carried out to the data in plane until cluster process terminates when the cost that is positioned at the center that arbitrary acentric data exchange to bunch in each bunch is not less than zero.Technical scheme provided by the invention on the one hand, can reduce the time cost of computing, improves the validity of the effect of cluster and cluster, high efficiency and robustness etc. on the other hand.

Description

A kind of clustering method and device

Technical field

The invention belongs to Data Mining, particularly relate to a kind of clustering method and device.

Background technology

Cluster, so-called cluster, refer to that the set by physics or abstract object is divided into the process of the multiple classes be made up of similar object, bunch (Cluster) that generated by cluster is the set of one group of data object, these objects are similar each other to the object in same bunch, different with the object in other bunches, and cluster analysis (or clustering algorithm) is also called cluster analysis, it is a kind of statistical analysis technique of study sample or index classification problem, is also an important algorithm of data mining simultaneously.Along with the explosive increase that the data volume of various information now presents, existing Clustering Analysis Technology is also faced with larger challenge, therefore, the research of clustering algorithm has become one of the forward position and hot issue in multiple Related Research Domain such as data mining, pattern-recognition, statistics.

In existing numerous clustering algorithm, ant colony clustering algorithm is a kind of clustering algorithm based on swarm intelligence, comes from the behavior that ant piles up corpse.Ant can be stacked being dispersed in ant cave ant corpse everywhere respectively according to ant corpse Individual Size difference, and ant heap more conference attracts more ant to be stacked herein by more corpse.Ant group algorithm can when realizing autonomous cluster without any when priori and human intervention, and robustness (robustness) is comparatively strong, is easy to combine with other algorithms.

But when practical application, the time cost of above-mentioned ant colony clustering algorithm is larger.Especially, along with increasing of data, ant number, area of space and iterations etc. all want corresponding increase, thus operation time is doubled and redoubled.Further research finds, mainly consume these working times and move at random at the driftlessness of unloaded ant and repeat among a small circle to move, and, at the end of ant colony clustering algorithm, also some discrete data does not move in appointment heap, show as some isolated point and rickle, Clustering Effect not very well.

Summary of the invention

The object of the present invention is to provide a kind of clustering method and device, to reduce the cost and improve the Clustering Effect of clustering algorithm operation time of cluster analysis.

First aspect present invention provides a kind of clustering method, and described method comprises:

Calculate the similarity of adjacent data in the ant data of bearing and Square Neighborhood, described ant and data binding;

If the similarity of adjacent data is not more than first threshold in the data that described ant bears and described Square Neighborhood, then move the data that described ant bears, until the similarity of the adjacent data of any one data and any one data described is greater than described first threshold in described Square Neighborhood;

The process of described Similarity Measure and data mobile is repeated after increasing the area of described Square Neighborhood;

When the area of described Square Neighborhood is increased to preset value after the repeating and reach preset times of described Similarity Measure and data mobile, iteratively cluster is carried out to the data in plane, terminate until be positioned at cluster process when cost that arbitrary acentric data exchange to the center of described bunch is not less than zero in each bunch, described plane comprise area increase after Square Neighborhood.

Second aspect present invention provides a kind of clustering apparatus, it is characterized in that, described device comprises:

Similarity calculation module, for calculating the similarity of adjacent data in data and Square Neighborhood that ant bears, described ant and data binding;

Data movement module, for when in the data of bearing described ant and described Square Neighborhood, the similarity of adjacent data is not more than first threshold, the data that mobile described ant bears, until the similarity of the adjacent data of any one data and any one data described is greater than described first threshold in described Square Neighborhood;

Area increases module, for increasing the area of described Square Neighborhood;

Described similarity calculation module and data movement module are also for increasing the process repeating described Similarity Measure and data mobile after the area of described Square Neighborhood increases by module at described area;

Cluster module, for after the repeating and reach preset times of the described Similarity Measure when the area of described Square Neighborhood is increased to preset value and data mobile, iteratively cluster is carried out to the data in plane, terminate until be positioned at cluster process when cost that arbitrary acentric data exchange to the center of described bunch is not less than zero in each bunch, described plane comprise area increase after Square Neighborhood.

From the invention described above technical scheme, on the one hand, owing to bearing ant and the data binding of data, data can move freely, and can not there is the situation of the ant zero load in existing ant colony clustering algorithm, therefore can reduce the time cost of computing; On the other hand, cluster is iteratively carried out, iteration terminates to be that to be positioned in each bunch when cost that arbitrary acentric data exchange to the center of described bunch is not less than zero be mark, therefore, avoid isolated data and do not move to the situation of specifying heap, improve the validity of the effect of cluster and cluster, high efficiency and robustness etc.

Accompanying drawing explanation

Fig. 1 is the realization flow schematic diagram of the clustering method that the embodiment of the present invention one provides;

Fig. 2 is the structural representation of the clustering apparatus that the embodiment of the present invention two provides;

Fig. 3 is the structural representation of the clustering apparatus that the embodiment of the present invention three provides;

Fig. 4 is the structural representation of the clustering apparatus that the embodiment of the present invention four provides;

Fig. 5 is the structural representation of the clustering apparatus that the embodiment of the present invention five provides;

Fig. 6-a is the structural representation of the clustering apparatus that the embodiment of the present invention six provides;

Fig. 6-b is the structural representation of the clustering apparatus that the embodiment of the present invention seven provides;

Fig. 6-c is the structural representation of the clustering apparatus that the embodiment of the present invention eight provides;

Fig. 6-d is the structural representation of the clustering apparatus that the embodiment of the present invention nine provides.

Embodiment

In order to make object of the present invention, technical scheme and beneficial effect clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.

The embodiment of the present invention provides a kind of clustering method, and described method comprises: the similarity calculating adjacent data in the ant data of bearing and Square Neighborhood, described ant and data binding;

If the similarity of adjacent data is not more than first threshold in the data that described ant bears and described Square Neighborhood, then move the data that described ant bears, until the similarity of the adjacent data of any one data and any one data described is greater than described first threshold in described Square Neighborhood; The process of described Similarity Measure and data mobile is repeated after increasing the area of described Square Neighborhood; When the area of described Square Neighborhood is increased to preset value after the repeating and reach preset times of described Similarity Measure and data mobile, iteratively cluster is carried out to the data in plane, terminate until be positioned at cluster process when cost that arbitrary acentric data exchange to the center of described bunch is not less than zero in each bunch, described plane comprise area increase after Square Neighborhood.The present invention also provides corresponding clustering apparatus, is described in detail respectively below.

Referring to accompanying drawing 1, is the realization flow schematic diagram of the clustering method that the embodiment of the present invention one provides, and mainly comprises the following steps S101 to step S105:

S101, calculates the similarity of adjacent data in the ant data of bearing and Square Neighborhood, described ant and data binding.

In embodiments of the present invention, initial phase, data are by random projections on two dimensional surface, and Square Neighborhood is the region delimited in this two dimensional surface.Initial phase, can arrange smaller by the area of this Square Neighborhood as required.With existing ant colony clustering algorithm unlike, in embodiments of the present invention, ant and the data of bearing data are bound, namely, the corresponding data of ant, the quantity of ant is equal with data volume, so, ant zero load is there will not be namely not bear data and move.In embodiments of the present invention, ant can be understood as the data element of intelligent mobile.

In an embodiment of the invention, the similarity of adjacent data in the ant data of bearing and Square Neighborhood is calculated according to following formula (1):

f (o_{i}) = m a x {0, \frac{1}{s^{2}} Σ_{o_{j &Element; {Neigh}_{s \times s}} (r)} [1 - \frac{d (o_{i}, o_{j})}{α}]}

Formula (1)

Wherein, α is the proportionality factor regulating similarity between data, Neigh _{s × s}r () represents that the area centered by the r of position is the Square Neighborhood of S × S, d (o _i, o _j) be data o _iwith data o _jbetween Euclidean distance, data o _iwith data o _jthe data of phase neighbour in the Square Neighborhood of to be area centered by the r of position be S × S.It should be noted that, the data that ant bears, has both been likely the data inherently had in Square Neighborhood, is likely again the data moved outside Square Neighborhood.

Be responsible for the ant of Mobile data, it runs into data o _itime by data o _ithe probability P picked up _pickor the probability P put down _dropthe similarity calculated by above-mentioned formula (1) obtains, probability P _pickand probability P _dropcomputing formula as follows:

P_{p i c k} = \frac{2 \exp (- c f (o_{i}))}{1 + \exp (- c f (o_{i}))}

Formula (2)

P_{d r o p} = \frac{1 - \exp (- c f (o_{i}))}{1 + \exp (- c f (o_{i}))}

Formula (3).

In above-mentioned formula (2) and formula (3), c is constant, for regulating convergence of algorithm speed, if ant is non-loaded, then calculates when running into data the probability P picked up _pick.If the probability P picked up _pickbe greater than a certain threshold value of setting, then ant picks up the movement at random of these data.What the ant bearing data moved to somewhere puts down probability P _dropif be greater than another threshold value of setting, then ant is put down these data and marks that oneself is non-loaded, then circulates next time.

S102, if the similarity of adjacent data is not more than first threshold in the data that ant bears and Square Neighborhood, then move the data that ant bears, until the similarity of the adjacent data of any one data and any one data is greater than described first threshold in Square Neighborhood.

It should be noted that, step S102 is the process of the loop iteration of Square Neighborhood area when fixing, this procedure converges or end at any one data o in Square Neighborhood _iwith this any one data o _iadjacent data o _jsimilarity f (o _i) be greater than first threshold, namely

f (o_{i}) > ϵ + β \exp (\frac{1 - n}{γ})

Formula (4)

Set up, wherein, any one data o in Square Neighborhood _ibe adjacent data o _jsimilarity f (o _i) can calculate according to aforementioned formula (1), ε, β and γ are constant.Once the similarity of the adjacent data of this Square Neighborhood any one data interior and any one data meets above-mentioned formula (4) be namely greater than described first threshold, then flow process enters step S103.

S103, whether the process repeating Similarity Measure and data mobile reaches preset times, and whether Square Neighborhood area is preset value.

It should be noted that, in embodiments of the present invention, repeating Similarity Measure and data mobile is implementation when Square Neighborhood area is fixed up.Whenever the increase of Square Neighborhood area once, then when repeating Similarity Measure and data mobile or afterwards, under whether judgement Square Neighborhood area is now increased to preset value and is increased to the prerequisite of preset value at Square Neighborhood area, whether the process repeating Similarity Measure and data mobile reaches preset times.

S104, increases the area of Square Neighborhood.

In embodiments of the present invention, increasing the process of the area S × S of Square Neighborhood, can be certainly increasing or increase process of S.

S105, when the area of Square Neighborhood is increased to preset value after the repeating and reach preset times of described Similarity Measure and data mobile, iteratively cluster is carried out to the data in plane, terminate until be positioned at cluster process when cost that arbitrary acentric data exchange to the center of described bunch is not less than zero in each bunch, described plane comprise area increase after Square Neighborhood.

In embodiments of the present invention, the preset value of the area of preset times and Square Neighborhood is the condition of convergence that repeats of step S101 to step S102 or end condition, namely, when the area of Square Neighborhood is increased to preset value, if repeating of the process of Similarity Measure and data mobile reaches preset times, then stop the movement of data, and start iteratively to carry out cluster to the data in plane.

Carry out to the data in plane the implementation that cluster is an iterative algorithm, it converges on or ends in each bunch and is positioned at cluster process when cost that arbitrary acentric data exchange to the center of described bunch is not less than zero and terminates.

As one embodiment of the invention, iteratively cluster is carried out to the data in plane, until cluster process terminates to comprise the steps that S1051 is to step S1054 when the cost being positioned at the center that arbitrary acentric data exchange to bunch in each bunch is not less than zero:

S1051, the data clusters in plane is become predetermined number bunch.

In embodiments of the present invention, the two dimensional surface of the plane described in step S1051 i.e. initial phase projection data, obviously, it comprises the Square Neighborhood after area increase.

What become predetermined number to the data clusters in plane bunch is also an iterative process, and it realizes by following steps S1 and step S2:

S1, comprises data that the first maximum Square Neighborhood of data comprises as first bunch in face of making even.

S2, shoot off to be formed bunch shared by remaining area after region to fetch data the second maximum Square Neighborhoods, when the data that the second Square Neighborhood comprises are less than Second Threshold with the average similarity of the last bunch data comprised formed, then the data comprised by the second Square Neighborhood are as one new bunch.

Particularly, remaining area after region shared by first bunch of shooting off fetches data the second maximum Square Neighborhoods, when the average similarity of the data that the second Square Neighborhood comprises and the first bunch of data comprised is less than Second Threshold, then using the second Square Neighborhood as second bunch; Similarly, third party's shape neighborhood that remaining area after region shared by first bunch and second bunch of shooting off fetches data maximum, when the average similarity of the data that third party's shape neighborhood comprises and second bunch of data comprised is less than Second Threshold, then using third party's shape neighborhood as the 3rd bunch ... by that analogy, repeat to shoot off to have been formed bunch shared by region obtain the implementation of new bunch, until planar cluster to predetermined number bunch.

Planar cluster to predetermined number bunch after, for various reasons, and in on-plane surface all data all cluster to bunch in, still have indivedual or some data do not have cluster to bunch in.In order to improve Clustering Effect, in embodiments of the present invention, planar cluster to predetermined number bunch after, also comprise by the Data Placement that do not have cluster to arrive any one bunch in described plane to bunch center nearest bunch.Particularly, the data D by not having cluster to arrive any one bunch in Calculation Plane _jthe distance at the Ge Cu center be clustered in the point at place and plane, if data D _jbunch C in each bunch that has been clustered in the point at place and plane _jthe distance at center minimum, then by data D _jbe divided to a bunch C _j.

The center Oc of each bunch that S1052, extraction step S1051 are clustered into.

Particularly, for bunch C that any one is formed _ishared region, therefrom can choose a some m _i, some m _ito bunch C _iin the distance sum of point at all data places minimum, then put m _ibe exactly bunch C that any one is formed _icenter Oc.

S1053, calculates arbitrary acentric data Orandom in each bunch of cluster gained and exchanges to the cost of the center Oc of each bunch.

Particularly, the cost E that arbitrary acentric data Orandom exchanges to the center Oc of each bunch in each bunch calculates by following formula (5):

E = Σ_{i = 1}^{k} Σ_{p &Element; C_{i}} | p - m_{i} |^{2}

Formula (5)

Wherein, p represents object any one point in space, and specific in above-described embodiment, p is exactly a bunch C _iin arbitrary acentric data Orandom; m _ifor a bunch C _icentral point, specific in above-described embodiment, m _ibe exactly a bunch C _icenter Oc.

S1054, if the cost that data Orandom exchanges to the center Oc of each bunch is less than zero, then becomes Cu Xin center by replacement center, the position Oc at data Orandom place.

The cost that data Orandom exchanges to the center Oc of each bunch is less than zero, that is, the cost E that data Orandom exchanges to the center Oc of each bunch is less than zero, then replacement center, the position Oc at data Orandom place is become Cu Xin center.

Pricing and the center of repeating step S1053 and step S1054 replace implementation, until the cost being positioned at the center that arbitrary acentric data exchange to bunch in each bunch is not less than zero, after meeting this condition of convergence or end condition, the real center that the center finally formed is only bunch.

It should be noted that, in order to improve arithmetic speed, in the above-described embodiments, after the similarity calculating adjacent data in the ant data of bearing and Square Neighborhood, data all in described plane are arranged according to the order of similarity size, so, when whether the similarity of adjacent data is greater than first threshold in the data that relatively ant bears and Square Neighborhood, greatly arithmetic speed can be improved.

From the clustering method of above-mentioned accompanying drawing 1 example, on the one hand, owing to bearing ant and the data binding of data, data can move freely, and can not there is the situation of the ant zero load in existing ant colony clustering algorithm, therefore can reduce the time cost of computing; On the other hand, cluster is iteratively carried out, iteration terminates to be that to be positioned in each bunch when cost that arbitrary acentric data exchange to the center of described bunch is not less than zero be mark, therefore, avoid isolated data and do not move to the situation of specifying heap, improve the validity of the effect of cluster and cluster, high efficiency and robustness etc.

Referring to accompanying drawing 2, is the structural representation of the clustering apparatus that the embodiment of the present invention two provides.For convenience of explanation, accompanying drawing 2 illustrate only the part relevant to the embodiment of the present invention.The clustering apparatus of accompanying drawing 2 example can be the executive agent of the clustering method of accompanying drawing 1 example, and it can be server.The clustering apparatus of accompanying drawing 2 example two mainly comprises similarity calculation module 201, data movement module 202, area increase module 203 and cluster module 204, wherein:

Similarity calculation module 201, for calculating the similarity of adjacent data in data and Square Neighborhood that ant bears, wherein, ant and data binding;

Data movement module 202, when similarity for calculating adjacent data in data and Square Neighborhood that gained ant bears in similarity calculation module 201 is not more than first threshold, the data that mobile ant bears, until the similarity of the adjacent data of any one data and any one data is greater than described first threshold in Square Neighborhood;

Area increases module 203, for increasing the area of Square Neighborhood;

Similarity calculation module 201 and data movement module 202 are also for increasing the process repeating described Similarity Measure and data mobile after the area of Square Neighborhood increases by module 203 at area;

Cluster module 204, for when the area of Square Neighborhood is increased to preset value by area increase module 203, after the repeating of Similarity Measure and data mobile reaches preset times, iteratively cluster is carried out to the data in plane, terminate until be positioned at cluster process when cost that arbitrary acentric data exchange to the center of described bunch is not less than zero in each bunch, described plane comprise area increase after Square Neighborhood.

It should be noted that, in the embodiment of the clustering apparatus of above accompanying drawing 2 example, the division of each functional module only illustrates, can be as required in practical application, the facility of the such as configuration requirement of corresponding hardware or the realization of software is considered, and above-mentioned functions distribution is completed by different functional modules, the inner structure by described clustering apparatus is divided into different functional modules, to complete all or part of function described above.And, in practical application, corresponding functional module in the present embodiment can be by corresponding hardware implementing, also can perform corresponding software by corresponding hardware to complete, such as, aforesaid similarity calculation module can be the hardware with the similarity performing adjacent data in the aforementioned calculating ant data of bearing and Square Neighborhood, such as Similarity Measure device also can be general processor or other hardware devices that can perform corresponding computer program thus complete aforementioned function, for another example aforesaid cluster module, can be after performing the repeating and reach preset times of when the area of described Square Neighborhood is increased to preset value described Similarity Measure and data mobile, iteratively cluster is carried out to the data in plane, until be positioned at the hardware that when cost that arbitrary acentric data exchange to the center of described bunch is not less than zero, cluster process terminates in each bunch, such as cluster device, also can be general processor or other hardware devices (each embodiment that this instructions provides all can apply foregoing description principle) that can perform corresponding computer program thus complete aforementioned function.

The cluster module 204 of accompanying drawing 2 example can comprise bunch forming unit 301, center extraction unit 302, pricing unit 303 and center and replace unit 304, as shown in Figure 3 the clustering apparatus that provides of the embodiment of the present invention three, wherein:

Bunch forming unit 301, for the data clusters in plane is become predetermined number bunch;

Center extraction unit 302, for extracting the center Oc of each bunch that bunch forming unit 301 is clustered into;

Pricing unit 303, exchanges to the cost of the center Oc of each bunch for arbitrary acentric data Orandom in each bunch of compute cluster forming unit 301 cluster gained;

Center replace unit 304, when the cost for the center Oc exchanging to each bunch at data Orandom is less than zero, by the position at data Orandom place replace bunch center Oc become Cu Xin center;

Pricing unit and center replace unit also for repeating pricing and center replaces implementation, until the cost being positioned at the center that arbitrary acentric data exchange to bunch in each bunch is not less than zero.

Bunch forming unit 301 of accompanying drawing 3 example can comprise first bunch of forming unit 401 and remaining bunch of forming unit 402, as shown in Figure 4 the clustering apparatus that provides of the embodiment of the present invention four, wherein:

First bunch of forming unit 401, for comprising data that the first maximum Square Neighborhood of data comprises as first bunch in the face of making even;

Remaining bunch forming unit 402, for shoot off to be formed bunch shared by remaining area after region to fetch data the second maximum Square Neighborhoods, when the data that the second Square Neighborhood comprises are less than Second Threshold with the average similarity of the last bunch data comprised formed, then the data comprised by the second Square Neighborhood are as one new bunch;

Wherein, remaining bunch of forming unit 402 also for repeating to shoot off region shared by existing bunch obtain the implementation of new bunch, until planar cluster to predetermined number bunch.

The clustering apparatus of accompanying drawing 4 example can also comprise and divides module 501, as shown in Figure 5 the clustering apparatus that provides of the embodiment of the present invention five.Divide module 501 for bunch forming unit 301 planar cluster to predetermined number bunch after, the Data Placement arriving any one bunch by not having cluster in described plane extremely with bunch center nearest bunch.

The clustering apparatus of the arbitrary example of accompanying drawing 2 to accompanying drawing 5 can also comprise arrangement module 601, the clustering apparatus that the embodiment of the present invention six to embodiment nine provides as shown in accompanying drawing 6-a to accompanying drawing 6-d.After the similarity of arrangement module 601 for adjacent data in the data that calculate ant in similarity calculation module 201 and bear and Square Neighborhood, data all in plane are arranged according to the order of similarity size.

It should be noted that, the content such as information interaction, implementation between each module/unit of said apparatus, due to the inventive method embodiment based on same design, its technique effect brought is identical with the inventive method embodiment, particular content see describing in the inventive method embodiment, can repeat no more herein.

One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is that the hardware that can carry out instruction relevant by program has come, this program can be stored in a computer-readable recording medium, storage medium can comprise: ROM (read-only memory) (ROM, ReadOnlyMemory), random access memory (RAM, RandomAccessMemory), disk or CD etc.

The clustering method provided the embodiment of the present invention above and device are described in detail, apply specific case herein to set forth principle of the present invention and embodiment, the explanation of above embodiment just understands method of the present invention and core concept thereof for helping; Meanwhile, for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims

1. a clustering method, is characterized in that, described method comprises:

2. the method for claim 1, is characterized in that, describedly iteratively carries out cluster to the data in plane, terminates, comprising until be positioned at cluster process when cost that arbitrary acentric data exchange to the center of described bunch is not less than zero in each bunch:

Data clusters in described plane is become predetermined number bunch;

Extract the center Oc of each bunch;

Calculate arbitrary acentric data Orandom in each bunch of described cluster gained and exchange to the cost of the center Oc of described each bunch;

If the cost that data Orandom exchanges to the center Oc of described each bunch is less than zero, then the position at data Orandom place being replaced described center Oc becomes described Cu Xin center;

Repeat described pricing and center and replace implementation, until be positioned at the cost that arbitrary acentric data exchange to the center of described bunch in each bunch to be not less than zero.

3. method as claimed in claim 2, is characterized in that, described to the data clusters in described plane become predetermined number bunch, comprising:

Get in described plane and comprise data that the first maximum Square Neighborhood of data comprises as first bunch;

Shoot off to be formed bunch shared by remaining area after region to fetch data the second maximum Square Neighborhoods, when the data that the second Square Neighborhood comprises are less than Second Threshold with the average similarity of the last bunch data comprised formed, then the data comprised by the second Square Neighborhood are as one new bunch;

To shoot off described in repetition region shared by having been formed bunch obtain the implementation of new bunch, until in described plane cluster to described predetermined number bunch.

4. method as claimed in claim 3, is characterized in that, described in described plane cluster to described predetermined number bunch after, described method also comprises:

The Data Placement arriving any one bunch by not having cluster in described plane to bunch centre distance nearest bunch.

5. the method as described in Claims 1-4 any one, is characterized in that, in the described calculating ant data of bearing and Square Neighborhood adjacent data similarity after, described method also comprises:

Data all in described plane are arranged according to the order of similarity size.

6. a clustering apparatus, is characterized in that, described device comprises:

7. device as claimed in claim 6, it is characterized in that, described cluster module comprises:

Bunch forming unit, for the data clusters in described plane is become predetermined number bunch;

Center extraction unit, for extracting the center Oc of each bunch that described bunch forming unit is clustered into;

Pricing unit, exchanges to the cost of the center Oc of described each bunch for arbitrary acentric data Orandom in calculate described cluster gained each bunch;

Center replaces unit, and when the cost for the center Oc exchanging to described each bunch at data Orandom is less than zero, the position at data Orandom place being replaced described center Oc becomes described Cu Xin center;

Described pricing unit and center replace unit also for repeating described pricing and center replaces implementation, until be positioned at the cost that arbitrary acentric data exchange to the center of described bunch in each bunch to be not less than zero.

8. device as claimed in claim 7, it is characterized in that, described bunch of forming unit comprises:

First bunch of forming unit, comprises data that the first maximum Square Neighborhood of data comprises as first bunch for getting in described plane;

Remaining bunch forming unit, for shoot off to be formed bunch shared by remaining area after region to fetch data the second maximum Square Neighborhoods, when the data that the second Square Neighborhood comprises are less than Second Threshold with the average similarity of the last bunch data comprised formed, then the data comprised by the second Square Neighborhood are as one new bunch;

Described remaining bunch forming unit also for shooting off region shared by existing bunch obtain the implementation of new bunch described in repeating, until in described plane cluster to described predetermined number bunch.

9. device as claimed in claim 8, it is characterized in that, described device also comprises:

Divide module, for described bunch of forming unit in described plane cluster to described predetermined number bunch after, the Data Placement arriving any one bunch by not having cluster in described plane extremely with bunch centre distance nearest bunch.

10. the device as described in claim 6 to 9 any one, is characterized in that, described device also comprises:

Arrangement module, for calculate adjacent data in data and Square Neighborhood that ant bears in described similarity calculation module similarity after, data all in described plane are arranged according to the order of similarity size.