CN109086831A

CN109086831A - Hybrid Clustering Algorithm based on Fuzzy C-Means Algorithm and artificial bee colony clustering algorithm

Info

Publication number: CN109086831A
Application number: CN201810935647.1A
Authority: CN
Inventors: 李宏伟; 卫建华; 田智慧; 赫晓慧; 郭恒亮; 王晓蕾; 赵姗
Original assignee: Individual
Current assignee: Individual
Priority date: 2018-08-16
Filing date: 2018-08-16
Publication date: 2018-12-25

Abstract

The present invention relates to artificial bee colony algorithm technical fields, more particularly to the Hybrid Clustering Algorithm based on Fuzzy C-Means Algorithm and artificial bee colony clustering algorithm, the algorithm includes initial phase, leads the bee stage, follows bee stage and investigation bee stage, further include following steps: step 1: after following the bee stage, judging whether current algorithm is to recycle for the first time；If so, executing step 2；If it is not, thening follow the steps three；Step 2: it is optimized current optimal solution as the initial cluster center of Fuzzy C-Means Clustering Algorithm, if the quality of the solution after optimization is higher than current optimal solution, then current optimal solution is replaced with the solution after optimization, otherwise it abandons, the number of iterations in corresponding nectar source adds 1 simultaneously, subsequently into the investigation bee stage；Step 3: judge whether optimal solution changes after following the bee stage；If so, executing step 2；If it is not, then entering the investigation bee stage.It is high that algorithm provided by the present invention clusters accuracy rate height, fast convergence rate, low optimization accuracy.

Description

Hybrid Clustering Algorithm based on Fuzzy C-Means Algorithm and artificial bee colony clustering algorithm

Technical field

The present invention relates to artificial bee colony algorithm technical fields, and in particular to one kind is based on Fuzzy C-Means Algorithm and artificial bee The Hybrid Clustering Algorithm of group's clustering algorithm.

Background technique

About Fuzzy C-Means Algorithm:

Dunn in 1974 proposes Fuzzy C-means (FCM) clustering algorithm on the Research foundation of Bezdek, is widely used In multiple fields such as geospatial information, image procossing, data minings.The maximum of Fuzzy C-Means Algorithm and hard C- mean algorithm The difference is that the degree of membership problem of object, it can only be 0,1 two values that hard C- mean value, which requires the degree of membership of object, and Fuzzy C- Mean value allows the degree of membership of object between [0,1], can also take 0 or 1, this feature of Fuzzy C-means possesses object Greater flexibility, an object both may belong to C₁Also it may belong to C₂Class, only subjection degree is different.

The basic process of Fuzzy C-Means Clustering Algorithm is: concentrate the characteristic distributions of object to analyze data first, Suitable clusters number c and Fuzzy Exponential m is set according to the characteristic distributions of object；Then it is a right that c is randomly choosed from data set As initial cluster center；Followed by loop iteration, Matrix dividing is obtained, Matrix dividing includes each object to institute There is the degree of membership information of class, cluster centre of new generation is determined by Matrix dividing and data set；Finally, when objective function convergence reaches When keeping stablizing to convergence precision or the degree of membership of object, stops iteration, obtain final cluster centre, data set is according to division Matrix completes fuzzy division.

The objective function of Fuzzy C-Means Algorithm is defined as follows:

d_ij=| | x_j-v_i|| (1.2)

Wherein, C={ C₁, C₂..., C_cIndicate set, d_ijIt is object x_jTo the distance of the cluster centre of i-th of subclass, U It is the Matrix dividing of a n × c, is U_ijSet.u_ijIndicate j-th of object x_jProgram and u are subordinate to for the i-th class_ij∈ [0,1].u_ijMeet following constraint condition:

Meanwhile each object is 1 to the sum of degree of membership of all classes, i.e.,

In formula, m is ambiguity parameter, m ∈ [1 ,+∞) control the fog-level of algorithm:

m→1⁺When, u_ij→ 1 or 0, this when, FCM algorithm was just degenerated to HCM algorithm；

When m →+∞, u_ij→ 1/c, this when, the fuzziness of cluster result of FCM algorithm was in maximum rating, i.e. m value The ambiguity for increasing then algorithm increases.The value of m is 2 under normal conditions.

F (X, U, C) is error weighted sum of squares in class, and FCM algorithm makes objective function F (X, U, C) by continuous iteration It minimizes.

Specific step is as follows for Fuzzy C-Means Algorithm:

Stepl: parameter initialization.Set cluster numbersWith Fuzzy Exponential m (1 < m <+∞), usual feelings Value is 2 under condition.Cluster centre is initialized, V is obtained⁽⁰⁾={ v₁, v₂..., v_c}.Convergence precision ε (ε > 0), the number of iterations k= 0。

Step2: subordinated-degree matrix U is calculated.According to cluster centre set V⁽⁰⁾, calculate data set in all objects to gather Then the distance at class center is updated subordinated-degree matrix U according to formula (1.5), i.e.,

Step3: cluster centre set V is updated^(k).K=k+1 is enabled, is calculated separately according to subordinated-degree matrix U complete in all classes The weighted average of portion's object, and as new cluster centre, i.e.,

Step4: Step2, Step3 are repeated, to the last the cluster centre set of iteration meets following condition twice:

||V^(k+1)-V^(k)| | < ε (1.7)

The artificial bee colony algorithm of standard:

As shown in Fig. 2, the artificial bee colony algorithm of standard includes 4 stages: initial phase leads the bee stage, follows bee Stage and search bee stage.

(1) initial phase

Initial phase includes parameter initialization and the initial nectar source of generation.Artificial bee colony algorithm has 3 important parameters: honey The quantity SN in source, the maximum cycle MaxCycle of algorithm, nectar source maximum number of iterations limit.Artificial bee colony algorithm exists SN initial nectar sources are randomly generated by formula (2.1) in the initial stage of algorithm, then calculate the fitness value in each nectar source.

Wherein i ∈ { 1,2 ..., SN } indicates the quantity in nectar source；J ∈ { 1,2 ..., D }, indicates the dimension in nectar source；x_ijIt indicates Solve x_iJth dimension value,Indicate the value range of jth dimension variable.

(2) the bee stage is led

It leads the quantity in bee and nectar source equal, bee is led to find quality higher nectar source on the basis of initial nectar source, lead to Formula (2.2) is crossed to carry out neighborhood search near nectar source and generate new nectar source.

v_ij=x_ij+r×(x_ij-x_kj) (2.2)

Wherein v_ijNew nectar source is indicated, it will be seen that new nectar source is in current nectar source x from formula (1.2)_ijWith it is adjacent Nectar source x_kjOn the basis of obtained by changing the value of current nectar source jth dimension.Random number between r expression [- 1,1], k ∈ 1, 2 ..., SN }, j ∈ { 1,2 ..., D } is both randomly choosed, and k ≠ i.J represents the dimension being updated, and artificial bee colony is calculated Method lead the bee stage by randomly choose certain it is one-dimensional be updated, obtain nectar source.For new nectar source v_ijIf Then enableIfThen enableIf the fitness value in new nectar source is greater than the fitness in old nectar source Value, then replace old nectar source with new nectar source, bee otherwise led still to save old nectar source.

(3) the bee stage is followed

Honeycomb is returned to after leading bee to search nectar source, calculates the fitness value in each nectar source in all honey according to formula (2.3) Shared ratio in the sum of the fitness value in source.Bee is followed according to the random number that system generates to determine whether that some is selected to lead bee Nectar source scan for, if certain nectar source fitness value proportion be greater than system generate random number if follow bee will select honey Source, this selection strategy are referred to as roulette selection strategy.

Fit in formula_iIt indicates the corresponding fitness value in i-th of nectar source, bee is followed to select a nectar source to carry out neighbour in this stage Domain search, it is similar to the bee stage is led, new nectar source is generated by formula (1.2), is retained if the fitness value in new nectar source is higher Otherwise new nectar source still retains old nectar source.

(4) the search bee stage

If some nectar source fitness value after limit neighborhood search is not still improved, it indicate that working as Preceding nectar source has been local optimum nectar source, corresponding with this nectar source to lead bee that abandon this nectar source and be changed into search bee, is scouted Bee finds new nectar source according to formula (2.1) by way of random search, and search bee starts to scan for simultaneously this new nectar source It is again transformed into and leads bee.

Judge whether the cycle-index of algorithm has reached maximum cycle MaxCycle.If reaching, terminator；If Not up to, then it returns to second stage and updates nectar source by leading bee to continue field search.

Summary of the invention

It is poly- that the purpose of the present invention is to provide a kind of mixing based on Fuzzy C-Means Algorithm and artificial bee colony clustering algorithm Class algorithm can pass through, and accelerate convergence speed of the algorithm.

In order to reach above-mentioned technical purpose, the technical solution adopted in the present invention is as follows:

A kind of Hybrid Clustering Algorithm based on Fuzzy C-Means Algorithm and artificial bee colony clustering algorithm, including initialization rank Section leads the bee stage, follows bee stage and investigation bee stage, which is characterized in that is following between bee stage and investigation bee stage Further include following steps:

Step 1: after following the bee stage, judge whether current algorithm is to recycle for the first time；If recycling for the first time, Then follow the steps two；If not recycling for the first time, three are thened follow the steps；

Step 2: optimizing current optimal solution as the initial cluster center of Fuzzy C-Means Clustering Algorithm, if The quality of solution after optimization is higher than current optimal solution, then replaces current optimal solution with the solution after optimization, otherwise abandon, while corresponding The number of iterations in nectar source adds 1, subsequently into the investigation bee stage；

Step 3: judge whether optimal solution changes after following the bee stage；If changing, two are thened follow the steps； If no change has taken place, enter the investigation bee stage.

It is further, described to lead the bee stage and/or follow the formula for generating new nectar source in the bee stage are as follows:

v_ij=x_ij+θ×(x_ij-x_kj)

Wherein, the θ is the nonlinear change factor, v_ijIndicate new nectar source, x_ijIndicate current nectar source, x_kjIndicate adjacent honey Source, k ∈ { 1,2 ..., SN }, j ∈ { 1,2 ..., D } is both randomly choosed, and k ≠ i；J represents the dimension being updated.

Further, the nonlinear change factor θ are as follows:

Wherein m, n are coefficient, and Cycle indicates previous cycle the number of iterations, and MaxCycle indicates largest loop the number of iterations,Wherein rand is random function.

Further, the value range of the m, n are respectively as follows: m ∈ [1,1.5], n ∈ [0,0.2].

Further, it is described follow the bee stage the following steps are included:

It sorts from low to high according to the size for the nectar source fitness value for leading bee, and assigns weight for each nectar source；

According to the fitness value for assigning weight, bee is followed to select nectar source by the selection mode of roulette and carry out neighborhood to search Rope generates new nectar source.

Further, the calculation formula of the weight in the nectar source are as follows:

Wherein, w (i) indicates the weight in nectar source, and value range is between [0,1]；SN indicates to lead the quantity of bee.

Further, it is described lead the bee stage and/or follow and generate new nectar source in the bee stage after, if new nectar source fitness value is big Fitness value in old nectar source then replaces old nectar source with new nectar source, on the contrary then retain old nectar source.

Further, further include following steps after the investigation bee stage: whether judging the cycle-index of the algorithm Reach maximum cycle MaxCycle；If reaching, terminator；If not up to, return leads the bee stage, continue Field search updates nectar source.

The invention has the following beneficial effects:

1, the present invention joined the step of whether optimal solution is improved judged in original clustering algorithm, can further add Fast convergence speed of the algorithm.

2, the random number r in nectar source more new formula is improved to nonlinear change factor θ by the present invention, with the fortune of algorithm Row, scale factor θ can nonlinear change.Bigger in the initial stage θ value of algorithm, nectar source update step-length is also bigger, and honeybee is searched The range of rope is also just bigger, and the diversity of population is also just relatively good；In the later period of algorithm, since bee colony moves closer to optimal honey Source needs to carry out at this time small range of search, and θ value slowly reduces, and nectar source updates step-length and slowly reduces, and is conducive in current honey The more good nectar source of careful search near source, and the present invention is by improved nectar source more new formula and improved mixing The mode that clustering algorithm combines further improves the low optimization accuracy of algorithm.

3, the present invention is that each nectar source assigns weight, random in order to avoid occurring when using the selection mode of roulette Property bigger, low efficiency and nectar source higher for quality exist leakage choosing possibility drawback；Following bee stage selection honey When source, weight is assigned according to leading the nectar source quality of bee to be ranked up from low to high, and for each nectar source；The higher nectar source of quality I value is bigger, and the weight of distribution is higher, and the selected probability in nectar source is also higher.The algorithm later period has been arrived, although all nectar sources Fitness value reaches unanimity, but the nectar source that the nectar source weight of the weight in high-quality nectar source or specific mass difference is high, high-quality It still is able to show one's talent, obtains more optimizing chance.

Detailed description of the invention

Fig. 1 is Hybrid Clustering Algorithm flow chart；

Fig. 2 is the operation figure of IRIS cluster data.

Specific embodiment

Below by specific embodiment combination attached drawing, the present invention will be described in detail, it should be noted that in the feelings not conflicted Under condition, the feature in embodiment and embodiment in the present invention be can be combined with each other, and the scope of protection of the present invention is not limited thereto.

Embodiment 1

Honeybee producting honey behavior in artificial bee colony algorithm and searching Optimal cluster centers in clustering algorithm are one-to-one Relationship, table 1 list this corresponding relationship.In artificial bee colony algorithm, in nectar source position and cluster process in possible cluster The heart is corresponding, and nectar source quality is corresponding with the value of evaluation function, and bee colony explores the speed of searching and gathering honey and finds Optimal cluster centers Speed it is corresponding, optimal quality nectar source corresponds to Optimal cluster centers.

The corresponding relationship of table 1 searching Optimal cluster centers and honeybee producting honey behavior

If sample space is x={ x₁, x₂..., x_n, wherein x_iIt is a d dimensional vector.It will be every in artificial bee colony algorithm One nectar source and a cluster centre set V={ v₁, v₂..., v_cCorresponding, wherein v_jIt is and x_iVector with identical dimension, honey Source quality is higher, and expression cluster centre is more excellent.In order to evaluate each nectar source (each group cluster centralization) in artificial bee colony algorithm Quality, we are by the fitness function of artificial bee colony algorithm is defined as:

fit_i=1/ [1+F (X, U, C)] (3.1)

Wherein: F (X, U, C) is objective function defined in formula (1.1), that is, the target of Fuzzy C-Means Clustering Algorithm Function.Nectar source quality is higher, and expression cluster centre set is more excellent, and for the value of F (X, U, C) with regard to smaller, Clustering Effect is also better, fit_iValue it is higher.Artificial bee colony algorithm is extended to artificial bee colony clustering algorithm (Artificial Bee Colony below Clustering algorithm, CABC).

As shown in Figure 1, specific step is as follows for artificial bee colony clustering algorithm:

Stepl: setting clusters number c, nectar source quantity, to lead bee quantity, follow bee quantity be SN.If sample attribute is tieed up Degree is d, then sets D=c*d for the dimension in nectar source, the maximum number of iterations in each nectar source is set as Limit=SN*D, algorithm Maximum cycle is set as MaxCycle, and current cycle time Cycle is set as 0.

Step2: SN initial nectar sources are randomly generated as initial cluster center according to initial nectar source formula, then calculates and draws Divide subordinated-degree matrix U and calculate the fitness value in each nectar source, the highest nectar source of fitness value is recorded.

The initial nectar source formula are as follows:

Wherein, i ∈ { 1,2 ..., SN } indicates i-th of nectar source；J ∈ { 1,2 ..., D }, indicates the dimension in nectar source；x_ijTable Show solution x_iJth dimension value,Indicate the value range of jth dimension variable, rand (0,1) is that value codomain is (0,1) Random function.

Step3: current cycle time Cycle is added into l.

Step4: it leads bee to carry out neighborhood search and generates new nectar source v_ij, then algorithm updates degree of membership square according to formula (1.5) Battle array U and the fitness value for calculating new nectar source are replaced if the fitness value in new nectar source is greater than the fitness value in old nectar source with new nectar source Old nectar source is changed, old nectar source is otherwise still retained.

The calculation formula for generating new nectar source are as follows:

v_ij=x_ij+r×(x_ij-x_kj) (1.1)

Wherein v_ijIndicate new nectar source, x_ijIndicate current nectar source, x_kjIndicate adjacent nectar source, r indicates random between [- 1,1] Number, k ∈ { 1,2 ..., SN }, j ∈ { 1,2 ..., D } is both randomly choosed, and k ≠ i.J represents the dimension being updated, Artificial bee colony algorithm lead the bee stage by randomly choose certain it is one-dimensional be updated, obtain nectar source.For new nectar source v_ijIfThen enableIfThen enable

Step5: the fitness value in each nectar source proportion in the sum of the fitness value in all nectar sources is calculated.

Step6: following bee to select nectar source according to the selection mode of roulette, then carries out neighborhood search and generates new nectar source, Then algorithm updates subordinated-degree matrix U and calculates the fitness value in nectar source, if the fitness value in new nectar source is greater than the suitable of old nectar source It answers angle value then to replace old nectar source with new nectar source, otherwise still retains old nectar source；

The formula for following bee to generate new nectar source is identical as the formula for leading bee to generate new nectar source, is using formula 1.1。

The update subordinated-degree matrix U is to be updated according to the following formula, and it is u that U, which is the Matrix dividing of a n × c,_ijCollection It closes:

Wherein, k indicates the number of iterations, and c is setting cluster numbers, k=(1,2 ..., c), x_jIndicate that object, v indicate cluster Centralization, v_jIndicate j-th of subclass cluster centre, v_kIndicate that the cluster centre of k-th of subclass, m are ambiguity parameter, m ∈ [1 ,+∞) control the fog-level of algorithm, m → 1⁺When, u_ij→ 1 or 0,；When m →+∞, u_ij→ 1/c, m value increases then at this time The ambiguity of algorithm increases.The value of m is 2 under normal conditions.

Step7: judge whether the current cycle time Cycle of algorithm is to recycle for the first time, if recycling for the first time, is followed It after the bee stage, is optimized current optimal solution as the initial cluster center of Fuzzy C-Means Clustering Algorithm, if excellent The quality of solution after change is higher than current optimal solution, then replaces current optimal solution with the solution after optimization, otherwise abandon, while corresponding honey The number of iterations in source adds 1；If not recycling for the first time, it is divided into two kinds of situations at this time: if 1. optimal solution is sent out after following the bee stage Raw change then optimizes current optimal solution as the initial cluster center of Fuzzy C-Means Clustering Algorithm, if after optimization The quality of solution be higher than current optimal solution, then with optimization after solution replace current optimal solution, otherwise abandon, at the same accordingly nectar source change Generation number adds 1；2. if optimal solution is after following the bee stage, no change has taken place, does not execute Fuzzy C-Means Algorithm.

Step8: if the fitness value in certain nectar source nectar source after maximum number of iterations does not still improve, with nectar source pair That answers leads bee to be changed into investigation bee, and SN initial nectar sources are randomly generated according to initial nectar source formula in algorithm, and investigation bee turns again Become leading bee.

Step9: current cycle time Cycle adds 1, judges whether Cycle is greater than maximum cycle MaxCycle.If big In maximum cycle, indicating that algorithm has reached maximum cycle, stop iteration, algorithm terminates, output Optimal cluster centers, Subordinated-degree matrix and maximum adaptation angle value；If being less than maximum cycle, goes to Step4 and continue cycling through.

Embodiment 2

In the present embodiment, difference from example 1 is that: follow bee according to the selecting party of roulette in Step6 Further include following steps before formula selects nectar source:

Honeycomb is returned to after leading bee to search nectar source, and nectar source is arranged from low to high first, in accordance with the height of nectar source fitness value Then sequence is that each nectar source assigns weight according to formula (2.4), follows bee to select nectar source, then update honey according to formula (1.1) Source.

The weight computing formula in new nectar source is as follows:

Wherein, SN indicates to lead the quantity of bee, and w (i) indicates the weight in i-th of new nectar source.

As can be seen from the above equation, the value range of w (i) is between [0,1].The higher nectar source i value of quality is bigger, distribution Weight is higher, and the selected probability in nectar source is also higher.The algorithm later period is arrived, although the fitness value in all nectar sources tends to one It causes, but the ropy nectar source weight of the weight ratio in high-quality nectar source is high, high-quality nectar source still is able to show one's talent, obtain To more optimization chances.

According to the fitness value for assigning weight, bee is followed to select nectar source by the selection mode of roulette, then according to public affairs Formula (1.1) carries out neighborhood search and generates new nectar source.Bee is followed if the fitness value in new nectar source is greater than the fitness value in old nectar source Retain new nectar source, otherwise give up new nectar source, bee is followed still to retain old nectar source.

Embodiment 3

Difference from example 1 is that: it leads bee to carry out neighborhood search in Step4 and generates new nectar source v_ij, and/or It follows bee to carry out neighborhood search in Step6 and generates new nectar source v_ij, new nectar source v therein_ijFormula has done further improvement, tool Body is as follows:

v_ij=x_ij+θ×(x_ij-x_kj) (2.5)

Wherein v_ijIndicate new nectar source, x_ijIndicate current nectar source, x_kjIndicate adjacent nectar source, θ indicates the nonlinear change factor, k ∈ { 1,2 ..., SN }, j ∈ { 1,2 ..., D } is both randomly choosed, and k ≠ i.J represents the dimension being updated.

New nectar source v_ijIt is in current nectar source x_ijWith adjacent nectar source x_kjOn the basis of by changing current nectar source x_ijJth dimension What value obtained.J represents the dimension that is updated, artificial bee colony algorithm lead the bee stage by randomly choosing certain one-dimensional progress more Newly, new nectar source is obtained.

For new nectar source v_ijIfThen enableIfThen enableAlso It is to say, if new nectar source is greater than maximum value, using maximum value as updated new nectar source；It, will if new nectar source is less than minimum value Minimum value is as updated new nectar source；If the fitness value in new nectar source is greater than the fitness value in old nectar source, with new nectar source generation For old nectar source, bee is otherwise led still to save old nectar source.

Nonlinear change factor θ in formula 2.5 are as follows:

Wherein m, n are coefficient, dimensionless group；Cycle is previous cycle the number of iterations, and MaxCycle changes for largest loop Generation number,

The value codomain of parameter alpha are as follows:

Random function rand defining range is (0,1) in the formula of initial nectar source, when rand is less than 0.5, α value -1；When When rand is more than or equal to 0.5, α value 1.

Artificial bee colony algorithm due to standard is following the bee stage to be updated according to formula (1.1) to nectar source, updates public Formula uses a codomain to control nectar source update step-length, random too big, the nothing of this way of search for the random factor r of [- 1,1] Method be effectively ensured nectar source search range with algorithm carry out make corresponding change.

Therefore the nonlinear change factor θ that the present invention is proposed according to the characteristics of artificial bee colony algorithm, can be with algorithm It carries out updating step-length according to the nonlinear change nectar source of iterative process (Cycle).In improved artificial bee colony algorithm, bee is followed Stage nectar source is updated according to formula (2.5), and with the operation of algorithm, nonlinear change factor θ can nonlinear change.? The initial stage θ value of algorithm is bigger, and nectar source update step-length is also bigger, and the range of honeybee search is also with regard to bigger, the multiplicity of population Property also just it is relatively good.In the later period of algorithm, since bee colony moves closer to optimal nectar source, need to carry out small range of search at this time Rope, θ value slowly reduce, and nectar source updates step-length and slowly reduces, and it is more good to be conducive to careful search near current nectar source New nectar source improves the low optimization accuracy of algorithm.

Preferably, m takes [1,1.5], and effect is relatively good when n takes the value between [0,0.2].

Embodiment 4

The difference is that, bee is followed to select honey according to the selection mode of roulette in step Step6 with embodiment 3 Before source, further include the steps that described in embodiment 2:

Honeycomb is returned to after leading bee to search nectar source, and nectar source is arranged from low to high first, in accordance with the height of nectar source fitness value Then sequence is that each nectar source assigns weight according to formula (2.4), follows bee to select nectar source, then update honey according to formula (2.5) Source.

The weight computing formula in new nectar source is as follows:

Wherein, SN indicates to lead the quantity of bee, and w (i) indicates the weight in i-th of new nectar source

According to the fitness value for assigning weight, bee is followed to select nectar source by the selection mode of roulette, then according to public affairs Formula (2.5) carries out neighborhood search and generates new nectar source.Bee is followed if the fitness value in new nectar source is greater than the fitness value in old nectar source Retain new nectar source, otherwise give up new nectar source, bee is followed still to retain old nectar source.

Experimental result and analysis

(1) experimental result:

In order to compare artificial bee colony clustering algorithm (CABC algorithm) and Hybrid Clustering Algorithm involved in this embodiment The performance of (ICABC_FCM algorithm), we are using 5 common data sets in UCI database: IRIS data set, BUPA data Collection, WDBC data set, Wine data set, Thyroid data set are tested.Experiment sample composition is as shown in table 4:

The composition of 4 experiment sample data set of table

Dataset name	Number of samples	Classification number	Dimension
				IRIS	150	3	4
BUPA	345	2	6
				WDBC	569	2	30
Wine	178	3	13
				Thyroid	215	3	5

IRIS data set: being made of the attribute data of three kinds of iris plant samples, and data set includes 150 samples in total, Every a kind of including 50 samples, sample attribute includes sepal length, sepal width, petal length and petal width a total of four Attribute.

BUPA data set: about the record of male's patients with liver diseases, data set includes 345 samples, each sample in total Originally there are 6 attributes, wherein preceding 5 attributes are blood testings as a result, the sample in data set is divided into two classes: the first kind has 114, Second class has 231.

WDBC data set: data set includes 569 samples, and each sample has 30 attributes, and the sample in data set is divided into Two classes: Malignant and Benign, wherein Malignant class has 357, and Benign class has 212.

Wine data set: data set includes 178 samples, and each sample has 13 attributes, represents the grape in a place of production 13 chemical features that wine is included, the sample in data set are divided into three classes: the first kind has 59 samples, and the second class has 71 samples This, third class has 48 samples.

Thyroid data set: the thyroid gland data set being made of 215 samples, each sample have 5 attributes.Data set In sample be divided into three classes: the first kind has 150 samples, and the second class has 35 samples, and third class has 30 samples.

Respectively with CABC algorithm, ICABC algorithm and ICABC_FCM algorithm to IRIS data set, BUPA data set, WDBC number Clustering is carried out according to collection, Wine data set and Thyroid data set.

The FUZZY WEIGHTED index of each algorithm is m=2, and the minimum that wherein the FCM algorithm stage allows in ICABC_FCM algorithm is accidentally Poor ε=10^-3, honeybee number is 20, and maximum cycle MaxCycle is that 2000, Limit is dimension * (SN/2), the dimension in nectar source Degree is equal to the attribute dimensions of sample multiplied by clusters number, and algorithm is separately operable 20 times and is averaged as final result.

As shown in Fig. 2, being the operation figure of IRIS cluster data.

The cluster accuracy of each data set is as shown in table 5:

Each cluster data of table 5 averagely under true rate

The cluster result of each data set is as shown in table 6 below:

The target function value of each cluster data of table 6

Interpretation of result:

Fig. 2 is the operation figure of IRIS cluster data, it can be seen from the figure that CABC algorithm is after circulation about 180 times Target function value just starts to tend towards stability, and hybrid algorithm just has a solution well after circulation primary, and algorithm is several in circulation Target function value tends to stablize after secondary, and convergence speed of the algorithm is quickly.

From table 4, it can be seen that the cluster accuracy rate of hybrid algorithm is higher than the cluster accuracy rate of CABC algorithm, show clustering In accuracy rate, ICABC_FCM algorithm is better than CABC algorithm.

As can be seen from Table 6, ICABC_FCM algorithm is respectively less than CABC algorithm in the average and standard deviation of cluster, Show that ICABC_FCM algorithm is superior to CABC algorithm on whole low optimization accuracy and stability, and compare ICABC_FCM algorithm and The cluster result of CABC algorithm, ICABC_FCM algorithm are better than ICABC algorithm on whole low optimization accuracy and stability.

By being analyzed above it is found that ICABC_FCM algorithm is steady in cluster accuracy rate, convergence rate, low optimization accuracy and algorithm CABC algorithm is superior on qualitative.

Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features； And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims

1. a kind of Hybrid Clustering Algorithm based on Fuzzy C-Means Algorithm and artificial bee colony clustering algorithm, including initial phase, It leads the bee stage, follow bee stage and investigation bee stage, which is characterized in that further include following steps:

Step 1: after following the bee stage, judge whether current algorithm is to recycle for the first time；If recycling for the first time, then hold Row step 2；If not recycling for the first time, three are thened follow the steps；

Step 2: optimizing current optimal solution as the initial cluster center of Fuzzy C-Means Clustering Algorithm, if optimization The quality of solution afterwards is higher than current optimal solution, then replaces current optimal solution with the solution after optimization, otherwise abandon, while corresponding nectar source The number of iterations add 1, subsequently into investigation the bee stage；

Step 3: judge whether optimal solution changes after following the bee stage；If changing, two are thened follow the steps；If not yet It changes, then enters the investigation bee stage.

It is calculated 2. a kind of mixing based on Fuzzy C-Means Algorithm and artificial bee colony clustering algorithm according to claim 1 clusters Method, which is characterized in that described to lead the bee stage and/or follow the formula for generating new nectar source in the bee stage are as follows:

v_ij=x_ij+θ×(x_ij-x_kj)

Wherein, the θ is the nonlinear change factor, v_ijIndicate new nectar source, x_ijIndicate current nectar source, x_kjIndicate adjacent nectar source, i, K ∈ { 1,2 ..., SN } indicates the quantity in nectar source, and k ≠ i；J ∈ { 1,2 ..., D } represents the dimension in the nectar source being updated.

It is calculated 3. a kind of mixing based on Fuzzy C-Means Algorithm and artificial bee colony clustering algorithm according to claim 2 clusters Method, which is characterized in that the nonlinear change factor θ are as follows:

It is calculated 4. a kind of mixing based on Fuzzy C-Means Algorithm and artificial bee colony clustering algorithm according to claim 3 clusters Method, which is characterized in that, the value range of the m, n are respectively as follows: m ∈ [1,1.5], n ∈ [0,0.2].

It is calculated 5. a kind of mixing based on Fuzzy C-Means Algorithm and artificial bee colony clustering algorithm according to claim 1 clusters Method, which is characterized in that it is described follow the bee stage the following steps are included:

According to the fitness value for assigning weight, follows bee to select nectar source by the selection mode of roulette and carry out neighborhood search production Raw new nectar source.

It is calculated 6. a kind of mixing based on Fuzzy C-Means Algorithm and artificial bee colony clustering algorithm according to claim 5 clusters Method, which is characterized in that the calculation formula of the weight in the nectar source are as follows:

It is calculated 7. a kind of mixing based on Fuzzy C-Means Algorithm and artificial bee colony clustering algorithm according to claim 1 clusters Method, which is characterized in that it is described lead the bee stage and/or follow and generate new nectar source in the bee stage after, if new nectar source fitness value is big Fitness value in old nectar source then replaces old nectar source with new nectar source, on the contrary then retain old nectar source.

It is calculated 8. a kind of mixing based on Fuzzy C-Means Algorithm and artificial bee colony clustering algorithm according to claim 1 clusters Method, which is characterized in that after the investigation bee stage further include following steps: judge whether the cycle-index of the algorithm has reached To maximum cycle；If reaching, terminator；If not up to, return leads the bee stage, continue field search more New nectar source.