CN109711373A - A kind of big data feature selection approach based on improvement bat algorithm - Google Patents
A kind of big data feature selection approach based on improvement bat algorithm Download PDFInfo
- Publication number
- CN109711373A CN109711373A CN201811642556.5A CN201811642556A CN109711373A CN 109711373 A CN109711373 A CN 109711373A CN 201811642556 A CN201811642556 A CN 201811642556A CN 109711373 A CN109711373 A CN 109711373A
- Authority
- CN
- China
- Prior art keywords
- bat
- formula
- algorithm
- individual
- population
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of based on the big data feature selection approach for improving bat algorithm.Feature selecting is that the sample of higher dimensional space is transformed into lower dimensional space by way of mapping or converting, and then deletes choosing and falls redundancy and the further dimensionality reduction of uncorrelated features.Purpose is to obtain character subset as small as possible, while not significantly reducing nicety of grading, not influencing to be distributed.On the basis of the feature selection approach superiority and inferiority of analysis of classical, introduces and realize optimization feature selecting using Swarm Intelligence Algorithm.Have the advantages that the shortcomings that concurrency, strong robustness and fast convergence rate, the present invention easily falls into local optimum for it in view of bat algorithm, introduces sub- population dividing mechanism and binary system difference Variation mechanism based on K-means algorithm.Improved algorithm mutually learns to improve the otherness and search capability of individual with the ability of efficient information transmitting, avoid Premature Convergence between enhancing population.Finally using improved bat algorithm optimization feature selecting and achieve excellent effect.
Description
Technical field
The present invention relates to a kind of bat algorithm and feature selection approach, belong to artificial intelligence and machine learning field.
Background technique
With national total level of IT application fast lifting, wisdom people's livelihood information on services technology is widely used, the wisdom people's livelihood
Information on services resource is significantly increased, and Public Culture Information Assurance ability rises appreciably, and wisdom people's livelihood service initially enters comprehensive
The information age of covering, multi-level propulsion and professional development.Wisdom people's livelihood service system support technology, is in new information
Under the support of technology, conventional wisdom people's livelihood service is transformed and is changed, the technological means of General Promotion service quality, service
Platform and way to manage.It is constantly to meet masses' demand, the wisdom people's livelihood service for providing government becomes more economical, more effective
The technology carrier of rate, efficiency and benefit, preferably guarantee people's Public Culture equity is infrastructure, is to improve the wisdom people
The important means and approach of raw service " Cultural Force ".
With the wide-scale distribution of multimedia content in a network environment, all kinds of different types, different grain size, different-format
Audio-video frequency content impacts traditional broadcasting and TV broadcasting media approach, causes mass data storage, management, inquiry, analysis, excavation
Deng confusion and predicament.In such circumstances, the integrated and integration technology of more application port data becomes base under the fusion of three screens of research
One urgent task of the wisdom people's livelihood comprehensive service platform technical support of Yu Sanping fusion.
The integrated and integration technology of more application port data under the fusion of three screens is studied, emphasis is to solve isomeric data in language
Efficient data classification, integration problem in method, semanteme and big data exchange process.Main research contents includes: that distribution is deposited
Storing up contents, the targets such as the extraction of data source under environment, data transformation, data integration and final data fusion is to establish one
Relative quiescent, unified data integration, data management and Data analytic environment.Big data specification is in mass data analytic process
In, raw data set to be analyzed is not used directly, and use the subset of raw data set as analysis object, it is close to obtain
Like the method for analytical effect.Wherein, feature selecting is exactly a kind of method of big data specification.
In practical application scene, feature quantity is often more, wherein there may be incoherent feature, between feature
It there may be interdepending, is easy that the time needed for leading training pattern is longer, and model is excessively complicated, causes dimension disaster etc..It is special
Sign selection can reject uncorrelated or high Yu Tezheng, to reach reduction Characteristic Number, improve classification learning efficiency and model is accurate
Degree, reduces the purpose of runing time.On the other hand, real relevant feature reduction model is selected, researcher is easy to
Understand the process that data generate.There are many algorithms to be used to carry out the selection of feature at present, some are based on evaluation function to spy
Sign importance is ranked up, its quality is measured by the feature inside analysis character subset, and common evaluation index is based on
Information gain is based on distance, based on correlation etc.;In addition to this, there are also the character subsets of some algorithms selection to sample set
Classify, using the precision of classification as the standard for measuring character subset quality.But these algorithms solely evaluate some spy
The quality of sign does not account for influencing each other between the different manifestations and feature of different characteristic combination.
Swarm Intelligence Algorithm achieves good result on feature selection issues.It explains spring duckweed et al. and uses genetic algorithm
(genetic algorithm, GA) and it is based on associated feature selecting (Correlation-based Feature
Selection, CFS) mode that combines realizes the feature selecting of Chinese Web Page Automatic Classification.The disadvantage is that operation is comparatively laborious,
Variation mechanism leads to bad stability, and calculation amount increases, and the training time is longer.N.Cleetus etc. is selected using improved PSO algorithm
The optimization feature of intrusion detection is taken to combine, compared with GA, PSO algorithm does not have the operation such as more complicated variation, variation in GA, only
It is adaptively adjusted using individual experience and species characteristic, rule is relatively simple, and convergence rate is very fast.But it is easily trapped into simultaneously
Locally optimal solution causes convergence precision low and is not easy the defects of restraining.Ant group algorithm (the Ant Colony such as T.Mehmood
Optimization, ACO) it carries out Feature Selection and is flowed with support vector machines (Support Vector Machine, SVM)
Amount classification carries out Network anomaly detection, also achieves good experimental result.However that there are still parameters is more, convergence for these algorithms
Speed is slow, realizes the disadvantages of complicated, needs that algorithm is further improved.
Bat algorithm (Bat Algorithm, BA) is that Xin-sheYang proposed a kind of acquisition globally optimal solution in 2010
Heuristic search algorithm.For its Inspiration Sources in the bionics behavior of the Nature bat, main thought is that simulation bat is catching
Echolocation behavior in food.Different bat random flight in population perceives ambient enviroment using echolocation, finds
Target prey.The position of bat is exactly the solution of this kind of optimization problem.For feature selection issues, usually changing using bat algorithm
Into version-binary system bat algorithm.The present invention regards the process of feature selecting as in population individual is made iteratively position shifting
Dynamic and target search process.In general, the superiority and inferiority of fitness function measurement solution can be used.Bat algorithm has model letter
The advantages that list, strong robustness, high degree of concurrence.From the fervent concern for just receiving numerous scholars that comes out, theoretical and application is passed through
Development in recent years has also obtained very big progress, is commonly used in natural science and engineering practice by numerous researchers,
Such as engineering optimization, in the fields such as pattern-recognition, K mean cluster, feature selecting and data mining.
Bat algorithm possesses the strong point of Swarm Intelligence Algorithm, such as when powerful global search range and shorter convergence
Between, but disadvantage common there is also some Swarm Intelligence Algorithms simultaneously, for example it is easy to happen the precocious phenomenon of algorithm, easily fall into office
The shortcomings that portion's optimal solution.Because each bat is influenced by global optimum's individual merely, it is difficult efficiently to carry out letter with neighbours
Breath exchange.Meanwhile algorithm itself lacks Variation mechanism, so that group's Personal position lack of diversity.
Summary of the invention
The technical problems to be solved by the present invention are: providing a kind of based on the big data feature selecting side for improving bat algorithm
Method is mutually learnt between population and the efficient transmitting of information with enhancing, promotes the otherness and search capability of individual, avoid receiving too early
It holds back.
Inventive concept of the invention is: three screens fusion back is selected by improving a kind of intelligent optimization algorithm-bat algorithm
The more excellent feature combination of wisdom people's livelihood big data under scape.The present invention is a kind of based on the big data feature selecting for improving bat algorithm
Method, the shortcomings that easily falling into local optimum for bat algorithm, propose to introduce the sub- population dividing mechanism based on K-means algorithm
With binary system difference Variation mechanism.Improved algorithm, which enhances, mutually to be learnt between population and the efficient transmitting of information, is improved
The otherness and search capability of individual, avoid Premature Convergence.
To solve technical problem of the invention, the technical solution adopted by the present invention is as follows:
The present invention is a kind of to be included the following steps: based on the big data feature selection approach for improving bat algorithm
(1) relevant parameter of bat algorithm is initialized, the relevant parameter includes: bat group number of individuals N, maximum impulse
Volume A0, maximum pulse rate R0, search pulse frequency range [fmin,fmax], the attenuation coefficient α of volume, the enhancing system of search rate
Number γ, maximum number of iterations Nt;
The position of random initializtion bat by the following method generates N number of candidate feature combination:
For i-th of bat, according to the spatial position x of bati=(xi,1,xi,2,...,xi,d) and speed vi=(vi,1,
vi,2,...,vi,d), the spatial position of bat is abstracted into the string of binary characters of a d dimension space, the d dimension space two into
Character string processed is a candidate feature combination, wherein d is candidate feature number;The position that the value of the string of binary characters is 1
Indicate that the feature of current location is selected, the value of the string of binary characters is that the feature of 0 expression current location is not selected;
(2) the fitness value f (x of each bat is calculated according to the fitness function of formula (1)i), and from all bats
Find out the position G of current optimal bat0;
F=0.6 × R+0.4 × e-F×A (1)
In formula (1), R, F and A are respectively indicated to be combined using feature selected by current iteration and be called together as what input was classified
Return rate, F score and accuracy rate;
The inertia coeffeicent and the self study factor of epicycle iteration are updated according to formula (2) and formula (3);
In formula (2) and formula (3), WtInertia coeffeicent when indicating each bat iteration t times, CtIndicate that each bat changes
Self study factor when for t times, WmaxIt is maximum value, the W of inertia coeffeicentminIt is the minimum value of inertia coeffeicent, CmaxSelf study because
The maximum value of son, CminIt is the minimum value of the self study factor, NtIt is maximum number of iterations;
The probability P whether control bat executes the variation of mutation operation is calculated according to formula (4)t:
In formula (4), NtIt is maximum number of iterations, t indicates the number of iterations;
The contraction factor F being randomly generated between 0,1 is calculated according to formula (5)t:
In formula (5), FmaxIt is the maximum value of contraction factor, FminIt is the minimum value of contraction factor, NtIt is greatest iteration time
Number;
(3) bat group is divided into according to the distance between bat by sub- population using K-means algorithm;
(4) inside that each bat in every sub- population is accordingly updated according to the method as shown in formula (6)-(9) becomes
Measure search pulse frequency fi, flying speed vi, spatial position xi;For the non local optimum individual drawn game inside every sub- population
Portion's optimum individual updates flying speed according to formula (7) and (8) respectively:
fi=fmin+(fmax-fmin)·β (6)
vi t=Wt·vi t-1+(xi t-1-Mn t-1)·fi+(xi t-1-Pi t-1)·Ct (7)
vi t=Wt·vi t-1+(xi t-1-Gt-1)·fi+(xi t-1-Pi t-1)·Ct (8)
In formula (6), fmaxAnd fminIt is the maximum value of pulse frequency and the minimum value of pulse frequency respectively, β is one equal
The stochastic variable of even distribution, and β ∈ [0,1];In formula (7) and (8), vi tAnd vi t-1Bat individual i is respectively indicated in t and t-1
The flying speed formula at moment;xi tAnd xi t-1Bat individual i is respectively indicated at the location of t and t-1 moment;Mn t-1For son kind
Group n is in the position of the more excellent individual in t-1 moment part;Gt-1It is entire group in t-1 moment global best bat position;Pi t-1For
Each bat i is in the history optimum position that the t-1 moment retains;WtAnd CtThe inertia coeffeicent of bat and self study when indicating iteration t times
The factor;Formula (9) δ is a stochastic variable, and δ ∈ [0,1], S are Sigmoid functions;
(5) a random number rand1 is generated, if rand1 > ri, then according to formula (10) to bat group it is current most
Good bat position carries out random perturbation and obtains the new position x of the batnew, then execute step (6);If rand1≤ri,
Then follow the steps (7);riIt is the pulse frequency of i-th of bat of current iteration:
xnew=xold+δ·At * (10)
Wherein xoldIt is the original position of bat, δ is the random number between -1 and 1.At *It is t wheel all bats of iteration
Mean loudness;
(6) a random number rand2 is generated, if rand2 < AiAnd f (xi) < f (xnew), then with new position xnewIt replaces
The original position of the bat is changed, and accordingly updates the bat in the volume A of current iteration number t according to formula (11), (12)i tAnd arteries and veins
Rush frequency ri t;Otherwise, step (7) are executed;Wherein, f (xi) indicate the fitness value of the bat original position, f (xnew) indicate the bat
The fitness value of the new position of bat;
Ai t=α Ai t-1 (11)
Wherein Ai t-1Indicate volume of the bat individual i at the t-1 moment, α is constant, α ∈ (0,1);ri 0Indicate bat individual i
Pulse frequency at the beginning, γ are constant, γ > 0;
(7) a random number rand3 is generated to each bat, according to formula (13) to wherein meeting rand3 < PtBat
Present speed executes mutation;
Wherein r1, r2, r5Be with target bat in same sub- population randomly selected individual, r3, r4It is different sons kind
The bat randomly selected in group;"+" is logic xor operationIt is logic or operation.Rand be generated between 0 to 1 with
Machine number." " is if indicate condition rand < FtMeet, the operation in bracket will execute;
(8) fitness value of all bats is ranked up, is waited using the position of the highest bat of fitness value as current
Feature is selected to combine;Judge whether the combination of current candidate feature meets preset optimal conditions, if satisfied, then with current candidate
The feature combination of feature combination alternatively;If not satisfied, then returning to step (2).
Compared with the prior art, the advantages of the present invention are as follows:
The improved bat algorithm of the present invention makes searching for individual by inertia coeffeicent and the Studying factors control changed over time
Rope and optimizing ability adaptive change.By introducing sub- population dividing mechanism, so that study and movement of the individual to optimal location
It can be unfolded between population inside sub- population, both ensure that the transmitting of optimization information between individuals, in turn avoided part
The extreme influence of optimum individual.The introducing of binary system differential variation enhances the diversity of body position, in an iterative process for
The population for tending to assimilation introduces new vitality.Compared with common model, the improved bat algorithm of the present invention accelerates convergence rate,
Improve effect of optimization.After carrying out feature selecting using the algorithm, selected characteristic of the present invention is huge to subsequent classification judgement contribution,
Improve the precision and performance of classification.
Detailed description of the invention
Fig. 1 is sub- population dividing schematic diagram of mechanism;
Fig. 2 is that the performance of different groups intelligent algorithm compares figure;
Fig. 3 is that the algorithm performance of Different Individual number and the number of iterations compares figure;
Fig. 4 is that the performance of different population division methods compares figure.
Specific embodiment
The present invention is a kind of wisdom people's livelihood big data feature choosing based on improvement bat algorithm merged under background in three screens
Selection method, it utilizes and improves Swarm Intelligent Algorithm-bat algorithm to select more excellent feature.It combines candidate feature and regards
For position individual in bat algorithm, regard the process of feature selecting as in population it is mobile to be made iteratively position for bat individual
With the process of target search, the global optimum position finally searched is the feature of selection.For original bat algorithm, do as
Lower improvement: introducing the sub- population dividing mechanism based on K-means algorithm, enhances and efficiently learns between neighborhood individual inside sub- population
Optimize the transmitting of information between sub- population;Introduce binary system difference Variation mechanism;In addition to this, draw in speed more new formula
The inertial factor and self study coefficient for entering linear time-varying introduce mutation probability and constriction coefficient in variation, so that bat is individual
Search capability with the number of iterations adaptive change, avoid falling into locally optimal solution too early, accelerate convergence rate.Changed using above-mentioned
Feature selecting is carried out into bat algorithm, reduces the time of screening feature, the feature selected is more favorable to subsequent classification, performance
Preferably.
Specifically, the present invention includes the following steps:
(1) relevant parameter of bat algorithm is initialized, the relevant parameter includes: bat group number of individuals N, maximum impulse
Volume A0, maximum pulse rate R0, search pulse frequency range [fmin,fmax], the attenuation coefficient α of volume, the enhancing system of search rate
Number γ, maximum number of iterations Nt;
The position of random initializtion bat by the following method generates N number of candidate feature combination:
For i-th of bat, according to the spatial position x of bati=(xi,1,xi,2,...,xi,d) and speed vi=(vi,1,
vi,2,...,vi,d), the spatial position of bat is abstracted into the string of binary characters of a d dimension space, the d dimension space two into
Character string processed is a candidate feature combination, wherein d is candidate feature number;The position that the value of the string of binary characters is 1
Indicate that the feature of current location is selected, the value of the string of binary characters is that the feature of 0 expression current location is not selected.
(2) the fitness value f (x of each bat is calculated according to the fitness function of formula (1)i), and from all bats
Find out the position G of current optimal bat0。
F=0.6 × R+0.4 × e-F×A (1)
In formula (1), R, F and A are respectively indicated to be combined using feature selected by current iteration and be called together as what input was classified
Return rate, F score and accuracy rate;Since the feature combination of selection has great influence to final classification result, so defined herein
Fitness function is determined by the evaluation index using classifying quality after selected feature training classifier.
The inertia coeffeicent and the self study factor of epicycle iteration are updated according to formula (2) and formula (3);
In formula (2) and formula (3), WtInertia coeffeicent when indicating each bat iteration t times, CtIndicate that each bat changes
Self study factor when for t times, WmaxIt is maximum value, the W of inertia coeffeicentminIt is the minimum value of inertia coeffeicent, CmaxSelf study because
The maximum value of son, CminIt is the minimum value of the self study factor, NtIt is maximum number of iterations.
The present invention improves the optimization energy of algorithm by introducing the inertia weight Wt linearly reduced with the number of iterations increase
Power.In earlier iterations, bat has biggish inertia coeffeicent and higher speed, has more powerful ability of searching optimum.
Into the later period, lesser inertia weight facilitates more accurate local search, thus accelerating algorithm convergence rate.By using ginseng
Number Ct draws the advantage that bat history bit-by-bit sets study, promotes speed more new effects.The parameter of dynamic self-adapting variation represents
Influence degree of the history optimal location to present speed.Just start that there is the bat of larger Ct to fly near oneself current location
Row has preferable local exploring ability.Later Ct is gradually reduced, and bat position is mainly influenced by global optimum's individual, is mentioned
Its high development ability.
The probability P whether control bat executes the variation of mutation operation is calculated according to formula (4)t:
In formula (4), NtIt is maximum number of iterations, t indicates the number of iterations;In earlier iterations, every bat has smaller
Mutation probability, its search capability optimizing can be made full use of in biggish space.As the number of iterations increases, bat more has
It may make a variation, break the constraint of local optimum, avoid precocity.
The contraction factor F being randomly generated between 0,1 is calculated according to formula (5)t:
In formula (5), FmaxIt is the maximum value of contraction factor, FminIt is the minimum value of contraction factor, NtIt is greatest iteration time
Number;Contraction factor controls mutation operation by the influence between control difference vector.Biggish F value helps to maintain population
The diversity of body, and lesser F can make individual obtain better local search ability.
(3) bat group is divided into according to the distance between bat by sub- population using K-means algorithm.
Specifically, population is divided into according to the distance between individual using K-means algorithm in each iterative process
The sub- population of fixed number.Algorithm one is divided into two levels: in first level, the individual inside every sub- population is only to it
Current local optimum individual study and movement.Individual speed changes the only shadow by oneself history optimal location and locally optimal solution
It rings;In second level, the local optimum individual of every sub- population similarly learns to global optimum's individual and mobile, the overall situation
Information can be spread apart between population to be come.In this way, and the iteration mobile to more excellent individual of bat individual layering
Ground obtains better position, avoids the extreme influence by a certain individual.All bats all can group again after each iteration
Synthon population, until algorithm terminates.
(4) inside that each bat in every sub- population is accordingly updated according to the method as shown in formula (6)-(9) becomes
Measure search pulse frequency fi, flying speed vi, spatial position xi;For the non local optimum individual drawn game inside every sub- population
Portion's optimum individual updates flying speed according to formula (7) and (8) respectively:
fi=fmin+(fmax-fmin)·β (6)
vi t=Wt·vi t-1+(xi t-1-Mn t-1)·fi+(xi t-1-Pi t-1)·Ct (7)
vi t=Wt·vi t-1+(xi t-1-Gt-1)·fi+(xi t-1-Pi t-1)·Ct (8)
In formula (6), fmaxAnd fminIt is the maximum value of pulse frequency and the minimum value of pulse frequency respectively, β is one equal
The stochastic variable of even distribution, and β ∈ [0,1];In formula (7) and (8), vi tAnd vi t-1Bat individual i is respectively indicated in t and t-1
The flying speed formula at moment;xi tAnd xi t-1Bat individual i is respectively indicated at the location of t and t-1 moment;Mn t-1For son kind
Group n is in the position of the more excellent individual in t-1 moment part;Gt-1It is entire group in t-1 moment global best bat position;Pi t-1For
Each bat i is in the history optimum position that the t-1 moment retains;WtAnd CtThe inertia coeffeicent of bat and self study when indicating iteration t times
The factor;Formula (9) δ is a stochastic variable, and δ ∈ [0,1], S are Sigmoid functions.
(5) a random number rand1 is generated, if rand1 > ri, then according to formula (10) to bat group it is current most
Good bat position carries out random perturbation and obtains the new position x of the batnew, then execute step (6);If rand1≤ri,
Then follow the steps (7);riIt is the pulse frequency of i-th of bat of current iteration:
xnew=xold+δ·At * (10)
Wherein xoldIt is the original position of bat, δ is the random number between -1 and 1.At *It is t wheel all bats of iteration
Mean loudness.
(6) a random number rand2 is generated, if rand2 < AiAnd f (xi) < f (xnew), then with new position xnewIt replaces
The original position of the bat is changed, and accordingly updates the bat in the volume A of current iteration number t according to formula (11), (12)i tAnd arteries and veins
Rush frequency ri t;Otherwise, step (7) are executed;Wherein, f (xi) indicate the fitness value of the bat original position, f (xnew) indicate the bat
The fitness value of the new position of bat.
Ai t=α Ai t-1 (11)
Wherein Ai t-1Indicate volume of the bat individual i at the t-1 moment, α is constant, α ∈ (0,1);ri 0Indicate bat individual i
Pulse frequency at the beginning, γ are constant, γ > 0;When bat individual is close to target, it can reduce A according to such as upper typei
And increase ri。
(7) a random number rand3 is generated to each bat, according to formula (13) to wherein meeting rand3 < PtBat
Present speed executes mutation.
Wherein r1, r2, r5Be with target bat in same sub- population randomly selected individual, and r3, r4It is different sons
The bat randomly selected in population.Since the position and speed of bat each in space is indicated by string of binary characters, the present invention
Mutation process can be realized using logical operation."+" is logic xor operationIt is logic or operation.Rand be 0 to 1 it
Between the random number that generates." " is if indicate condition rand < FtMeet, the operation in bracket will execute.
(8) fitness value of all bats is ranked up, is waited using the position of the highest bat of fitness value as current
Feature is selected to combine;Judge whether the combination of current candidate feature meets preset optimal conditions, if satisfied, then with current candidate
The feature combination of feature combination alternatively;If not satisfied, then returning to step (2).
The present invention is further introduced into the Mutation Mechanism of Differential Evolution, can enhance population diversity, improves bat individual and jumps out
The ability of local optimum.By randomly selecting individual in population, is brought using the otherness between individual to target value and centainly disturbed
It is dynamic.This disturbance contains otherness individual between different population in same population.The advantages of operating in this way is both to keep
The superiority of former individual self-position, avoids unnecessary variation bring reduced performance.Meanwhile and introduce different groups it
Between otherness, promote entire Evolution of Population.
Technical solution of the present invention is further illustrated with specific embodiment below.This implementation utilizes network invasion monitoring
1999 data set of KDD CUP will select network flow characteristic as input, random forest (Random Forest, RF) algorithm
Classify as classifier to network flow, passes through the superiority and inferiority of the quality verifying feature selecting of classifying quality.Data set details
Feature is as follows:
1 experimental data details of table
There are some improved models for bat algorithm at present.A.C.Enache et al. proposes to utilize Levy flight
The randomness of method enhancing candidate solution.T.Kanungo et al. proposes to generate using the Euclidean distance of current candidate solution and more excellent solution
Increment generates a new explanation at random, to prevent algorithm from falling into local optimum.The present invention has the innovatory algorithm of proposition with some
Improvement bat algorithm and common Swarm Intelligence Algorithm be compared, they are used for feature selecting.The accuracy of classification and
Error rate can be shown in Table 2.As seen from table, the present invention realizes nicety of grading 96.03%, error rate 1.18%, and performance is better than other
Comparison other, provable innovatory algorithm proposed by the present invention can more effectively screen feature, improve the accuracy rate of classification, reduce
Error rate.
The performance of 2 different characteristic selection algorithm of table compares
Algorithms of different | Accuracy (%) | Error rate (%) |
ACO | 94.25 | 4.78 |
PSO | 94.52 | 3.99 |
BA | 94.93 | 3.68 |
A.C.Enache proposes algorithm | 95.35 | 2.74 |
T.Kanungo proposes algorithm | 95.64 | 2.06 |
It proposes to improve BA algorithm | 96.03 | 1.18 |
In addition to this, the present invention also demonstrates the present invention performance optimization and convergence speed during optimal characteristics combinatorial search
Improvement on degree.As shown in Figure 2, improving BA algorithm, it is multiple to reduce the time far earlier than other algorithms in 40 wheel iteration or so convergence
Miscellaneous degree.Meanwhile from the point of view of the high and low position of curve, the fitness for the optimal solution that algorithm is obtained in convergence is also above other calculations
Method.From simulation result, it can be concluded that, the introducing of sub- population dividing and binary system difference Variation mechanism is so that algorithm is easier to jump out
Local optimum obtains better feature combination, and the parameter of linear time-varying also enhances the News Search ability of individual, meets not
With the needs of stage Search.
Since the size of population and the number of iterative search are to solve for very important two parameters of optimization problem, so figure
3 demonstrate influence of the different parameters value to optimization problem.It can be seen from the figure that when individual amount determines in population, with
The increase of the number of iterations, nicety of grading also rise with it.It is inferred that population is constantly evolved under certain the number of iterations
More excellent solution is searched, but eventually finds approximate optimal solution and one fixed wheel number of convergence domain.When fixed population the number of iterations, contain
The larger population of more individuals compares Small Population and performs better than.This is because having bigger difference in biggish population between individual
The opposite sex, individual between effectively can exchange and interact, be able to carry out larger range of search, avoid converging on local optimum.
In innovatory algorithm, an important optimization is to introduce sub- population dividing mechanism.The method for dividing population is more
Kind multiplicity.Individual is only randomly assigned in different clusters by some methods, and other methods are referred to according to various evaluations
Mark is clustered.So the present invention using different demarcation method the influence to BA algorithm and provide final convergence result as schemed
4.It compared to other broken lines, represents and is restrained at first using the broken line of K-means clustering algorithm with more precipitous slope, and restrain
When the fitness highest that reaches.It can be proved that divide sub- population using K-means clustering algorithm, can be realized better performance,
Higher adaptability and faster convergence rate, because neighbouring individual is based on distance and is focused into identical sub- population.One
Aspect, entire population are mobile to current optimum position by knowledge sharing between study inside every sub- population and population.
On the other hand, each individual is only influenced by the current locally optimal solution of place population, slowly mobile, avoids it largely
Interference.
Claims (1)
1. a kind of based on the big data feature selection approach for improving bat algorithm, which comprises the steps of:
(1) relevant parameter of bat algorithm is initialized, the relevant parameter includes: bat group number of individuals N, maximum impulse volume
A0, maximum pulse rate R0, search pulse frequency range [fmin,fmax], the attenuation coefficient α of volume, the enhancing coefficient of search rate
γ, maximum number of iterations Nt;
The position of random initializtion bat by the following method generates N number of candidate feature combination:
For i-th of bat, according to the spatial position x of bati=(xi,1,xi,2,...,xi,d) and speed vi=(vi,1,
vi,2,...,vi,d), the spatial position of bat is abstracted into the string of binary characters of a d dimension space, the d dimension space two into
Character string processed is a candidate feature combination, wherein d is candidate feature number;The position that the value of the string of binary characters is 1
Indicate that the feature of current location is selected, the value of the string of binary characters is that the feature of 0 expression current location is not selected;
(2) the fitness value f (x of each bat is calculated according to the fitness function of formula (1)i), and find out and work as from all bats
The position G of preceding optimal bat0;
F=0.6 × R+0.4 × e-F×A (1)
In formula (1), R, F and A respectively indicate using feature selected by current iteration combination as input classify recall rate,
F score and accuracy rate;
The inertia coeffeicent and the self study factor of epicycle iteration are updated according to formula (2) and formula (3);
In formula (2) and formula (3), WtInertia coeffeicent when indicating each bat iteration t times, CtIt indicates each bat iteration t times
When the self study factor, WmaxIt is maximum value, the W of inertia coeffeicentminIt is the minimum value of inertia coeffeicent, CmaxIt is the self study factor
Maximum value, CminIt is the minimum value of the self study factor, NtIt is maximum number of iterations;
The probability P whether control bat executes the variation of mutation operation is calculated according to formula (4)t:
In formula (4), NtIt is maximum number of iterations, t indicates the number of iterations;
The contraction factor F being randomly generated between 0,1 is calculated according to formula (5)t:
In formula (5), FmaxIt is the maximum value of contraction factor, FminIt is the minimum value of contraction factor, NtIt is maximum number of iterations;
(3) bat group is divided into according to the distance between bat by sub- population using K-means algorithm;
(4) it is searched according to the built-in variable that the method as shown in formula (6)-(9) accordingly updates each bat in every sub- population
Rope pulse frequency fi, flying speed vi, spatial position xi;For inside every sub- population non local optimum individual and part most
Excellent individual updates flying speed according to formula (7) and (8) respectively:
fi=fmin+(fmax-fmin)·β (6)
vi t=Wt·vi t-1+(xi t-1-Mn t-1)·fi+(xi t-1-Pi t-1)·Ct (7)
vi t=Wt·vi t-1+(xi t-1-Gt-1)·fi+(xi t-1-Pi t-1)·Ct (8)
In formula (6), fmaxAnd fminIt is the maximum value of pulse frequency and the minimum value of pulse frequency respectively, β is one and uniformly divides
The stochastic variable of cloth, and β ∈ [0,1];In formula (7) and (8), vi tAnd vi t-1Bat individual i is respectively indicated at t the and t-1 moment
Flying speed formula;xi tAnd xi t-1Bat individual i is respectively indicated at the location of t and t-1 moment;Mn t-1For sub- population n
In the position of the more excellent individual in t-1 moment part;Gt-1It is entire group in t-1 moment global best bat position;Pi t-1It is each
Bat i is in the history optimum position that the t-1 moment retains;WtAnd CtIndicate iteration t times when bat inertia coeffeicent and self study because
Son;Formula (9) δ is a stochastic variable, and δ ∈ [0,1], S are Sigmoid functions;
(5) a random number rand1 is generated, if rand1 > ri, then according to formula (10) to the current best bat of bat group
Bat position carries out random perturbation and obtains the new position x of the batnew, then execute step (6);If rand1≤ri, then hold
Row step (7);riIt is the pulse frequency of i-th of bat of current iteration:
xnew=xold+δ·At * (10)
Wherein xoldIt is the original position of bat, δ is the random number between -1 and 1.At *It is the flat of t wheel all bats of iteration
Equal loudness;
(6) a random number rand2 is generated, if rand2 < AiAnd f (xi) < f (xnew), then with new position xnewReplacement should
The original position of bat, and the bat is accordingly updated in the volume A of current iteration number t according to formula (11), (12)i tAnd pulse frequency
Rate ri t;Otherwise, step (7) are executed;Wherein, f (xi) indicate the fitness value of the bat original position, f (xnew) indicate that the bat is new
The fitness value of position;
Ai t=α Ai t-1 (11)
ri t=ri 0·[1-e-γt] (12)
Wherein Ai t-1Indicate volume of the bat individual i at the t-1 moment, α is constant, α ∈ (0,1);ri 0Indicate bat individual i first
The pulse frequency when beginning, γ are constant, γ > 0;
(7) a random number rand3 is generated to each bat, according to formula (13) to wherein meeting rand3 < PtBat it is current
Speed executes mutation;
Wherein r1, r2, r5Be with target bat in same sub- population randomly selected individual, r3, r4It is in different sub- populations
The bat randomly selected;"+" is logic xor operationIt is logic or operation.Rand be generated between 0 to 1 it is random
Number." " is if indicate condition rand < FtMeet, the operation in bracket will execute;
(8) fitness value of all bats is ranked up, it is special using the position of the highest bat of fitness value as current candidate
Sign combination;Judge whether the combination of current candidate feature meets preset optimal conditions, if satisfied, then with current candidate feature
The feature combination of combination alternatively;If not satisfied, then returning to step (2).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811642556.5A CN109711373A (en) | 2018-12-29 | 2018-12-29 | A kind of big data feature selection approach based on improvement bat algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811642556.5A CN109711373A (en) | 2018-12-29 | 2018-12-29 | A kind of big data feature selection approach based on improvement bat algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109711373A true CN109711373A (en) | 2019-05-03 |
Family
ID=66259616
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811642556.5A Pending CN109711373A (en) | 2018-12-29 | 2018-12-29 | A kind of big data feature selection approach based on improvement bat algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109711373A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111080031A (en) * | 2019-12-27 | 2020-04-28 | 圆通速递有限公司 | Vehicle path optimization method and system based on improved dragonfly algorithm |
CN111368900A (en) * | 2020-02-28 | 2020-07-03 | 桂林电子科技大学 | Image target object identification method |
CN112308168A (en) * | 2020-11-09 | 2021-02-02 | 国家电网有限公司 | Method for detecting voltage data abnormity in power grid |
CN112800224A (en) * | 2021-01-28 | 2021-05-14 | 中南大学 | Text feature selection method and device based on improved bat algorithm and storage medium |
CN113076695A (en) * | 2021-04-12 | 2021-07-06 | 湖北民族大学 | Ionosphere high-dimensional data feature selection method based on improved BBA algorithm |
-
2018
- 2018-12-29 CN CN201811642556.5A patent/CN109711373A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111080031A (en) * | 2019-12-27 | 2020-04-28 | 圆通速递有限公司 | Vehicle path optimization method and system based on improved dragonfly algorithm |
CN111368900A (en) * | 2020-02-28 | 2020-07-03 | 桂林电子科技大学 | Image target object identification method |
CN112308168A (en) * | 2020-11-09 | 2021-02-02 | 国家电网有限公司 | Method for detecting voltage data abnormity in power grid |
CN112800224A (en) * | 2021-01-28 | 2021-05-14 | 中南大学 | Text feature selection method and device based on improved bat algorithm and storage medium |
CN113076695A (en) * | 2021-04-12 | 2021-07-06 | 湖北民族大学 | Ionosphere high-dimensional data feature selection method based on improved BBA algorithm |
CN113076695B (en) * | 2021-04-12 | 2022-06-17 | 湖北民族大学 | Ionosphere high-dimensional data feature selection method based on improved BBA algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109711373A (en) | A kind of big data feature selection approach based on improvement bat algorithm | |
CN107590436B (en) | Radar emitter signal feature selection approach based on peplomer subgroup multi-objective Algorithm | |
He et al. | A discrete multi-objective fireworks algorithm for flowshop scheduling with sequence-dependent setup times | |
Sikora | A modified stacking ensemble machine learning algorithm using genetic algorithms | |
Zeng et al. | Accurately clustering single-cell RNA-seq data by capturing structural relations between cells through graph convolutional network | |
CN107392919B (en) | Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method | |
CN108875816A (en) | Merge the Active Learning samples selection strategy of Reliability Code and diversity criterion | |
CN102214213A (en) | Method and system for classifying data by adopting decision tree | |
CN107992887A (en) | Classifier generation method, sorting technique, device, electronic equipment and storage medium | |
CN111105045A (en) | Method for constructing prediction model based on improved locust optimization algorithm | |
CN110287985B (en) | Depth neural network image identification method based on variable topology structure with variation particle swarm optimization | |
Carvalho et al. | Tree-Based Methods: Concepts, Uses and Limitations under the Framework of Resource Selection Models. | |
CN110909158B (en) | Text classification method based on improved firefly algorithm and K nearest neighbor | |
CN107798379A (en) | Improve the method for quantum particle swarm optimization and the application based on innovatory algorithm | |
CN106971091A (en) | A kind of tumour recognition methods based on certainty particle group optimizing and SVMs | |
CN109344956A (en) | Based on the SVM parameter optimization for improving Lay dimension flight particle swarm algorithm | |
CN108629400A (en) | A kind of chaos artificial bee colony algorithm based on Levy search | |
CN111832135A (en) | Pressure container structure optimization method based on improved Harris eagle optimization algorithm | |
CN110059756A (en) | A kind of multi-tag categorizing system based on multiple-objection optimization | |
CN107195297A (en) | A kind of normalized TSP question flock of birds speech recognition system of fused data | |
CN109978023A (en) | Feature selection approach and computer storage medium towards higher-dimension big data analysis | |
CN110796198A (en) | High-dimensional feature screening method based on hybrid ant colony optimization algorithm | |
CN115688097A (en) | Industrial control system intrusion detection method based on improved genetic algorithm feature selection | |
CN115101118A (en) | Method for predicting serum-free medium component concentration based on machine learning | |
CN107995027A (en) | Improved quantum particle swarm optimization and the method applied to prediction network traffics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190503 |