CN108830370B

CN108830370B - Feature selection method based on reinforced learning type flora foraging algorithm

Info

Publication number: CN108830370B
Application number: CN201810508479.8A
Authority: CN
Inventors: 姜慧研; 董万鹏; 马连博
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2018-05-24
Filing date: 2018-05-24
Publication date: 2020-11-10
Anticipated expiration: 2038-05-24
Also published as: CN108830370A

Abstract

The invention discloses a feature selection method based on a reinforced learning type flora foraging algorithm, which comprises the following steps: initializing the position, the maximum cycle number value and the initial value of the iteration times of the bacterial community; each thallus in the bacterial community represents a weight vector of a feature vector to be selected; selecting and executing a motion behavior for each thallus according to a maximum historical experience value strategy in the RL, and obtaining an updated position of each thallus after updating and a fitness value of each thallus after updating; based on RL rules, obtaining a feedback value aiming at the change of each thallus fitness value; and updating the historical experience value accumulated by each thallus according to the feedback value, increasing the iteration times by 1, and repeating the process until the iteration times are greater than the maximum cycle value, so as to output the bacterial community. The method of the invention adopts the reinforcement learning type optimization mode to replace the traditional probability type optimization mode, can obtain better recognition result and consumes less time.

Description

Feature selection method based on reinforced learning type flora foraging algorithm

Technical Field

The invention belongs to a feature selection technology, and particularly relates to a feature selection method based on an enhanced learning type flora foraging algorithm.

Background

In recent years, the development of bio-heuristic calculations has been enormous. Researchers are inspired by robustness and adaptivity of a biological system in response to a complex environment, provide a plurality of calculation models and algorithms for simulating biological foraging behavior to solve various complex optimization problems in complex engineering, and can be conveniently applied to the fields of networked engineering calculation, image processing and the like.

The group intelligent algorithm belongs to a biological heuristic optimization algorithm. The novel heuristic optimization algorithm has the characteristics of potential parallelism, distribution, reconfigurability and the like. The method is to describe the optimization problem to be solved in the form of an objective function by a mathematical model established by simulating the behavior of a natural biological population. Bacterial foraging algorithm (BFO) is an optimized model that simulates the foraging behavior of a Bacterial colony, and is one of the colony intelligent algorithms. Although BFO exhibits fine search characteristics and global optimization in low-dimensional continuity optimization problems; however, when faced with the problem of high-dimensional discreteness, it in turn creates pre-convergence phenomena due to the tendency to fall into a locally optimal solution. Therefore, how BFO solves these problems becomes a hot spot for research in the field of population intelligence.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a feature selection method based on an enhanced learning type flora foraging algorithm, which can not fall into the problem of convergence of a local optimal solution when facing a high-dimensional discreteness problem.

In a first aspect, the invention provides a feature selection method based on a foraging algorithm of an enhanced learning type flora, comprising the following steps:

step S1, initializing the position of the bacterial community, setting the maximum cycle value, and setting the maximum cycle value and the initial value of the iteration times; each bacterial thallus in the bacterial community represents a weight vector of a feature vector to be selected;

step S2, selecting a motion behavior for each bacterial thallus in the bacterial community according to a maximum historical experience value strategy in the reinforced learning RL;

step S3, after each bacterial thallus executes the movement behavior, the updated update position of each bacterial thallus is obtained;

step S4, acquiring the fitness value of each bacterial thallus after the position is updated;

step S5, based on RL rule, obtaining feedback value according to the change of fitness value of the position before and after updating of each bacteria thallus;

step S6, updating the historical experience value accumulated by each thallus according to the feedback value, and outputting a bacterial community;

and step S7, increasing the iteration number by 1, and repeating the steps S2 to S6 until the iteration number is larger than or equal to the maximum cycle value, and outputting the bacterial community.

Optionally, the athletic performance includes one or more of:

adaptive chemotactic behavior;

a copying behavior;

a reinforced dispelling behavior;

confluent cross-behavior.

Optionally, the step S2 includes:

step 21, setting an initial state and an initial action for each bacterial thallus; setting a Q-matrix for each bacterial thallus to save accumulated historical experience, and initializing the Q-matrix to be 0;

step 22, the current state s of each bacterial cell_tSelecting the optimal action a according to the content of the Q-matrix_t；

Step 23, executing respective action a for each bacterial cell_t；

Accordingly, the step S3 of obtaining the updated position of each bacterial cell includes:

step S31, updating the data item (S) of the Q-matrix_t,a_t) Update status is s_t+1；

Accordingly, the obtaining of the feedback value in step S5 includes:

obtaining immediate feedback, i.e. feedback value r, of each bacterial cell_t+1。

Optionally, the step 22 comprises:

selecting an optimal action a according to the following equation (1)_t；

a_t＝Max[Q(state,actions)] (1)

Wherein the optimal action a_tAnd selecting according to the maximum value of the current state in the Q-matrix.

Optionally, the S5 includes:

step S51: comparing the fitness value of each bacterial thallus before and after the updated position, and acquiring a feedback value r according to the following formula (2);

when the fitness value of the bacterial thallus is optimized, obtaining a feedback value r which is 1; when the situation is opposite, the feedback value r is equal to-1.

Optionally, the Q-matrix calculating process in step S31:

wherein γ represents a group belonging to [0, 1 ]]The discount factor of (a); r is_t+1Indicating that it is in the current state s_tAgent of (2) performs action a_tImmediate feedback of the post acquisition; alpha denotes a learning rate to balance the search process and the development process, and iter and MaxCycle denote the current iteration number and the total iteration number, respectively.

In a second aspect, the present invention provides a further electronic device, comprising a memory, a processor, a bus and a computer program stored on the memory and executable on the processor, the processor implementing the steps as described above when executing the program.

The invention has the following beneficial effects:

aiming at the balance performance of global search and local search and poor convergence performance in the face of high-dimensional and discrete optimization problems in the traditional BFO, the invention adopts a mechanism of reinforcement learning to solve the two problems. The improved optimization algorithm mainly comprises several actions: adaptive chemotactic behavior, replicative behavior, enhanced migration behavior, cross-behavior. Since different behaviors can search for different solutions, when to invoke which behavior becomes a core issue. According to the mechanism of reinforcement learning, the learner agent (i.e. thallus) will select the next action according to the historical experience to obtain the learning environment and give the maximum reward. Thus, the critical issue of when to invoke a certain behavior is addressed in conjunction with a reinforcement learning mechanism.

In addition, a set of benchmark functions can be adopted in specific use to verify the convergence performance and effectiveness of the reinforcement learning type flora algorithm.

In group intelligence, the feature selection problem can be regarded as a discretized and highly-dimensional complex problem. Therefore, the traditional feature selection algorithm combining the optimization model with the classification criterion has difficulty in achieving the goals of high classification precision and less time consumption. The optimization method is improved, and the traditional probability type optimization mode is replaced by the reinforcement learning type optimization mode, so that a better identification result can be obtained, and the time consumption is less.

Further, in the case of a bio-heuristic algorithm, the present invention may strike an appropriate balance between exploration and development. The invention uses RL mechanism, which can improve the foraging efficiency of the thallus and reach the convergence state of the objective function as soon as possible. An RL-based optimization algorithm is used, and the Fisher criterion is used as a judgment criterion for feature selection, so that the classification precision is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

Fig. 1 and fig. 2 are respectively schematic flow diagrams of a feature selection method based on an e-learning flora foraging algorithm according to an embodiment of the present invention.

Detailed Description

For the purpose of better explaining the present invention and to facilitate understanding, the present invention will be described in detail by way of specific embodiments with reference to the accompanying drawings.

In the following description, various aspects of the invention will be described, however, it will be apparent to those skilled in the art that the invention may be practiced with only some or all of the structures or processes of the present invention. Specific numbers, configurations and sequences are set forth in order to provide clarity of explanation, but it will be apparent that the invention may be practiced without these specific details. In other instances, well-known features have not been set forth in detail in order not to obscure the invention.

Currently, the key to designing a robust swarm intelligence algorithm is how to balance the exploration and development process in the optimization process. Theoretically, solving this problem can be attributed to the efficient management of local searches, i.e., call time and call frequency. In addition, the traditional flora foraging algorithm has some defects, namely premature and high calculation cost. The innovation of the reinforcement learning type flora optimization model (RBCFO for short) is realized in the following aspects:

(1) a flora foraging algorithm based on an enhanced learning type enables communities to have intelligence. That is, intelligent actions can be selected among multi-level (chemotactic, replicative, elimination-decentralizing and crossing) behaviors based on RL rules. The flora model can provide adaptability, cooperation and intelligence for the flora foraging optimization method.

(2) And dynamically balancing the detection process and the development process in the optimization process. That is, the cells can adaptively adjust the moving steps.

(3) And the bacterial colony internal information is shared by adopting an information crossing mechanism among the bacterial cells.

Referring to fig. 1 and 2, the method of the present embodiment includes the following steps:

step S1: the position of the bacterial community is initialized, the maximum cycle value is set, and the number of iterations is initially set to 0.

In the present embodiment, each bacterial cell in the bacterial community represents a weight vector of a feature vector to be selected.

For better understanding of the contents of the present embodiment, the following describes the weights of the community, solution, and feature: the initialized communities/populations are assumed to be represented by a matrix of 50 x 100;

50 represents the number of bacteria; 100 denotes the dimension of the feature vector before feature selection.

A 1 x 100 row vector represents the weight of the feature vector and represents a solution.

And the initialized community generates a final community after certain updating. The update process is implemented using a flora foraging algorithm. The foraging algorithm is improved, and a mechanism for enhancing learning is added. After stopping the procedure, a set of solutions, i.e. a set of weights, is obtained;

a solution from the set of solutions can be randomly selected; the solution selected represents the weight of the feature vector. For example, the final solution is 1 × 100, (0.2,0.3,0.8,0.96,0.2,0.1 …), a threshold is set, the columns in the solution that are smaller than the threshold are deleted, and the remaining columns are the last selected eigenvectors.

Step S2: according to the strategy of maximizing historical experience value in reinforcement learning (Reinforcement learning), selecting proper behaviors (namely adaptive chemotaxis behavior, replication behavior, enhanced migration behavior and cross behavior) for each bacterial thallus and executing.

For example, the reinforcement learning in the present embodiment may be composed of 3 parts: 1) policy: a mapping from state to action behavior, which is the core of reinforcement learning; 2) reward (feedback/reward) that is the reward brought by the environmental state change caused by a single action and is an instant reward such as a feedback value r_t+1(ii) a 3) value function Long-term reward, is cumulative reward or Q(s)_t,a_t)。

Step S3: and calculating a fitness value (namely a Fisher function value) f (omega) for the updated community position.

Wherein, F^i,jRepresenting a feature vector of a jth sample in the ith category; n is_iRepresenting the number of samples in the ith category; c represents the number of categories. m isⁱMean values of the medium feature vectors representing the ith category;

m represents the average of all feature vectors; sw represents the average weighted distance between all the feature vectors and the average value of the ith class of feature vectors; s_BRepresenting the average weighted distance of the feature vectors between the classes. ω of equation 3 is obtained by the feature selection method in the present embodiment.

That is, after each bacterial cell performs the movement behavior, the updated position of each bacterial cell is acquired, and the fitness value of each bacterial cell after the updated position is acquired.

Step S4: according to RL rules (i.e., reinforcement learning mechanisms), a feedback value is obtained for a change in the fitness value of each of the bacteria before update and after update.

That is, changes in the fitness value of the bacterial cells are observed and feedback is given accordingly.

Step S5: and updating the historical experience value accumulated by each thallus according to the feedback value, and outputting the bacterial community.

That is, the accumulated reward, which is the historical experience value accumulated in the cell bodies, is updated according to the RL rule.

Step S6: and adding 1 to the iteration number, repeating the steps S2 to S6, judging whether the current iteration number is larger than the maximum cycle number, stopping the program, outputting the bacterial community, and returning to the step S2 if the current iteration number is not larger than the maximum cycle number.

In this embodiment, after stopping the program, a set of solutions, that is, a set of weights, is obtained, and one solution is randomly selected from the set of solutions; the solution selected represents the weight of the feature vector. For example, the final solution is (0.2,0.3,0.8,0.96,0.2,0.1 …), a threshold is set, the columns in the solution that are smaller than the threshold are deleted, and the remaining columns are the feature vectors that are selected last.

The iterative process of this embodiment is to use the optimal solution output by the bacterial community as an input to logistic Regression (RL) to improve the classification accuracy.

The flora foraging algorithm of this example is always updated around a set of solutions. Before the intermediate solution is replaced, the intermediate solution is always a bacterial community, and one bacterial community comprises a plurality of thalli.

The method of the embodiment provides a very effective method approach for improving the classification problem, and can also be used in other image processing aspects.

Further, for example, the aforementioned step S2 may include the following sub-steps:

substep S21: setting an initial state and an initial action for each learner agent (namely, each learner corresponds to one bacterial thallus); a Q-matrix is set for each learner to preserve accumulated historical experience, and is initialized to 0.

In this example, an agent that enhances learning corresponds to each thallus of the flora foraging algorithm. In addition, the actions and states of reinforcement learning correspond to adaptive chemotactic behavior, replication behavior, enhanced dispelling behavior, and cross behavior of flora foraging algorithm.

Step S22: current state s for a learner_tBased on the Q-matrix content (i.e., the data in the Q-matrix, the previously described historical accumulated experience), the optimal action a is selected_t。

a_t＝Max[Q(state,actions)] (1)

Wherein the optimal action a_tSelected according to the maximum value of the current state (state) in the Q-matrix.

Step S23: agent performs action a_t(ii) a Giving agent immediate feedback r_t+1。

The "immediate feedback" in the present embodiment is a feedback value, and may be obtained, for example, by:

calculating the fitness value of the new position of the individual, and comparing the fitness values before and after the update position of the individual;

wherein when the fitness value of an individual is improved (e.g., fitness value is increased/boosted), immediate feedback r ═ 1 is obtained; when the situation is opposite, the feedback r is equal to-1 immediately.

Step S24: updating data items(s) of the Q-matrix_t,a_t) Update status is s_t+1。

The Q-matrix calculation process comprises the following steps:

In the above formula (9), α (t) is a number, but not a constant value, and when the iteration number iter is small, α should be large, and this stage focuses on searching. When iter is larger, α is smaller, paying attention to the existing experience (Q value).

The aforementioned actions mentioned in step S2 can be exemplified as follows:

1) adaptive chemotactic behavior:

coli is simulated to move toward the region where it is more suitable to survive by rotation of the flagella.

P_i(t)＝P_i(t-1)+C_i(t-1)*φ(t-1) (2)

Wherein,

indicating random flip angle, P_i(t) the position of the ith cell at time t, where the position corresponds to the weight of each dimension of the feature vector in the feature selection method; an optimal set of weight combinations (weight matrix) is found by an optimization algorithm.

In general, an important task in designing swarm intelligence (SwarmIntelligence) algorithms is how to adaptively balance the two major processes of exploration (exploration) and development (exploitation) in the search process. Where the population in the exploratory state moves towards the "strange" area for potential global optimal solutions, while the population in the developed state searches near the "potential" area. dynamic step size can dynamically balance the two processes:

wherein a is a constant coefficient, iter represents the current iteration number, and MaxCycle represents the total iteration number.

2) Copying action:

step 1: sorting all the populations in a descending order according to the fitness value from large to small;

step 2: and removing N/2 individuals after ranking according to the ranking result in the step 1, and copying N/2 individuals before ranking (N represents the size of the population).

3) Enhanced dispelling behavior:

the bacterial population is likely to be dispersed into new areas due to nutrient consumption or other unknown reasons. The position of the individual i is updated according to the optimal position:

P_id＝rand₁*(X_max-X_min)+rand₂*(gbest_d-P_id) (4)

wherein, P_idRepresenting i bits of an individualD is the dimension of placement; x_max，X_minRespectively representing the upper and lower boundaries of the search space; rand₁，rand₂Respectively obeying positive-phase-distribution with mean value of 0 and standard deviation of 1; gbest_dRepresenting the optimal position of the community at dimension d.

4) Confluent cross behavior:

the exchange of information between individual bacteria and their neighbors is expected to combine beneficial information, defined as:

v_ij＝gbest_j+beta*(P_aj-P_bj) (6)

wherein, P_ijA dimension d representing the position of an individual i; l is_CRRepresents [0, 1 ]]Cross probabilities within a range; k is the random dimension of the individual; p_ajAnd P_bjRespectively representing the positions of the individual a and the individual b in the dimension j; gbest_jRepresenting the optimal position of the whole population; beta represents [0, 1 ]]Scale factors within the range.

The application scenario of the method of the present embodiment is illustrated as follows:

application scenario 1:

classifying the lung bronchitis, and extracting the following characteristics of lung CT images:

7 texture features, namely entropy, mean, variance, gray level co-occurrence matrix GLCM, Local Binary Pattern (LBP), Haralick texture feature, Local Phase Quantization (LPQ);

5 geometric characteristics, namely area, perimeter, circumscribed circularity, rectangularity and elongation;

in total, 315-D feature vectors are extracted.

By adopting the feature selection method of the embodiment, a feature vector subset is selected from 315-D feature vectors and used as the input of a Support Vector Machine (SVM) classifier, so as to improve the classification accuracy.

Application scenario 2:

brain tumor classification (5 classes: background, edema, necrosis, enhanced tumor and non-enhanced tumor):

data: multi-modality brain MRI images;

for each pixel of each slice, 25 × 25 neighborhoods are selected, and for each pixel, a neighborhood Gabor and a gray average value are extracted.

That is, one neighborhood has 164-D (the image has 4 modalities, the mean of the gray levels for each modality; 5 sizes, 8 orientations for the Gabor features) feature vectors.

Note: 4*(5*8+1).

By adopting the feature selection method of the embodiment, a feature vector subset is selected from 164-D feature vectors and used as the input of a Support Vector Machine (SVM) classifier, so that the classification accuracy is improved.

According to another aspect of embodiments of the present invention, there is also provided an electronic device, which includes a memory, a processor, a bus, and a computer program stored on the memory and executable on the processor, and when the processor executes the program, the method steps of any of the embodiments described above are implemented. The electronic device of the embodiment may be a mobile terminal, a fixed terminal, or the like.

Further, the present embodiment also provides a computer storage medium having stored thereon a computer program which, when being executed by a processor, carries out the method steps of any of the embodiments as described above.

Finally, it should be noted that: the above-mentioned embodiments are only used for illustrating the technical solution of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A feature selection method based on a reinforced learning type flora foraging algorithm is applied to global search and local search, and the feature selection method is used for feature selection of lung CT images or multi-modal brain MRI images so as to improve classification accuracy, and the method comprises the following steps:

step S1, initializing the position of the bacterial community, and setting the maximum cycle value and the initial value of the iteration times; each bacterial thallus in the bacterial community represents a weight vector of a feature vector to be selected;

step 22, the current state s of each bacterial cell_tSelecting the optimal action a according to the content of the Q-matrix_tThe following formula (1);

a_t＝Max[Q(state,actions)] (1)

wherein the optimal action a_tSelecting according to the maximum value of the current state in the Q-matrix;

step 23, executing respective action a for each bacterial cell_t；

step S5, based on RL rule, obtaining feedback value according to the change of fitness value of the position before and after updating of each bacteria thallus; specifically, an immediate feedback value r, which is a feedback value, is obtained for each bacterial cell_t+1；

when the fitness value of the bacterial thallus is optimized, obtaining a feedback value r which is 1; when the conditions are opposite, the feedback value r is equal to-1;

step S7, increasing the iteration number by 1, and repeating the steps S2 to S6 until the iteration number is larger than or equal to the maximum cycle value, and outputting a bacterial community;

wherein, in the feature selection aiming at the lung CT image,

7 texture features, namely entropy, mean, variance, gray level co-occurrence matrix GLCM, local binary pattern LBP, Haralick texture feature and local phase quantization LPQ;

extracting 315-D characteristic vectors in total;

selecting a feature vector subset from 315-D feature vectors by adopting the feature selection method from the step S1 to the step S7, wherein the feature vector subset is used as the input of the SVM classifier of the support vector machine;

in the feature selection for multi-modality brain MRI images,

brain tumors are classified into 5 types: background, edema, necrosis, enhanced and non-enhanced tumors:

data: multi-modality brain MRI images;

selecting 25 × 25 neighborhoods for each pixel of each slice, and extracting the neighborhood Gabor and the gray average value to form a feature vector; one neighborhood has 164-D eigenvectors, and the multi-modal brain MRI image has 4 modalities, a grayscale average for each modality; 5 sizes of Gabor features and feature vectors in 8 directions;

and selecting a feature vector subset from the 164-D feature vectors by adopting the feature selection method from the step S1 to the step S7, wherein the feature vector subset is used as the input of the SVM classifier of the support vector machine to improve the classification accuracy.

2. The method of claim 1, wherein the athletic performance includes one or more of:

adaptive chemotactic behavior;

a copying behavior;

a reinforced dispelling behavior;

confluent cross-behavior.

3. The method of claim 2,

the Q-matrix calculation process in step S31:

4. An electronic device, comprising a memory, a processor, a bus, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of claims 1-3 when executing the program.