CN110782950A

CN110782950A - Tumor key gene identification method based on preference grid and Levy flight multi-target particle swarm algorithm

Info

Publication number: CN110782950A
Application number: CN201910903327.2A
Authority: CN
Inventors: 韩飞; 管天华; 孙郁闻天; 方升
Original assignee: Jiangsu University
Current assignee: Jiangsu University
Priority date: 2019-09-23
Filing date: 2019-09-23
Publication date: 2020-02-11
Anticipated expiration: 2039-09-23
Also published as: CN110782950B

Abstract

The invention discloses a tumor key gene identification method based on a preference grid and Levy flight multi-target particle swarm algorithm, which comprises the steps of filtering an original gene expression profile data set by using a classification information index to obtain a primary gene pool; calculating the gene category sensitivity information GCS value of each gene in the initial gene pool, and then coding the particles through the GCS value; constructing a multi-objective optimization model by taking the classification accuracy of the basis factor set on the extreme learning machine ELM and the scale of the basis factor subset as targets; and searching out a final gene subset through the established multi-target model, and identifying the key genes of the tumor. In the aspect of a multi-objective optimization model, the method can quickly and efficiently identify the key gene subsets with small number and good classification performance in the primary gene pool through the multi-objective model.

Description

Tumor key gene identification method based on preference grid and Levy flight multi-target particle swarm algorithm

Technical Field

The invention belongs to the field of application of computer analysis technology of tumor gene expression profile data, and particularly relates to a multi-target particle swarm optimization tumor key gene identification method based on preference grids and Levy flight.

Background

Microarray technology has been widely used for disease diagnosis since the eighties of the last century. It can help medical and research staff to access expression levels of thousands of genes simultaneously, ultimately producing microarray data. Classification and prediction of diagnostic classes of samples by gene expression profiling, these data have been successfully applied to the classification of cancer. However, complex gene expression profiling data still faces many challenges in developing effective classifiers: first, the dimensionality of gene expression profile data is high, and each dimensionality and gene have a complex and unknown relationship. Second, there are a large number of unrelated samples in the gene expression profile dataset. Third, the sample size of the gene expression profile dataset is small, which results in higher computational complexity and more prediction error.

Key gene recognition, i.e., gene selection, also known as feature selection, can be considered an effective method to improve the predictive performance of models. It is a key preprocessing step in data mining that focuses on identifying the optimal subset of genes from the expression dataset by reducing redundant, unrelated or noisy genes. Gene selection can be largely classified into a filtration method, a winding method and a mixing method, depending on how the correlation of each gene with a target class is evaluated. The filtering method does not use a classifier to evaluate the subset of genes, and most filtering methods do not consider the correlation between genes. The wrapping method integrates a predetermined learning algorithm with a classifier to group the optimal subset of genes according to prediction accuracy. Although filtration is more efficient than winding, the classification performance of the latter is much better than the former. The mixing method is a combination of the filtration method and the encapsulation method, and their advantages are utilized in a complementary manner. However, these methods generally view gene selection as a single-target problem. The main drawback is the difficulty to explore different potential tradeoffs between classification accuracy and different subsets of selected genes.

The Particle Swarm Optimization (PSO) has strong global searching capability and high convergence rate. Compared with a genetic algorithm, PSO does not need to carry out complex genetic operation, has fewer adjustable parameters and is easy to realize, so that PSO is widely applied to key gene identification of tumor expression profile data in recent years. In general, tumor key gene identification is a multi-objective problem that involves minimizing the size of a subset of genes and maximizing predictive performance. A velocity constraint multi-target particle swarm optimization (SMPSO) adds a velocity constraint mechanism, and when the velocity of a particle is too high, the velocity constraint mechanism can limit the velocity of the particle to be too high to cause a population explosion phenomenon. A multi-objective particle swarm algorithm (CMOPSO) based on a competition mechanism updates particles based on a pairwise competition rather than by conventional individually-optimal and globally-optimal particle updates. The methods improve the convergence and diversity of the algorithm to a certain extent, but the performance of the algorithm is often reduced when the complex multi-target problem is faced, such as a non-convex problem or a multi-modal problem. Furthermore, these multi-objective optimization algorithms aim at searching all Pareto optimal solutions, assuming that all non-dominant solutions are advisable. In practice, the main purpose of key gene identification is to enhance the classification performance of the classifier. Thus, key gene identification may prefer to search for those regions where the solution exhibits better predictive performance, rather than those regions with fewer genes at the pareto frontier. From this perspective, these methods waste computational costs when searching for solutions that are not needed.

Disclosure of Invention

The purpose of the invention is as follows: the method can identify the basis factor sets highly related to the tumor types, has fewer selected basis factor sets, and has stronger interpretability compared with the traditional method.

The technical scheme is as follows: a tumor key gene identification method based on a multi-target particle swarm algorithm of preference grids and Levy flight comprises the steps of carrying out primary selection on original genes by utilizing classification information indexes, then utilizing GCS information to encode particles, and utilizing the multi-target particle swarm algorithm based on the preference grids and the Levy flight to search key tumor genes, and comprises the following steps:

step 1, preprocessing gene expression profile data, including dividing an original data set into a training set and a testing set, and filtering the original gene expression profile data set by using a classification information index to obtain an initial gene pool;

step 2, calculating the gene category sensitivity information GCS value of each gene in the initial gene pool, and then coding the particles through the GCS value;

step 3, establishing a multi-objective optimization model by taking the classification accuracy of the gene subsets on the extreme learning machine ELM and the scale of the gene subsets as targets;

step 4, providing a multi-target particle swarm algorithm (MOPSO-PAG-LF) based on a preference grid and Levy flight, and continuously searching, evaluating and updating particles and maintaining an external archive by using the MOPSO-PAG-LF so as to obtain a gene subset with higher classification accuracy and smaller scale;

step 5, outputting the finally identified tumor key genes if the termination condition is met, otherwise, turning to step 4;

further, the step 1 comprises the following steps:

step 1.1 load original gene dataset and follow 2: 1, dividing a training set and a test set in proportion;

step 1.2 according to the formula (1), calculating the classification information index of each gene, arranging the classification information index in a descending order, and selecting the first 400 genes to add into an initial gene pool.

wherein ,

and

represents the mean of the expression levels of gene g in positive (+) and negative (-) classes, and

the standard deviations of the expression levels of gene g in positive (+) and negative (-) classes, respectively.

Further, the step 2 comprises the following steps:

step 2.1 calculating the GCS value of each gene in the primary gene pool according to the formula (2) and the formula (3), wherein the bigger the GCS value is, the bigger the contribution of the gene with the smaller GCS value to the classification is;

wherein X_TrainingFor training the sample set, β _sqIs the weight, w, of the s-th hidden layer node and the q-th output node of the ELM _jsIs the weight of the jth input node and the s-th hidden layer node; hid(s) is the input to the s-th hidden layer node; n is a radical of _gnlThe number of genes in an initial gene pool, g is an activation function of ELM, and the sigmoid function is taken.

And 2.2, encoding the particles, namely, firstly, carrying out descending order arrangement on all genes according to GCS values, randomly initializing the first 20 percent of genes to random numbers in [0, 1], initializing the rest 80 percent of genes to 0, indicating that the gene corresponding to a certain dimension is selected when the value of the position of the particle on the dimension is more than 0.5, and otherwise indicating that the gene corresponding to the dimension is not selected when the value is less than 0.5.

Further, the step 3 comprises the following steps:

step 3.1, setting evaluation indexes of the multi-target particle swarm algorithm, wherein the evaluation indexes comprise two indexes: accuracy and gene scale. f. of ₁Is the accuracy acc (i), which is the ELM classification accuracy of the ith particle on the validation set, f ₂For the gene scale genenum (i), the number of genes selected for particle i, to unify the two indices into a maximization problem, genenum (i) is changed to

d is the dimension of the sample.

Step 3.2 changing f to (f) ₁，f ₂) The method is used as an optimization target of the multi-target particle swarm algorithm.

Further, the step 4 comprises the following steps:

step 4.1, randomly initializing population particles, and adding a parameter flag to each particle, wherein the parameter is used for judging how long each particle has not evolved into a better particle;

step 4.2, whether the parameter flag of each particle is smaller than a preset threshold value T or not; (ii) a

4.3 if the value is less than T, evolving the particle according to the formulas (4) and (5), namely a conventional particle swarm algorithm formula, and if the value is more than T, evolving the particle according to the formulas (6) and (7) and (8) by using an improved Levy flight strategy on the particle, wherein the flag value of the particle is changed into 0;

here u and v follow a normal distribution:

and

wherein ,

is the velocity of particle i at the t +1 th iteration,

for particle i at the t-th timePosition of iteration, x _pb，iFor the individual historical optimal position, x, of particle i _gb，iFor the global optimal position of particle i, w is the inertial weight, typically [0.4, 0.9]]Adaptive change between c ₁，c ₂Is an acceleration constant, r ₁，r ₂Two are in [0, 1]]Generally speaking, parameter α is usually set to 0.01 to prevent it from stepping too aggressively and easily jumping out of the decision boundary, β is set to 1.5. Note that, when step S is updated, the present invention makes some perturbation to the conventional Lewy flight formula, where there is some probability of multiplying S by the global optimum particle x _gb，iSubtracting the position of the current particle The purpose of this is that the particle can be properly moved to the globally optimal particle x when the position of the particle is updated with the lave flight _gb，iThe random jumps are directionally dependent, rather than perfectly aligned with the lave distribution.

Step 4.4 with f ═ f (f) ₁，f ₂) As a target function, evaluating whether the particle evolves into a better solution, judging the domination relationship between the newly generated solution and the individual optimal particle, if the new particle dominates the individual optimal particle, updating the individual optimal information of the particle and setting a parameter flag of the particle to 0; if the new particle is dominated by the individual optimal particle, the value of the attribute flag of the particle is increased by 1; if the new particle and the individual optimal particle are not mutually independent, updating the individual optimal information of the particle with a certain probability (50%) and setting the parameter flag of the particle to be 0, and conversely, adding 1 to the attribute flag value of the particle.

Step 4.5, dominant comparison is carried out on the particles, non-dominant solutions are added into an external archive, and maintenance is carried out on the external archive. When maintaining external archives and selecting leader particles, the invention is carried out in a mode of preferring grids, and specifically comprises the following steps: a grid as shown in FIG. 1 is first created from the values of the non-dominant solutions in the external archive on the objective function, each representing a black point Q in the grid _iSo that Q is { Q ═ Q ₁，Q ₂，...，Q _i，...，Q _nDenotes the set of non-dominant solutions, n is the number of non-dominant solutions, and the grid of at least one particle in the grid is referred to herein as the active grid.

For Q _iBelongs to Q, and calculates Q according to formula (9) _iOf the weighted fitness value of, wherein F ₁，F ₂Is the fitness value of two targets, α is [0, 1]]Preference weights within, depending on F ₁ and F₂For the importance of the problem, the decision maker decides the parameter, and the invention sets α to 0.7, β is 1- α, and num is Q _iThe number of particles in the grid, θ, is a penalty term, and is set to 0.05.

λ _i＝α*F ₁+β*F ₂-θ*num (9)

When the leader particle is selected, Q is calculated according to equation (10) _iProbability of being selected P _iWhen the particle to be deleted is maintained in the external archive, Q is calculated according to equation (11) _iProbability of being selected P _iWhere n is the total number of non-dominated solutions, and then a particle is selected as the leader particle or deleted from the archive using the roulette method. Note that here for each lambda _iAre raised to the exponential power of e, the purpose of which is to let λ be _iThe larger particles have larger probability to be selected, and the lambda is further enlarged _iLarge particle and lambda _iSmall probability of hits between particles. From λ _iIt can be seen that when Q is _iWhen the number of particles in the grid is large, the obtained fitness value lambda is _iDue to the existence of the punishment item, the punishment item is smaller, so that the selected solution not only has higher classification accuracy, but also can be sparse in the grid, the decision efficiency of the algorithm is greatly improved, and the expenditure of computing resources is saved.

Step 4.6, judging whether the multi-target particle algorithm meets the termination condition, and if so, outputting a result; if not, turning to the step (4.2)

Further, the step 5 comprises the following steps:

step 5.1, repeating the above operations until the fitness function reaches a certain threshold or reaches a preset maximum iteration number, otherwise, returning to the step 4;

step 5.2 the non-dominant particles in the archive at this point may each represent the final selected subset of critical genes identifying the tumor.

Has the advantages that: variation and noise exist in the tumor gene expression profile data of the high-dimensional small sample, and a large amount of useful information is hidden. The PSO algorithm of the traditional method is easy to fall into a local minimum point, so that the selected basic factor set is not optimal. The invention constructs a grid capable of describing decision preference by a weighting method to maintain files and select leader particles, thereby greatly improving the decision efficiency of the algorithm and saving the expense of computing resources; meanwhile, an improved Levis flight strategy is combined with a multi-target particle swarm algorithm, and the convergence performance of the algorithm on the complex multi-target optimization problem is improved.

A multi-target particle swarm algorithm (MOPSO-PAG-LF) based on a preference grid and Levy flight is provided, updated particles are continuously searched and evaluated by the multi-target particle swarm algorithm, an external archive is maintained, a gene subset with high classification accuracy and small scale can be obtained, and compared with a traditional tumor key gene identification method, the classification identification method can identify two specific subtype tumor key genes in a primary gene pool through an improved multi-target model.

Drawings

FIG. 1 is a schematic diagram of a preference grid of the present invention;

FIG. 2 is a block diagram of the architecture of the present invention;

Detailed Description

A tumor key gene identification method based on a multi-target particle swarm algorithm of preference grids and Levy flight comprises the steps of carrying out primary selection on original genes by utilizing classification information indexes, then utilizing GCS information to encode particles, and utilizing the multi-target particle swarm algorithm based on the preference grids and the Levy flight to search key tumor genes, wherein the method specifically comprises the following steps:

further, the step 1 comprises the following steps:

step 1.1, loading an original gene data set, and dividing a training set and a testing set according to the ratio of 2: 1;

wherein ,

and

indicates that the gene g is uprightMean of the expression levels on class (+) and negative class (-),

and

Further, the step 2 comprises the following steps:

Further, the step 3 comprises the following steps:

d is the dimension of the sample.

Further, the step 4 comprises the following steps:

here u and v follow a normal distribution:

and

wherein ,

is the velocity of particle i at the t +1 th iteration,

is the position of particle i at the t-th iteration, x _pb，iFor the individual historical optimal position, x, of particle i _gb，iFor the global optimal position of particle i, w is the inertial weight, typically [0.4, 0.9]]Adaptive change between c ₁，c ₂Is an acceleration constant, r ₁，r ₂Two are in [0, 1]]Generally speaking, parameter α is usually set to 0.01 to prevent it from stepping too aggressively and easily jumping out of the decision boundary, β is set to 1.5. Note that, when step S is updated, the present invention makes some perturbation to the conventional Lewy flight formula, where there is some probability of multiplying S by the global optimum particle x _gb，iSubtracting the position of the current particle

The purpose of this is that the particle can be properly moved to the globally optimal particle x when the position of the particle is updated with the lave flight _gb，iThe random jumps are directionally dependent, rather than perfectly aligned with the lave distribution.

Step 4.4 with f ═ f (f) ₁，f ₂) As a target function, evaluating whether the particle evolves into a better solution, judging the domination relationship between the newly generated solution and the individual optimal particle, if the new particle dominates the individual optimal particle, updating the individual optimal information of the particle and setting a parameter flag of the particle to 0; if the new particle is dominated by the individual optimal particle, the value of the attribute flag of the particle is increased by 1; if the new particle is independent of the individual optimal particle, updating the individual optimal information of the particle with a certain probability (50%) and setting the parameter flag of the particle to 0, otherwise setting the parameter flag of the particle to 0The attribute flag value is incremented by 1.

λ _i＝α*F ₁+β*F ₂-θ*num (9)

When the leader particle is selected, Q is calculated according to equation (10) _iProbability of being selected P _iWhen the particle to be deleted is maintained in the external archive, Q is calculated according to equation (11) _iProbability of being selected P _iWhere n is the total number of non-dominated solutions, and then a particle is selected as the leader particle or deleted from the archive using the roulette method. Note that here for each lambda _iAre raised to the exponential power of e, the purpose of which is to let λ be _iThe larger particles have larger probability to be selected, and the lambda is further enlarged _iLarge particle and lambda _iSmall probability of hits between particles. From λ _iIt can be seen that when Q is _iWhen the number of particles in the grid is large, the obtained fitness value lambda is _iDue to the existence of the penalty term, the solution becomes smaller, and the solution selected in this way has higher scoreThe class accuracy rate can also make the solution sparse in the grid, greatly improving the decision efficiency of the algorithm and saving the expense of computing resources.

Further, the step 5 comprises the following steps:

Aiming at the problems that a fitness function only uses a single-target optimization scheme and lacks good interpretability, and the selected genes are not accurate enough to identify the tumor, the invention provides the method for identifying the key gene subset of the tumor by combining the Levy flight and the multi-target particle swarm optimization of the preference grid so as to obtain a more effective key gene subset of the tumor, thereby improving the accuracy of tumor identification.

The following is a brief description of the implementation of the present invention, taking tumor gene expression profile data as an example. This example selects a Brain cancer (Brain cancer) tumor expression profile dataset, containing a total of 60 samples, for a total of two subtypes: 46 representative brain cancer (Patients with classic brain cancer) samples and 14 desmoplastic brain cancers (Patientswitch proliferative brain cancer). Each sample contained 7219 genes and the data set was derived from http:// linus. nci. nih. gov/. brb/DataArchiveNew. html. Although the brain cancer tumor expression profile data set has only two categories, because the expression levels of all genes in the data set are relatively close, the key genes for identifying the tumor are difficult to obtain, and thus the prediction precision of the various classifiers on the gene subset selected by the traditional gene identification method to the sample is not high. On the data set, the specific implementation steps of the invention are as follows:

as shown in fig. 2, a tumor key gene identification method based on a multi-target particle swarm algorithm of a preference grid and a lave flight comprises the steps of initially selecting an original gene by using a classification information index, then encoding particles by using GCS information, and searching key tumor genes by using the multi-target particle swarm algorithm based on the preference grid and the lave flight, and comprises the following steps:

(1) the raw data was loaded and the data set was divided into a training set and a test set at a 2: 1 ratio, with 40 training samples and 20 test samples. 400 genes are preliminarily screened out on a training set by adopting an improved classification information index method (Han F, Sun W, Link Q-H (2014) A Novel Strategy for Gene Selection of Microarray Data Based on Gene-to-Class Sensitivity information PLoS ONE 9(5) e97530. doi: 10.1371/joumal. port. 0097530) to form an initial alternative Gene pool.

(2) The GCS value of each Gene in the primary Gene pool (Han F, Sun W, Link Q-H (2014) A NovelStrategy for Gene Selection of Microarray Data Based on Gene-to-ClassSensinity information PLoS ONE 9 (5): e97530. doi: 10.1371/joumal. port. 0097530) is calculated and the genes are sorted in descending order by GCS value, the first 20% of the genes are randomly initialized to random numbers in [0, 1], the remaining 80% of the genes are initialized to 0, the position of the particle in a certain dimension is greater than 0.5, which indicates that the Gene corresponding to the dimension is selected, otherwise, less than 0.5 indicates that the Gene is not selected.

(3) The evaluation index of the multi-target particle swarm algorithm is set, and comprises two indexes: accuracy and gene scale. f. of ₁Is the accuracy acc (i), which is the ELM classification accuracy of the ith particle on the validation set, f ₂For the gene scale genenum (i), the number of genes selected for particle i, to unify the two indices into a maximization problem, genenum (i) is changed to

d is the dimension of the sample.

(4) Selecting a key tumor gene from an initial gene pool by using a multi-target particle swarm algorithm based on preference grids and Levy flight, and specifically comprising the following steps of:

① initializing the population according to step 2, setting the parameter flag of each particle to 0, setting the threshold T to 10, setting the population size to 50, setting the maximum number of iterations to 50, setting the external archive size to be 50 consistent with the population size, setting the preference weight α to 0.7, linearly decreasing the inertia weight w from 0.9 to 0.4, and accelerating the constant c ₁ and c₂Is 1.5.

② if the parameter flag of the particle is less than T, the particle is evolved according to equations (4) (5), and if greater than T, the particle is evolved according to equations (6) (7) (8) with a Levy flight strategy.

③, calculating the adaptive value of each particle according to the evaluation target of step 3, and updating the historical optimum position and the global optimum position of each particle and the parameter flag of each particle.

④ make dominant comparisons of particles, add non-dominant solutions to the external archive, and maintain the external archive with a policy that favors the grid, according to step 4.5.

⑤ if the predetermined maximum number of iterations has not been reached (50 in this example), the process returns to step ②, otherwise the result is output, and all non-dominant particles in the archive represent the final identified key set of lung cancer tumor genes.

Table 1 shows the classification accuracy of ELM on the identified gene set in the embodiment of the invention, and the ELM classification 5-fold cross accuracy and the test accuracy respectively reach 86.97% and 81.22% on 3 key genes. While the 5-fold cross accuracy and the test accuracy of ELM on the 6 optimal Gene subsets selected by the Kmeans-GCSI-MBPSO-ELM method (Han F, Sun W, Link Q-H (2014) A Novel strand for the genetic selection of Microarray Data Based on Gene-to-Class sensing information. PLoS ONE 9 (5): e97530. doi: 10.1371/joumal. pole.0097530) are 88.63% and 80.40%, respectively. This further illustrates that the present invention can identify key genes associated with tumors and find key genes with fewer genes and more helpful classification performance.

TABLE 1 Classification accuracy of ELMs in different subsets of genes selected on brain cancer data sets according to the invention

Table 2 shows that 1000 experiments are carried out on the expression profile data of the brain cancer tumor by using the method of the invention to screen the 10 key genes for identifying the brain cancer with the highest frequency. From tables 1 and 2, it can be seen that the number of gene sets selected by the method of the present invention is small in the brain cancer data set (brain), and the genes with the gene numbers of 5931, 4413 and 18 not only appear frequently but also appear repeatedly in the selected key genes.

TABLE 2 identification of the 30 genes with the highest frequency on the brain cancer tumor expression profile dataset according to the invention

In the aspect of a multi-objective optimization model, a grid capable of describing decision preference is constructed by a weighting method to maintain files and select leader particles, so that the decision efficiency of an algorithm is greatly improved, and the expenditure of computing resources is saved; meanwhile, an improved Levis flight strategy is combined with a multi-target particle swarm algorithm, and the convergence performance of the algorithm on the complex multi-target optimization problem is improved. Compared with the traditional tumor key gene identification method, the method can quickly and efficiently identify the key gene subsets with fewer numbers and better classification performance in the primary gene pool through the multi-target model.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an illustrative embodiment," "an example," "a specific example," or "some examples" or the like mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A tumor key gene identification method based on a multi-target particle swarm algorithm of preference grids and Levy flight is characterized by comprising the following steps:

step 4, providing a multi-target particle swarm algorithm (MOPSO-PAG-LF) based on a preference grid and Levy flight, and continuously searching, evaluating and updating particles and maintaining an external archive by using the MOPSO-PAG-LF to obtain a gene subset;

and 5, outputting the finally identified tumor key genes if the termination condition is met, otherwise, turning to the step 4.

2. The method for identifying key genes of tumors based on the multi-target particle swarm algorithm of the preference grid and the Levy flight according to claim 1, wherein the step 1 comprises the following steps:

wherein ,

and represents the mean of the expression levels of gene g in positive (+) and negative (-) classes, and

3. The method for identifying key genes of tumors based on the multi-target particle swarm algorithm of the preference grid and the Levy flight according to claim 1, wherein the step 2 comprises the following steps:

wherein X_TrainingFor training the sample set, β _sqIs the s-th hidden layer node of the ELM andweight of the q-th output node, w _jsIs the weight of the jth input node and the s-th hidden layer node; hid(s) is the input to the s-th hidden layer node; n is a radical of _gnlThe number of genes in an initial gene pool, g is an activation function of ELM, and a sigmoid function is taken;

4. The method for identifying key genes of tumors based on the multi-target particle swarm algorithm of the preference grid and the Levy flight according to claim 1, wherein the step 3 comprises the following steps:

d is the dimension of the sample;

5. The method for identifying key genes of tumors based on the multi-target particle swarm algorithm of the preference grid and the Levy flight according to claim 1, wherein the step 4 comprises the following steps:

step 4.2, whether the parameter flag of each particle is smaller than a preset threshold value T or not;

here u and v follow a normal distribution, being random variables:

and

wherein ,

is the velocity of particle i at the t +1 th iteration,

is the position of particle i at the t-th iteration, x _pb，iFor the individual historical optimal position, x, of particle i _gb，iIs the global optimum position of the particle i, w is the inertial weight, c ₁，c ₂To addRate constant, r ₁，r ₂Two are in [0, 1]]Random numbers changing in the range, S is the updating step length of the Lewy flight, α and β are parameters, when the step length S is updated, the method makes some disturbance to the conventional Lewy flight formula, and a certain probability is obtained by multiplying the global optimal particle x by S _gb，iSubtracting the position of the current particle

The purpose of this is that the particle can be properly moved to the globally optimal particle x when the position of the particle is updated with the lave flight _gb，iRandom jumps that are directionally dependent, rather than perfectly aligned with the Levin distribution;

step 4.4 with f ═ f (f) ₁，f ₂) As a target function, evaluating whether the particle evolves into a better solution, judging the domination relationship between the newly generated solution and the individual optimal particle, if the new particle dominates the individual optimal particle, updating the individual optimal information of the particle and setting a parameter flag of the particle to 0; if the new particle is dominated by the individual optimal particle, the value of the attribute flag of the particle is increased by 1; if the new particle and the individual optimal particle are not mutually independent, updating the individual optimal information of the particle with a certain probability and setting the parameter flag of the particle to be 0, otherwise, adding 1 to the attribute flag value of the particle;

step 4.5, dominating and comparing the particles, adding a non-dominated solution into an external archive, maintaining the external archive, and performing maintenance on the external archive and selecting a leader particle by a mode of preferring a grid, specifically: first, a grid is created based on the values of the non-dominant solutions in the external archive on the objective function, each non-dominant solution representing a black point Q in the grid _iSo that Q is { Q ═ Q ₁，Q ₂，...，Q _i，...，Q _nDenotes the set of so non-dominant solutions, n is the number of non-dominant solutions, and the grid of at least one particle in the grid is referred to herein as the active grid;

for Q _iBelongs to Q, and calculates Q according to formula (9) _iOf the weighted fitness value of, wherein F ₁，F ₂Is twoTarget fitness value, α, is [0, 1]]Preference weights within, depending on F ₁ and F₂For the importance of the problem, the decision maker decides the parameter, and β is 1- α, num is Q _iThe number of particles in the grid is located, theta is a penalty term and is set to be 0.05;

λ _i＝α*F ₁+β*F ₂-θ*num (9)

when the leader particle is selected, Q is calculated according to equation (10) _iProbability of being selected P _iWhen the particle to be deleted is maintained in the external archive, Q is calculated according to equation (11) _iProbability of being selected P _iWhere n is the total number of non-dominated solutions, and then selecting a particle as the leader particle or deleted from the archive using roulette, noting that for each lambda _iAre raised to the exponential power of e, the purpose of which is to let λ be _iThe larger particles have larger probability to be selected, and the lambda is further enlarged _iLarge particle and lambda _iSmall probability of hits between particles. From λ _iIt can be seen that when Q is _iWhen the number of particles in the grid is large, the obtained fitness value lambda is _iDue to the existence of the punishment item, the punishment item becomes smaller, so that the selected solution not only has higher classification accuracy, but also can be sparse in the grid, the decision efficiency of the algorithm is greatly improved, and the expenditure of computing resources is saved;

step 4.6, judging whether the multi-target particle algorithm meets the termination condition, and if so, outputting a result; if not, the process goes to step (4.2).

6. The method for identifying key genes of tumors based on the multi-objective particle swarm optimization algorithm of preference grids and Levis flight as claimed in claim 1, wherein w is an inertial weight and is adaptively changed between [0.4 and 0.9 ].

7. The method for identifying key genes of tumors based on the multi-objective particle swarm algorithm of the preference grid and the Levis flight as claimed in claim 1, wherein the step 5 comprises the following steps: