CN106971091A - A kind of tumour recognition methods based on certainty particle group optimizing and SVMs - Google Patents
A kind of tumour recognition methods based on certainty particle group optimizing and SVMs Download PDFInfo
- Publication number
- CN106971091A CN106971091A CN201710122492.5A CN201710122492A CN106971091A CN 106971091 A CN106971091 A CN 106971091A CN 201710122492 A CN201710122492 A CN 201710122492A CN 106971091 A CN106971091 A CN 106971091A
- Authority
- CN
- China
- Prior art keywords
- particle
- gene
- certainty
- value
- sigma
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Mathematical Physics (AREA)
- Public Health (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Genetics & Genomics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of tumour recognition methods based on certainty particle group optimizing and SVMs, pretreatment including expressing oncogene modal data, primary election is carried out to information gene with classification information index method on training set, then removing redundancy gene using redundancy approach two-by-two obtains alternative gene pool;Crucial gene subset is further obtained using classification information index method on training set;The parameter of SVMs is optimized using certainty particle swarm optimization algorithm on training set, then oncogene expression modal data to be identified is identified.The present invention is making full use of SVMs to be suitable on the characteristics of Small Sample Database is recognized, SVMs is optimized with certainty particle group optimizing, further improves the performance of SVMs, so as to improve tumour identification accuracy.
Description
Technical field
The invention belongs to the application field of the computer analytical technology of oncogene expression modal data, and in particular to Yi Zhongji
In the tumour recognition methods of certainty particle group optimizing and SVMs.
Background technology
DNA microarray technology is that biology brings huge opportunity, but its a large amount of and complicated microarray for producing
Data, the scholars to association area propose huge challenge, and its main cause has four:First, contain in microarray data
Much noise or exceptional value.Since noise and exceptional value can be usually produced in experimentation, and data handling procedure also can band
Come error or sample class marked erroneous, accordingly, it is desirable to be able to the strong processing method of design robustness.Second, gene expression profile
Data scale is huge, and how to handle large-scale dataset is also to need one of difficult point of solution.Therefore, design is calculated and space is answered
Miscellaneous degree all relatively low efficient algorithms just become very meaningful.3rd, microarray data has high-dimensional, the feature of low sample.
Gene expression profile data collection, its sort operation scale increases with gene data and exponentially increased, so how to tackle dimension
Disaster problem is also one of difficult point.4th, there is non-linear behavior, and conceal a large amount of practical informations in microarray data.Cause
This, makes the statistical analysis technique of classics be transformed into nonlinear analysis method processing nonlinear data collection, and utilize these methods
Seem extremely important to excavate and derive these potential biological informations.
Since Golub in 1999 etc. has started the beginning in staging field of gene expression profile, scholar land
It is continuous to propose many sorting techniques based on gene expression profile, wherein there is some algorithm commonly used.By different classification
Algorithm can be designed that different graders, such as Bayes, SVMs, the classical taxonomy device such as artificial neural network, they
It can be learnt according to known sample class information, to extract the information of sample classification.Based on these graders in tumour
Classification field test result indicates that, different graders is different to the classification capacity of same data set, that is to say, that one
Individual good grader is difficult all very high to the classification performances of all data sets.SVM advantage applies to handle higher-dimension sample
Notebook data, and nicety of grading is high, noise resisting ability is strong, it is not necessary to adjust and input substantial amounts of parameter.In addition, with can spend
Amount property, i.e., typically smaller by the number of supporting vector after training, this comes to the ever-increasing gene expression profile of matrix dimension
Say highly effective.Although SVM is recognized suitable for Small Sample Database, the selection of parameter is relatively time-consuming in SVM, and does not have also at present
There is the selection of parameter in effective theories integration SVM, so as to influence SVM classification performance.
Particle cluster algorithm (Particle Swarm Optimization, PSO) has good ability of searching optimum.Phase
For genetic algorithm, PSO has without complicated genetic manipulation, and adjustable parameter is few, it is easy to accomplish the advantages of, therefore it is obtained in recent years
To being widely applied.And traditional PS O, due to the randomness of particle search, blind search number of times is more, cause search time compared with
Long, search performance has much room for improvement.Therefore, the Deterministic searching based on gradient search is introduced into particle swarm optimization algorithm, by inciting somebody to action
Random search and Deterministic searching combine the search speed and precision for improving population.
The content of the invention
Objects of the present invention:Carry out the parameter of Support Vector Machines Optimized with improved particle swarm optimization algorithm (IGPSO),
It is accurate to improve tumour identification so as to improve the search performance of SVMs, and applied to the identification of tumour expression modal data
Property.Relative to traditional tumour express spectra recognition methods, this method is effectively improved tumour recognition accuracy.
Technical scheme:A kind of tumour recognition methods based on certainty particle group optimizing and SVMs, including based on
The gene subset screening of classification information index and two-by-two redundancy approach, and utilize certainty particle swarm optimization algorithm
(improved particle swarm optimization based on gradient search, IGPSO) optimization is supported
Vector machine come realize oncogene express modal data identification, comprise the following steps:
Step 1 oncogene expresses the pretreatment of spectrum data set, and oncogene expression spectrum data set is divided into instruction first
Practice collection and test set, then data set is normalized, obtain final key gene subset;
Step 2 proposes certainty particle swarm optimization algorithm (IGPSO), on training set, uses certainty particle group optimizing
Algorithm optimization SVMs (SVM);
Step 3 is on test set, using optimizing obtained support vector machines in step 2 come to oncogene express spectra
Data set is identified;
Further, comprised the steps of in the step 1:
Oncogene expression spectrum data set is divided into training set and test set by step 1.1;
Step 1.2 calculates " the classification information index " of each gene in training set according to formula (1).
Wherein d (g) is gene g classification information index,Respectively gene g is expressed in the positive negative sample of two classes
The average of level,WithThe standard deviation of respectively gene g expressions in the positive negative sample of two classes.
Step 1.3 selection is used as the gene after preliminary filtering more than all genes of some threshold value (classification information index)
Collection.
Step 1.4 is after using classification information index method tentatively filtering, and calculating is two-by-two between gene expression dose
Pearson correlation coefficient, selection reduces the size of alternative gene pool again more than the gene of some value.
Step 1.5 reuses classification information index method in alternative gene pool, and selection is all more than some threshold value
Gene is used as final key gene subset.
Further, propose that certainty particle swarm optimization algorithm is comprised the steps of in the step 2:
The kind of step 2.1 position (x) of random initializtion population, speed (v) and each function in initial range
Group's threshold of diversity (σ);
Step 2.2 calculate each particle adaptive value and for fitness function its position gradient;
Step 2.3 is for each particle, and the adaptive value for the desired positions that its adaptive value and Individual Experience are crossed is compared,
If more preferably, as current optimal location;
Step 2.4 is for each particle, and the adaptive value for the desired positions that its adaptive value is undergone with colony is compared,
If more preferably, as colony's optimal location;
Step 2.5 is when population diversity value is more than the threshold value of setting, and the speed of each particle is carried out more according to formula (2)
Newly, otherwise it is updated according to formula (5), and the position of more new particle;
Two stages are divided into based on certainty particle swarm optimization algorithm, the first stage is the process that attracts each other of particle, it
Two steps can be divided into, first, when population diversity value is more than some appropriate threshold value, particle is along fitness function to it
The negative gradient direction of position, gathers towards global optimum's particle;When searching some optimal vertex neighborhood, using the plan progressively declined
Slightly, the speed of particle is constantly reduced to carry out linear search.Two steps of this stage are respectively adopted formula (2) and formula (3) to retouch
State.
vij(t+1)=w*gra (i, j)+c2*rand () * (pg-xij(t)) (2)
vij(t+1)=k*vij(t) (3)
Wherein Vi=(vi1, vi2..., vin) be particulate i current flight speed, Xi=(xi1, xi2..., xin)
For particulate i current location, w is inertia weight, pgFor global desired positions, k is the constant between (0,1);For fitness
Function f (x), its corresponding negative gradient gra (i, j) is as follows:
Second stage is the mutually exclusive process of particle.When population diversity value is less than predetermined threshold value, adaptively
Particle is repelled to improve population diversity, while direction of the particle along gradient is scanned for and local most to other
Advantage is close.Obviously, population diversity is bigger, and its speed of scattering is smaller, and population diversity is smaller, and its speed of scattering is bigger.
Particle rapidity more new formula is as follows:
Wherein diversity is the population diversity calculated by (6) formula.
Wherein S is population, | S | the particulate number included for colony, | L | it is the greatest radius of search space, N is problem
Dimension, pijFor j-th of component of i-th of particulate.
If step 2.6 not up to end condition, goes to step 2.2, otherwise exports adaptive value.
Further, comprised the steps of in the step 3 using certainty particle swarm optimization algorithm Support Vector Machines Optimized:
Step 3.1:Set SVM C, σ parameter search spaces, xI, min≤xi≤xI, max, wherein C is penalty factor, and σ is core
Function parameter, xiFor parameter value, i represents number of parameters, and 2 are set to here, is chosen at random when algorithm starts on search space
One parameter value x;
SVM classifying rules equation such as formula (7):
Training set T { (xi,yi);xi∈Rn;yi=± 1;I=1,2 ..., r }, wherein:xiFor training sample, x is to wait to sentence
Disconnected sample, b is thresholding, αiIt is Lagrange multiplier, K (xi, x) it is kernel function;
The optimization problem of SVMs solution and constructed categorised decision function are as follows:
Wherein K (x, xi) it is kernel function, xiFor training sample, b is thresholding, αiIt is Lagrange multiplier, its effect is by it
Feature space is mapped to higher dimensional space.In actual applications, characterizing gene quantity is small,So using the SVM classifier based on RBF
Tumor sample is classified, RBF is expressed as follows:
Step 3.2:The size for setting population is N, and classification accuracy requirement is F, and spreading factor is Ex, and local size is
W=[w1, w2], maximum reattempt times are Tmax, number of retries t and expansion factor start as 0;
Step 3.3:According to the search space p=[p of initialization when algorithm starts1, p2], expand by spreading factor Ex and search for
Space, is calculated as follows local positions so that local falls in this search space p+0.6Ex*w;
Step 3.4:Calculate the corresponding classification performance function f of xp;
Step 3.5:Optimal value is found with IGPSO algorithms, the corresponding classification performance function f of optimal value is drawnc;
Step 3.6:If searching more preferable classification rate (fp< fc), then t and Ex are set to 0, otherwise t=t+1;
Step 3.7:If t >=Tmax, then it is 0, Ex=Ex+1 to put t, is now possible to be absorbed in local optimum, increase search
Scope is to jump out current regional area;
Step 3.8:If reaching classification accuracy requirement (fp≤ F), then export the value and classification accuracy of { C, σ }, algorithm
Terminate, otherwise go to step 3.3.
Beneficial effect:There are this many hashes, supporting vector in the oncogene expression spectrum data set of higher-dimension small sample
Machine has good extensive effect, and the classification of data is used for always.However, the classification performance of SVMs is selected dependent on parameter
Select, this problem, which never has, preferably to be solved, and greatly limit SVM application.Certainty in the present invention
The particle swarm optimization algorithm of search carries out Local Search by gradient information, when particle search is to some optimal vertex neighborhood, constantly
The speed of particle is reduced, step-length of the control particle in linear search is unlikely to excessive;With Biodiversity Characteristics and attract and
Exclusion principle controls the overall situation, when entering local optimum, adaptively particle is repelled and ensures diversity, finally can
A higher solution of precision is rapidly converged to, thus, optimize SVM using the particle swarm optimization algorithm based on Deterministic searching, it is excellent
Change RBF kernel functional parameters and penalty factor, improve SVM classification performance, be so conducive to improving oncogene expression modal data
Recognition accuracy.
Brief description of the drawings
Fig. 1 is the structured flowchart of the present invention;
Fig. 2 is the flow chart of certainty particle swarm optimization algorithm in the present invention;
Embodiment
A kind of tumour recognition methods based on certainty particle group optimizing and SVMs, including referred to based on classification information
The screening of number and the two-by-two gene of redundancy approach, and utilize certainty particle swarm optimization algorithm (IGPSO) optimization supporting vector
The step of machine carries out oncogene identification, comprises the following steps:
Step 1 oncogene expresses the pretreatment of spectrum data set, and oncogene expression spectrum data set is divided into instruction first
Practice collection and test set, then data set is normalized, obtain final key gene subset;
Step 2 proposes certainty particle swarm optimization algorithm (IGPSO);
Step 3 uses certainty particle swarm optimization algorithm Support Vector Machines Optimized (SVM) on training set;
Step 4 is on test set, using optimizing obtained support vector machines in step 3 come to oncogene express spectra
Data set is identified;
Further, comprised the steps of in the step 1:
Oncogene expression spectrum data set is divided into training set and test set by step 1.1;
Step 1.2 calculates " the classification information index " of each gene in training set according to formula (1).
Wherein d (g) is gene g classification information index,Respectively gene g is expressed in the positive negative sample of two classes
The average of level,WithThe standard deviation of respectively gene g expressions in the positive negative sample of two classes.
Step 1.3 selection is used as the gene after preliminary filtering more than all genes of some threshold value (classification information index)
Collection.
Step 1.4 is after using classification information index method tentatively filtering, and calculating is two-by-two between gene expression dose
Pearson correlation coefficient, selection reduces the size of alternative gene pool again more than the gene of some value.
Step 1.5 reuses classification information index method in alternative gene pool, and selection is all more than some threshold value
Gene is used as final key gene subset.
Further, comprised the steps of in the step 2:
The kind of step 2.1 position (x) of random initializtion population, speed (v) and each function in initial range
Group's threshold of diversity (σ);
Step 2.2 calculate each particle adaptive value and for fitness function its position gradient;
Step 2.3 is for each particle, and the adaptive value for the desired positions that its adaptive value and Individual Experience are crossed is compared,
If more preferably, as current optimal location;
Step 2.4 is for each particle, and the adaptive value for the desired positions that its adaptive value is undergone with colony is compared,
If more preferably, as colony's optimal location;
Step 2.5 is when population diversity value is more than the threshold value of setting, and the speed of each particle is carried out more according to formula (2)
Newly, otherwise it is updated according to formula (5), and the position of more new particle;
Two stages are divided into based on certainty particle swarm optimization algorithm, the first stage is the process that attracts each other of particle, it
Two steps can be divided into, first, when population diversity value is more than some appropriate threshold value, particle is along fitness function to it
The negative gradient direction of position, gathers towards global optimum's particle;When searching some optimal vertex neighborhood, using the plan progressively declined
Slightly, the speed of particle is constantly reduced to carry out linear search.Two steps of this stage are respectively adopted formula (2) and formula (3) to retouch
State.
vij(t+1)=w*gra (i, j)+c2*rand () * (pg-xij(t)) (2)
vij(t+1)=k*vij(t) (3)
Wherein Vi=(vi1, vi2..., vin) be particulate i current flight speed, Xi=(xi1, xi2..., xin)
For particulate i current location, w is inertia weight, pgFor global desired positions, k is the constant between (0,1);For fitness
Function f (x), its corresponding negative gradient gra (i, j) is as follows:
Second stage is the mutually exclusive process of particle.When population diversity value is less than predetermined threshold value, adaptively
Particle is repelled to improve population diversity, while direction of the particle along gradient is scanned for and local most to other
Advantage is close.Obviously, population diversity is bigger, and its speed of scattering is smaller, and population diversity is smaller, and its speed of scattering is bigger.
Particle rapidity more new formula is as follows:
Wherein diversity is the population diversity calculated by (6) formula.
Wherein S is population, | S | the particulate number included for colony, | L | it is the greatest radius of search space, N is problem
Dimension, pijFor j-th of component of i-th of particulate.
If step 2.6 not up to end condition, goes to step 2.2, otherwise exports adaptive value.
Further, comprised the steps of in the step 3:
Step 3.1:Set SVM C, σ parameter search spaces, xI, min≤xi≤xI, max, wherein C is penalty factor, and σ is core
Function parameter, xiFor parameter value, i represents number of parameters, and 2 are set to here, is chosen at random when algorithm starts on search space
One parameter value x;
SVM classifying rules equation such as formula (7):
Training set T { (xi,yi);xi∈Rn;yi=± 1;I=1,2 ..., r }, wherein:xiFor training sample, x is to wait to sentence
Disconnected sample, b is thresholding, αiIt is Lagrange multiplier, K (xi, x) it is kernel function;
The optimization problem of SVMs solution and constructed categorised decision function are as follows:
Wherein K (x, xi) it is kernel function, xiFor training sample, b is thresholding, αiIt is Lagrange multiplier, its effect is by it
Feature space is mapped to higher dimensional space.In actual applications, characterizing gene quantity is small, so using the SVM classifier based on RBF
Tumor sample is classified, RBF is expressed as follows:
Step 3.2:The size for setting population is N, and classification accuracy requirement is F, and spreading factor is Ex, and local size is
W=[w1, w2], maximum reattempt times are Tmax, number of retries t and expansion factor start as 0;
Step 3.3:According to the search space p=[p of initialization when algorithm starts1, p2], expand by spreading factor Ex and search for
Space, is calculated as follows local positions so that local falls in this search space p+0.6Ex*w;
Step 3.4:Calculate the corresponding classification performance function f of xp;
Step 3.5:Optimal value is found with IGPSO algorithms, the corresponding classification performance function f of optimal value is drawnc;
Step 3.6:If searching more preferable classification rate (fp< fc), then t and Ex are set to 0, otherwise t=t+1;
Step 3.7:If t >=Tmax, then it is 0, Ex=Ex+1 to put t, is now possible to be absorbed in local optimum, increase search
Scope is to jump out current regional area;
Step 3.8:If reaching classification accuracy requirement (fp≤ F), then export the value and classification accuracy of { C, σ }, algorithm
Terminate, otherwise go to step 3.3.
Below with oncogene express spectra data instance, the implementation procedure of the present invention is simplyd illustrate.This example selection knot
Intestinal cancer tumour expresses spectrum data set, and altogether comprising 62 samples, each sample is represented with the expression value of 2000 genes.
This 62 genes include 22 normal samples and 40 tumor samples.On the data set, specific execution step of the invention is such as
Under:
As shown in figure 1, a kind of tumour recognition methods based on certainty particle group optimizing and SVMs, including based on
The screening of the gene of classification information index and two-by-two redundancy approach, and it is excellent using certainty particle swarm optimization algorithm (IGPSO)
Change the step of SVMs carries out oncogene identification, comprise the following steps:
(1) data set is divided into training set and test set, on training set, using the improvement in classification information index method
Signal to noise ratio formula is calculated each gene.The information index of gene is bigger, its sample classification information included it is corresponding compared with
It is many, it is also corresponding stronger to the classification capacity of sample.Shown by table 1 is exactly colon cancer data set classification information distribution situation.Knot
173 genes of the information index more than 0.5 are selected in intestinal cancer data set as the gene subset of lower surface analysis.
(2) Pearson correlation coefficient between the exclusion of redundancy gene is by calculating two gene expression doses.Colon cancer number
Analyzed according to using 173 genes selected by above method.Calculate and compare through " redundancy two-by-two ", finally give 59
Gene.
(3) the gene subset that obtains to more than is calculated further according to the improvement signal to noise ratio formula in classification information index,
11 genes of colon cancer data set information index are selected, final key gene subset is used as.Shown by table 2 be exactly colon cancer most
The key gene subset that screened eventually.
(4) to two parameters progress initial setting up of SVMs, its hunting zone { the < σ < 6 of 0 < C < 16,0 }, most
The expansion step-length that the expansion step-length that big number of retries is set to 10, C is 0.3, σ is that 0.1. combines the crucial base of 11 obtained before
The two parameters of SVMs are optimized by cause with IGPSO algorithms.IGPSO algorithms in local particle according to (performance
Function) classification rate is scanned for the gradient direction of parameter { C, σ }, if reaching maximum reattempt times, do not find and preferably divides
Class rate, then expand hunting zone.
(5) on test set, with classifying after IGPSO algorithm optimizations SVM to it.Shown by table 3 is exactly colon cancer
Classification situation in sample.
Table 1 gives colon cancer classification information exponential distribution.
Colon cancer classification information exponential distribution in the present invention of table 1
Gene information index | Gene dosage | Account for the ratio of 2000 genes |
0.0~0.3 | 1524 | 76.2 |
0.3~0.5 | 303 | 15.15 |
0.5~1.897 | 173 | 8.65 |
Table 2 gives the colon cancer gene subset to be classified
Colon cancer gene subset in the present invention of table 2
Table 3 gives the classification situation in colon cancer sample in the present invention, when penalty factor is smaller, and it classifies wrong
Rate is all higher by mistake, when C increases, drastically reduces, i.e., classification performance is improved rapidly, and continues to increase C, and the change of classification performance is simultaneously
Not substantially, after C increases to certain value, performance no longer changes with C change, i.e., SVM is unwise to C in the larger context
Sense.C classification accuracy rates in the range of (6,15) are obtained in an experiment higher, that is to say, that SVM is insensitive to C in this region.
Experiment shows, in the state of optimal classification effect, appropriate to reduce σ value, that is, appropriate amendment is carried out to it, can be obvious
Improve classification accuracy rate.Finally test result indicates that σ values between (0.9,1.88) have good classifying quality.
Classification of the present invention of table 3 on colon cancer sample
Table 4 gives the comparison of method proposed by the present invention and SVM correlation techniques.
The inventive method of table 4 and the comparison of SVM correlation techniques
In the description of this specification, reference term " one embodiment ", " some embodiments ", " illustrative examples ",
The description of " example ", " specific example " or " some examples " etc. means to combine specific features, the knot that the embodiment or example are described
Structure, material or feature are contained at least one embodiment of the present invention or example.In this manual, to above-mentioned term
Schematic representation is not necessarily referring to identical embodiment or example.Moreover, specific features, structure, material or the spy of description
Point can in an appropriate manner be combined in any one or more embodiments or example.
Although an embodiment of the present invention has been shown and described, it will be understood by those skilled in the art that:Not
In the case of departing from the principle and objective of the present invention a variety of change, modification, replacement and modification can be carried out to these embodiments, this
The scope of invention is limited by claim and its equivalent.
Claims (4)
1. a kind of tumour recognition methods based on certainty particle group optimizing and SVMs, it is characterised in that including following
Step:
Step 1 oncogene expresses the pretreatment of spectrum data set, and oncogene expression spectrum data set is divided into training set first
And test set, then data set is normalized, final key gene subset is obtained;Step 2 proposes certainty grain
Subgroup optimized algorithm IGPSO, on training set, uses certainty particle swarm optimization algorithm Support Vector Machines Optimized SVM;Step 3
On test set, oncogene expression spectrum data set is known using obtained support vector machines are optimized in step 2
Not.
2. the tumour recognition methods according to claim 1 based on certainty particle group optimizing and SVMs, it is special
Levy and be, comprised the steps of in the step 1:
Oncogene expression spectrum data set is divided into training set and test set by step 1.1;
Step 1.2 calculates " the classification information index " of each gene in training set according to formula (1);
Wherein d (g) is gene g classification information index,Respectively gene g expressions in the positive negative sample of two classes
Average,WithThe standard deviation of respectively gene g expressions in the positive negative sample of two classes;
Step 1.3 selection is used as the gene set after preliminary filtering more than all genes of some classification information index threshold;
Step 1.4 calculates the Pearson between gene expression dose two-by-two after using classification information index method tentatively filtering
Coefficient correlation, chooses the gene set more than some value, reduces the size of alternative gene pool again;
Step 1.5 reuses classification information index side to more reduce the scope of key gene collection in alternative gene pool
Method, selection is used as final key gene subset more than all genes of some threshold value.
3. the tumour recognition methods according to claim 1 based on certainty particle group optimizing and SVMs, it is special
Levy and be, propose that certainty particle swarm optimization algorithm IGPSO is comprised the steps of in the step 2:
The population diversity of step 2.1 the position x of random initializtion population, speed v and each function in initial range
Threshold value σ;
Step 2.2 calculate each particle adaptive value and for fitness function its position gradient;
Step 2.3 is for each particle, and the adaptive value for the desired positions that its adaptive value and Individual Experience are crossed is compared, if
More preferably, then as current optimal location;
Step 2.4 is for each particle, and the adaptive value for the desired positions that its adaptive value is undergone with colony is compared, if
More preferably, then as colony's optimal location;
Step 2.5 is when population diversity value is more than the threshold value of setting, and the speed of each particle is updated according to formula (2), no
Then it is updated according to formula (5), and the position of more new particle;
Two stages are divided into based on certainty particle swarm optimization algorithm, the first stage is the process that attracts each other of particle, is divided into two
Individual step, first, when population diversity value is more than some appropriate threshold value, particle is born along fitness function to its position
Gradient direction, gathers towards global optimum's particle;When searching some optimal vertex neighborhood, using the strategy progressively declined, constantly
The speed of particle is reduced to carry out linear search;Two steps of this stage are respectively adopted formula (2) and formula (3) to describe;
vij(t+1)=w*gra (i, j)+c2*rand()*(pg-xij(t)) (2)
vij(t+1)=k*vij(t) (3)
Wherein Vi=(vi1, vi2..., vin) be particulate i current flight speed, Xi=(xi1, xi2..., xin) it is micro-
Grain i current location, w is inertia weight, pgFor global desired positions, k is the constant between (0,1);For fitness function f
(x), its corresponding negative gradient gra (i, j) is as follows:
Second stage is the mutually exclusive process of particle;When population diversity value is less than predetermined threshold value, adaptively to grain
Son is repelled to improve population diversity, while direction of the particle along gradient is scanned for and to other local best points
It is close;Population diversity is bigger, and its speed of scattering is smaller, and population diversity is smaller, and its speed of scattering is bigger;Particle rapidity is more
New formula is as follows:
Wherein diversity is the population diversity calculated by (6) formula;
Wherein S is population, | S | the particulate number included for colony, | L | it is the greatest radius of search space, N is the dimension of problem
Number, pijFor j-th of component of i-th of particulate;
If step 2.6 not up to end condition, goes to step 2.2, otherwise exports adaptive value.
4. the tumour recognition methods according to claim 1 based on certainty particle group optimizing and SVMs, it is special
Levy and be, comprised the steps of in the step 2 using certainty particle swarm optimization algorithm Support Vector Machines Optimized SVM:
Step 3.1:Set SVM C, σ parameter search spaces, xI, min≤xi≤xI, max, wherein penalty factor, kernel functional parameter
σ, xiFor parameter value, i represents number of parameters, and 2 are set to here, chooses at random a parameter when algorithm starts on search space
Value x;SVM classifying rules equation such as formula (7):
Training set T { (xi,yi);xi∈Rn;yi=± 1;I=1,2 ..., r }, wherein:xiFor training sample, x is sample to be judged
This, b is thresholding, αiIt is Lagrange multiplier, K (xi, x) it is kernel function;
The optimization problem of SVMs solution and constructed categorised decision function are as follows:
Wherein K (x, xi) it is kernel function, xiFor training sample, b is thresholding, αiIt is Lagrange multiplier, its effect is by its feature
Space reflection is to higher dimensional space;In actual applications, characterizing gene quantity is small, so using the SVM classifier based on RBF to swollen
Knurl sample is classified, and RBF is expressed as follows:
Step 3.2:The size for setting population is N, and classification accuracy requirement is F, and spreading factor is Ex, and local size is w=
[w1, w2], maximum reattempt times are Tmax, number of retries t and expansion factor start as 0;
Step 3.3:According to the search space p=[p of initialization when algorithm starts1, p2], expand search space by spreading factor Ex,
Local positions are calculated as follows so that local falls in this search space p+0.6Ex*w;
Step 3.4:Calculate the corresponding classification performance function f of xp;
Step 3.5:Optimal value is found with IGPSO algorithms, the corresponding classification performance function f of optimal value is drawnc;
Step 3.6:If searching more preferable classification rate (fp< fc), then t and Ex are set to 0, otherwise t=t+1;
Step 3.7:If t >=Tmax, then it is 0, Ex=Ex+1 to put t, is now possible to be absorbed in local optimum, increases hunting zone
To jump out current regional area;
Step 3.8:If reaching classification accuracy requirement (fp≤ F), then the value and classification accuracy of { C, σ } are exported, algorithm terminates,
Otherwise 3.3 are gone to step.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710122492.5A CN106971091B (en) | 2017-03-03 | 2017-03-03 | Tumor identification method based on deterministic particle swarm optimization and support vector machine |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710122492.5A CN106971091B (en) | 2017-03-03 | 2017-03-03 | Tumor identification method based on deterministic particle swarm optimization and support vector machine |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106971091A true CN106971091A (en) | 2017-07-21 |
CN106971091B CN106971091B (en) | 2020-08-28 |
Family
ID=59328372
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710122492.5A Active CN106971091B (en) | 2017-03-03 | 2017-03-03 | Tumor identification method based on deterministic particle swarm optimization and support vector machine |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106971091B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107643948A (en) * | 2017-09-30 | 2018-01-30 | 广东欧珀移动通信有限公司 | Application program management-control method, device, medium and electronic equipment |
CN108629158A (en) * | 2018-05-14 | 2018-10-09 | 浙江大学 | A kind of intelligent Lung Cancer cancer cell detector |
CN108875305A (en) * | 2018-05-14 | 2018-11-23 | 浙江大学 | A kind of leukaemia cancer cell detector of colony intelligence optimizing |
CN109947941A (en) * | 2019-03-05 | 2019-06-28 | 永大电梯设备(中国)有限公司 | A kind of method and system based on elevator customer service text classification |
CN110060740A (en) * | 2019-04-16 | 2019-07-26 | 中国科学院深圳先进技术研究院 | A kind of nonredundancy gene set clustering method, system and electronic equipment |
CN111383710A (en) * | 2020-03-13 | 2020-07-07 | 闽江学院 | Gene splice site recognition model construction method based on particle swarm optimization gemini support vector machine |
CN111582370A (en) * | 2020-05-08 | 2020-08-25 | 重庆工贸职业技术学院 | Brain metastasis tumor prognostic index reduction and classification method based on rough set optimization |
CN113707216A (en) * | 2021-08-05 | 2021-11-26 | 北京科技大学 | Infiltration immune cell proportion counting method |
CN113808659A (en) * | 2021-08-26 | 2021-12-17 | 四川大学 | Feedback phase regulation and control method based on gene gradient particle swarm optimization |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103258244A (en) * | 2013-04-28 | 2013-08-21 | 西北师范大学 | Method for predicting inhibiting concentration of pyridazine HCV NS5B polymerase inhibitor based on particle swarm optimization support vector machine |
CN105372202A (en) * | 2015-10-27 | 2016-03-02 | 九江学院 | Genetically modified cotton variety recognition method |
-
2017
- 2017-03-03 CN CN201710122492.5A patent/CN106971091B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103258244A (en) * | 2013-04-28 | 2013-08-21 | 西北师范大学 | Method for predicting inhibiting concentration of pyridazine HCV NS5B polymerase inhibitor based on particle swarm optimization support vector machine |
CN105372202A (en) * | 2015-10-27 | 2016-03-02 | 九江学院 | Genetically modified cotton variety recognition method |
Non-Patent Citations (3)
Title |
---|
童燕 等: "一种改进的基于粒子群优化的SVM训练算法", 《计算机工程与应用》 * |
郝爱丽: "支持向量机分类模型下的肿瘤基因辨识研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
韩飞 等: "一种改进的基于梯度搜索的粒子群优化算法", 《南京大学学报(自然科学)》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107643948A (en) * | 2017-09-30 | 2018-01-30 | 广东欧珀移动通信有限公司 | Application program management-control method, device, medium and electronic equipment |
CN107643948B (en) * | 2017-09-30 | 2020-06-02 | Oppo广东移动通信有限公司 | Application program control method, device, medium and electronic equipment |
CN108629158A (en) * | 2018-05-14 | 2018-10-09 | 浙江大学 | A kind of intelligent Lung Cancer cancer cell detector |
CN108875305A (en) * | 2018-05-14 | 2018-11-23 | 浙江大学 | A kind of leukaemia cancer cell detector of colony intelligence optimizing |
CN109947941A (en) * | 2019-03-05 | 2019-06-28 | 永大电梯设备(中国)有限公司 | A kind of method and system based on elevator customer service text classification |
CN110060740A (en) * | 2019-04-16 | 2019-07-26 | 中国科学院深圳先进技术研究院 | A kind of nonredundancy gene set clustering method, system and electronic equipment |
CN111383710A (en) * | 2020-03-13 | 2020-07-07 | 闽江学院 | Gene splice site recognition model construction method based on particle swarm optimization gemini support vector machine |
CN111582370A (en) * | 2020-05-08 | 2020-08-25 | 重庆工贸职业技术学院 | Brain metastasis tumor prognostic index reduction and classification method based on rough set optimization |
CN113707216A (en) * | 2021-08-05 | 2021-11-26 | 北京科技大学 | Infiltration immune cell proportion counting method |
CN113808659A (en) * | 2021-08-26 | 2021-12-17 | 四川大学 | Feedback phase regulation and control method based on gene gradient particle swarm optimization |
CN113808659B (en) * | 2021-08-26 | 2023-06-13 | 四川大学 | Feedback phase regulation and control method based on gene gradient particle swarm algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN106971091B (en) | 2020-08-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106971091A (en) | A kind of tumour recognition methods based on certainty particle group optimizing and SVMs | |
Ghareb et al. | Hybrid feature selection based on enhanced genetic algorithm for text categorization | |
Song et al. | Feature selection using bare-bones particle swarm optimization with mutual information | |
CN105426426B (en) | A kind of KNN file classification methods based on improved K-Medoids | |
Li et al. | An ant colony optimization based dimension reduction method for high-dimensional datasets | |
Huang et al. | Using glowworm swarm optimization algorithm for clustering analysis | |
CN108363810A (en) | A kind of file classification method and device | |
Alomari et al. | A hybrid filter-wrapper gene selection method for cancer classification | |
CN106778853A (en) | Unbalanced data sorting technique based on weight cluster and sub- sampling | |
CN104750844A (en) | Method and device for generating text characteristic vectors based on TF-IGM, method and device for classifying texts | |
CN111062425B (en) | Unbalanced data set processing method based on C-K-SMOTE algorithm | |
CN110909158B (en) | Text classification method based on improved firefly algorithm and K nearest neighbor | |
CN105045913B (en) | File classification method based on WordNet and latent semantic analysis | |
CN106548041A (en) | A kind of tumour key gene recognition methods based on prior information and parallel binary particle swarm optimization | |
Arowolo et al. | A survey of dimension reduction and classification methods for RNA-Seq data on malaria vector | |
CN108171012A (en) | A kind of gene sorting method and device | |
CN113436684A (en) | Cancer classification and characteristic gene selection method | |
CN106951728B (en) | Tumor key gene identification method based on particle swarm optimization and scoring criterion | |
Das et al. | Group incremental adaptive clustering based on neural network and rough set theory for crime report categorization | |
Chen et al. | A new particle swarm feature selection method for classification | |
CN105512675A (en) | Memory multi-point crossover gravitational search-based feature selection method | |
Wang et al. | Bayesian penalized method for streaming feature selection | |
CN115098690B (en) | Multi-data document classification method and system based on cluster analysis | |
Afif et al. | Genetic algorithm rule based categorization method for textual data mining | |
Sahu | Multi filter ensemble method for cancer prognosis and Diagnosis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |