CN108664763A - A kind of lung cancer carcinoma cell detection instrument that parameter is optimal - Google Patents
A kind of lung cancer carcinoma cell detection instrument that parameter is optimal Download PDFInfo
- Publication number
- CN108664763A CN108664763A CN201810458000.4A CN201810458000A CN108664763A CN 108664763 A CN108664763 A CN 108664763A CN 201810458000 A CN201810458000 A CN 201810458000A CN 108664763 A CN108664763 A CN 108664763A
- Authority
- CN
- China
- Prior art keywords
- gene
- value
- algorithm
- representing
- parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 208000020816 lung neoplasm Diseases 0.000 title claims abstract description 23
- 206010058467 Lung neoplasm malignant Diseases 0.000 title claims abstract description 21
- 201000005202 lung cancer Diseases 0.000 title claims abstract description 21
- 238000001514 detection method Methods 0.000 title claims abstract description 11
- 201000009030 Carcinoma Diseases 0.000 title abstract 2
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 55
- 238000002493 microarray Methods 0.000 claims abstract description 25
- 238000000034 method Methods 0.000 claims abstract description 21
- 230000008569 process Effects 0.000 claims abstract description 14
- 238000005457 optimization Methods 0.000 claims abstract description 9
- 230000014509 gene expression Effects 0.000 claims description 24
- 239000003016 pheromone Substances 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 12
- 230000035772 mutation Effects 0.000 claims description 11
- 230000002759 chromosomal effect Effects 0.000 claims description 9
- 241000257303 Hymenoptera Species 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 230000003044 adaptive effect Effects 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 230000002068 genetic effect Effects 0.000 claims description 3
- 230000002028 premature Effects 0.000 claims description 3
- 238000010276 construction Methods 0.000 abstract 1
- 238000011160 research Methods 0.000 description 5
- 206010028980 Neoplasm Diseases 0.000 description 3
- 208000037841 lung tumor Diseases 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 230000036210 malignancy Effects 0.000 description 2
- 230000001717 pathogenic effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- SNICXCGAKADSCV-JTQLQIEISA-N (-)-Nicotine Chemical compound CN1CCC[C@H]1C1=CC=CN=C1 SNICXCGAKADSCV-JTQLQIEISA-N 0.000 description 1
- FMMWHPNWAFZXNH-UHFFFAOYSA-N Benz[a]pyrene Chemical compound C1=C2C3=CC=CC=C3C=C(C=C3)C2=C2C3=CC=CC2=C1 FMMWHPNWAFZXNH-UHFFFAOYSA-N 0.000 description 1
- 238000000018 DNA microarray Methods 0.000 description 1
- 206010012374 Depressed mood Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000711 cancerogenic effect Effects 0.000 description 1
- 231100000315 carcinogenic Toxicity 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000002124 endocrine Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 208000033065 inborn errors of immunity Diseases 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 229960002715 nicotine Drugs 0.000 description 1
- SNICXCGAKADSCV-UHFFFAOYSA-N nicotine Natural products CN1CCCC1C1=CC=CN=C1 SNICXCGAKADSCV-UHFFFAOYSA-N 0.000 description 1
- XKLJHFLUAHKGGU-UHFFFAOYSA-N nitrous amide Chemical compound ON=N XKLJHFLUAHKGGU-UHFFFAOYSA-N 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 229910052699 polonium Inorganic materials 0.000 description 1
- HZEBHPIOVYHPMT-UHFFFAOYSA-N polonium atom Chemical compound [Po] HZEBHPIOVYHPMT-UHFFFAOYSA-N 0.000 description 1
- 208000028529 primary immunodeficiency disease Diseases 0.000 description 1
- 230000002285 radioactive effect Effects 0.000 description 1
- 208000000649 small cell carcinoma Diseases 0.000 description 1
- 239000000779 smoke Substances 0.000 description 1
- 230000000391 smoking effect Effects 0.000 description 1
- 206010041823 squamous cell carcinoma Diseases 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
Landscapes
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Chemical & Material Sciences (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Computational Biology (AREA)
- Analytical Chemistry (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a kind of lung cancer carcinoma cell detection instrument that parameter is optimal, which reads in module, data prediction and feature ordering module, parameter optimization module, model output module by gene microarray and forms.System first pre-processes the gene microarray data of input, then carries out importance ranking to remaining each gene, calculates correlation by counting score, recycles grader criterion function to calculate contribution degree, all gene importance are ranked up.Improved optimization method added under original intelligent optimizing algorithm fitness detection with population disturb, can prevent population diversity scatter and disappear and optimization process be absorbed in local optimum.Then the optimized parameter searched is completed into model construction as classifier parameters and exports result.System speed is fast, is suitble to on-line checking.
Description
Technical Field
The invention relates to the technical field of gene microarray data application, in particular to a lung cancer cell detector with optimal parameters.
Background
The 21 st century is a century of life science, data on a DNA Microarray (also called gene Microarray) has great research value potential, and in the aspect of basic medicine, the Microarray can be used for rapidly detecting the expression value of a huge amount of genes, comparing the expression difference of different typical samples, and carrying out discovery research, gene detection and the like of disease pathogenic genes. Clinically, the early stage of the tumor shows the change of the gene expression map in the cell, and the microarray data research can achieve the effect of early finding and treating so as to guide the clinical practice; lung cancer is one of the most rapidly growing malignancies that threaten human health and life. In many countries, the incidence and mortality of lung cancer have been reported to be significantly higher in recent 50 years, with lung cancer incidence and mortality in men accounting for the first of all malignancies, in women accounting for the second, and mortality accounting for the second. The etiology of lung cancer is not completely clear to date, and according to the us estimate in 1985, 80% of lung cancer in men and 79% of lung cancer in women are due to smoking. Nicotine, benzopyrene, nitrosamine, and a small amount of radioactive element polonium in smoke are carcinogenic, and are especially prone to squamous cell carcinoma and small cell carcinoma. The risk factors of lung tumor are many, and the medical field supposes that lung tumor is a multifactorial disease. Hypoimmunity, endocrine dyscrasia, mood depression and family inheritance may all cause lung tumors to occur. Research has shown that the incidence and mortality of lung cancer has increased year by year in recent years. How to find the pathogenic gene of lung cancer is a significant work. Gene microarray data typically have high dimensional and few sample features, usually the number of genes observed in each sample is thousands or even tens of thousands, and only tens of samples are observed in one experiment. In the pattern recognition problem, the dimension is too large, which leads to dimension disaster, on one hand, the algorithm time is exponentially increased along with the increase of the dimension, and the traditional informatics algorithm related to the probability density estimation cannot be carried out if the number of samples is too small. Finding a feasible method, reducing the feature dimension, selecting the optimal feature gene subset, and making the data separable most under the feature subspace is the most urgent requirement. And under the condition of selecting the optimal feature subset, how to select the parameters of the classifier avoids the inefficiency and randomness caused by manual parameter adjustment, and is also a great research hotspot at present.
Disclosure of Invention
In order to overcome the defects that the optimal characteristic subset of gene microarray data and the optimal parameters for classification are difficult to search at present, the invention aims to provide a lung cancer cell detector with optimal parameters.
The technical scheme adopted by the invention for solving the technical problems is as follows: the lung cancer cell detector with optimal parameters consists of a gene microarray reading module, a data preprocessing and feature sorting module, a parameter optimizing module and a model output module; wherein:
the gene microarray reading module reads in the category labels Y ═ Y of all the gene microarrays1,y2,...,ym]Wherein y isiK, k ∈ (-1,1), and gene microarray expression values for all samples:
wherein each row xiRepresenting the expression values of all genes in a sample, corresponding to each column xjRepresents the expression value of one gene in all samples, the index i represents the ith sample, and m in total, and the index j represents the jth gene, and n in total.
The data preprocessing and feature sorting module is used for performing normalization and feature sorting processing on read-in original microarray data. Wherein the normalization operation is:
wherein Min and Max are respectively the minimum value and the maximum value of the gene expression value of the sample. And feature ordering selection is achieved by scoring the contribution of each gene to classification accuracy by defining a contribution function:
wherein α ═ α1,...,αn],Hij=yiyjK(xi,xj) in fact, the formula represents the square value of the classification boundary size, and then the formula has the following components:
definition ofw is the normal vector of the classification absolute plane, w*corresponding for optimal normal vector, alpha being normal vectorcoefficient, alpha*And the optimal normal vector corresponds to the coefficient. Observing the above formula, one can obtain: the importance of each feature is determined according to the contribution of the feature to the cost function, that is, the contribution value of each feature is:where δ represents the degree of contribution.
When using a non-linear kernel as the kernel function, the following approximation can be generally calculated:
in this case, it is reasonable to assume that the α value is unchanged after a certain feature is eliminated, and H (-i) represents the H matrix value after the feature is eliminatedWherein xiRepresenting an n × 1 input feature vector, tiRepresenting an m x 1 target vector. Given an activation function g (x) and the number of nodes in the hidden layerThen the ELM gene detection system is:
wherein, ω isiRepresenting the weight vector between the i-th hidden layer node and the input layer, birepresents the bias of the i-th hidden layer node, βiRepresenting the weight vector between the i-th hidden layer node and the output layer, ojRepresenting the target output for the jth input data. In addition, ωi·xjRepresents omegaiAnd xjThe inner product of (d).
The output of the network can be infinitely close to the N samples of the input, i.e.:
the following can be obtained:
the above formula can be expressed in matrix form, H β ═ T
Wherein H represents the output matrix of the hidden layer, the ith column of H represents the ith node of the hidden layer corresponding to N input x1,x2,…,xNthe input weights of the single hidden layer feedforward neural network (SLFNs) and the bias of the hidden layer need not be adjusted during the network training process and can be arbitrarily givenObtaining:
the solution of the equation can be quickly solved by using a linear method, as shown in the formula:
wherein,a Moore-Penrose generalized inverse matrix representing H,represents the minimum norm least squares solution, which is exactly the solution with the minimum norm in the least squares solution. Compared with a plurality of existing gene detection systems, the extreme learning machine can achieve a good training effect at a very high speed through the solution of the Moore-Penrose generalized inverse.
The parameter optimization module design uses an improved parameter optimization algorithm to increase the diversity of the population, and the specific design is as follows:
1) initializing the population information of the DE algorithm:
in the population, random generation:
in the above formula xi(0) Represents the expression value of the i-th individual chromosomal gene of the first generation, xj,i(0) The expression value of the jth chromosomal gene in the ith individual of the initial generation, rand (0,1) is a uniform random number in the interval (0,1), NP is the population size, and superscript L, U represents the lower and upper bound values, respectively.
2) Mutation operation (Mutation): the DE algorithm is distinguished from the Genetic Algorithm (GA) in that it is carried out using a scoring strategy
Variation, by randomly choosing the difference between two individuals, scaled and vector-summed with the target individual, i.e.
vi(g+1)=xr1(g)+F·(xr2(g)-xr3(g)),i≠r1≠r2≠r3
In the above formula, g represents the g-th generation, F is the scaling factor of two random vector differences, vi(g +1) is a variant intermediate variable, xr1(g)、xr2(g)、xr3(g) The expression values of the r-th chromosomal gene of the 1 st, 2 nd and 3 rd individuals of the g-th generation are shown, respectively.
3) Crossover operation (Crossover): the g generation population xi(g) And the intermediate variable v generated in step 2)i(g +1) are crossed to give
In the above formula, CR is the set crossover rate, uj,i(g +1) is a crossover intermediate variable.
4) Selection operation (Selection): the differential evolution algorithm uses a common greedy algorithm to reserve the next generation if the population fitness f (u) is generated in a crossed manneri(g +1) is greater than the population fitness f (x) of the previous generationi(g) Otherwise, the population is unchanged, i.e.
In order to avoid the premature situation, an adaptive operator lambda is designed:
in the above formula GmaxRepresenting the maximum number of iterations, G representing the current number of iterations, F0The value is a mutation operator, the value is larger at the initial stage, the sample diversity is ensured, and the value is gradually reduced at the later stageProtecting the good information of the evolution process. In the differential evolution algorithm, if the fitness can not exceed the historical optimum all the time after a certain number of iterations, the algorithm is considered to be involved in the local optimum, and at the moment, the group intelligent algorithm is utilized to jump out of the differential evolution algorithm:
5) initializing the current position point information to an ant colony intelligent algorithm, wherein the number of the ant individuals is as follows: m, pheromone concentration: tau isij=c(c>0)。
6) Simulating the probability that all ants 1,2, m move to the end point and each ant moves from the current position i to the next position jComprises the following steps:
7) when one iteration is finished, namely when all ants finish the path, updating the current pheromone concentration:
in the above formula, rho is the pheromone concentration volatilization coefficient,representing the concentration of pheromones left by ant k on path ij, which can be defined as follows according to the relationship that pheromone concentration is inversely proportional to path length:
in the above formula, C is a proportional constant, and L is a path length.
8) After a new candidate solution is obtained, the historical best is compared with the historical best and updated.
9) The above process is iteratively run until a maximum algebra is reached. And then inputting the historical optimal parameters as final results of parameter optimization into a model output module.
And the model output module directly inputs the patient data by using the model obtained in the process, and a result can be obtained according to the label value.
The invention has the following beneficial effects: in the intelligent optimization process, the invention sets monitoring variables to increase the diversity of the population, thereby increasing the probability of searching the optimal parameters, having high system speed and being suitable for online detection.
Drawings
FIG. 1 is a schematic structural view of the present invention;
fig. 2 is a flow chart of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings.
Referring to fig. 1, a lung cancer cell detector with optimal parameters, which comprises a gene microarray reading module 1, a data preprocessing and feature sorting module 2, a parameter optimizing module 3 and a model output module 4; wherein:
the gene microarray reading module 1 reads in the category labels Y ═ Y of all gene microarrays1,y2,...,ym]Wherein y isiK, k ∈ (-1,1), and gene microarray expression values for all samples:
wherein each row xiRepresenting a sample officeWith the expression value of the gene, corresponding to each column xjRepresents the expression value of one gene in all samples, the index i represents the ith sample, and m in total, and the index j represents the jth gene, and n in total.
The data preprocessing and feature sorting module 2 is a processing process for normalizing and feature sorting the read-in original microarray data. Wherein the normalization operation is:
wherein Min and Max are respectively the minimum value and the maximum value of the gene expression value of the sample. And feature ordering selection is achieved by scoring the contribution of each gene to classification accuracy by defining a contribution function:
wherein α ═ α1,...,αn],Hij=yiyjK(xi,xj) in fact, the formula represents the square value of the classification boundary size, and then the formula has the following components:
definition ofw is the normal vector of the classification absolute plane, w*is the optimal normal vector, α is the coefficient corresponding to the normal vector, α*And the optimal normal vector corresponds to the coefficient. Observe the above formulaIt is possible to obtain: the importance of each feature is determined according to the contribution of the feature to the cost function, that is, the contribution value of each feature is:where δ represents the degree of contribution.
When using a non-linear kernel as the kernel function, the following approximation can be generally calculated:
in this case, it is reasonable to assume that the α value is unchanged after a certain feature is eliminated, and H (-i) represents the H matrix value after the feature is eliminatedWherein xiRepresenting an n × 1 input feature vector, tiRepresenting an m x 1 target vector. Given an activation function g (x) and the number of nodes in the hidden layerThen the ELM gene detection system is:
wherein, ω isiRepresenting the weight vector between the i-th hidden layer node and the input layer, birepresents the bias of the i-th hidden layer node, βiRepresenting the weight vector between the i-th hidden layer node and the output layer, ojRepresenting the target output for the jth input data. In addition, ωi·xjRepresents omegaiAnd xjThe inner product of (d).
The output of the network can be infinitely close to the N samples of the input, i.e.:
the following can be obtained:
the above formula can be expressed in matrix form, H β ═ T
Wherein H represents the output matrix of the hidden layer, the ith column of H represents the ith node of the hidden layer corresponding to N input x1,x2,…,xNthe input weights of the single hidden layer feedforward neural network (SLFNs) and the bias of the hidden layer need not be adjusted during the network training process and can be arbitrarily givenObtaining:
the solution of the equation can be quickly solved by using a linear method, as shown in the formula:
wherein,a Moore-Penrose generalized inverse matrix representing H,represents the minimum norm least squares solution, which is exactly the solution with the minimum norm in the least squares solution. Compared with a plurality of existing gene detection systems, the extreme learning machine can achieve a good training effect at a very high speed through the solution of the Moore-Penrose generalized inverse.
The parameter optimizing module 3 is designed to use an improved parameter optimizing algorithm to increase the diversity of the population, and is specifically designed as follows:
1) initializing the population information of the DE algorithm:
in the population, random generation:
in the above formula xi(0) Represents the expression value of the i-th individual chromosomal gene of the first generation, xj,i(0) The expression value of the jth chromosomal gene in the ith individual of the initial generation, rand (0,1) is a uniform random number in the interval (0,1), NP is the population size, and superscript L, U represents the lower and upper bound values, respectively.
2) Mutation operation (Mutation): the DE algorithm is distinguished from the Genetic Algorithm (GA) in that it employs a scoring strategy for mutation, by randomly selecting the difference between two individuals, scaling and then summing the scaled differences with the target individual, i.e. the DE algorithm is characterized by the fact that
vi(g+1)=xr1(g)+F·(xr2(g)-xr3(g)),i≠r1≠r2≠r3
In the above formula, g represents the g-th generation, F is the scaling factor of two random vector differences, vi(g +1) is a variant intermediate variable, xr1(g)、xr2(g)、xr3(g) The expression values of the r-th chromosomal gene of the 1 st, 2 nd and 3 rd individuals of the g-th generation are shown, respectively.
3) Crossover operation (Crossover): the g generation population xi(g) And the intermediate variable v generated in step 2)i(g +1) are crossed to give
In the above formula, CR is the set crossover rate, uj,i(g +1) is a crossover intermediate variable.
4) Selection operation (Selection): the differential evolution algorithm uses a common greedy algorithm to reserve the next generation if the population fitness f (u) is generated in a crossed manneri(g +1) is greater than the population fitness f (x) of the previous generationi(g) Otherwise, the population is unchanged, i.e.
In order to avoid the premature situation, an adaptive operator lambda is designed:
in the above formula GmaxRepresenting the maximum number of iterations, G representing the current number of iterations, F0The method is a mutation operator, the value is large at the initial stage, the diversity of the sample is ensured, and the value is gradually reduced at the later stage, so that the excellent information in the evolution process is protected. In the differential evolution algorithm, if the fitness can not exceed the experience all the time after a certain number of iterationsAnd (3) the history is optimal, namely the history is considered to be trapped into local optimal, and at the moment, a group intelligent algorithm is utilized to jump out of a differential evolution algorithm:
5) initializing the current position point information to an ant colony intelligent algorithm, wherein the number of the ant individuals is as follows: m, pheromone concentration: tau isij=c(c>0)。
6) Simulating the probability that all ants 1,2, m move to the end point and each ant moves from the current position i to the next position jComprises the following steps:
7) when one iteration is finished, namely when all ants finish the path, updating the current pheromone concentration:
in the above formula, rho is the pheromone concentration volatilization coefficient,representing the concentration of pheromones left by ant k on path ij, which can be defined as follows according to the relationship that pheromone concentration is inversely proportional to path length:
in the above formula, C is a proportional constant, and L is a path length.
8) After a new candidate solution is obtained, the historical best is compared with the historical best and updated.
9) The above process is iteratively run until a maximum algebra is reached. The historical optimum parameters are then input to the model output module 4 as the final result of the parameter optimization.
The parameters output by the parameter optimizing module 3 enter the model output module 4 to be used as the parameters of the model. And the model output module 4 analyzes and analyzes the subsequently input actual lung cancer patient gene microarray data.
The above-described embodiments are intended to illustrate rather than to limit the invention, and any modifications and variations of the present invention are within the spirit of the invention and the scope of the appended claims.
Claims (5)
1. An optimal parameter lung cancer cell detector is characterized in that: the system consists of a gene microarray reading module, a data preprocessing and feature sorting module, a parameter optimizing module and a model output module.
2. The parameter optimized lung cancer cell detector of claim 1, wherein: the gene microarray reading module reads in all the class labels Y ═ Y of the gene microarrays1,y2,...,ym]Wherein y isiK, k ∈ (-1,1), and gene microarray expression values for all samples:
wherein each row xiRepresenting the expression values of all genes in a sample, corresponding to each column xjRepresents the expression value of one gene in all samples, the index i represents the ith sample, and m in total, and the index j represents the jth gene, and n in total.
3. The parameter optimized lung cancer cell detector of claim 1, wherein: the data preprocessing and feature sorting module is used for normalizing and feature sorting the original microarray data read in by the gene microarray reading module. Wherein the normalization operation is:
wherein Min and Max are respectively the minimum value and the maximum value of the gene expression value of the sample. And feature ordering selection is achieved by scoring the contribution of each gene to classification accuracy by defining a contribution function:
wherein α ═ α1,...,αn],Hij=yiyjK(xi,xj) in fact, the formula represents the square value of the classification boundary size, and then the formula has the following components:
definition ofw is the normal vector of the classification absolute plane, w*is the optimal normal vector, α is the coefficient corresponding to the normal vector, α*And the optimal normal vector corresponds to the coefficient. Observing the above formula, one can obtain: the importance of each feature is determined according to the contribution of the feature to the cost function, that is, the contribution value of each feature is:where δ represents the degree of contribution.
When using a non-linear kernel as the kernel function, the following approximation can be generally calculated:
in which, it is reasonable to assume that the alpha value is unchanged after a certain feature is eliminated, H (-i) represents the H matrix value after the feature is eliminated, and when this assumption is used, the obtained result is not much different from the result of the linear kernelWherein xiRepresenting an n × 1 input feature vector, tiRepresenting an m x 1 target vector. Given an activation function g (x) and the number of nodes in the hidden layerThen the ELM gene detection system is:
wherein, ω isiRepresenting the ith hidden layer sectionWeight vector between point and input layer, birepresents the bias of the i-th hidden layer node, βiRepresenting the weight vector between the i-th hidden layer node and the output layer, ojRepresenting the target output for the jth input data. In addition, ωi·xjRepresents omegaiAnd xjThe inner product of (d).
The output of the network can be infinitely close to the N samples of the input, i.e.:
the following can be obtained:
the above formula can be expressed in matrix form, H β ═ T
Wherein H represents the output matrix of the hidden layer, the ith column of H represents the ith node of the hidden layer corresponding to N input x1,x2,…,xNthe input weights of the single hidden layer feedforward neural network (SLFNs) and the bias of the hidden layer need not be adjusted during the network training process and can be arbitrarily givenObtaining:
the solution of the equation can be quickly solved by using a linear method, as shown in the formula:
wherein,a Moore-Penrose generalized inverse matrix representing H,represents the minimum norm least squares solution, which is exactly the solution with the minimum norm in the least squares solution. Compared with a plurality of existing gene detection systems, the extreme learning machine can achieve a good training effect at a very high speed through the solution of the Moore-Penrose generalized inverse.
4. The parameter optimized lung cancer cell detector of claim 1, wherein: the parameter optimizing module increases the diversity of the population by using an improved parameter optimizing algorithm, which is as follows:
1) initializing the population information of the DE algorithm:
in the population, random generation:
in the above formula xi(0) Represents the expression value of the i-th individual chromosomal gene of the first generation, xj,i(0) The expression value of the jth chromosomal gene in the ith individual of the initial generation, rand (0,1) is a uniform random number in the interval (0,1), NP is the population size, and superscript L, U represents the lower and upper bound values, respectively.
2) Mutation operation (Mutation): the DE algorithm is distinguished from the Genetic Algorithm (GA) in that it employs a scoring strategy for mutation, by randomly selecting the difference between two individuals, scaling and then summing the scaled differences with the target individual, i.e. the DE algorithm is characterized by the fact that
vi(g+1)=xr1(g)+F·(xr2(g)-xr3(g)),i≠r1≠r2≠r3
In the above formula, g represents the g-th generation, F is the scaling factor of two random vector differences, vi(g +1) is a variant intermediate variable, xr1(g)、xr2(g)、xr3(g) The expression values of the r-th chromosomal gene of the 1 st, 2 nd and 3 rd individuals of the g-th generation are shown, respectively.
3) Crossover operation (Crossover): the g generation population xi(g) And the intermediate variable v generated in step 2)i(g +1) are crossed to give
In the above formula, CR is the set crossover rate, uj,i(g +1) is a crossover intermediate variable.
4) Selection operation (Selection): the differential evolution algorithm uses a common greedy algorithm to reserve the next generation if the population fitness f (u) is generated in a crossed manneri(g +1) is greater than the population fitness f (x) of the previous generationi(g) Otherwise, the population is unchanged, i.e.
In order to avoid the premature situation, an adaptive operator lambda is designed:
in the above formula GmaxRepresenting the maximum number of iterations, G representing the current number of iterations, F0The method is a mutation operator, the value is large at the initial stage, the diversity of the sample is ensured, and the value is gradually reduced at the later stage, so that the excellent information in the evolution process is protected. In the differential evolution algorithm, if the fitness can not exceed the historical optimum all the time after a certain iteration number, the fitness is considered to be involved in the local optimum, and at the moment, the crowd intelligence is utilizedThe algorithm can jump out of the differential evolution algorithm:
5) initializing the current position point information to an ant colony intelligent algorithm, wherein the number of the ant individuals is as follows: m, pheromone concentration: tau isij=c(c>0)。
6) Simulating the probability that all ants 1,2, m move to the end point and each ant moves from the current position i to the next position jComprises the following steps:
7) when one iteration is finished, namely when all ants finish the path, updating the current pheromone concentration:
in the above formula, rho is the pheromone concentration volatilization coefficient,representing the concentration of pheromones left by ant k on path ij, which can be defined as follows according to the relationship that pheromone concentration is inversely proportional to path length:
in the above formula, C is a proportional constant, and L is a path length.
8) After a new candidate solution is obtained, the historical best is compared with the historical best and updated.
9) The above process is iteratively run until a maximum algebra is reached. And then inputting the historical optimal parameters as final results of parameter optimization into a model output module.
5. The parameter optimized lung cancer cell detector of claim 1, wherein: the model output module directly inputs patient data by using the model obtained by the parameter optimizing module, and a result can be obtained according to the label value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810458000.4A CN108664763A (en) | 2018-05-14 | 2018-05-14 | A kind of lung cancer carcinoma cell detection instrument that parameter is optimal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810458000.4A CN108664763A (en) | 2018-05-14 | 2018-05-14 | A kind of lung cancer carcinoma cell detection instrument that parameter is optimal |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108664763A true CN108664763A (en) | 2018-10-16 |
Family
ID=63779515
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810458000.4A Pending CN108664763A (en) | 2018-05-14 | 2018-05-14 | A kind of lung cancer carcinoma cell detection instrument that parameter is optimal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108664763A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111241957A (en) * | 2020-01-04 | 2020-06-05 | 圣点世纪科技股份有限公司 | Finger vein in-vivo detection method based on multi-feature fusion and DE-ELM |
CN113270144A (en) * | 2021-06-23 | 2021-08-17 | 北京易奇科技有限公司 | Phenotype-based gene priority ordering method and electronic equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101145171A (en) * | 2007-09-15 | 2008-03-19 | 中国科学院合肥物质科学研究院 | Gene microarray data predication method based on independent component integrated study |
CN103793600A (en) * | 2014-01-16 | 2014-05-14 | 西安电子科技大学 | Isolated component analysis and linear discriminant analysis combined cancer forecasting method |
CN105825081A (en) * | 2016-04-20 | 2016-08-03 | 苏州大学 | Gene expression data classification method and system |
-
2018
- 2018-05-14 CN CN201810458000.4A patent/CN108664763A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101145171A (en) * | 2007-09-15 | 2008-03-19 | 中国科学院合肥物质科学研究院 | Gene microarray data predication method based on independent component integrated study |
CN103793600A (en) * | 2014-01-16 | 2014-05-14 | 西安电子科技大学 | Isolated component analysis and linear discriminant analysis combined cancer forecasting method |
CN105825081A (en) * | 2016-04-20 | 2016-08-03 | 苏州大学 | Gene expression data classification method and system |
Non-Patent Citations (4)
Title |
---|
XINTENG GAO 等: "A novel effective diagnosis model based on optimized least squares support machine for gene microarray", 《APPLIED SOFT COMPUTING》 * |
刘志斌 等: "《最优化方法及应用案例》", 30 November 2013, 石油工业出版社 * |
熊伟丽 等: "基于改进差分进化算法的非线性系统模型参数辨识", 《计算机应用研究》 * |
贺二公子: "简单易学的机器学习算法-极限学习机(ELM)", 《HTTPS://BLOG.CSDN.NET/HELI200482128/ARTICLE/DETAILS/79149829》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111241957A (en) * | 2020-01-04 | 2020-06-05 | 圣点世纪科技股份有限公司 | Finger vein in-vivo detection method based on multi-feature fusion and DE-ELM |
CN113270144A (en) * | 2021-06-23 | 2021-08-17 | 北京易奇科技有限公司 | Phenotype-based gene priority ordering method and electronic equipment |
CN113270144B (en) * | 2021-06-23 | 2022-02-11 | 北京易奇科技有限公司 | Phenotype-based gene priority ordering method and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2015243031B2 (en) | Application of Machine Learning Methods for Mining Association Rules in Plant and Animal Data Sets Containing Molecular Genetic Markers, Followed by Classification or Prediction Utilizing Features Created from these Association Rules | |
CN116982113A (en) | Machine learning driven plant gene discovery and gene editing | |
CN108985010B (en) | Gene classification method and apparatus | |
CN111370073B (en) | Medicine interaction rule prediction method based on deep learning | |
CN112258223B (en) | Marketing advertisement click prediction method based on decision tree | |
CN112232413A (en) | High-dimensional data feature selection method based on graph neural network and spectral clustering | |
CN110210973A (en) | Insider trading recognition methods based on random forest and model-naive Bayesian | |
Ayan | Genetic algorithm-based hyperparameter optimization for convolutional neural networks in the classification of crop pests | |
CN108664763A (en) | A kind of lung cancer carcinoma cell detection instrument that parameter is optimal | |
CN117520914A (en) | Single cell classification method, system, equipment and computer readable storage medium | |
CN112488188B (en) | Feature selection method based on deep reinforcement learning | |
Hegde et al. | Customer Churn Analysis in Financial Domain using Deep Intelligence Network | |
CN113035363B (en) | Probability density weighted genetic metabolic disease screening data mixed sampling method | |
CN115083511A (en) | Peripheral gene regulation and control feature extraction method based on graph representation learning and attention | |
Prajapati et al. | Feature Selection using Ant Colony Optimization for Microarray Data Classification | |
Klawonn et al. | Exploiting class learnability in noisy data | |
CN113296947A (en) | Resource demand prediction method based on improved XGboost model | |
Jaenal et al. | Analysis of Effectiveness Particle Swarm Optimization in Improving The Performance of Naïve Bayes Algorithm | |
CN114997366B (en) | Protein structure model quality assessment method based on graph neural network | |
Santhiya et al. | Multi-class Classification of Insects using Deep Neural Networks | |
CN108695002A (en) | A kind of intelligent Colon cancer cancer cell detector | |
Mehta et al. | Identification and Classification of Crop Diseases Using Transfer Learning Based Convolution Neural Network | |
CN108376261B (en) | Tobacco classification method based on density and online semi-supervised learning | |
Idris et al. | Incremental Learning Approach for Tomato Leaf Disease Detection Without Catastrophic Forgetting Problem | |
CN108715804A (en) | A kind of lung cancer carcinoma cell detection instrument of colony intelligence optimizing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181016 |