CN113838519A - Gene selection method and system based on adaptive gene interaction regularization elastic network model - Google Patents

Gene selection method and system based on adaptive gene interaction regularization elastic network model Download PDF

Info

Publication number
CN113838519A
CN113838519A CN202110959928.2A CN202110959928A CN113838519A CN 113838519 A CN113838519 A CN 113838519A CN 202110959928 A CN202110959928 A CN 202110959928A CN 113838519 A CN113838519 A CN 113838519A
Authority
CN
China
Prior art keywords
gene
adaptive
network model
interaction
regularization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110959928.2A
Other languages
Chinese (zh)
Other versions
CN113838519B (en
Inventor
王雅娣
朱海红
刘荣
王芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University
Original Assignee
Henan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University filed Critical Henan University
Priority to CN202110959928.2A priority Critical patent/CN113838519B/en
Publication of CN113838519A publication Critical patent/CN113838519A/en
Application granted granted Critical
Publication of CN113838519B publication Critical patent/CN113838519B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/30Microarray design
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a gene selection method and a system based on an adaptive gene interaction regularization elastic network model, wherein the method comprises the following steps: assessing the degree of importance of each measured gene based on Wilcoxon rank sum test; quantifying the importance degree of each measured gene, adding self-adaptive penalty weight, and further deleting noise genes to obtain characteristic genes; introducing the punishment weight into a least square loss function, and constructing a self-adaptive elastic network model; constructing an adjacency matrix of the gene interaction network; constructing a gene interaction network penalty based on the adjacency matrix; combining the adaptive elastic network model with the gene interaction network penalty to construct an adaptive gene interaction regularization elastic network model; and solving the optimal solution of the regularized elastic network model based on a gradient descent algorithm, and selecting genes based on the optimal solution. The present invention can adaptively select important genes highly related to the generation of tumors and remove redundant, unrelated genes and noise genes.

Description

Gene selection method and system based on adaptive gene interaction regularization elastic network model
Technical Field
The invention belongs to the technical field of biological information, and particularly relates to a gene selection method and system based on an adaptive gene interaction regularization elastic network model.
Background
Tumors become one of the main diseases threatening human life health, according to the 2018 global cancer statistical data report, the number of new cancer cases will reach 1810 ten thousand in 2018, the number of deaths will reach 960 ten thousand in 2018, and the number of cancer diagnosed is rapidly increasing every year, however, the research on the treatment means and the prevention means of cancer is not comprehensive. With the application and development of a large number of gene chip technologies, people can continuously obtain normal gene expression information of various tissues, and find out a small number of genes with differential expression in different disease categories from a large number of genes measured on a gene chip, which is the key point for carrying out accurate disease judgment and providing reliable diagnosis bases. Meanwhile, the method also provides convenience for the development of further disease-resistant medicines.
The learning method based on machine learning and statistical analysis is helpful to provide important references for tumor diagnosis, cancer classification, clinical outcome prediction and the like, and thus has attracted extensive attention of a large number of scholars. In biomedicine, microarray data is widely used in the study of cancer classification and prognosis. DNA microarray data is typically represented in the form of a matrix, with the values of the elements in the matrix representing gene expression level information. The specific format is shown in table 1, where the rows represent samples, the columns represent a gene, and the rightmost column represents the class label for the sample.
TABLE 1 matrix form of DNA microarray data
Figure BDA0003221693710000011
Figure BDA0003221693710000021
The gene selection is an important step for researching and analyzing a gene expression profile, a small amount of subsets containing key gene information are selected from the high-dimensional gene expression profile, and the gene value in an original sample is not changed in the gene selection process, but redundant genes are removed and genes related to classification are reserved.
The process of gene selection is as follows: (1) acquiring microarray data, (2) preprocessing the acquired data, (3) extracting characteristic genes, (4) performing classification modeling, and (5) analyzing the classification result. The specific classification process is shown in fig. 1.
Currently, there are many studies that use microarray data to classify diseases based on the expression levels of genes. Numerous studies have shown that the expression level of a gene is an important tool for finding characteristic genes and classes. Logistic regression is a commonly used method of feature classification, but for microarray data (i.e., the number of predictor variables p is much greater than the number of samples n), it may produce unstable estimates. Furthermore, the maximum likelihood method also produces unstable results when multiple collinearities exist between the predictor variables. Therefore, existing logistic regression methods are not suitable for disease classification based on gene expression levels. Based on L2Penalized logistic regression of norms and various L1Norm penalties and regularization methods have been successfully applied to disease classification. However, the existing L1The type penalty method may cause problems of low estimation efficiency and inconsistent variable selection results when modeling linear regression because a penalty is applied to all features without considering the importance degree of each feature. Some highly related genes present in disease classification should be selected or eliminated simultaneously as a gene population. From a learning point of view, this can be considered as a population effect, i.e., the estimated coefficients of some highly related genes are relatively close. As a new regularization method, elastic network models and their various generalizations are able to produce population effects in the process of creating classifiers, but such models are rarely bioanalytically interpretable and fail to adequately consider genetic interactions.
With the rise of large-scale cancer genome research and personalized medicine, comprehensive prediction of clinical results by utilizing multiomic data becomes an emerging research topic. Since the relative advantages of DNA methylation and gene expression in predicting cancer stage and patient survival are less pronounced in many cancers, prediction performance can be improved by combining gene expression profiling data, methylation profiling data and other gene measurement methods to predict clinical outcome. However, this requires the collection and integration of genomic data from a large number of patients, with a relatively large number of tasks.
Disclosure of Invention
The invention provides a gene selection method and a gene selection system based on an adaptive gene interaction regularization elastic network model aiming at the problems of low estimation efficiency, insufficient consideration of gene interaction and large task load of the existing gene selection method, overcomes the defects, can adaptively select important genes highly related to the generation of tumors, and removes redundant and irrelevant genes and noise genes.
In order to achieve the purpose, the invention provides a gene selection method based on an adaptive gene interactive regularization elastic network model, which is characterized in that the importance of each gene is evaluated based on Wilcoxon Rank Sum Test (WRST), and then adaptive punishment is applied to each gene according to the importance degree of the measured gene, so that a noise gene is deleted from the model, and a characteristic gene is identified; integrating gene measurement and interactive information between genes into a self-adaptive elastic network model, enhancing the sparsity of a structure, and selecting characteristic genes by utilizing a grouping effect to reduce redundancy; and solving the regularized elastic network model by using an iterative gradient descent algorithm. The method specifically comprises the following steps:
step 1: assessing the degree of importance of each measured gene based on Wilcoxon rank sum test;
step 2: quantifying the degree of importance of each measured gene;
and step 3: adding a self-adaptive penalty weight to each measured gene according to the importance degree of each quantified gene, and deleting a noise gene based on the self-adaptive penalty weight to obtain a characteristic gene;
and 4, step 4: introducing the self-adaptive penalty weight into a least square loss function so as to construct a self-adaptive elastic network model;
and 5: constructing an adjacency matrix of the gene interaction network;
step 6: constructing a gene interaction network penalty based on the adjacency matrix;
and 7: combining the self-adaptive elastic network model with the gene interaction network penalty to construct a self-adaptive gene interaction regularization elastic network model;
and 8: and solving the optimal solution of the self-adaptive gene interaction regularization elastic network model based on a gradient descent algorithm, and selecting genes based on the optimal solution.
Further, the step 1 comprises:
based on the Wilcoxon rank sum test, the importance of each measured gene was evaluated according to the following formula:
Figure BDA0003221693710000031
wherein I (.) is an indicator function;
Figure BDA0003221693710000032
represents the ith expression value of the jth gene; p represents the total number of genes measured; n is a radical of0And N1Index sets representing different sample classes, n0、n1Respectively represent samples N0、N1The number of (2); s (g)j) Denotes that the jth gene has different expression levels in two classes, 0. ltoreq. s (g)j)≤n0n1If s (g)j) Is close to 0 or n0n1It means that the j-th gene is an important characteristic gene in the classification.
Further, the step 2 comprises:
the genes were ranked according to the following formula:
R(gj)=max{s(gj),n0n1-(gj)}
when s (g)j) The closer to 0 orn0n1When is, R (g)j) The larger the value, the greater the importance of the jth gene in the classification problem.
Further, in step 3, the expression of the adaptive penalty weight is:
Figure BDA0003221693710000041
where n is the number of samples.
Further, the expression of the adaptive elastic network model is as follows:
Figure BDA0003221693710000042
wherein O is2Representing an adaptive elastic network model, y being a sample class, beta being an estimated coefficient of all genes, betajIs the estimated coefficient, x, of the jth geneiFor the input vector, λ and α are regularization parameters, and λ>0,α∈[0,1]。
Further, in the step 5, an adjacency matrix of the gene interaction network is constructed according to the following formula:
A=[aij]∈Rp×p
wherein R represents a real number set; a represents a adjacency matrix of the gene interaction network; a isijThe value is 0 or 1.
Further, in the step 6, a gene interaction network penalty is constructed according to the following formula:
Figure BDA0003221693710000043
wherein O is3Represents a genetic interaction network penalty, betaiTr (. lamda.) represents the trace of the matrix for the estimated coefficient of the ith gene.
Further, the expression of the adaptive gene interaction regularization elastic network model is as follows:
Figure BDA0003221693710000051
wherein F (X, A, beta) represents an adaptive gene interaction regularization elastic network model, X is an input matrix,
Figure BDA0003221693710000052
for the penalty term, γ is the regularization parameter.
The invention also provides a gene selection system based on the adaptive gene interaction regularization elastic network model, which comprises the following steps:
a gene importance assessment module for assessing the importance of each measured gene based on Wilcoxon rank sum test;
a gene importance quantification module for quantifying the importance of each measured gene;
the weighting module is used for adding self-adaptive penalty weight to each measured gene according to the importance degree of each quantized gene, and deleting noise genes based on the self-adaptive penalty weight to obtain characteristic genes;
the first construction module is used for introducing the genetic weight into a least square loss function so as to construct an adaptive elastic network model;
the second construction module is used for constructing an adjacency matrix of the gene interaction network;
the third construction module is used for constructing gene interaction network punishment based on the adjacency matrix;
the fourth construction module is used for combining the self-adaptive elastic network model and the gene interaction network penalty to construct a self-adaptive gene interaction regularization elastic network model;
and the gene obtaining module is used for solving the optimal solution of the self-adaptive gene interaction regularization elastic network model based on a gradient descent algorithm and selecting genes based on the optimal solution.
Compared with the prior art, the invention has the following beneficial effects:
the self-adaptive gene interaction regularization elastic network model expands and integrates gene interaction network information and the self-adaptive elastic network model so as to achieve the aim of better classification. The common elastic network model does not consider information of interaction between genes, and the proposed adaptive elastic network model contains information of gene interaction. The method integrates the self-adaptive elastic network model and the gene interaction network, and adopts a gradient descent algorithm to solve the optimal solution of the model, so that the gene importance and the gene interaction information are conveniently integrated to identify the characteristic genes, and the redundancy is reduced; it is also possible to adaptively select important genes highly correlated with the generation of tumors and remove redundant, irrelevant genes and noise genes.
Drawings
FIG. 1 is a schematic diagram of a gene selection process;
FIG. 2 is a basic flowchart of a method for selecting a gene based on an adaptive genetic interaction regularization elastic network model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a gene selection system based on an adaptive genetic interaction regularization elastic network model according to an embodiment of the present invention.
Detailed Description
The invention is further illustrated by the following examples in conjunction with the accompanying drawings:
as shown in fig. 2, a method for selecting a gene based on an adaptive gene interaction regularization elastic network model includes:
step 1: assessing the degree of importance of each measured gene based on Wilcoxon rank sum test;
step 2: quantifying the degree of importance of each measured gene;
and step 3: adding a self-adaptive penalty weight to each measured gene according to the importance degree of each quantified gene, and deleting a noise gene based on the self-adaptive penalty weight to obtain a characteristic gene;
and 4, step 4: introducing the self-adaptive penalty weight into a least square loss function so as to construct a self-adaptive elastic network model;
and 5: constructing an adjacency matrix of the gene interaction network;
step 6: constructing a gene interaction network penalty based on the adjacency matrix;
and 7: combining the self-adaptive elastic network model and the gene interaction network penalty to construct a self-adaptive gene interaction regularization elastic network model (AGIREN);
and 8: and solving the optimal solution of the self-adaptive gene interaction regularization elastic network model based on a gradient descent algorithm, and selecting genes based on the optimal solution.
Specifically, in order to be able to efficiently sort out important genes for classification, adaptive L is applied1And (4) type punishment, namely adding important genes in the classification into a punishment regression model.
Further, the step 1 comprises:
based on the Wilcoxon rank sum test, the importance of each measured gene was evaluated according to the following formula:
Figure BDA0003221693710000061
wherein I (.) is an indicator function;
Figure BDA0003221693710000062
represents the ith expression value of the jth gene; p represents the total number of genes measured; n is a radical of0And N1Index sets representing different sample classes, n0、n1Respectively represent samples N0、N1The number of (2); s (g)j) Denotes that the jth gene has different expression levels in two classes, 0. ltoreq. s (g)j)≤n0n1If s (g)j) Is close to 0 or n0n1It means that the j-th gene is an important characteristic gene in the classification.
Further, the step 2 comprises:
although the importance of each gene can be measured by Wilcoxon rank sum test, since this statistic cannot be used directly for adaptive penalty weighting, to quantify the importance of each gene, the genes are ranked according to the following formula:
R(gj)=max{s(gj),n0n1-(gj)}
wherein s (g)j) Close to 0 or n0n1Indicates that the jth gene is an important characteristic gene, and R (g) corresponding to the important genej) Larger value, and R (g) corresponding to the noise genej) The value is small; s (g)j) The closer to 0 or n0n1,R(gj) The larger the value, the greater the importance of the jth gene in the classification.
Further, in order to perform a distinguishing penalty according to the importance degree of each gene in the classification, in the step 3, the expression of the adaptive penalty weight is as follows:
Figure BDA0003221693710000071
where n is the number of samples. Noise genes get relatively large penalty weights, while key feature genes get smaller penalty weights.
Further, the importance of the gene pair for classification, i.e., the gene weight wjIntroducing the data into a common least square loss function so as to construct an adaptive elastic network model, wherein the expression of the adaptive elastic network model is as follows:
Figure BDA0003221693710000072
wherein O is2Representing an adaptive elastic network model, y being a sample class, beta being an estimated coefficient of all genes, betajIs the estimated coefficient, x, of the jth geneiIs an input vector, n is the number of samples, λ and α are regularization parameters, and λ>0,α∈[0,1]。
Further, in the step 5, an adjacency matrix of the gene interaction network is constructed according to the following formula:
A=[aij]∈Rp×p
wherein R represents a real numberSet, A represents the adjacency matrix of the gene interaction network; a isijThe value is 0 or 1. a isij0 means that the interaction between the ith gene and the jth gene is weak, and vice versa. It is worth mentioning that the gene interaction matrix A constructed in the present invention can be further optimized according to the degree of interaction between genes, and can also contain various types of interaction information, such as interaction between the target of transcription factor and protein. For example, proteins that interact more strongly may be assigned more weight than proteins that interact less strongly.
Further, to ensure that known interacting genes have similar coefficients and thus are more likely to be grouped together, it is desirable to maximize the overall grouping effect in the gene interaction network, constructing a gene interaction network penalty according to the following formula:
Figure BDA0003221693710000081
wherein O is3Represents a genetic interaction network penalty, betaiTr (. lamda.) represents the trace of the matrix for the estimated coefficient of the ith gene.
Further, in step 7, the expression of the adaptive genetic interaction regularization elastic network model is as follows:
Figure BDA0003221693710000082
wherein F (X, A, beta) represents an adaptive gene interaction regularization elastic network model, X is an input matrix,
Figure BDA0003221693710000083
for the penalty term, γ is the regularization parameter.
Further, in step 8, the expression of the adaptive gene interaction regularization elastic network model solved based on the gradient descent algorithm is as follows:
Figure BDA0003221693710000091
wherein
Figure BDA0003221693710000092
In order to be a penalty term,
Figure BDA0003221693710000093
representing an adaptive gene interaction regularization elastic network model solved based on a gradient descent algorithm, and the optimal solution is
Figure BDA0003221693710000094
Based on the optimal solution
Figure BDA0003221693710000095
Selecting a gene; specifically, a non-zero regression coefficient is an important gene closely related to cancer, and the larger the absolute value of the regression coefficient, the stronger the correlation between the gene and cancer.
Further, after the step 8, the method further comprises:
classification is performed based on the selected genes, and the classification results are analyzed.
Specifically, some highly related genes present in the classification of diseases should be selected or eliminated simultaneously as one gene population. As a new regularization method, the elastic network model and its various generalizations can produce a population effect in the process of creating a classifier. To be able to efficiently sort out important genes for classification, adaptive L is applied1And (4) type punishment, namely adding important genes in the classification into a punishment regression model. Expressing the importance degree of each gene based on a Wilcoxon rank sum test gene ordering method, proposing adaptive weight, quantifying the importance degree of each gene and carrying out differential punishment according to the importance degree of each gene in classification. However, noise genes get relatively large weights, while key signature genes get smaller weights. Thus, the importance of gene pairs for classification can be incorporated directly into the logistic regression model, i.e., the adaptive elastic network model
Figure BDA0003221693710000096
Gene-gene interactions are fundamental elements in the understanding of complex diseases, and phenotypes are thought to be the result of interactions between multiple key genes. When cancer classification is performed, it is necessary to consider the interaction of genes, and when a plurality of genes interact, all genes are not considered as characteristic genes because the information they carry inevitably has correlation due to the interaction of genes. To avoid redundancy, a network constraint based on gene interactions may be defined such that any variable in the network is likely to be placed in the same set. To ensure that genes with known interactions have similar coefficients and thus are more likely to be grouped together, it is desirable to maximize the overall grouping effect in the gene interaction network, i.e., the gene interaction regularization model
Figure BDA0003221693710000101
Constructing an adaptive gene interaction regularization elastic network model (AGEREN) according to the adaptive elastic network model and the gene interaction regularization model:
Figure BDA0003221693710000102
on the basis of the above embodiments, as shown in fig. 3, the present invention further provides a gene selection system based on an adaptive gene interaction regularization elastic network model, which includes:
a gene importance assessment module for assessing the importance of each measured gene based on Wilcoxon rank sum test;
a gene importance quantification module for quantifying the importance of each measured gene;
the weighting module is used for adding self-adaptive penalty weight to each measured gene according to the importance degree of each quantized gene, and deleting noise genes based on the self-adaptive penalty weight to obtain characteristic genes;
the first construction module is used for introducing the genetic weight into a least square loss function so as to construct an adaptive elastic network model;
the second construction module is used for constructing an adjacency matrix of the gene interaction network;
the third construction module is used for constructing gene interaction network punishment based on the adjacency matrix;
the fourth construction module is used for combining the self-adaptive elastic network model and the gene interaction network penalty to construct a self-adaptive gene interaction regularization elastic network model;
and the gene obtaining module is used for solving the optimal solution of the self-adaptive gene interaction regularization elastic network model based on a gradient descent algorithm and selecting genes based on the optimal solution.
In summary,
(1) the invention introduces the importance of genes in a classification method based on gene ordering through Wilcoxon rank sum test so as to better select characteristic genes which have important contribution to classification;
(2) according to the method, the self-adaptive penalty weight is applied to each gene, so that the noise gene has a larger penalty and is removed by the model, and the penalty of the characteristic gene is smaller and is reserved;
(3) because a large amount of redundant information exists between genes, in order to effectively remove the redundant genes, the invention constructs the penalty of gene-gene interaction network;
(4) and combining the three points. The self-adaptive gene interaction regularization elastic network model provided by the invention has the following two obvious characteristics. Firstly, the adaptive gene interactive regularization elastic network model is established on the basis of the adaptive elastic network model, so that the model has sparsity, relatively few characteristic genes are selected according to regression coefficients, and the selected characteristic genes play a key role in the processes of cancer classification, clinical result prediction and the like. Second, constructing a gene interaction network model can reduce redundant information between genes and is applicable to a wide variety of data types, such as DNA methylation data, gene expression profiling data, protein interactions, and the like.
The above shows only the preferred embodiments of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims (9)

1. A gene selection method based on an adaptive gene interaction regularization elastic network model is characterized by comprising the following steps:
step 1: assessing the degree of importance of each measured gene based on Wilcoxon rank sum test;
step 2: quantifying the degree of importance of each measured gene;
and step 3: adding a self-adaptive penalty weight to each measured gene according to the importance degree of each quantified gene, and deleting a noise gene based on the self-adaptive penalty weight to obtain a characteristic gene;
and 4, step 4: introducing the self-adaptive penalty weight into a least square loss function so as to construct a self-adaptive elastic network model;
and 5: constructing an adjacency matrix of the gene interaction network;
step 6: constructing a gene interaction network penalty based on the adjacency matrix;
and 7: combining the self-adaptive elastic network model with the gene interaction network penalty to construct a self-adaptive gene interaction regularization elastic network model;
and 8: and solving the optimal solution of the self-adaptive gene interaction regularization elastic network model based on a gradient descent algorithm, and selecting genes based on the optimal solution.
2. The method for selecting genes based on an adaptive gene interaction regularization elastic network model according to claim 1, wherein the step 1 comprises:
based on the Wilcoxon rank sum test, the importance of each measured gene was evaluated according to the following formula:
Figure FDA0003221693700000011
wherein I (.) is an indicator function;
Figure FDA0003221693700000012
represents the ith expression value of the jth gene; p represents the total number of genes measured; n is a radical of0And N1Index sets representing different sample classes, n0、n1Respectively represent samples N0、N1The number of (2); s (g)j) Denotes that the jth gene has different expression levels in two classes, 0. ltoreq. s (g)j)≤n0n1If s (g)j) Is close to 0 or n0n1It means that the j-th gene is an important characteristic gene in the classification.
3. The method for selecting genes based on an adaptive gene interaction regularization elastic network model according to claim 2, wherein said step 2 comprises:
the genes were ranked according to the following formula:
R(gj)=max{s(gj),n0n1-(gj)}
when s (g)j) The closer to 0 or n0n1When is, R (g)j) The larger the value, the greater the importance of the jth gene in the classification problem.
4. The method for selecting genes based on an adaptive genetic interaction regularization elastic network model according to claim 3, wherein in the step 3, the expression of the adaptive penalty weight is as follows:
Figure FDA0003221693700000021
where n is the number of samples.
5. The method of claim 4, wherein the adaptive genetic interaction regularization elastic network model is expressed as:
Figure FDA0003221693700000022
wherein O is2Representing an adaptive elastic network model, y being a sample class, beta being an estimated coefficient of all genes, betajIs the estimated coefficient, x, of the jth geneiFor the input vector, λ and α are regularization parameters, and λ>0,α∈[0,1]。
6. The method for selecting genes based on adaptive genetic interaction regularization elastic network model according to claim 2, wherein in said step 5, an adjacency matrix of a genetic interaction network is constructed according to the following formula:
A=[aij]∈Rp×p
wherein R represents a real number set; a represents a adjacency matrix of the gene interaction network; a isijThe value is 0 or 1.
7. The method for selecting genes based on an adaptive genetic regularization elastic network model as claimed in claim 6, wherein in said step 6, a genetic interaction network penalty is constructed according to the following formula:
Figure FDA0003221693700000023
wherein O is3Represents a genetic interaction network penalty, betaiTr (. lamda.) represents the trace of the matrix for the estimated coefficient of the ith gene.
8. The method for selecting genes based on an adaptive genetic interaction regularization elastic network model according to claim 7, wherein in the step 7, the expression of the adaptive genetic interaction regularization elastic network model is:
Figure FDA0003221693700000031
wherein F (X, A, beta) represents an adaptive gene interaction regularization elastic network model, X is an input matrix,
Figure FDA0003221693700000032
for the penalty term, γ is the regularization parameter.
9. A gene selection system based on an adaptive gene interaction regularization elastic network model, comprising:
a gene importance assessment module for assessing the importance of each measured gene based on Wilcoxon rank sum test;
a gene importance quantification module for quantifying the importance of each measured gene;
the weighting module is used for adding self-adaptive penalty weight to each measured gene according to the importance degree of each quantized gene, and deleting noise genes based on the self-adaptive penalty weight to obtain characteristic genes;
the first construction module is used for introducing the genetic weight into a least square loss function so as to construct an adaptive elastic network model;
the second construction module is used for constructing an adjacency matrix of the gene interaction network;
the third construction module is used for constructing gene interaction network punishment based on the adjacency matrix;
the fourth construction module is used for combining the self-adaptive elastic network model and the gene interaction network penalty to construct a self-adaptive gene interaction regularization elastic network model;
and the gene obtaining module is used for solving the optimal solution of the self-adaptive gene interaction regularization elastic network model based on a gradient descent algorithm and selecting genes based on the optimal solution.
CN202110959928.2A 2021-08-20 2021-08-20 Gene selection method and system based on adaptive gene interaction regularization elastic network model Active CN113838519B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110959928.2A CN113838519B (en) 2021-08-20 2021-08-20 Gene selection method and system based on adaptive gene interaction regularization elastic network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110959928.2A CN113838519B (en) 2021-08-20 2021-08-20 Gene selection method and system based on adaptive gene interaction regularization elastic network model

Publications (2)

Publication Number Publication Date
CN113838519A true CN113838519A (en) 2021-12-24
CN113838519B CN113838519B (en) 2022-07-05

Family

ID=78961000

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110959928.2A Active CN113838519B (en) 2021-08-20 2021-08-20 Gene selection method and system based on adaptive gene interaction regularization elastic network model

Country Status (1)

Country Link
CN (1) CN113838519B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117727372A (en) * 2023-12-25 2024-03-19 韶关学院 Data integration method and system based on regularization model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140270455A1 (en) * 2013-03-15 2014-09-18 International Business Machines Corporation Using RNAi Imaging Data For Gene Interaction Network Construction
CN113241122A (en) * 2021-06-11 2021-08-10 长春工业大学 Gene data variable selection and classification method based on fusion of adaptive elastic network and deep neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140270455A1 (en) * 2013-03-15 2014-09-18 International Business Machines Corporation Using RNAi Imaging Data For Gene Interaction Network Construction
CN113241122A (en) * 2021-06-11 2021-08-10 长春工业大学 Gene data variable selection and classification method based on fusion of adaptive elastic network and deep neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘匆提等: "惩罚logistic回归方法在SNPs数据变量筛选研究中的应用", 《实用预防医学》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117727372A (en) * 2023-12-25 2024-03-19 韶关学院 Data integration method and system based on regularization model
CN117727372B (en) * 2023-12-25 2024-05-17 韶关学院 Data integration method and system based on regularization model

Also Published As

Publication number Publication date
CN113838519B (en) 2022-07-05

Similar Documents

Publication Publication Date Title
CN111933212B (en) Clinical histology data processing method and device based on machine learning
CN112270666A (en) Non-small cell lung cancer pathological section identification method based on deep convolutional neural network
CN107025384A (en) A kind of construction method of complex data forecast model
CN114927162A (en) Multi-set correlation phenotype prediction method based on hypergraph representation and Dirichlet distribution
JP2007513391A (en) How to identify a subset of multiple components of a system
CN114093515A (en) Age prediction method based on intestinal flora prediction model ensemble learning
CN103793600B (en) Classifier model generating method for gene microarray data
Vengatesan et al. The performance analysis of microarray data using occurrence clustering
CN107480441B (en) Modeling method and system for children septic shock prognosis prediction
Schachtner et al. Knowledge-based gene expression classification via matrix factorization
CN115881232A (en) ScRNA-seq cell type annotation method based on graph neural network and feature fusion
CN113838519B (en) Gene selection method and system based on adaptive gene interaction regularization elastic network model
CN117520914A (en) Single cell classification method, system, equipment and computer readable storage medium
CN117409962B (en) Screening method of microbial markers based on gene regulation network
TWI709904B (en) Methods for training an artificial neural network to predict whether a subject will exhibit a characteristic gene expression and systems for executing the same
CN117423391A (en) Method, system and equipment for establishing gene regulation network database
DeTomaso et al. Identifying informative gene modules across modalities of single cell genomics
Marshall et al. Discriminant analysis for longitudinal data with multiple continuous responses and possibly missing data
US12060578B2 (en) Systems and methods for associating compounds with physiological conditions using fingerprint analysis
EP2710152A1 (en) Computer-implemented method and system for detecting interacting dna loci
CN115662504A (en) Multi-angle fusion-based biological omics data analysis method
JP5852902B2 (en) Gene interaction analysis system, method and program thereof
CN113971984A (en) Classification model construction method and device, electronic equipment and storage medium
Priyadharshini et al. An Optimized Feature Selection Method for High Dimensional Data
Joshi et al. Classification and prediction of disease classes using gene microarray data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant