CN109817337B - Method for evaluating channel activation degree of single disease sample and method for distinguishing similar diseases - Google Patents

Method for evaluating channel activation degree of single disease sample and method for distinguishing similar diseases Download PDF

Info

Publication number
CN109817337B
CN109817337B CN201910091441.XA CN201910091441A CN109817337B CN 109817337 B CN109817337 B CN 109817337B CN 201910091441 A CN201910091441 A CN 201910091441A CN 109817337 B CN109817337 B CN 109817337B
Authority
CN
China
Prior art keywords
genes
gene
degree
disease sample
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910091441.XA
Other languages
Chinese (zh)
Other versions
CN109817337A (en
Inventor
李敏
李幸一
王建新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN201910091441.XA priority Critical patent/CN109817337B/en
Publication of CN109817337A publication Critical patent/CN109817337A/en
Application granted granted Critical
Publication of CN109817337B publication Critical patent/CN109817337B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses an evaluation method of single disease sample channel activation degree and a similar disease distinguishing method; constructing a fully-connected network for each passage, taking the original connecting edge in each passage as an important connecting edge of each passage, and taking the added connecting edge as a background connecting edge of each passage; taking genes existing in the pathway as important genes and other genes as background genes; calculating the difference value between the disease sample and the normal sample for each connecting edge in the fully-connected network, and calculating the significance of the difference value; calculating the difference multiple of the expression value of each gene in the disease sample and the normal sample; and calculating the enrichment degree of the important nodes and the connecting edges in the node and connecting edge ranking of each fully connected network as the activation degree of the corresponding channel. Similar diseases are distinguished by the degree of activation of the pathway. The invention can effectively calculate the activation degree of each channel in a single disease sample, and converts the gene expression matrix of a high-dimensional small sample of the disease sample into the expression matrix of the activation degree of the channel, so as to distinguish similar diseases and have high accuracy.

Description

Method for evaluating channel activation degree of single disease sample and method for distinguishing similar diseases
Technical Field
The invention relates to the field of bioinformatics, and relates to a method for evaluating the channel activation degree of a single disease sample and a method for distinguishing similar diseases.
Background
Studies have shown that genes and gene products do not act individually, but rather act synergistically by participating in complex, interrelated networks. Common biological structures in the form of networks include pathways, gene transcription control networks, and protein interaction networks, wherein the pathways can reflect biological processes in cells, such as biological metabolism, signal transmission, and growth cycle, and the effective biological information is important for revealing molecular mechanisms of organisms from the aspect of functions by combining with pathway data mining.
The occurrence and development of diseases are often closely related to the disorder of important pathways, and the identification of these dysregulated pathways and quantification of the extent of their dysregulation are of great interest for disease research.
Pathway activity may be used to measure the degree of dysregulation of a pathway. Furthermore, although the clinical symptoms of similar complex diseases are similar, the mechanisms by which different diseases develop differ, and therefore the activation state of the pathway can be used as an indicator to distinguish between similar diseases. There are several models and methods for assessing the activation of pathways during disease development, which differ from each other in their definition and calculation of the activation of pathways, e.g., Han et al[1]A method called prop is proposed to calculate the degree of activation of the pathway using a gaussian bayesian network. Young and Craft[2]Three methods of calculating the degree of activation of a pathway are provided: PCA, NTC and GED. PCA (principal component analysis) extracts principal components in gene expression data based on each channel by using a principal component analysis method as the activation degree of the channel; the NTC method is that the Euclidean distance between a disease sample and a normal sample is calculated based on gene expression data of each path and is used as the activation degree of the path; the GED scores genes which are distributed in the gene expression data of each channel in the normal sample and the disease sample in a different mode, and the channel activation characteristic is defined according to the gene score value. While considering the specificity status of a single disease sample from the perspective of pathway is crucial to reveal the molecular mechanisms of complex diseases at the systemic level, none of the current models and methods consider the specificity status of a single disease sample from the perspective of pathway.
In addition, although there are several models and methods available to distinguish similar diseases, e.g., Winter et al[3]A method for improving the Petzer sequencing is provided, wherein genes are sequenced according to the ranking of neighbor nodes of the genes in a network, and the genes with the top sequence are extracted as characteristics for distinguishing similar diseases. Cun and
Figure GDA0002570841090000021
[4]a feature selection method stSVM based on a support vector machine is provided, and effective gene markers are extracted to serve as features for distinguishing similar diseases. Zhang et al[5]A frame CNS for extracting functional characteristics is proposed, and the method utilizes a flow balance model to polymerize genes enriched with the same functions, thereby obtaining functional modules capable of furthest distinguishing two similar diseases, and extracting the functional modules as the characteristics for distinguishing the similar diseases. However, the classification accuracy of similar disease classification based on the features extracted by these methods is still to be further improved.
Therefore, there is a need to provide a method for assessing the degree of activation of a single disease sample pathway and effectively distinguishing between similar diseases.
[1]Han,L.et al.A probabilistic pathway score(PROPS)for classificationwith applications to inflammatory bowel disease.Bioinformatics,2017;34(6):985-993.
[2]Young,M.R.and Craft,D.L.Pathway-informed classification system(PICS)for cancer analysis using gene expression data.Cancer informatics,2016;15:151-161.
[3]Winter C,Kristiansen G,Kersting S,et al.Google goes cancer:improving outcome prediction for cancer patients by network-based ranking ofmarker genes[J].PLoS computational biology,2012,8(5):e1002511.
[4]Cun Y,
Figure GDA0002570841090000022
H.Network and data integration for biomarkersignature discovery via network smoothed t-statistics[J].PloS one,2013,8(9):e73074.
[5]Zhang C,Liu J,Shi Q,et al.Comparative network stratificationanalysis for identifying functional interpretable network biomarkers[J].BMCbioinformatics,2017,18(3):48.
Disclosure of Invention
The invention aims to solve the technical problem that aiming at the defects of the prior art, the invention provides the method for evaluating the channel activation degree of a single disease sample and the method for distinguishing similar diseases, so that the characteristic of effectively distinguishing similar diseases, namely the channel activation degree of the disease sample can be obtained, the classification of the similar diseases is carried out based on the characteristic, and the classification accuracy is high.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
a method for evaluating the activation degree of a single disease sample channel, wherein the activation degree of each channel comprises a continuous edge activation degree and a gene activation degree, and for each channel in a disease sample, the method for evaluating the activation degree comprises the following steps:
step 1, for all genes in the path, if two genes have no connecting edge, adding the connecting edge, and constructing the path into a fully-connected network (namely a network with connecting edges between every two nodes);
taking the original continuous edge in the path as the important continuous edge, and taking the added continuous edge as the background continuous edge;
using genes present in the pathway as important genes and genes not present in the pathway (genes present in other pathways in the disease sample) as background genes;
step 2, for each connecting edge in the fully-connected network, calculating the Pearson correlation coefficient of the expression values of two genes connected with the connecting edge in n samples based on n normal samples, and marking the Pearson correlation coefficient as PCCn(ii) a Adding a single disease sample into n normal samples, calculating the Pearson correlation coefficient of the expression values of the two connected genes in the n +1 samples, and marking as PCCn+1(ii) a By PCCn+1And PCCnDifferencing to give Δ PCCnAs the difference value of the continuous edge between the disease sample and the normal sample; and evaluating the significance of the difference values;
step 3, calculating the difference multiple of the expression values of each gene in the fully-connected network in the disease sample and n normal samples;
step 4, sorting all connected edges in the fully-connected network according to the significance of the difference values, and sorting all genes in the fully-connected network according to the difference multiple;
and 5, calculating the enrichment degree of important continuous edges/genes (positive labels) in the continuous edge/gene sequencing of the fully-connected network according to the sequencing result, and taking the enrichment degree as the activation degree of the continuous edges/genes in the corresponding channel.
Further, in the step 2, the pearson correlation coefficient PCC of the expression value of each of the two genes connected in parallel in the n samples is calculatednThe formula of (1) is:
Figure GDA0002570841090000041
wherein x is1And x2Respectively representing the expression values of the two genes connected with the connecting edge in n samples, covn(x1,x2) Denotes x1And x2The covariance of (a) of (b),
Figure GDA0002570841090000046
and
Figure GDA0002570841090000047
respectively represent x1And x2Standard deviation of (2).
Further, in step 2, the significance of the difference values is evaluated based on the Z-test (Z-test):
Figure GDA0002570841090000042
wherein the Z value represents Δ PCCnThe significance of (a).
Further, in step 3, for any gene, the formula for calculating the fold difference FC between the expression values of the gene in the disease sample and the expression values in the n normal samples is as follows:
Figure GDA0002570841090000043
wherein b represents an expression value of the gene in the disease sample,
Figure GDA0002570841090000044
represents the mean value of the expression values of the gene in n normal samples.
Further, in step 5, for any one path, the activation degree of its connecting edge/node is calculated by the following formula:
Figure GDA0002570841090000045
wherein I represents the set of all important edges/genes in the fully connected network, rankiRepresenting the sequence of the ith connecting edge/gene in the I when the I is arranged according to the ascending sequence in the step 4, wherein M represents the total number of important connecting edges/genes in the fully-connected network, and N represents the total number of background connecting edges/gene sets; the formula utilizes AUC to calculate the enrichment degree of important continuous edges/genes (positive labels) in the continuous edge/gene ordering of the fully-connected network from the angle of the continuous edges/genes, and the enrichment degree is used as the activation degree of a channel.
A method of distinguishing between similar diseases comprising the steps of:
firstly, calculating the channel activation degree of each disease sample according to the evaluation method of the channel activation degree of the single disease sample, and connecting the joint activation degree and the gene activation degree of all the channels of the single disease sample into a vector to be used as a characteristic vector of the disease sample; corresponding features of the same dimension in all disease sample feature vectors are the same, namely the continuous edge activation degree or the gene activation degree corresponding to the same channel;
secondly, training a classifier by taking the feature vectors of the known disease samples as input and the classification labels of the known disease samples as output;
and finally, inputting the feature vector of the unknown disease sample into a trained classifier to obtain a classification label of the unknown disease sample.
Further, the classifier is a random forest classifier.
Has the advantages that:
the method can effectively calculate the activation degree of each channel in a single disease sample, converts the gene expression matrix of the high-dimensional small sample of the disease sample into the expression matrix of the activation degree of the channel, and solves the problem that the specificity of the single disease sample is not considered in other feature extraction methods. The calculated activation degree of the pathway can be used for distinguishing similar diseases, and the accuracy is high.
Drawings
FIG. 1 is a block diagram of the present invention (PASS);
FIG. 2 is a graph comparing ROC curves and the area under them (AUC) for the methods of the present invention (PASS) and NetRank, stSVM, CNS, PCA, NTC, GED, PROPS;
FIG. 3 is a significance analysis of the difference of the pathway in two similar disease samples based on the degree of pathway activation extracted by the present invention.
FIG. 4 is an enrichment analysis of known disease genes in significantly differentially expressed pathways based on the degree of pathway activation extracted in the present invention.
Detailed Description
As shown in FIG. 1, the invention provides a method for evaluating the activation degree of a single disease sample channel, wherein the activation degree of each channel comprises a connective activation degree and a gene activation degree, and the method for evaluating the activation degree of each channel in the disease sample comprises the following steps:
preprocessing of first, channel data
For all genes in a path, if no connecting edge exists between the two genes, adding the connecting edge, and constructing the path into a fully-connected network (namely a network with connecting edges between every two nodes);
taking the original continuous edge in the path as the important continuous edge, and taking the added continuous edge as the background continuous edge;
using genes present in the pathway as important genes and genes not present in the pathway (genes present in other pathways in the disease sample) as background genes;
secondly, calculating the difference significance of the edges
For each connecting edge in the fully-connected network, calculating the Pearson correlation coefficient of the expression values of two genes connected with the connecting edge in n samples based on n normal samples, and marking the Pearson correlation coefficient as PCCn(ii) a Adding a single disease sample into n normal samples, calculating the Pearson correlation coefficient of the expression values of the two connected genes in the n +1 samples, and marking as PCCn+1(ii) a By PCCn+1And PCCnDifferencing to give Δ PCCnAs the difference value of the continuous edge between the disease sample and the normal sample; and evaluating the difference value Δ PCCnThe significance of (a);
calculating the expression value of two genes connected with each connecting edge in n samples by using the Pearson Correlation Coefficient (PCC)nThe formula of (1) is:
Figure GDA0002570841090000061
wherein x is1And x2Respectively representing the expression values of the two genes connected with the connecting edge in n samples, covn(x1,x2) Denotes x1And x2The covariance of (a) of (b),
Figure GDA0002570841090000062
and
Figure GDA0002570841090000063
respectively represent x1And x2Standard deviation of (d);
ΔPCCnsignificance of (d) was assessed by z-test:
Figure GDA0002570841090000071
thirdly, calculating the difference significance of the nodes
The expression for the fold difference in expression value between individual disease samples and normal samples for each gene is:
Figure GDA0002570841090000072
wherein b represents an expression value of the gene in the disease sample,
Figure GDA0002570841090000073
represents the mean value of the expression values of the gene in n normal samples.
Fourth, evaluation of channel activation
The activation degree of the channel is calculated by the following formula:
Figure GDA0002570841090000074
wherein I represents the set of all important edges/genes in the fully connected network, rankiThe position of the ith edge/gene in the I is shown after the ith edge/gene is sorted according to the significance of the difference value/the multiple of the difference value in an ascending order, M represents the total number of important edges/genes in the fully-connected network, and N represents the total number of background edges/gene sets; the formula utilizes AUC to calculate the enrichment degree of important continuous edges/genes in the continuous edges and gene sequencing of each fully-connected network from the angles of continuous edges and genes (nodes) respectively, and the enrichment degree is used as the activation degree of a channel.
The activation degree of the pathway in a single disease sample can be calculated based on the activation degree of the pathway estimated by the single disease sample, and the problem that the specificity of each disease sample is not considered in other feature extraction methods is solved.
The invention also provides a method for distinguishing similar diseases, which comprises the following steps:
firstly, calculating the channel activation degree of each disease sample, and connecting the side activation degree and the gene activation degree of all channels of a single disease sample into a vector to be used as a characteristic vector of the disease sample;
secondly, training a classifier by taking the feature vectors of the known disease samples as input and the classification labels of the known disease samples as output;
and finally, inputting the feature vector of the unknown disease sample into a trained classifier to obtain a classification label of the unknown disease sample.
The classifier may employ a random forest classifier.
Fifth, experiment verification
To verify the effectiveness of the present method, validation was performed based on four data sets of two similar diseases in inflammatory bowel disease, regional enteritis and ulcerative enteritis. Four data sets of colitis regionalis and ulcerative colitis were derived from the GEO database (https:// www.ncbi.nlm.nih.gov/GEO /), GSE9686, GSE3365, GSE36807, GSE71730, containing a total of 61 samples of ulcerative colitis and 105 samples of colitis regionalis. The full human pathway data is from the KEGG database (https:// www.kegg.jp /), with 294 pathways in total.
To evaluate the accuracy and functional interpretability of the classification of the method, the following three analyses were performed:
(1) accuracy of analytical classification
This section performs analysis on all samples in the four data sets together. For each method in the invention (PASS), NetRank, stSVM, CNS, PCA, NTC, GED and PROPS, respectively constructing a random forest classifier based on the extracted characteristics, applying a three-fold cross validation method to divide a sample set into 3 subsets, respectively making a primary validation set for each subset, taking the rest 2 subsets as training sets to obtain 3 classifiers, and classifying the samples in the corresponding validation sets by using the classifiers to obtain classification results; and repeating the triple-fold cross validation 500 times (different divisions are carried out on the sample set each time), calculating a True Positive Rate (TPR) and a False Positive Rate (FPR) based on all classification results, and drawing an ROC curve. And evaluating the classification result by adopting ROC and AUC indexes. The AUC value is the area under the ROC curve, the ROC and AUC experimental results are shown in figure 2, and as can be seen from figure 2, the AUC value of the invention is superior to that of other methods.
(2) Analysis of significance of differences in pathways in two similar disease samples
This section analyzes samples in the four data sets separately. For each pathway, a t-test was used to determine whether the degree of activation differed significantly between the two similar disease samples in each data set. The method comprises the following steps: the activation degree of the pathway in each disease sample is calculated respectively based on the method, then a t value calculation formula is adopted to calculate a t value representing the difference degree of the activation degree of the pathway in two similar disease samples, a t boundary value table is checked, the sum of the number of horizontal marks (freedom degrees) in the t boundary value table, namely two disease samples in a data set, is determined to be-2, the number of vertical marks P corresponding to a unit cell with the value of t is determined, and if the P is less than or equal to 0.05, the difference of the activation degree of the pathway in the two similar disease samples is obvious. The P values corresponding to all the pathways are counted, as shown in fig. 3, and it can be seen from fig. 3 that the P values corresponding to most of the pathways are less than or equal to 0.05, which indicates that the activation degrees of most of the pathways are significantly different in the two similar disease samples.
(3) The degree of enrichment of known disease genes in differentially expressed pathways in two similar diseases.
This section analyzes samples in the four data sets separately. Taking the path with the P value less than or equal to 0.05 obtained in the step (2) as a differential expression path, and respectively determining the enrichment degree of the known disease genes in the two similar diseases in the paths.
P-values for the degree of enrichment of known disease genes in the differential expression pathway were calculated by hypergeometric tests:
Figure GDA0002570841090000091
wherein N isThe number of genes in a pathway, M the number of known disease genes, n the number of genes in a differentially expressed pathway, and M the number of known disease genes in a differentially expressed pathway. The smaller the P value, the higher the enrichment of the known disease gene in the differential expression pathway. Log obtained based on four data sets10The results for P are shown in FIG. 4, from which it can be seen that-log10The P values are all more than or equal to 1.3, namely the P values are less than or equal to 0.05, which indicates that the known disease genes are highly enriched in differential expression channels.
The results of fig. 3 and fig. 4 show that the pathway activation degree of a single disease sample extracted by the method of the present invention can effectively reflect the difference between similar diseases, and the two similar diseases can be effectively distinguished by the pathway activation degree calculation method provided by the present invention.
Experimental results show that the method has good classification accuracy and stability.

Claims (4)

1. A method for evaluating the activation degree of a single disease sample channel is characterized in that the activation degree of each channel comprises a continuous activation degree and a gene activation degree, and for each channel in a disease sample, the method for evaluating the activation degree comprises the following steps:
step 1, for all genes in the path, if two genes have no connecting edge, adding the connecting edge, and constructing the path into a fully-connected network;
taking the original continuous edge in the path as the important continuous edge, and taking the added continuous edge as the background continuous edge;
taking a gene present in the pathway as an important gene and a gene not present in the pathway as a background gene;
step 2, for each connecting edge in the fully-connected network, calculating the Pearson correlation coefficient of the expression values of two genes connected with the connecting edge in n samples based on n normal samples, and marking the Pearson correlation coefficient as PCCn(ii) a Adding a single disease sample into n normal samples, calculating the Pearson correlation coefficient of the expression values of the two connected genes in the n +1 samples, and marking as PCCn+1(ii) a By PCCn+1And PCCnDifferencing to give Δ PCCnAs the difference value of the continuous edge between the disease sample and the normal sample; and evaluating the significance of the difference values;
in step 2, the significance of the difference values is evaluated based on the Z test:
Figure FDA0002570841080000011
wherein the Z value represents Δ PCCnThe significance of (a);
step 3, calculating the difference multiple of the expression values of each gene in the fully-connected network in the disease sample and n normal samples;
in the step 3, for any gene, the formula for calculating the expression value difference multiple FC between the disease sample and n normal samples is as follows:
Figure FDA0002570841080000012
wherein b represents an expression value of the gene in the disease sample,
Figure FDA0002570841080000013
represents the mean value of the expression values of the gene in n normal samples;
step 4, sorting all connected edges in the fully-connected network according to the significance of the difference values, and sorting all genes in the fully-connected network according to the difference multiple;
step 5, according to the sequencing result, calculating the enrichment degree of important continuous edges/genes in the continuous edge/gene sequencing of the fully-connected network, and taking the enrichment degree as the activation degree of the continuous edges/genes in the corresponding channel;
in the step 5, for any one passage, the enrichment degree of the important connecting edges/genes in the connecting edge/gene ordering of the fully-connected network is calculated by the following formula:
Figure FDA0002570841080000021
wherein I represents the set of all important edges/genes in the fully connected network, rankiRepresenting the sequence of the ith connecting edge/gene in the I when the I is arranged according to the ascending sequence in the step 4, wherein M represents the total number of important connecting edges/genes in the fully-connected network, and N represents the total number of background connecting edges/gene sets; the formula utilizes AUC to calculate the enrichment degree of important continuous edges/genes in the continuous edge/gene ordering of the fully-connected network from the angle of the continuous edges/genes, and the enrichment degree is used as the activation degree of a channel.
2. The method for assessing the degree of activation of a single disease sample pathway according to claim 1, wherein in step 2, the Pearson correlation coefficient PCC of the expression values of the two genes linked at each side in n samples is calculatednThe formula of (1) is:
Figure FDA0002570841080000022
wherein x is1And x2Respectively representing the expression values of the two genes connected with the connecting edge in n samples, covn(x1,x2) Denotes x1And x2The covariance of (a) of (b),
Figure FDA0002570841080000023
and
Figure FDA0002570841080000024
respectively represent x1And x2Standard deviation of (2).
3. A method for distinguishing between similar diseases, comprising the steps of:
firstly, calculating the channel activation degree of each disease sample according to the method for evaluating the channel activation degree of the single disease sample as claimed in claim 1, and connecting the side activation degree and the gene activation degree of all the channels of the single disease sample into a vector as a characteristic vector of the disease sample;
secondly, training a classifier by taking the feature vectors of the known disease samples as input and the classification labels of the known disease samples as output;
and finally, inputting the feature vector of the unknown disease sample into a trained classifier to obtain a classification label of the unknown disease sample.
4. A similar disease differentiating method according to claim 3 wherein said classifier is a random forest classifier.
CN201910091441.XA 2019-01-30 2019-01-30 Method for evaluating channel activation degree of single disease sample and method for distinguishing similar diseases Active CN109817337B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910091441.XA CN109817337B (en) 2019-01-30 2019-01-30 Method for evaluating channel activation degree of single disease sample and method for distinguishing similar diseases

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910091441.XA CN109817337B (en) 2019-01-30 2019-01-30 Method for evaluating channel activation degree of single disease sample and method for distinguishing similar diseases

Publications (2)

Publication Number Publication Date
CN109817337A CN109817337A (en) 2019-05-28
CN109817337B true CN109817337B (en) 2020-09-08

Family

ID=66605807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910091441.XA Active CN109817337B (en) 2019-01-30 2019-01-30 Method for evaluating channel activation degree of single disease sample and method for distinguishing similar diseases

Country Status (1)

Country Link
CN (1) CN109817337B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107203704A (en) * 2017-05-02 2017-09-26 温州大学 A kind of method that gene pathway is recognized based on GSA
CN107220526A (en) * 2017-05-02 2017-09-29 温州大学 A kind of method that gene pathway is recognized based on PADOG

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2439282A1 (en) * 2010-10-06 2012-04-11 bioMérieux Method for determining a biological pathway activity
SG11201502106SA (en) * 2012-09-18 2015-05-28 Agency Science Tech & Res Grouping for classifying gastric cancer
WO2014071316A1 (en) * 2012-11-02 2014-05-08 H. Lee Moffitt Cancer Center And Research Institute, Inc. In silico identification of cancer molecular signaling pathways and drug candidates
CN103093119A (en) * 2013-01-24 2013-05-08 南京大学 Method for recognizing significant biologic pathway through utilization of network structural information
KR101945093B1 (en) * 2014-05-30 2019-02-07 난토믹스, 엘엘씨 Systems and methods for comprehensive analysis of molecular profiles across multiple tumor and germline exomes
US10546019B2 (en) * 2015-03-23 2020-01-28 International Business Machines Corporation Simplified visualization and relevancy assessment of biological pathways
CN107133492B (en) * 2017-05-02 2020-08-25 温州大学 Method for identifying gene pathway based on PAGES
CN108763864B (en) * 2018-05-04 2021-06-29 温州大学 Method for evaluating state of biological pathway sample
CN108763862B (en) * 2018-05-04 2021-06-29 温州大学 Method for deducing gene pathway activity

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107203704A (en) * 2017-05-02 2017-09-26 温州大学 A kind of method that gene pathway is recognized based on GSA
CN107220526A (en) * 2017-05-02 2017-09-29 温州大学 A kind of method that gene pathway is recognized based on PADOG

Also Published As

Publication number Publication date
CN109817337A (en) 2019-05-28

Similar Documents

Publication Publication Date Title
Koo et al. A review for detecting gene-gene interactions using machine learning methods in genetic epidemiology
Wang et al. Guidelines for bioinformatics of single-cell sequencing data analysis in Alzheimer’s disease: review, recommendation, implementation and application
RU2517286C2 (en) Classification of samples data
CN111276252B (en) Construction method and device of tumor benign and malignant identification model
CN107025384A (en) A kind of construction method of complex data forecast model
CN106203483A (en) A kind of zero sample image sorting technique of multi-modal mapping method of being correlated with based on semanteme
CN106202999A (en) Microorganism high-pass sequencing data based on different scale tuple word frequency analyzes agreement
CN107463799B (en) Method for identifying DNA binding protein by interactive fusion feature representation and selective integration
Zhou et al. scAdapt: virtual adversarial domain adaptation network for single cell RNA-seq data classification across platforms and species
CN110010204B (en) Fusion network and multi-scoring strategy based prognostic biomarker identification method
CN108197431A (en) The analysis method and system of chromatin interaction difference
CN108537003B (en) Marker screening method based on univariate and paravariable
Waegeman et al. On the scalability of ordered multi-class ROC analysis
CN106874705A (en) The method that tumor marker is determined based on transcript profile data
CN109817337B (en) Method for evaluating channel activation degree of single disease sample and method for distinguishing similar diseases
KR102376212B1 (en) Gene expression marker screening method using neural network based on gene selection algorithm
CN113192562B (en) Pathogenic gene identification method and system fusing multi-scale module structure information
CN111739581B (en) Comprehensive screening method for genome variables
Zhang et al. Predicting patient survival from longitudinal gene expression
JP3936851B2 (en) Clustering result evaluation method and clustering result display method
Gong et al. Interpretable single-cell transcription factor prediction based on deep learning with attention mechanism
CN107798217B (en) Data analysis method based on linear relation of feature pairs
KR102659915B1 (en) Method of gene selection for predicting medical information of patients and uses thereof
CN115497563B (en) Cancer driver gene identification method, system, storage medium and equipment
CN115881218B (en) Gene automatic selection method for whole genome association analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant