CN114360642A - Cancer transcriptome data processing method based on gene co-expression network analysis - Google Patents

Cancer transcriptome data processing method based on gene co-expression network analysis Download PDF

Info

Publication number
CN114360642A
CN114360642A CN202210040488.5A CN202210040488A CN114360642A CN 114360642 A CN114360642 A CN 114360642A CN 202210040488 A CN202210040488 A CN 202210040488A CN 114360642 A CN114360642 A CN 114360642A
Authority
CN
China
Prior art keywords
gene
genes
analysis
expression
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210040488.5A
Other languages
Chinese (zh)
Inventor
付聪
梁磊
张彦
易星丞
许彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin Puchuan Bio Medicine Co ltd
Original Assignee
Jilin Puchuan Bio Medicine Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin Puchuan Bio Medicine Co ltd filed Critical Jilin Puchuan Bio Medicine Co ltd
Priority to CN202210040488.5A priority Critical patent/CN114360642A/en
Publication of CN114360642A publication Critical patent/CN114360642A/en
Pending legal-status Critical Current

Links

Images

Abstract

A cancer transcriptome data processing method based on gene coexpression network analysis relates to the data processing field, including obtaining the original data set; preprocessing original data; identifying differentially expressed genes; constructing a gene co-expression network; excavating a gene module; correlation analysis of the gene module and clinical indexes; enrichment analysis of the gene module; identifying a key gene; the functions of key genes are explored; survival analysis of key genes. As can be seen from the results of the enrichment analysis, the gene modules divided by using the method have significant biological significance; as can be seen from the results of the verification of key genes by the Dispenet database, most of the key genes identified by the method are related to tumor diseases. The method has good effects on the aspects of gene module mining and key gene identification. The method can be used as an important tool for cancer disease transcriptome data, and the application of the method also provides a new direction for further understanding the pathogenesis of cancer diseases.

Description

Cancer transcriptome data processing method based on gene co-expression network analysis
Technical Field
The invention relates to a gene data processing method, in particular to a cancer transcriptome data processing method based on gene coexpression network analysis.
Background
In recent years, the prevalence of cancer diseases has become higher, but the research on cancer diseases has become more important because such diseases are difficult to treat and are very susceptible to relapse. If the functional gene module of cancer diseases can be mined by a bioinformatics method and key genes in the cancer diseases are identified, the pathogenesis of the cancer diseases can be further understood and certain help is provided for clinical treatment of the cancer diseases.
With the rapid development of the next-generation sequencing technology, the gene expression data has an explosive growth, and how to dig out hidden knowledge from a large amount of data becomes one of the important tasks in the later genome era. Meanwhile, as research progresses, it is gradually discovered that various biological factors do not act individually but cooperate with each other to perform various complex biological functions in a cellular environment. Therefore, various biological data are converted into biological networks by adopting a proper method, and are analyzed and mined by utilizing the graph theory and the related knowledge of the complex network theory, so that the method becomes an effective method for processing mass biological data. Biological networks are networks constructed by using biological elements in the scientific problem of the biological field, wherein nodes in the networks represent the biological elements such as proteins, genes and the like, and edges in the networks represent the interaction relationship of the biological elements in biochemistry, physics or function. The gene coexpression network is a common biological network, and the appearance of the gene coexpression network opens up a new direction for the development of genomics.
Disclosure of Invention
In order to effectively process cancer transcriptome data, the invention provides a cancer transcriptome data processing method based on gene coexpression network analysis.
The technical scheme adopted by the invention for solving the technical problem is as follows:
the invention relates to a cancer transcriptome data processing method based on gene coexpression network analysis, which mainly comprises the following steps:
step one, acquiring an original data set;
secondly, preprocessing original data;
step three, identifying a differential expression gene;
step four, constructing a gene co-expression network;
fifthly, excavating a gene module;
sixthly, performing correlation analysis on the gene module and clinical indexes;
seventhly, enriching and analyzing the gene module;
step eight, identifying key genes;
step nine, researching the functions of key genes;
step ten, survival analysis of key genes.
Further, in the first step, the raw data set is derived from a TCGA database or a GEO database; the raw data set includes gene expression data in cancer tissue samples, gene expression data in paracancerous tissue samples, and corresponding clinical data for each sample.
Further, in the second step, low expression genes are filtered out, then hierarchical clustering is carried out on the samples, and outlier samples are deleted.
Further, in step three, all differentially expressed genes satisfying the defined condition are identified by using the FC-t algorithm.
Further, in the fourth step, based on the gene expression data of the differentially expressed genes in the sample, the Pearson correlation analysis of every two genes is carried out; setting a limiting condition to screen all obtained relations, and regarding two genes meeting the limiting condition as having a co-expression relation; and (3) expressing all genes with co-expression relations and the relations thereof by using a graph to obtain the gene co-expression network.
Further, in the fifth step, carrying out network clustering on nodes in the gene co-expression network by using 4 community detection algorithms to obtain communities formed by genes with similar functions, namely gene modules; and selecting the optimal module mining result by using the 'modularity' as an evaluation criterion.
Further, in the sixth step, principal component analysis is performed on all gene expression data in one gene module, and the first principal component is defined as a module characteristic gene of the gene module; and carrying out Pearson correlation analysis on the module characteristic genes of each gene module and different clinical indexes to obtain a correlation matrix of the gene module and the clinical indexes.
Further, in the seventh step, the gene in the gene module of interest and the biological process, cell components and molecular functions provided by the GO database are subjected to enrichment analysis, and the gene and the signal channel provided by the Reactome database are subjected to enrichment analysis.
And further, in the eighth step, the importance of all nodes in the gene co-expression network is scored by using a PageRank algorithm, the scoring standard is based on a topological principle, and then more important nodes in the gene co-expression network are identified, and genes corresponding to the nodes are key genes.
Further, in the ninth step, diseases related to key genes are searched by using a Dispenet database, and the functions of the key genes are researched.
Further, in the tenth step, survival analysis is carried out on the key genes by using online software onclnc, and a survival curve is drawn.
The invention has the beneficial effects that:
the complex network theory plays a great role in many disciplines, and in recent years, the application of the complex network theory in computer disciplines, physics, sociology and other disciplines is widely researched. The organism is a highly complex system, each biological process of the organism needs the joint participation of a plurality of substances, and the research on a single gene or protein is difficult to understand the molecular mechanism behind the single gene or protein as a whole. Due to the complexity of cancer diseases, the existing bioinformatics analysis method is difficult to effectively analyze and mine the transcriptome data, so the invention applies the complex network theory to biological research, and particularly to a method for processing and analyzing the cancer transcriptome data.
The invention provides a cancer transcriptome data processing method based on gene coexpression network analysis, which mainly comprises the following steps: acquiring an original data set; preprocessing original data; identifying differentially expressed genes; constructing a gene co-expression network; excavating a gene module; correlation analysis of the gene module and clinical indexes; enrichment analysis of the gene module; identifying a key gene; the functions of key genes are explored; survival analysis of key genes. According to the GO/Reactome enrichment analysis result, the gene modules divided by the method have obvious biological significance; as can be seen from the results of the verification of key genes by the Dispenet database, most of the key genes identified by the method are related to tumor diseases. Therefore, the cancer transcriptome data processing method based on gene co-expression network analysis provided by the invention has good effects on the aspects of gene module mining and key gene identification. The cancer transcriptome data processing method based on gene coexpression network analysis can be used as an important tool of cancer disease transcriptome data, and the application of the method also provides a new direction for further understanding the pathogenesis of cancer diseases.
Drawings
FIG. 1 is a flow chart of a cancer transcriptome data processing method based on gene co-expression network analysis according to the present invention.
FIG. 2 is a flow chart of data acquisition and preprocessing in example 1.
FIG. 3 is a hierarchical clustering tree of cancer tissue samples in example 1.
FIG. 4 is a flowchart of the identification of differentially expressed genes in example 1.
FIG. 5 is a volcanic plot of differentially expressed genes in example 1.
FIG. 6 is a flow chart of the construction of the gene co-expression network in example 1.
FIG. 7 shows the gene co-expression network and several piconets in example 1.
FIG. 8 is a flowchart of gene module mining in example 1.
FIG. 9 shows the results of the module mining of the eigenvector algorithm in example 1.
FIG. 10 is a flowchart of the correlation analysis between the gene modules and clinical markers in example 1.
FIG. 11 is a correlation matrix of gene modules and clinical indices in example 1.
FIG. 12 is a flow chart of GO/Reactome enrichment analysis of the gene module in example 1.
FIG. 13 shows the result of BP enrichment of the gene module m1 in example 1.
FIG. 14 shows the result of CC enrichment of gene module m1 in example 1.
FIG. 15 shows the MF enrichment result of gene module m1 in example 1.
FIG. 16 is a flowchart showing the identification of key genes in example 1.
FIG. 17 is a survival curve of the gene NAA40 in example 1.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
The invention relates to a cancer transcriptome data processing method based on gene coexpression network analysis, which specifically comprises the following steps as shown in figure 1:
step one, acquiring an original data set
The raw data set, which mainly includes gene expression data in cancer tissue samples, gene expression data in paracancer tissue samples and clinical data corresponding to each sample, is obtained from a TCGA database (https:// cancer tissue. nih. gov /) or a GEO database (https:// www.ncbi.nlm.nih.gov/geofiles).
Step two, preprocessing the original data
Firstly, low-expression genes in the low-expression genes are filtered, namely, the low-expression genes with the maximum value of the gene expression level (FPKM) in cancer tissues or para-cancer tissues is deleted, then, the expression levels of the remaining genes in the cancer tissues or the para-cancer tissues are subjected to hierarchical clustering about samples, and the clustered samples are deleted, so that an original data set for further mining is obtained.
Step three, identifying the differentially expressed genes
All differentially expressed genes that meet the defined conditions were identified using the FC-t algorithm. The limiting conditions are as follows: FC > | FC [ ═ threshold & & P [ - ] threshold, where FC denotes threshold denotes the multiple of the change in difference and P denotes the statistical significance of T-test.
Step four, constructing a gene co-expression network
Carrying out Pearson correlation analysis on every two genes on the basis of gene expression data of the differentially expressed genes in a sample to obtain a Pearson Correlation Coefficient (PCC) and a P value; further, setting a limiting condition | PCC | >, threshold & & P < threshold, screening all the obtained relations, and regarding two genes meeting the limiting condition as having a co-expression relation; finally, all genes with co-expression relationship and the relationship thereof are expressed by a graph, and the gene co-expression network is obtained.
Step five, excavating gene modules
Carrying out network clustering on nodes (genes) in a gene co-expression network by utilizing 4 community detection algorithms (eigenvector, label-propagation, map-equalization and edge-betweenness) to obtain a community composed of genes with similar functions, namely a gene module; further, the optimal module mining result is selected by using the 'modularity' as an evaluation criterion.
The 4 community detection algorithms (eigen, label-propagation, map-evaluation, edge-betweens) are respectively realized by using functions of R software "image" package, label.
Sixthly, correlation analysis of the gene module and the clinical index
Performing principal component analysis on all gene expression data in a gene module, and defining a first principal component as a module characteristic gene (ME) of the gene module; further, the module characteristic genes (ME) of each gene module and different clinical indexes are subjected to Pearson correlation analysis, and the absolute value of the Pearson Correlation Coefficient (PCC) is taken to obtain a correlation matrix of the gene module and the clinical indexes.
Seventhly, GO/Reactome enrichment analysis of gene module
The gene in the gene module of interest is enriched and analyzed with the Biological Process (BP), the Cell Component (CC) and the Molecular Function (MF) provided by the GO database (http:// genentology. org /), and simultaneously, the gene is enriched and analyzed with the signal path provided by the Reactome database (https:// Reactome. org /).
Step eight, identifying key genes
And (3) scoring the importance of all nodes in the gene co-expression network by using a PageRank algorithm, wherein the scoring standard is based on a topological principle, so that more important nodes in the gene co-expression network are identified, and genes corresponding to the nodes are key genes. The top n genes with the highest score can be selected as key genes for the disease according to the situation.
Step nine, exploring the function of key genes
The function of the key gene was explored by searching for diseases associated with the key gene using the Dispenet database (http:// www.disgenet.org /).
Step ten, survival analysis of key genes
Survival analysis was performed on the key genes using the online software onclnc (http:// www.oncolnc.org /), and a survival curve was plotted.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1 mammary invasive carcinoma transcriptome data analysis
The invention relates to a cancer transcriptome data processing method based on gene coexpression network analysis, which is used for carrying out data analysis on a transcriptome of hepatocellular carcinoma and specifically comprises the following steps:
(1) acquisition and preprocessing of raw data sets
As shown in fig. 2, the method specifically includes the following steps:
expression profile data of mammary invasive carcinoma (BR CA) tissues and tissues beside the mammary invasive carcinoma (BR CA) are downloaded from a TCGA database (https:// cancer.
Secondly, deleting low-expression genes of which the maximum value of the gene expression quantity (FPKM) in the breast infiltrating cancer tissues or tissues beside the cancer is less than 1, and remaining 14,129 genes.
Thirdly, the expression quantity of all the genes left after filtration in the breast infiltrating cancer tissues or the tissues beside the cancer is subjected to hierarchical clustering about the samples, and a hierarchical clustering tree is shown in figure 3. As can be seen from fig. 3, there are 3 outlier samples: TCGA-DD-AAEB, TCGA-CC-5259, TCGA-FV-A4ZP, which were deleted to give the original data set for further analysis. The raw data (gene expression level) after partial pretreatment are shown in Table 1.
TABLE 1 Gene expression data of breast infiltrating cancer tissue after pretreatment
Gene numbering Sample 1 Sample 2 Sample 3 Sample 4 Sample 5
ENSG00000167578 2.982962631 2.426924178 2.180554626 1.704487843 1.574196206
ENSG00000078237 1.511416409 2.962567928 3.496769794 2.590545901 0.977175405
ENSG00000146083 15.42361467 34.18583752 7.12327477 6.727115362 4.062698315
ENSG00000198242 105.7124415 207.8535728 193.8028654 113.9189313 112.3564048
ENSG00000134108 17.91677888 19.23785333 34.42522038 26.4881835 14.67476604
ENSG00000167700 25.30139043 15.73939839 12.30061718 5.993611251 58.71591007
ENSG00000060642 5.208718456 6.704560923 6.293990173 6.504866534 11.32913478
ENSG00000166391 0 0.268748468 0.194918684 0.157939948 20.1794968
ENSG00000070087 2.812958236 6.293035032 17.21057879 11.38258759 0.345172284
ENSG00000153561 8.757826693 6.304640789 15.13777725 11.0352232 15.45512232
(2) Identification of differentially expressed genes
As shown in fig. 4, the method specifically includes the following steps:
first, FC-t algorithm was used to calculate FC values and P values of all genes, and part of the calculation results are shown in Table 2.
② setting a limiting condition FC > -2 | | FC < -0.5 & & P < -0.05 to identify the differentially expressed genes, and identifying 4130 up-regulated genes and 471 down-regulated genes in total.
Thirdly, drawing a volcanic map of the differential expression genes by using an R software package 'ggplot 2' to visually display the screening result of the differential expression genes, wherein the volcanic map of the differential expression genes is shown in figure 5.
As can be seen from table 2 and fig. 4, a large number of genes were significantly differentially expressed in the mammary invasive cancer tissue compared to the normal tissue.
TABLE 2 FC-t Algorithm calculation results
Gene numbering FC P
ENSG00000146083 2.700996521 1.13E-40
ENSG00000198242 2.360743008 3.45E-33
ENSG00000167700 2.481208784 4.38E-28
ENSG00000166391 0.302473358 1.03E-17
ENSG00000127511 2.283066186 1.44E-46
ENSG00000064601 2.93550767 3.52E-58
ENSG00000227766 2.444028443 3.37E-09
ENSG00000008517 2.397033874 2.05E-13
ENSG00000070081 2.184524979 2.20E-27
ENSG00000275479 3.451166828 1.91E-23
(3) Construction of Gene Co-expression networks
As shown in fig. 6, the method specifically includes the following steps:
for each differentially expressed gene, the Pearson Correlation Coefficient (PCC) and P value with other differentially expressed genes were calculated, and part of the calculation results are shown in table 3.
And secondly, setting a limiting condition | PCC | >, 0.65& & P <0.05, screening the obtained relation, and regarding two genes meeting the limiting condition as having a co-expression relation.
And thirdly, introducing all genes with co-expression relationship and relationship thereof into Cytoscape software for visualization, as shown in FIG. 7.
And fourthly, deleting the genes of the small net according to the visualization result, wherein the rest large net is the gene coexpression network.
TABLE 3 Pearson correlation analysis results
Figure BDA0003470009370000081
Figure BDA0003470009370000091
(4) Excavation of Gene modules
As shown in fig. 8, the method specifically includes the following steps:
the method comprises the following steps of firstly, utilizing functions of a software "arraph" package of R software, namely, leading. element. communication (), label. propagation. communication (), infomap. communication (), and edge. beta. communication (), to carry out network clustering on nodes (genes) in a gene co-expression network, and obtaining a community, namely a gene module, which is composed of the genes with similar functions.
Secondly, calculating the modularity of 4 community detection algorithms (eigenvector, label-propagation, map-evaluation and edge-betweenness) clustering results, and selecting the result with the maximum modularity for further research. In this embodiment, the modularity of the eigenvector algorithm is the highest, so the module mining result obtained by the eigenvector algorithm is selected here for further study.
③ the community with the too small number of deleted genes (the community with the number of deleted genes less than 50 in the embodiment) is 9 communities, which correspond to 9 gene modules. The results of the module mining of the eigenvector algorithm are shown in fig. 9.
(5) Correlation analysis of gene modules and clinical indicators
As shown in fig. 10, the method specifically includes the following steps:
subjecting all gene expression data in each gene module to principal component analysis to obtain a module signature gene (ME) of each gene module.
Secondly, carrying out Pearson correlation analysis on the module characteristic genes (ME) of each gene module and 4 clinical indexes of event, T, N and M (wherein the event represents the survival state of the patient, and T, N, M represents the tumor stage), and obtaining a correlation matrix of the gene module and the clinical indexes by taking the absolute value of a Pearson Correlation Coefficient (PCC), as shown in FIG. 11. As can be seen from fig. 11, the gene modules m1, m2, m3, and m7 have high correlation with clinical markers.
(6) GO/Reactome enrichment analysis of gene modules
As shown in fig. 12, the method specifically includes the following steps:
the genes contained in the gene modules m1, m2, m3 and m6 are respectively enriched and analyzed with the Biological Process (BP), the Cell Component (CC) and the Molecular Function (MF) provided by the GO database, and 10 Terms with the minimum P value are selected for research. The results of the enrichment analysis of gene module m1 are shown in FIGS. 13-15.
And secondly, carrying out enrichment analysis on genes contained in the gene modules m1, m2, m3 and m6 and signal channels provided by the Reactome database respectively, and selecting 10 signal channels with the minimum P value for research. The results of the enrichment analysis of gene module m1 are shown in Table 4.
As can be seen from the enrichment results of the gene modules, the enrichment results of the gene modules have high specificity and are mostly related to tumor diseases, so that the reliability of the module mining results can be proved.
TABLE 4 Reactome enrichment results for Gene Module m1
Way numbering Path name Enriched amount P
R-HSA-69278 Cell Cycle,Mitotic 64 1.11E-16
R-HSA-1640170 Cell Cycle 71 1.11E-16
R-HSA-453279 Mitotic G1 phase and G1/S transition 24 2.57E-12
R-HSA-73886 Chromosome Maintenance 19 6.22E-12
R-HSA-69205 G1/S-Specific Transcription 13 3.12E-11
R-HSA-69206 G1/S Transition 20 3.50E-10
R-HSA-68886 M Phase 32 4.48E-10
R-HSA-69190 DNA strand elongation 11 1.62E-09
R-HSA-73894 DNA Repair 28 2.67E-08
R-HSA-69242 S Phase 19 3.58E-08
(7) Identification of key genes
As shown in fig. 16, the method specifically includes the following steps:
the importance of all genes in a gene coexpression network is scored based on a topological structure by utilizing a PageRank algorithm.
And secondly, sequencing all genes in a descending order according to the scoring result.
And selecting the first 20 genes as key genes of hepatocellular carcinoma. The 20 key genes in this embodiment are: FABP7, CXCL3, LOC284578, CAPN6, NRG2, HCFC1, ILF3, KANSL1, NAA40, NCOA6, PCDHB2, GRIK2, FRMD7, CCSER1, PCDHGA1, PCDHA1, LRRC37A6P, PCDHGA12, ZNF486 and PCDHGB 5.
(8) Functional study of key genes
All key genes are sequentially input into a Dispenet database (http:// www.disgenet.org /) for searching related diseases. Among them, the results of the gene NAA40 search are shown in Table 5.
From the search results of the Dispenet database, most of the 20 HUB genes are related to tumor diseases, so that the reliability of the cancer transcriptome data processing method based on gene co-expression network analysis provided by the invention can be proved.
TABLE 5 search results for Gene NAA40
Figure BDA0003470009370000111
(9) Survival analysis of Key genes
All key genes were subjected to viability analysis using the online software onclnc (http:// www.oncolnc.org /) and a viability curve was plotted (cancer chose "BRCA"). The survival curve of gene NAA40 is shown in FIG. 17.
As can be seen from the survival analysis of the key genes, 20 key genes have significant correlation with the survival of the patients, which further proves that the cancer transcriptome data processing method based on the gene co-expression network analysis provided by the invention has significant effects on key gene identification.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. The cancer transcriptome data processing method based on gene coexpression network analysis is characterized by comprising the following steps:
step one, acquiring an original data set;
secondly, preprocessing original data;
step three, identifying a differential expression gene;
step four, constructing a gene co-expression network;
fifthly, excavating a gene module;
sixthly, performing correlation analysis on the gene module and clinical indexes;
seventhly, enriching and analyzing the gene module;
step eight, identifying key genes;
step nine, researching the functions of key genes;
step ten, survival analysis of key genes.
2. The method for processing cancer transcriptome data based on gene co-expression network analysis of claim 1, wherein in step one, the raw data set is derived from TCGA database or GEO database; the raw data set includes gene expression data in cancer tissue samples, gene expression data in paracancerous tissue samples, and corresponding clinical data for each sample.
3. The method for processing cancer transcriptome data based on gene coexpression network analysis as claimed in claim 2, wherein in step two, low-expression genes are filtered out first, then the samples are hierarchically clustered, and outlier samples are deleted.
4. The method for processing cancer transcriptome data based on gene co-expression network analysis of claim 3, wherein in step three, all differentially expressed genes satisfying the defined condition are identified by using FC-t algorithm.
5. The method for processing cancer transcriptome data based on gene coexpression network analysis as claimed in claim 4, wherein in step four, the Pearson correlation analysis of two genes is performed based on the gene expression data of the differentially expressed genes in the sample; setting a limiting condition to screen all obtained relations, and regarding two genes meeting the limiting condition as having a co-expression relation; and (3) expressing all genes with co-expression relations and the relations thereof by using a graph to obtain the gene co-expression network.
6. The cancer transcriptome data processing method based on gene coexpression network analysis as claimed in claim 5, wherein in step five, 4 kinds of community detection algorithms are used to perform network clustering on nodes in the gene coexpression network to obtain communities consisting of similar-function genes, i.e. gene modules; and selecting the optimal module mining result by using the 'modularity' as an evaluation criterion.
7. The method for processing cancer transcriptome data based on gene coexpression network analysis as claimed in claim 6, wherein in step six, principal component analysis is performed on all gene expression data in a gene module, and the first principal component is defined as a module characteristic gene of the gene module; and carrying out Pearson correlation analysis on the module characteristic genes of each gene module and different clinical indexes to obtain a correlation matrix of the gene module and the clinical indexes.
8. The method for processing cancer transcriptome data based on gene coexpression network analysis as claimed in claim 7, wherein in step seven, the gene in the gene module of interest is enriched and analyzed with the biological process, cell components and molecular functions provided by GO database, and simultaneously the gene is enriched and analyzed with the signal pathway provided by Reactome database.
9. The cancer transcriptome data processing method based on gene coexpression network analysis of claim 8, wherein in step eight, importance of all nodes in the gene coexpression network is scored by using PageRank algorithm, scoring standard is based on topological principle, and then more important nodes in the gene coexpression network are identified, and genes corresponding to these nodes are key genes.
10. The method for processing cancer transcriptome data based on gene coexpression network analysis as claimed in claim 9, wherein in the ninth step, the disease related to key gene is searched by using the Dispenet database, and the function of key gene is explored; in the tenth step, survival analysis is carried out on the key genes by using online software onclnc, and a survival curve is drawn.
CN202210040488.5A 2022-01-14 2022-01-14 Cancer transcriptome data processing method based on gene co-expression network analysis Pending CN114360642A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210040488.5A CN114360642A (en) 2022-01-14 2022-01-14 Cancer transcriptome data processing method based on gene co-expression network analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210040488.5A CN114360642A (en) 2022-01-14 2022-01-14 Cancer transcriptome data processing method based on gene co-expression network analysis

Publications (1)

Publication Number Publication Date
CN114360642A true CN114360642A (en) 2022-04-15

Family

ID=81109997

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210040488.5A Pending CN114360642A (en) 2022-01-14 2022-01-14 Cancer transcriptome data processing method based on gene co-expression network analysis

Country Status (1)

Country Link
CN (1) CN114360642A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115064213A (en) * 2022-08-18 2022-09-16 神州医疗科技股份有限公司 Multi-group-chemistry combined analysis method and system based on tumor sample

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115064213A (en) * 2022-08-18 2022-09-16 神州医疗科技股份有限公司 Multi-group-chemistry combined analysis method and system based on tumor sample
CN115064213B (en) * 2022-08-18 2022-11-01 神州医疗科技股份有限公司 Multi-group combined analysis method and system based on tumor sample

Similar Documents

Publication Publication Date Title
CN109979538B (en) Analysis method based on 10X single cell transcriptome sequencing data
CN107066835B (en) System for discovering and integrating rectal cancer related gene and functional analysis thereof
Simon et al. Insect phylogenomics: exploring the source of incongruence using new transcriptomic data
NZ759659A (en) Deep learning-based variant classifier
NZ759880A (en) Aberrant splicing detection using convolutional neural networks (cnns)
US20060281097A1 (en) Method of processing and/or genome mapping of ditag sequences
CN106068330A (en) Known allele is used for the system and method during reading maps
WO2009155443A2 (en) Method and apparatus for sequencing data samples
CN112837753B (en) MicroRNA-disease associated prediction method based on multi-mode stacking automatic coding machine
CN109033748A (en) A kind of miRNA identification of function method based on multiple groups
CN110570905A (en) method and device for constructing omics data analysis platform and computer equipment
CN112466404A (en) Unsupervised clustering method and unsupervised clustering system for metagenome contigs
CN113470743A (en) Differential gene analysis method based on BD single cell transcriptome and proteome sequencing data
CN112086199A (en) Liver cancer data processing system based on multiple groups of mathematical data
CN114360642A (en) Cancer transcriptome data processing method based on gene co-expression network analysis
Micale et al. SPECTRA: an integrated knowledge base for comparing tissue and tumor-specific PPI networks in human
CN114913919A (en) Intelligent reading and reporting method, system and server for genetic variation of single-gene disease
Maind et al. Identifying condition specific key genes from basal-like breast cancer gene expression data
Batut et al. Hands-on: Hands-on: Reference-based RNA-Seq data analysis
CN110211634B (en) Method for joint analysis of multiple groups of chemical data
CN116543832A (en) disease-miRNA relationship prediction method, model and application based on multi-scale hypergraph convolution
CN113380326B (en) Gene expression data analysis method based on PAM clustering algorithm
Tanvir et al. Cancer biomarker discovery from gene co-expression networks using community detection methods
CN114974432A (en) Screening method of biomarker and related application thereof
Bao et al. A deep embedded clustering algorithm for the binning of metagenomic sequences

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination