CN112233796A - Research method of molecular subtype for enhancing immunity in early liver cancer - Google Patents

Research method of molecular subtype for enhancing immunity in early liver cancer Download PDF

Info

Publication number
CN112233796A
CN112233796A CN202011101709.2A CN202011101709A CN112233796A CN 112233796 A CN112233796 A CN 112233796A CN 202011101709 A CN202011101709 A CN 202011101709A CN 112233796 A CN112233796 A CN 112233796A
Authority
CN
China
Prior art keywords
immune
samples
genes
expression
subtypes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011101709.2A
Other languages
Chinese (zh)
Inventor
祝让飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Mugu Technology Co ltd
Original Assignee
Hangzhou Mugu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Mugu Technology Co ltd filed Critical Hangzhou Mugu Technology Co ltd
Priority to CN202011101709.2A priority Critical patent/CN112233796A/en
Publication of CN112233796A publication Critical patent/CN112233796A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biomedical Technology (AREA)
  • Pathology (AREA)
  • Primary Health Care (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to the technical field of molecular immunity, and discloses a research method of an immune-enhanced molecular subtype in early liver cancer, which comprises the following steps: s1, downloading data; s2, preprocessing data: transcriptome data of TCGA: converting the TPM expression profile into a TPM expression profile; metgene score: selecting the median of the expression level of the sample in each gene as the score of the sample in the immunological reagents; MCP count of immune-related cells: calculating the score of immune cells in the sample using the R software package MCPcounter'; and (4) scoring immune cells of the sample. The research method of the immune enhanced molecular subtype in early liver cancer collects data from a transcriptome of TCGA, carries out data preprocessing and immune gene screening, and finally carries out software analysis on the relation between the molecular subtype and relevant clinical characteristics, immunity, mutation and the like by utilizing an R software package, thereby realizing the research on the expression mode and potential clinical relevance of immune checkpoint molecules in liver cancer.

Description

Research method of molecular subtype for enhancing immunity in early liver cancer
Technical Field
The invention relates to the technical field of molecular immunity, in particular to a research method of an immune-enhanced molecular subtype in early liver cancer.
Background
The immune process plays a key role in the carcinogenesis and progression of solid tumors, it is believed that newly transformed cells can be initially eliminated by the host immune system based on innate and adaptive immunity, such as inflammation and immune surveillance in cancer, memory responses of natural killer cells, destroyed cells releasing various tumor antigens, further stimulating adaptive immunity, e.g., activated T cells and B cells, in later stages cancer cells have developed mechanisms to escape immune surveillance, e.g., cancer cells secrete soluble cytokines or chemokines that can recruit suppressor cells such as regulatory T cells (Tregs) and myeloid suppressor cells into the tumor microenvironment, tumors co-stimulate co-suppression or co-suppression signals in regulating T cell activation, PD1 signaling contributes to suppression of T cells, ligand CD86/CD80 can act as a co-stimulatory or co-suppression effector, depending on their linkage to CD28 or CTLA4, CTLA-4, the essential immune checkpoint for T cell activation, upregulation of these checkpoint molecules leads to immunosuppressive tumor microenvironments, targeting of the co-inhibitory receptors (PD1 and CTLA4) by immunotherapeutic agents can increase immune responses by inhibiting immunosuppressive mechanisms and provide superior therapeutic benefits in several cancers.
To date, the expression pattern and potential clinical relevance of immune checkpoint molecules in liver cancer remains unclear, and it is therefore necessary to study the expression pattern and potential clinical relevance of immune checkpoint molecules in liver cancer.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides a method for researching immune-enhanced molecular subtypes in early liver cancer, which has the advantages of utilizing immune genes to screen the molecular subtypes and utilizing an R software package to analyze, thus researching the expression mode and potential clinical relevance of immune checkpoint molecules in the liver cancer and the like, and solves the problems provided by the background technology.
(II) technical scheme
In order to realize the purpose of screening the molecular subtypes by using immune genes and analyzing the molecular subtypes by using an R software package so as to research the expression mode and potential clinical relevance of immune checkpoint molecules in liver cancer, the invention provides the following technical scheme: a method for researching an immunopotentiation molecular subtype in early liver cancer comprises the following steps:
s1, downloading data;
s2, preprocessing data:
transcriptome data of TCGA:
converting the TPM expression profile into a TPM expression profile;
metgene score:
selecting the median of the expression level of the sample in each gene as the score of the sample in the immunological reagents;
MCP count of immune-related cells:
calculating the score of immune cells in the sample using the R software package MCPcounter';
scoring of immune cells of the sample:
downloading immune cell scores corresponding to the liver cancer samples from a Timer database;
immunization of samples, matrix scoring:
calculating immune and matrix scores of each sample by using an R software package (estimate);
s3, screening of expression profiles of immune genes:
firstly, matching an expression profile sample and a clinical follow-up sample, selecting both samples, screening StageI and StageII as an inclusion sample set of the research, extracting an immune gene set with an expression level from the expression profile, and finally including 778 genes and 257 samples;
s4, screening molecular subtypes:
using an R software package Consenssus Cluster plus to perform consistent clustering based on immune gene expression profiles, screening molecular subtypes, calculating similarity distances among samples by using Euclidean distances, using K-means to perform clustering, determining the optimal clustering number by using a Cumulative Distribution Function (CDF), obtaining a clustering result which is stable when the Cluster is 5, observing a CDF Delta area curve, and finally selecting K as 5 to obtain 5 molecular subtypes, analyzing the clustering significance of the five subtypes by using an R software package sigcluster, and finding that C1-vs-C5, C4-vs-C5, C2-vs-C3 have significant expression distribution difference p <0.05, C1-vs-C2, C1-vs-C3, C2-vs-C4 and C3-C5 are significant edge presentation;
s5, expression profile clustering analysis:
according to the consistent clustering result, selecting a stable clustering result with k being 5, wherein 257 tumor samples are allocated to the 5 classes, further using 778 immune gene set expression profiles to perform subtype difference analysis, using Kolmogorov-Smirnov test to respectively screen genes with high expression in a certain subtype relative to other subtype samples, selecting expression profiles of the first 100 genes (less than 100 genes with all differential genes) with the most significant high expression in each subtype to perform PCA principal component analysis, obtaining the first two principal components to draw a scatter diagram, wherein five subtypes can be clearly divided, using the expression profiles of the genes to make a heat map, and wherein the expression profiles of the subtypes in the genes have clear boundaries and have obvious expression patterns;
s6, relationship to clinical characteristics:
the relationship between five subtypes and Age, genter, T, N, M and Stage is analyzed respectively, wherein N, M only has one type and can not be compared, the relationship between Age and five subtypes is compared, the Age distribution in various samples is different, the average Age of C2 is the minimum, the average Age of C5 is the maximum, the Age difference of various groups of samples is analyzed by taking 60 years as a boundary, chi-square distribution test p is 0.00019, the relationship between five subtypes and Stage is analyzed, the ratio of Stage I in C2 is obviously lower (chi-square: p <0.001), the ratio of Stage II in C3 is obviously lower (chi-square: p <0.001), the relationship between subtype and Grade is analyzed, the ratio of G1 in C3 is obviously higher, the ratio of G3+ G4 in C2 is obviously higher (chi-square: p <0.001), the relationship between analysis and T Stage, the relationship between T2 in C2 is obviously higher (p <0.001), the proportion of women in C1 is remarkably high, the proportion of men in C2 is remarkably high (p is less than 0.001), the relation between the five subtypes and HBV/HCV/molecular subtypes in liver cancer comprehensive genome analysis reported in the past is analyzed, the proportion of icluster1 in C2 is remarkably high, the proportion of icluster3 in C5 is remarkably high (p is less than 0.001), and the proportions of HBV and HCV in various types are not remarkably different;
s7, relationship to immunity:
from past studies, 13 types of immune metagenes, tumor immune component (stroma, immunity, tumor purity) scores, six types of immune infiltration cell scores, and 10 types of MCP counts of immune-related cells were collected, and the four groups of immune-related scores were analyzed for their relationship to the five subtypes, respectively, most of the 13 types of immune metagenes were highly expressed in C4 and poorly expressed in C5, the immune score of the samples of C4 group was significantly higher than that of the other groups, the stroma score of C3 group was significantly higher than that of the other groups, the immune score of C5 group was lowest, T cells and CD 8T cells in 10 types of immune-related cells were significantly higher in C4 group than that of the other groups, the immune cell score of C5 group was lowest, B _ cells, CD8_ cells, Neutrophil, Dendritic, macro was significantly higher in C4 group, C5 group was lowest, and most of immune-related subtypes were consistently higher in C4 group, c5 down-regulation relative to the other groups;
s8, analyzing prognosis difference;
the Kaplan Meier is used for carrying out prognostic difference analysis on disease-free survival and progression-free survival on five types of samples, and the five types of samples can be seen to have significant difference in prognosis, wherein C5 type samples have the worst prognosis, C3 type prognosis is the best, the C4 type of samples which have the highest immunological score but not the best prognosis are worth noting, C3 and C4 groups of samples which have higher immunological scores are combined, and the prognostic difference between the disease-free survival and the progression-free survival is further analyzed to find significant prognostic difference, the disease-free survival and the progression-free survival between higher immunological scores C3 and C4 and low immunological scores C5 have significant difference in prognosis, and the immune-enhanced subtypes C3 and C4 have better clinical prognosis results;
s9, relationship to mutation:
the relationship of the mutation of three genes of TP53, CTNNB1 and AXIN1 in the five types of samples is analyzed, firstly, the mutation data of TP53, CTNNB1 and AXIN1 are extracted from the SNP data processed by the mutect of TCGA, the proportions of TP53, CTNNB1 and AXIN1 mutation groups and non-mutation group samples in the five types of subtype samples are analyzed respectively, and the distribution of the number of all mutation genes in the five types of samples is analyzed.
S10, expression relationship to immune checkpoint genes:
analyzing the relation of gene expression of 8 immune check points in the five subtypes, and counting the gene expression distribution of the 8 immune check points.
S11, WGCNA analysis and excavation:
acquiring expression profile data of five subtype-different immune-related genes, wherein the total number is 492, calculating the distance between each transcript by using a Pearson correlation coefficient, constructing a weight co-expression network by using an R software package WGCNA, selecting a soft threshold value of 3, screening co-expression modules, researching to show that the co-expression network accords with a scale-free network, namely the logarithm log (k) of a node with the connectivity of k and the logarithm log (P (k)) of the probability of the node are in negative correlation, selecting beta to be 3, converting an expression matrix into an adjacency matrix, converting the adjacency matrix into a topological matrix, clustering the genes by using an average-link hierarchical clustering method based on TOM, setting the gene number of each gene (lncRNA) network module to be 30 according to the standard of a hybrid dynamic shear tree, determining the gene modules by using the dynamic shear method, calculating feature vector values (eggens) of each module in sequence, then carrying out cluster analysis on the modules, combining the modules with close distances into a new module, setting height to be 0.25, depsplit to be 2 and minModuleSize to be 30 to obtain 7 modules in total, carrying out transcript statistics on each module, respectively calculating the correlation between the feature vectors of the 6 modules and five subtypes, analyzing the functions of genes in the five modules, carrying out KEGG enrichment analysis by using an R software package clusterinfilter, selecting the significance FDR to be less than 0.05, and analyzing the relationship between the paths enriched by the modules.
S12, external data set verification:
selecting genes in a gene co-expression module (blue, brown) closely related to subtypes C3 and C4, calculating the correlation between the genes and the module, selecting 73 genes with correlation coefficients larger than 0.8 as characteristic genes, extracting expression profiles from the genes as a training set, establishing a classification model by using a Support Vector Machine (SVM), classifying samples with the classification accuracy of 91.1%, downloading GSE14520 standardized data from a GEO database, wherein the GSE14520 standardized data comprises 445 samples, extracting 170 samples from the expression profiles of the characteristic genes and samples in stages I and II, classifying the samples by substituting the model, predicting the expression distribution of 39 samples C1, 40 samples C2, 18 samples C3, 18 samples C4 samples 29 and 44 samples C5 in all stages, analyzing the expression distribution of 13 immunemmunomate in the five types of samples in each subtype, analyzing the immune scores of the samples, analyzing the distribution of 10 immunelated cells in the five types of samples, and finally analyzing the differences of the five subtypes after age distribution to obtain a conclusion.
Preferably, the data in S1 are derived from cancer gene map (TCGA) transcriptome sequencing technology (RNA-Seq) data, clinical follow-up information data, Single Nucleotide Polymorphism (SNP) data, Timer database, gene-aggregate (gene) database, and immune cell database.
Preferably, the number of the metagenes in the S2 is 13, the types of the immune-related cells in the S2 are 10, and the types of the immune cells in the sample in the S2 are 6.
Preferably, in S3, genes with expression levels greater than 0 in each sample in the proportion of more than 30% were selected as the immune genes for inclusion in the study.
Preferably, in S4, 80% of the samples are resampled with the resampling scheme, and the resampling is performed 100 times.
Preferably, in S5, FDR <0.05 is used as a threshold, and finally 230 genes with significantly high expression in C1, 64 genes in C2, 118 genes in C3, 125 genes in C4, 72 genes in C5, 54 intersections among them, 54 intersections among C3, C4, 54 intersections among C1, 54 intersections among C4, and fewer intersections among other classes are screened out.
(III) advantageous effects
Compared with the prior art, the invention has the following beneficial effects:
the research method of the immune enhanced molecular subtype in early liver cancer collects data from a transcriptome of TCGA, carries out data preprocessing and immune gene screening, and finally carries out software analysis on the relation between the molecular subtype and relevant clinical characteristics, immunity, mutation and the like by utilizing an R software package, thereby realizing the research on the expression mode and potential clinical relevance of immune checkpoint molecules in liver cancer.
Drawings
FIG. 1 is a flow chart of the research method of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a method for studying immunopotentiation molecular subtype in early liver cancer includes the following steps:
s1, downloading data;
further, the data sources in S1 are cancer gene map (TCGA) transcriptome sequencing technology (RNA-Seq) data, clinical follow-up information data, Single Nucleotide Polymorphism (SNP) data, Timer database, aggregate gene (gene) database, immune cell database;
s2, preprocessing data:
transcriptome data of TCGA:
converting the TPM expression profile into a TPM expression profile;
metgene score:
selecting the median of the expression level of the sample in each gene as the score of the sample in the immune genes according to the gene expression level in the genes;
MCP count of immune-related cells:
calculating the score of immune cells in the sample using the R software package MCPcounter';
scoring of immune cells of the sample:
downloading immune cell scores corresponding to the liver cancer samples from a Timer database;
immunization of samples, matrix scoring:
calculating immune and matrix scores of each sample by using an R software package (estimate);
further, the number of the metagenes in the S2 is 13, the metagenes respectively correspond to various immune cell types and reflect various immune functions, the types of the immune related cells in the S2 are 10, and the types of the sample immune cells in the S2 are 6;
s3, screening of expression profiles of immune genes:
firstly, matching an expression profile sample and a clinical follow-up sample, selecting both samples, screening StageI and StageII as a sample collection for inclusion in the research, analyzing all Stage combinations, wherein the StageI + II result is the best, the prognosis is consistent with the immune level, extracting an immune gene set with the expression level from the expression profile, and finally including 778 genes, wherein 257 samples;
further, in S3, genes were selected as the immune genes included in the present study, in which more than 30% of the samples with an expression level greater than 0 were selected in each sample;
s4, screening molecular subtypes:
using an R software package Consenssus Cluster plus to perform consistent clustering based on immune gene expression profiles, screening molecular subtypes, calculating similarity distances among samples by using Euclidean distances, using K-means to perform clustering, determining the optimal clustering number by using a Cumulative Distribution Function (CDF), obtaining a clustering result which is stable when the Cluster is 5, observing a CDF Delta area curve, and finally selecting K as 5 to obtain 5 molecular subtypes, analyzing the clustering significance of the five subtypes by using an R software package sigcluster, and finding that C1-vs-C5, C4-vs-C5, C2-vs-C3 have significant expression distribution difference p <0.05, C1-vs-C2, C1-vs-C3, C2-vs-C4 and C3-C5 are significant edge presentation;
further, in S4, 80% of the samples are sampled by using a resampling scheme, and the resampling is performed 100 times:
s5, expression profile clustering analysis:
according to the consistent clustering result, selecting a stable clustering result with k being 5, wherein 257 tumor samples are allocated to the 5 classes, further using 778 immune gene set expression profiles to perform subtype difference analysis, using Kolmogorov-Smirnov test to respectively screen genes with high expression in a certain subtype relative to other subtype samples, selecting expression profiles of the first 100 genes (less than 100 genes with all differential genes) with the most significant high expression in each subtype to perform PCA principal component analysis, obtaining the first two principal components to draw a scatter diagram, wherein five subtypes can be clearly divided, using the expression profiles of the genes to make a heat map, and wherein the expression profiles of the subtypes in the genes have clear boundaries and have obvious expression patterns;
further, in the S5, using FDR <0.05 as a threshold, 230 genes, 64 genes in C2, 118 genes in C3, 125 genes in C4, and 72 genes in C5, which are significantly highly expressed in C1, are finally screened out, wherein C3, C4 intersections are 54, C1, C4 are 54, and intersections among other classes are fewer;
s6, relationship to clinical characteristics:
the relationship between five subtypes and Age, genter, T, N, M and Stage is analyzed respectively, wherein N, M only has one type and can not be compared, the relationship between Age and five subtypes is compared, the Age distribution in various samples is different, the average Age of C2 is the minimum, the average Age of C5 is the maximum, the Age difference of various groups of samples is analyzed by taking 60 years as a boundary, chi-square distribution test p is 0.00019, the relationship between five subtypes and Stage is analyzed, the ratio of Stage I in C2 is obviously lower (chi-square: p <0.001), the ratio of Stage II in C3 is obviously lower (chi-square: p <0.001), the relationship between subtype and Grade is analyzed, the ratio of G1 in C3 is obviously higher, the ratio of G3+ G4 in C2 is obviously higher (chi-square: p <0.001), the relationship between analysis and T Stage, the relationship between T2 in C2 is obviously higher (p <0.001), the proportion of women in C1 is remarkably high, the proportion of men in C2 is remarkably high (p is less than 0.001), the relation between the five subtypes and HBV/HCV/molecular subtypes in liver cancer comprehensive genome analysis reported in the past is analyzed, the proportion of icluster1 in C2 is remarkably high, the proportion of icluster3 in C5 is remarkably high (p is less than 0.001), and the proportions of HBV and HCV in various types are not remarkably different;
s7, relationship to immunity:
to analyze the relationship between these five subtypes and immunity, 13 types of immunometages, scores of tumor immune components (stroma, immunity, tumor purity), scores of six types of immunoinfiltrating cells, MCP counts of 10 types of immunologically relevant cells were collected from past studies, and the relationship between these four groups of immunologically relevant cells and these five subtypes was analyzed, respectively, most of the 13 types of immunologically metages were highly expressed in C4 and poorly expressed in C5, the immune scores of the C4 group samples were significantly higher than those of the other groups, the score of stroma in the C3 group was significantly higher than that of the other groups, the score of immuno score in the C5 group was lowest, T cells in the 10 types of immunologically relevant cells, CD 8T cells were higher in the C4 group than those of the other groups, the score of immunocytes was lowest in the C5 group, B _ cells, CD8_ cells, neutphritic, dendrotic, macroge was significantly higher in the C4 group, the C5 group was lowest, and in summary, the C4 group was the signature for most of the other groups, down-regulation of C5 relative to the other groups suggested an enhanced immune microenvironment in subtype C4 and a diminished immune microenvironment in subtype C5;
s8, analyzing prognosis difference;
the poor prognosis in liver cancer is mainly due to high disease progression and recurrence, in order to observe the relationship between the five subtype samples and prognosis, Kaplan Meier is used for carrying out prognosis difference analysis on disease-free survival and progression-free survival of the five subtype samples, and it can be seen that the five subtype samples have significant differences in prognosis, wherein C5 samples have the worst prognosis, C3 samples have the best prognosis, and C4 samples have the highest immune score but not the best prognosis, which is probably caused by the small number of C4 samples, C3 and C4 groups with higher immune scores are combined, and further analysis shows that the more significant difference in prognosis exists in the disease-free survival and progression-free survival, which suggests that the high immune score is a protective factor in early liver cancer, and the high immune score is a very significant difference in prognosis in disease-free survival and progression-free survival between C3, C4 and low immune score C5, the immune-enhanced subtypes C3, C4 have better clinical prognosis results;
s9, relationship to mutation:
there have been many reports (PMID:22561517, 22634756(No AXIN1), 23728943(No AXIN 23728943), 23728943), TP 23728943, CTNNB 23728943, AXIN 23728943 mutations are closely related to the development and development of liver cancer, so the relationship of the mutations of three genes of TP 23728943, CTNNB 23728943, AXIN 23728943 in five types of samples is analyzed, firstly the mutation data of TP 23728943, CTNNB 23728943, AXIN 23728943 are extracted from the mutect treated SNP data of TCGA, the proportions of the mutations in the five types of samples are analyzed respectively in samples of the five types of subtypes of TP 23728943, CTNNB 23728943, AXIN 23728943 mutant group and non-mutant group samples, it can be seen that the proportion of the mutations in the five types of samples is obviously lower than that of the other types of samples, the mutations in samples of the types of CTNNB 23728943, the mutations in the groups of C23728943 and C72 are not higher than that of the mutations in the other types of 23728943, wherein the mutations are analyzed in the samples of the five types of C23728943, the C23728943 is obviously higher than that of the other types of the mutations in all the other types of the C23728943, it can be seen that there is a significant difference in the gene mutation frequency of the five types of samples, wherein the C3 type sample is significantly lower than the other groups, and p is 0.02.
S10, expression relationship to immune checkpoint genes:
analyzing the relation of gene expression of 8 immune checkpoints in the five subtypes, wherein the expression of PDCD1, CD274, PDCD1LG2, CTLA4, CD86 and CD80 in C4 is obviously higher than that of other subtypes, the expression of CD276 in C2 is higher than that of other subgroups, and counting the gene expression distribution of the 8 immune checkpoints, wherein the expression distribution of other subtypes except VTCN1 in five samples has obvious difference.
S11, WGCNA analysis and excavation:
in order to further excavate prognosis markers related to a liver cancer immune microenvironment, obtain expression profile data of five immune related genes with subtype differences, wherein the total number of the expression profile data is 492, the distance between each transcript is calculated by using a Pearson correlation coefficient, a weight co-expression network is constructed by using an R software package WGCNA, a soft threshold value is 3, a co-expression module is screened, research shows that the co-expression network accords with a scale-free network, namely, the log (k) of a node with the connecting degree of k and the log (P (k)) of the probability of the node are in negative correlation, the correlation coefficient is more than 0.8, in order to ensure that the network is the scale-free network, the expression matrix is selected to be 3, the expression matrix is converted into an adjacency matrix, then the adjacency matrix is converted into a topological matrix, based on TOM, the genes are clustered by using an average-linking clustering method, and the hierarchy of a mixed dynamic shear tree is adopted, setting the minimum number of genes 30 of each gene (lncRNA) network module, sequentially calculating characteristic vector values (egigenes) of each module after determining the gene modules by using a dynamic shearing method, then carrying out cluster analysis on the modules, combining the modules with closer distance into a new module, setting height to be 0.25, desepslit to be 2 and minModuleSize to be 30 to obtain 7 modules in total, wherein the grey module is a gene set which cannot be aggregated to other modules, counting transcripts of each module, wherein 371 transcripts is shown to be distributed into 5 co-expression modules, respectively calculating the correlation between the characteristic vectors of the 6 modules and five subtypes, and from the results, it can be seen that the blue module is positively correlated with C3 and C4, and is correlated with C5, the yellow module is positively correlated with C3 and negatively correlated with C5, the turuoise module is negatively correlated with C1, C7, C5, and C5, negative correlation with C3, positive correlation between brown and red modules and C1, negative correlation with C2, analysis of gene function in five modules, using R software package to make KEGG enrichment analysis, selecting significant FDR <0.05, brown module enriching in B Cell receptor signalling pathway, yellow module enriching in 8 KEGG pathways, such as EGFR type kinase inhibition reaction, Small Cell lung cancer, Focal addition, etc. closely related to tumor, blue module enriching in 24 pathways, including Inflammatory response related pathways such as collagen, insulin, IgA production, etc., red module enriching in Cell receptor immune pathways, Th2 immune response related to Cell immune response, Cell expression, etc., the relationship between the enriched pathways of the modules is analyzed, and it can be seen that the modules are enriched to 44 pathways in total, the enriched KEGG pathways between the blue and turquoise modules share 12 common pathways, and the enriched pathways of the other modules have fewer intersections, which suggests that the genes in the blue and turquoise modules have similar regulation processes in five subtypes.
S12, external data set verification:
selecting genes in a gene co-expression module (blue, brown) closely related to subtypes C3 and C4, calculating the correlation between the genes and the module, selecting 73 genes with correlation coefficients larger than 0.8 as characteristic genes, extracting an expression spectrum from the genes as a training set, establishing a classification model by using a Support Vector Machine (SVM), classifying samples with the classification accuracy of 91.1%, downloading GSE14520 standardized data from a GEO database to verify the five subtypes, wherein the GSE14520 standardized data comprises 445 samples, extracting the expression spectrum of the characteristic genes and samples in stages I and II to obtain 170 samples, classifying the samples by substituting the model, predicting 39C 1 samples, 40C 2 samples, 18C 3 samples, 29C 4 samples and 44C 5 samples respectively, analyzing the expression distribution of 13 immune matnes in each subtype in the five samples, and most of the metnes are highly expressed in C4, this is consistent with the validation set, the sample immunity score was analyzed, from which it can be seen that the immunity score C4 group is significantly higher than the other groups, the matrix score of the C3 group is significantly higher than the other groups, which is consistent with the training set, most of the metagenes are highly expressed in C3 and C4, which is consistent with the validation set, the sample immunity score was analyzed, from which it can be seen that the immunity score C4 group is significantly higher than the other groups, the matrix score of the C3 group is significantly higher than the other groups, which is consistent with the training set, the distribution of 10 immune-related cells in five types of samples was analyzed, from which it can be seen that most of the immune-related cells are higher in C than the other groups, which is consistent with the training set, and finally the age distribution of five types of subtypes was analyzed, from which it can be seen that the age distribution of five types of subtypes is also: in early liver cancer, there are immune-enhancing and immune-attenuating subtypes, and their prognosis is significantly different.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (6)

1. A method for researching an immunopotentiation molecular subtype in early liver cancer is characterized by comprising the following steps: the method comprises the following steps:
s1, downloading data;
s2, preprocessing data:
transcriptome data of TCGA:
converting the TPM expression profile into a TPM expression profile;
metgene score:
selecting the median of the expression level of the sample in each gene as the score of the sample in the immunological reagents;
MCP count of immune-related cells:
calculating the score of immune cells in the sample using the R software package MCPcounter';
scoring of immune cells of the sample:
downloading immune cell scores corresponding to the liver cancer samples from a Timer database;
immunization of samples, matrix scoring:
calculating immune and matrix scores of each sample by using an R software package (estimate);
s3, screening of expression profiles of immune genes:
firstly, matching an expression profile sample and a clinical follow-up sample, selecting both samples, screening StageI and StageII as an inclusion sample set of the research, extracting an immune gene set with an expression level from the expression profile, and finally including 778 genes and 257 samples;
s4, screening molecular subtypes:
using an R software package Consenssus Cluster plus to perform consistent clustering based on immune gene expression profiles, screening molecular subtypes, calculating similarity distances among samples by using Euclidean distances, using K-means to perform clustering, determining the optimal clustering number by using a Cumulative Distribution Function (CDF), obtaining a clustering result which is stable when the Cluster is 5, observing a CDF Delta area curve, and finally selecting K as 5 to obtain 5 molecular subtypes, analyzing the clustering significance of the five subtypes by using an R software package sigcluster, and finding that C1-vs-C5, C4-vs-C5, C2-vs-C3 have significant expression distribution difference p <0.05, C1-vs-C2, C1-vs-C3, C2-vs-C4 and C3-C5 are significant edge presentation;
s5, expression profile clustering analysis:
according to the consistent clustering result, selecting a stable clustering result with k being 5, wherein 257 tumor samples are allocated to the 5 classes, further using 778 immune gene set expression profiles to perform subtype difference analysis, using Kolmogorov-Smirnov test to respectively screen genes with high expression in a certain subtype relative to other subtype samples, selecting expression profiles of the first 100 genes (less than 100 genes with all differential genes) with the most significant high expression in each subtype to perform PCA principal component analysis, obtaining the first two principal components to draw a scatter diagram, wherein five subtypes can be clearly divided, using the expression profiles of the genes to make a heat map, and wherein the expression profiles of the subtypes in the genes have clear boundaries and have obvious expression patterns;
s6, relationship to clinical characteristics:
the relationship between five subtypes and Age, genter, T, N, M and Stage is analyzed respectively, wherein N, M only has one type and can not be compared, the relationship between Age and five subtypes is compared, the Age distribution in various samples is different, the average Age of C2 is the minimum, the average Age of C5 is the maximum, the Age difference of various groups of samples is analyzed by taking 60 years as a boundary, chi-square distribution test p is 0.00019, the relationship between five subtypes and Stage is analyzed, the ratio of Stage I in C2 is obviously lower (chi-square: p <0.001), the ratio of Stage II in C3 is obviously lower (chi-square: p <0.001), the relationship between subtype and Grade is analyzed, the ratio of G1 in C3 is obviously higher, the ratio of G3+ G4 in C2 is obviously higher (chi-square: p <0.001), the relationship between analysis and T Stage, the relationship between T2 in C2 is obviously higher (p <0.001), the proportion of women in C1 is remarkably high, the proportion of men in C2 is remarkably high (p is less than 0.001), the relation between the five subtypes and HBV/HCV/molecular subtypes in liver cancer comprehensive genome analysis reported in the past is analyzed, the proportion of icluster1 in C2 is remarkably high, the proportion of icluster3 in C5 is remarkably high (p is less than 0.001), and the proportions of HBV and HCV in various types are not remarkably different;
s7, relationship to immunity:
from past studies, 13 types of immune metagenes, tumor immune component (stroma, immunity, tumor purity) scores, six types of immune infiltration cell scores, and 10 types of MCP counts of immune-related cells were collected, and the four groups of immune-related scores were analyzed for their relationship to the five subtypes, respectively, most of the 13 types of immune metagenes were highly expressed in C4 and poorly expressed in C5, the immune score of the samples of C4 group was significantly higher than that of the other groups, the stroma score of C3 group was significantly higher than that of the other groups, the immune score of C5 group was lowest, T cells and CD 8T cells in 10 types of immune-related cells were significantly higher in C4 group than that of the other groups, the immune cell score of C5 group was lowest, B _ cells, CD8_ cells, Neutrophil, Dendritic, macro was significantly higher in C4 group, C5 group was lowest, and most of immune-related subtypes were consistently higher in C4 group, c5 down-regulation relative to the other groups;
s8, analyzing prognosis difference;
the Kaplan Meier is used for carrying out prognostic difference analysis on disease-free survival and progression-free survival on five types of samples, and the five types of samples can be seen to have significant difference in prognosis, wherein C5 type samples have the worst prognosis, C3 type prognosis is the best, the C4 type of samples which have the highest immunological score but not the best prognosis are worth noting, C3 and C4 groups of samples which have higher immunological scores are combined, and the prognostic difference between the disease-free survival and the progression-free survival is further analyzed to find significant prognostic difference, the disease-free survival and the progression-free survival between higher immunological scores C3 and C4 and low immunological scores C5 have significant difference in prognosis, and the immune-enhanced subtypes C3 and C4 have better clinical prognosis results;
s9, relationship to mutation:
the relationship of the mutation of three genes of TP53, CTNNB1 and AXIN1 in the five types of samples is analyzed, firstly, the mutation data of TP53, CTNNB1 and AXIN1 are extracted from the SNP data processed by the mutect of TCGA, the proportions of TP53, CTNNB1 and AXIN1 mutation groups and non-mutation group samples in the five types of subtype samples are analyzed respectively, and the distribution of the number of all mutation genes in the five types of samples is analyzed.
S10, expression relationship to immune checkpoint genes:
analyzing the relation of gene expression of 8 immune check points in the five subtypes, and counting the gene expression distribution of the 8 immune check points.
S11, WGCNA analysis and excavation:
acquiring expression profile data of five subtype-different immune-related genes, wherein the total number is 492, calculating the distance between each transcript by using a Pearson correlation coefficient, constructing a weight co-expression network by using an R software package WGCNA, selecting a soft threshold value of 3, screening co-expression modules, researching to show that the co-expression network accords with a scale-free network, namely the logarithm log (k) of a node with the connectivity of k and the logarithm log (P (k)) of the probability of the node are in negative correlation, selecting beta to be 3, converting an expression matrix into an adjacency matrix, converting the adjacency matrix into a topological matrix, clustering the genes by using an average-link hierarchical clustering method based on TOM, setting the gene number of each gene (lncRNA) network module to be 30 according to the standard of a hybrid dynamic shear tree, determining the gene modules by using the dynamic shear method, calculating feature vector values (eggens) of each module in sequence, then carrying out cluster analysis on the modules, combining the modules with close distances into a new module, setting height to be 0.25, depsplit to be 2 and minModuleSize to be 30 to obtain 7 modules in total, carrying out transcript statistics on each module, respectively calculating the correlation between the feature vectors of the 6 modules and five subtypes, analyzing the functions of genes in the five modules, carrying out KEGG enrichment analysis by using an R software package clusterinfilter, selecting the significance FDR to be less than 0.05, and analyzing the relationship between the paths enriched by the modules.
S12, external data set verification:
selecting genes in a gene co-expression module (blue, brown) closely related to subtypes C3 and C4, calculating the correlation between the genes and the module, selecting 73 genes with correlation coefficients larger than 0.8 as characteristic genes, extracting expression profiles from the genes as a training set, establishing a classification model by using a Support Vector Machine (SVM), classifying samples with the classification accuracy of 91.1%, downloading GSE14520 standardized data from a GEO database, wherein the GSE14520 standardized data comprises 445 samples, extracting 170 samples from the expression profiles of the characteristic genes and samples in stages I and II, classifying the samples by substituting the model, predicting the expression distribution of 39 samples C1, 40 samples C2, 18 samples C3, 18 samples C4 samples 29 and 44 samples C5 in all stages, analyzing the expression distribution of 13 immunemmunomate in the five types of samples in each subtype, analyzing the immune scores of the samples, analyzing the distribution of 10 immunelated cells in the five types of samples, and finally analyzing the differences of the five subtypes after age distribution to obtain a conclusion.
2. The method of claim 1, wherein the molecular subtype of the liver cancer is selected from the group consisting of: the data sources in S1 are cancer gene map (TCGA) transcriptome sequencing technology (RNA-Seq) data, clinical follow-up information data, Single Nucleotide Polymorphism (SNP) data, Timer database, aggregate gene (gene) database, immune cell database.
3. The method of claim 1, wherein the molecular subtype of the liver cancer is selected from the group consisting of: the number of the metagenes in the S2 is 13, the types of the immune related cells in the S2 are 10, and the types of the immune cells in the sample in the S2 are 6.
4. The method of claim 1, wherein the molecular subtype of the liver cancer is selected from the group consisting of: in S3, genes were selected as the immune genes included in the study, in which more than 30% of the samples with an expression level of 0 were selected in each sample.
5. The method of claim 1, wherein the molecular subtype of the liver cancer is selected from the group consisting of: in S4, 80% of the samples are sampled by a resampling scheme, and the samples are resampled 100 times.
6. The method of claim 1, wherein the molecular subtype of the liver cancer is selected from the group consisting of: in the S5, FDR <0.05 is used as a threshold value, 230 genes which are significantly and highly expressed in C1, 64 genes in C2, 118 genes in C3, 125 genes in C4 and 72 genes in C5 are finally screened respectively, C3 and C4 intersections among the genes are 54, C1 and C4 are 54, and intersections among other classes are few.
CN202011101709.2A 2020-10-15 2020-10-15 Research method of molecular subtype for enhancing immunity in early liver cancer Pending CN112233796A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011101709.2A CN112233796A (en) 2020-10-15 2020-10-15 Research method of molecular subtype for enhancing immunity in early liver cancer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011101709.2A CN112233796A (en) 2020-10-15 2020-10-15 Research method of molecular subtype for enhancing immunity in early liver cancer

Publications (1)

Publication Number Publication Date
CN112233796A true CN112233796A (en) 2021-01-15

Family

ID=74113124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011101709.2A Pending CN112233796A (en) 2020-10-15 2020-10-15 Research method of molecular subtype for enhancing immunity in early liver cancer

Country Status (1)

Country Link
CN (1) CN112233796A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113035274A (en) * 2021-04-22 2021-06-25 广东技术师范大学 NMF-based tumor gene point mutation characteristic map extraction algorithm
CN113539360A (en) * 2021-07-21 2021-10-22 西北工业大学 IncRNA characteristic recognition method based on correlation optimization and immune enrichment
CN114334012A (en) * 2021-12-16 2022-04-12 天津大学 Method for identifying cancer subtypes based on multigroup data
CN115862876A (en) * 2023-03-02 2023-03-28 北京师范大学 Device and computer readable storage medium for predicting lung adenocarcinoma patient prognosis and medication guidance based on immune microenvironment gene group
CN118098378A (en) * 2024-04-28 2024-05-28 浙江大学医学院附属邵逸夫医院 Gene model construction method for identifying new subtype of liver cell liver cancer and application

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WENLI LI等: "Multi-omics Analysis of Microenvironment Characteristics and Immune Escape Mechanisms of Hepatocellular Carcinoma", FRONTIERS IN ONCOLOGY, no. 9, 15 October 2019 (2019-10-15), pages 2 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113035274A (en) * 2021-04-22 2021-06-25 广东技术师范大学 NMF-based tumor gene point mutation characteristic map extraction algorithm
CN113539360A (en) * 2021-07-21 2021-10-22 西北工业大学 IncRNA characteristic recognition method based on correlation optimization and immune enrichment
CN113539360B (en) * 2021-07-21 2023-03-31 西北工业大学 IncRNA characteristic recognition method based on correlation optimization and immune enrichment
CN114334012A (en) * 2021-12-16 2022-04-12 天津大学 Method for identifying cancer subtypes based on multigroup data
CN115862876A (en) * 2023-03-02 2023-03-28 北京师范大学 Device and computer readable storage medium for predicting lung adenocarcinoma patient prognosis and medication guidance based on immune microenvironment gene group
CN118098378A (en) * 2024-04-28 2024-05-28 浙江大学医学院附属邵逸夫医院 Gene model construction method for identifying new subtype of liver cell liver cancer and application

Similar Documents

Publication Publication Date Title
CN112233796A (en) Research method of molecular subtype for enhancing immunity in early liver cancer
CN113140258B (en) Method for screening potential prognosis biomarkers of lung adenocarcinoma based on tumor invasive immune cells
CN109872776B (en) Screening method for potential biomarkers of gastric cancer based on weighted gene co-expression network analysis and application thereof
US20230027353A1 (en) Systems and Methods for Deconvoluting Tumor Ecosystems for Personalized Cancer Therapy
CN109346130A (en) A method of directly micro- haplotype and its parting are obtained from full-length genome weight sequencing data
CN115631857B (en) Thyroid cancer CD8+ T cell immune related gene prognosis prediction model
Tang et al. Identification of a tumor immunological phenotype-related gene signature for predicting prognosis, immunotherapy efficacy, and drug candidates in hepatocellular carcinoma
CN112779334B (en) Methylation marker combination for early screening of prostate cancer and screening method
CN112687342A (en) Application of a group of immune-related molecular markers identified based on TCGA (TCGA) database in esophageal cancer prognosis prediction
CN109830264A (en) The method that tumor patient is classified based on methylation sites
CN112086199B (en) Liver cancer data processing system based on multiple groups of study data
Chretien et al. Increased NK cell maturation in patients with acute myeloid leukemia
CN109929934B (en) Application of immune related gene in kit and system for colorectal cancer prognosis
CN115807089A (en) Hepatocellular carcinoma prognosis biomarker and application thereof
CN114164269A (en) Potential antigen significantly related to renal clear cell carcinoma prognosis, immunophenotyping, construction method and application thereof
CN116631508A (en) Detection method for tumor specific mutation state and application thereof
CN116206681A (en) Method for evaluating prognostic gene pair value of immune infiltration cell model
CN113355411B (en) Tumor immunotyping method based on lncRNA marker
CN114373548A (en) Pancreatic cancer prognosis risk prediction method and device established based on metabolic genes
CN113584175A (en) Group of molecular markers for evaluating renal papillary cell carcinoma progression risk and screening method and application thereof
CN110408706A (en) It is a kind of assess recurrent nasopharyngeal carcinoma biomarker and its application
CN113436741B (en) Lung cancer recurrence prediction method based on tissue specific enhancer region DNA methylation
CN115188415A (en) Intestinal cancer molecular typing method and device based on immune characteristics
CN114822688A (en) Prognostic analysis method based on primary liver cancer gene classification and liver cancer tissue energy metabolism
CN116469552A (en) Method and system for breast cancer polygene genetic risk assessment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination