CN113707223A - Gene set system and method for predicting activity state and treatment sensitivity of tumor inflammasome - Google Patents

Gene set system and method for predicting activity state and treatment sensitivity of tumor inflammasome Download PDF

Info

Publication number
CN113707223A
CN113707223A CN202110689083.XA CN202110689083A CN113707223A CN 113707223 A CN113707223 A CN 113707223A CN 202110689083 A CN202110689083 A CN 202110689083A CN 113707223 A CN113707223 A CN 113707223A
Authority
CN
China
Prior art keywords
activity
gene
inflammasome
data
il1b
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110689083.XA
Other languages
Chinese (zh)
Inventor
梁庆昱
吴建奇
程文
吴安华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of CN113707223A publication Critical patent/CN113707223A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B35/00ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
    • G16B35/20Screening of libraries
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/30Unsupervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Data Mining & Analysis (AREA)
  • Library & Information Science (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Physiology (AREA)
  • Artificial Intelligence (AREA)
  • Chemical & Material Sciences (AREA)
  • Biochemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

A gene set system and a method for predicting the activity state and treatment sensitivity of tumor inflammasome belong to the technical field of analysis of the activity state of the inflammasome. The system establishes a prediction model based on a sample training data set and five gene sets, and divides tumors into a class I-inflammatory corpuscle activityIs low in‑IL1BIs low inClass II-inflammasome ActivityIs low in‑IL1BHeight ofClass III-inflammatory body ActivityIn‑CASP1Height ofClass IV-inflammasome ActivityIn‑IL18Height ofClass five-inflammasome activityHeight of‑IL18Is low inAnd class six-inflammasome activityHeight of‑IL18Height ofAnd six types. And determining the activity of the second-class-inflammasome through the analysis of the target treatment sensitivity and the immunotherapy sensitivityIs low in‑IL1BHeight ofFor BRAF-targeted drug resistance, the stronger the inflammatory corpuscle activity status is, e.g., the five classes-inflammatory corpuscle activityHeight of‑IL18Is low inAnd class six-inflammasome activityHeight of‑IL18Height ofIs resistant to immunotherapy.

Description

Gene set system and method for predicting activity state and treatment sensitivity of tumor inflammasome
Technical Field
The invention relates to a gene set system and a method for predicting the activity state and treatment sensitivity of tumor inflammasome, belonging to the technical field of analysis of the activity state of the inflammasome.
Background
The inflammasome is a complex composed of a number of proteins with a molecular weight of about 700 Kda. The inflammasome regulates the activation of caspase-1 (caspase-1) and thus promotes the maturation and secretion of the cytokine precursors pro-IL-1 β and pro-IL-18 during the course of the innate immune defense. It also regulates caspase-1 dependent form apoptosis, inducing cell death under inflammatory and stress pathological conditions. In modern medicine, tumors are one of the leading causes of death. During tumor development, most tumors show activation of the inflammasome. The enhanced activity of the inflammasome can promote the tumor proliferation, the angiogenesis, the metastasis and the immune escape. Clinically, the activation of the inflammasome is often closely associated with tumor recurrence, drug resistance and poor prognosis. Recent studies have shown that activation of the inflammasome can enhance the tumor cell immunosuppressive state by remodeling the tumor cell microenvironment, leading to drug resistance or immune escape. It can be seen that analyzing the activity status of the tumor inflammasome is of great significance for studying the life activities of tumor cells, however, there is currently no method for assessing the activity of the tumor inflammasome.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a gene set system and a method for predicting the activity state and treatment sensitivity of tumor inflammasome.
In order to realize the purpose of the invention, the invention adopts the following technical scheme: a gene set for predicting the activity state and treatment sensitivity of tumor inflammasome, wherein the gene set is an inflammasome activity related gene set IRGs which comprises five gene sets of 15 inflammasome core genes, 34 CASP1 regulatory genes, 92 IL1B regulatory genes, 8 IL18 regulatory genes and 13 GSDMD regulatory genes.
The 15 inflammasome core genes comprise fifteen genes of NLRP1, NLRP3, CASP4, CASP5, NLRC5, NLRP6, NLRP12, NLRP7, NAIP, NLRC4, AIM2, IFI16, MEFV, NLRP2 and PYCARD.
The 34 CASP1 regulatory genes comprise TMEM260, TIFA, LSM4, CERS2, PLCG2, FAM219A, LAMTOR3, KLF11, CTSH, CAPZA2, ACTL6A, C10orf11, KLRB1, UBR1, PPP1R12B, ZNF512B, FARP2, PPIP5K1, PHKA1, BACE1, MRPS27, TRAPPC12, TAS1R3, CD247, CYBRD1, PCED1A, AP5M1, RMDN1, FBXO31, BLOC1S6, ATP10D, FMO1, FKBP5 and NNT thirty-four genes.
The 92 IL1 regulatory genes include ADAM, SLC11A, EREG, CSF, ESR, BID, BACH, TNFRSF1, MAPKAPK, IFNAR, NFKB, OSMR, BTN1A, IRF, GNAQ, SULF, SLC7A, SLAMF, SELP, NPY1, PROKR, SOD, SERPINB, RAB, PTGER, STX, IGSF, ARHGAP, TBC1D, TNIP, SLAMF, ACTR3, PIM, MAP, 3K, POLR3, BCL, HOXD, CDCA, RND, CERS, GPR171, PTGS, NFATC, B3GAT, CH25, IL, SLC5A, NFKBIE, SOELF, SELE, TNFP, TLR, CHL, CCL, STEAP, ZC3H12, IL, CXVCAM, NFKCL, BICCL, LACOX, SHFC, SALC, SCLC, SCHS, SCLC 3, SCHS, SACK 3, SACK 608, SACK.
The 8 IL18 regulatory genes include the eight genes CD226, KLRB1, CXCR6, THY1, SAMD3, TXK, C5orf28, FASLG, RGS11, CTSW, MID1, CAMK2B, and CNBD 2.
The 13 GSDMDM regulatory genes comprise eight genes including NLRP1, NLRP3, CASP4, CASP5, NLRC5, NLRP6, NLRP12, NLRP7, NAIP, NLRC4, AIM2, IFI16, MEFV, NLRP2 and PYCARD.
The system comprises a data input module, a data analysis module and an output module, wherein the data analysis module is in data communication connection with the data input module and the output module, the data input module is used for inputting gene expression data of a training sample data set, gene expression data and related drug sensitivity screening data of cell lines obtained from tumor drug sensitivity databases GDSC and CCLE databases, gene expression and immunotherapy reactivity data of corresponding samples obtained from an immunotherapy database IMvigor210 CoreBiologes, and gene expression data of samples to be tested.
The training sample dataset comprises 9881 data of 33 tumor types obtained by a TCGA database of tumor genome maps.
The data analysis module comprises an inflammasome activity state classification module, a targeted therapy analysis module and an immunotherapy analysis module.
The inflammatory corpuscle activity state classification module firstly takes gene expression data of a training sample data set as input, calculates the score of an inflammatory corpuscle activity related gene set IRGs of each sample in the training sample data set by adopting an ssGSEA algorithm, classifies the inflammatory corpuscle activity state by utilizing unsupervised K-mean clustering according to the score, and defines the inflammatory corpuscle activity state as a class-inflammatory corpuscle activity state according to the classification resultIs low in-IL1BIs low inClass II-inflammasome ActivityIs low in-IL1BHeight ofClass III-inflammatory body ActivityIn-CASP1Height ofClass IV-inflammasome ActivityIn-IL18Height ofClass five-inflammasome activityHeight of-IL18Is low inClass six-inflammasome activityHeight of-IL18Height ofThen using gene expression data of sample to be tested as input to obtain gene expression data of sample to be testedInflammatory corpuscle activity status type.
The targeted therapy analysis module firstly takes gene expression data of a cell line and related drug sensitivity screening data as input, calculates IRG scores of five gene sets of the cell line by using ssGSEA algorithm, classifies the cell line into six types of inflammatory corpuscle activity state types based on the IRG scores of the five gene sets by SVM machine learning algorithm, compares the IC50 value of each targeted drug in the six types of inflammatory corpuscle activity state types, and determines the activity of the second type-inflammatory corpuscleIs low in-IL1BHeight ofPreferring BRAF targeting drug resistance, and then determining whether the sample to be tested is BRAF targeting drug resistance according to the activity state type of the inflammasome of the sample to be tested.
The immunotherapy analysis module firstly takes gene expression and immunotherapy reactivity data of a corresponding sample as input, calculates IRG scores of five gene sets of the corresponding sample by using a ssGSEA algorithm, predicts the activity state types of inflammatory corpuscles of patients in an IMvigor210CoreBiologies database based on the IRG scores by using an SVM machine learning algorithm, compares the immunotherapy reaction conditions among the activity state types of the inflammatory corpuscles, determines that the activity enhancement of the inflammatory corpuscles is immunotherapy resistance, and then determines whether the sample to be tested is immunotherapy resistance or not according to the activity state types of the inflammatory corpuscles of the sample to be tested.
The output module is used for outputting the activity state type of the inflammatory corpuscle of the sample to be tested, whether BRAF targeting drug resistance exists or not and whether immunotherapy resistance exists or not.
A method for predicting tumor inflammasome activity status and treatment sensitivity, comprising the steps of:
the method comprises the following steps: acquiring a training sample data set, and acquiring 9881 cases of 33 types of tumor types from a TCGA (tumor genome atlas database) as the training sample data set, wherein the 9881 cases of data comprise gene mutation data, gene copy number variation data, gene expression data and clinical information data of each case of sample.
Step two: construction of an inflammasome activity-related gene set IRGs comprising five gene sets of 15 inflammasome core genes, 34 CASP1 regulatory genes, 92 IL1B regulatory genes, 8 IL18 regulatory genes, and 13 GSDMD regulatory genes.
The 15 inflammasome core genes comprise fifteen genes of NLRP1, NLRP3, CASP4, CASP5, NLRC5, NLRP6, NLRP12, NLRP7, NAIP, NLRC4, AIM2, IFI16, MEFV, NLRP2 and PYCARD.
The 34 CASP1 regulatory genes comprise TMEM260, TIFA, LSM4, CERS2, PLCG2, FAM219A, LAMTOR3, KLF11, CTSH, CAPZA2, ACTL6A, C10orf11, KLRB1, UBR1, PPP1R12B, ZNF512B, FARP2, PPIP5K1, PHKA1, BACE1, MRPS27, TRAPPC12, TAS1R3, CD247, CYBRD1, PCED1A, AP5M1, RMDN1, FBXO31, BLOC1S6, ATP10D, FMO1, FKBP5 and NNT thirty-four genes.
The 92 IL1 regulatory genes include ADAM, SLC11A, EREG, CSF, ESR, BID, BACH, TNFRSF1, MAPKAPK, IFNAR, NFKB, OSMR, BTN1A, IRF, GNAQ, SULF, SLC7A, SLAMF, SELP, NPY1, PROKR, SOD, SERPINB, RAB, PTGER, STX, IGSF, ARHGAP, TBC1D, TNIP, SLAMF, ACTR3, PIM, MAP, 3K, POLR3, BCL, HOXD, CDCA, RND, CERS, GPR171, PTGS, NFATC, B3GAT, CH25, IL, SLC5A, NFKBIE, SOELF, SELE, TNFP, TLR, CHL, CCL, STEAP, ZC3H12, IL, CXVCAM, NFKCL, BICCL, LACOX, SHFC, SALC, SCLC, SCHS, SCLC 3, SCHS, SACK 3, SACK 608, SACK.
The 8 IL18 regulatory genes include the eight genes CD226, KLRB1, CXCR6, THY1, SAMD3, TXK, C5orf28, FASLG, RGS11, CTSW, MID1, CAMK2B, and CNBD 2.
The 13 GSDMDM regulatory genes comprise eight genes including NLRP1, NLRP3, CASP4, CASP5, NLRC5, NLRP6, NLRP12, NLRP7, NAIP, NLRC4, AIM2, IFI16, MEFV, NLRP2 and PYCARD.
Step three: calculating IRG scores of five gene sets in IRGs (inflammatory corpuscle activity related gene sets), calculating the IRG scores of the five gene sets of each sample by using a ssGSEA (single Strand genetic algorithm) by taking gene expression data of the training sample data set as input, classifying the training sample data set by using unsupervised K-mean clustering based on the IRG scores of the five gene sets of each sample, wherein the parameters of the K-mean clustering are set as follows: the number of simulations =100, distance = euclidean distance, and the number of clusters after clustering was set to 6 from the consistency data.
Step four: determining activity intensity and activation mode of six kinds of inflammatory corpuscle, and defining the sample in the training sample data set as one kind-inflammatory corpuscle activity according to the distribution of IRG scores of five gene sets in step three in different typesIs low in-IL1BIs low inClass II-inflammasome ActivityIs low in-IL1BHeight ofClass III-inflammatory body ActivityIn-CASP1Height ofClass IV-inflammasome ActivityIn-IL18Height ofClass five-inflammasome activityHeight of-IL18Is low inClass six-inflammasome activityHeight of-IL18Height of
Step five: the method comprises the steps of carrying out targeted treatment sensitivity analysis, obtaining gene expression data of cell line data and related drug sensitivity screening data from a tumor drug sensitivity database GDSC and a CCLE database, calculating IRG scores of five gene sets of the cell line by using a ssGSEA algorithm, classifying the cell line into six types of inflammatory corpuscle activity state types in step four based on the IRG scores of the five gene sets through a SVM machine learning algorithm, comparing the IC50 value of each targeted drug in the six types of inflammatory corpuscle activity state types, and determining the second type-inflammatory corpuscle activity state typesIs low in-IL1BHeight ofAre prone to BRAF-targeted drug resistance.
Step six: and (3) carrying out immunotherapy sensitivity analysis, obtaining gene expression and immunotherapy reactivity data of corresponding samples from an immunotherapy database IMvigor210CoreBiologies, calculating IRG scores of five gene sets of the corresponding samples by using an ssGSEA algorithm, predicting the inflammatory body activity state types of the patients in the IMvigor210CoreBiologies database based on the five IRG scores through an SVM machine learning algorithm, comparing the immunotherapy reaction conditions among the inflammatory body activity state types, and determining that the activity enhancement of the inflammatory body is immunotherapy resistance.
Step seven: analyzing the activity state of the inflammatory corpuscle of the sample to be detected to obtain gene expression data of the sample to be detected, calculating IRG scores of five gene sets of the sample to be detected by utilizing a ssGSEA algorithm, and determining the activity state type of the inflammatory corpuscle of the sample to be detected, whether BRAF targeted drug resistance exists or not and whether immunotherapy resistance exists or not based on the five IRG scores of the sample to be detected and six inflammatory corpuscle activity state type labels of a training sample data set.
The invention has the beneficial effects that: a gene set system and a method for predicting the activity state and treatment sensitivity of tumor inflammasome are disclosed, wherein a gene set capable of predicting the activity state and treatment sensitivity of the tumor inflammasome is established by analysis, a prediction system and a prediction method are established by utilizing the gene set, the prediction system and the prediction method are based on a sample training data set and five gene sets, a prediction model is established by an ssGSEA algorithm, an unsupervised K-mean clustering algorithm and an SVM machine learning algorithm, and samples are divided into one class, namely the activity state of the tumor inflammasome and the treatment sensitivity, the prediction model is established by utilizing the ssGSEA algorithm, the unsupervised K-mean clustering algorithm and the SVM machine learning algorithm, and the samples are classified into one class, namely the activity state of the tumor inflammasome and the treatment sensitivity of the tumor inflammasomeIs low in-IL1BIs low inClass II-inflammasome ActivityIs low in-IL1BHeight ofClass III-inflammatory body ActivityIn-CASP1Height ofClass IV-inflammasome ActivityIn-IL18Height ofClass five-inflammasome activityHeight of-IL18Is low inAnd class six-inflammasome activityHeight of-IL18Height ofAnd six types. And predicting the activity of the inflammatory corpuscle of the second kind through the analysis of the target treatment sensitivity and the immunotherapy sensitivityIs low in-IL1BHeight ofFor BRAF-targeted drug resistance, the stronger the inflammatory corpuscle activity status is, e.g., the five classes-inflammatory corpuscle activityHeight of-IL18Is low inAnd class six-inflammasome activityHeight of-IL18Height ofIs resistant to immunotherapy. The prediction model can efficiently evaluate and identify the activity states of the inflammasome in different samples, and provides a new gene set, a prediction system and a prediction method for analyzing the activity states of the inflammasome and the treatment sensitivity of tumors with different activity states of the inflammasome.
Drawings
FIG. 1 is a schematic diagram of a system for predicting tumor inflammasome activity status and treatment sensitivity.
FIG. 2 is a schematic flow chart of a method for predicting tumor inflammasome activity status and treatment sensitivity.
Figure 3 is a graph of therapeutic responsiveness results of murine primary glioma immune checkpoint blockade in combination with an inhibitor of inflammatory body activity.
FIG. 4 is a graph of the categorical prediction of inflammatory corpuscle activity status of breast cancer patients from external data.
Detailed Description
In order to make the technical solutions of the present invention clearer, the technical solutions in the following embodiments will be clearly and completely described with reference to the embodiments of the present invention, which are used for illustrating the present invention and are not intended to limit the scope of the present invention.
A process for constructing a gene set for predicting the activity state and treatment sensitivity of tumor inflammasome comprises the following steps: the inflammatory body activity signals are primarily involved in three phases, including the initial phase (regulated by the inflammatory body complex), the processing phase (regulated by caspase-1) and the final phase (regulated by GSDMD, IL1B and IL 18). From the names NOD-like receptors and antibiotics A review of the same and non-cationic signalling routes (A)Archives of Biochemistry and Biophysics 670 (2019) 4–14) And The engineering roles of inflammatory cytokines in cancer definitions (EMBO Reports (2019) 20: e47575) 15 genes associated with The inflammatory-corpuscle complex (called The inflammatory-corpuscle core genes) were collected. From the name of analytes of caspase-1-regulated transactions in variance of tissue lead to identification of novel IL-1 beta-, IL-18-and sintuin-1-index pathways (Li et al, Journal of Hematology)&Oncology (2017) 10: 40) identified 34 CASP1 regulatory genes and 92 IL1B regulatory genes in a meta-analysis study based on gene expression profiles of the GEO dataset. 8 genes regulated by IL18 were identified by performing the same meta-analysis procedure as described above on the GEO data sets with GEO numbers GSE64308, GSE64309, GSE 64310. The same meta-analysis failed to find the GSDMD regulated gene. We then identified 13 GSDMD regulated genes by performing differential gene analysis on GEO data with GEO number GSE 126289. Inflammation composed of the above five gene setsSex-body activity Related Gene Sets (IRGs).
The genetic makeup of the five gene sets in the inflammasome activity Related Gene Set (IRGs) is listed in table 1:
TABLE 1 Gene set composition
Gene set Name of Gene Sub-classification of gene sets
CASP1 TMEM260 CASP1_up_regulated_gene
CASP1 TIFA CASP1_up_regulated_gene
CASP1 LSM4 CASP1_up_regulated_gene
CASP1 CERS2 CASP1_up_regulated_gene
CASP1 PLCG2 CASP1_up_regulated_gene
CASP1 FAM219A CASP1_up_regulated_gene
CASP1 LAMTOR3 CASP1_up_regulated_gene
CASP1 KLF11 CASP1_up_regulated_gene
CASP1 CTSH CASP1_up_regulated_gene
CASP1 CAPZA2 CASP1_up_regulated_gene
CASP1 ACTL6A CASP1_up_regulated_gene
CASP1 C10orf11 CASP1_up_regulated_gene
CASP1 KLRB1 CASP1_up_regulated_gene
CASP1 UBR1 CASP1_down_regulated_gene
CASP1 PPP1R12B CASP1_down_regulated_gene
CASP1 ZNF512B CASP1_down_regulated_gene
CASP1 FARP2 CASP1_down_regulated_gene
CASP1 PPIP5K1 CASP1_down_regulated_gene
CASP1 PHKA1 CASP1_down_regulated_gene
CASP1 BACE1 CASP1_down_regulated_gene
CASP1 MRPS27 CASP1_down_regulated_gene
CASP1 TRAPPC12 CASP1_down_regulated_gene
CASP1 TAS1R3 CASP1_down_regulated_gene
CASP1 CD247 CASP1_down_regulated_gene
CASP1 CYBRD1 CASP1_down_regulated_gene
CASP1 PCED1A CASP1_down_regulated_gene
CASP1 AP5M1 CASP1_down_regulated_gene
CASP1 RMDN1 CASP1_down_regulated_gene
CASP1 FBXO31 CASP1_down_regulated_gene
CASP1 BLOC1S6 CASP1_down_regulated_gene
CASP1 ATP10D CASP1_down_regulated_gene
CASP1 FMO1 CASP1_down_regulated_gene
CASP1 FKBP5 CASP1_down_regulated_gene
CASP1 NNT CASP1_down_regulated_gene
IL1B ADAM17 IL1B_up_regulated_gene
IL1B SLC11A2 IL1B_up_regulated_gene
IL1B EREG IL1B_up_regulated_gene
IL1B CSF1 IL1B_up_regulated_gene
IL1B ESR1 IL1B_up_regulated_gene
IL1B BID IL1B_up_regulated_gene
IL1B BACH1 IL1B_up_regulated_gene
IL1B TNFRSF1B IL1B_up_regulated_gene
IL1B MAPKAPK3 IL1B_up_regulated_gene
IL1B IFNAR2 IL1B_up_regulated_gene
IL1B NFKB2 IL1B_up_regulated_gene
IL1B OSMR IL1B_up_regulated_gene
IL1B BTN1A1 IL1B_up_regulated_gene
IL1B IRF5 IL1B_up_regulated_gene
IL1B GNAQ IL1B_up_regulated_gene
IL1B SULF2 IL1B_up_regulated_gene
IL1B SLC7A11 IL1B_up_regulated_gene
IL1B SLAMF1 IL1B_up_regulated_gene
IL1B SELP IL1B_up_regulated_gene
IL1B NPY1R IL1B_up_regulated_gene
IL1B PROKR1 IL1B_up_regulated_gene
IL1B SOD2 IL1B_up_regulated_gene
IL1B SERPINB2 IL1B_up_regulated_gene
IL1B RAB32 IL1B_up_regulated_gene
IL1B PTGER4 IL1B_up_regulated_gene
IL1B STX11 IL1B_up_regulated_gene
IL1B IGSF3 IL1B_up_regulated_gene
IL1B ARHGAP27 IL1B_up_regulated_gene
IL1B TBC1D9 IL1B_up_regulated_gene
IL1B TNIP1 IL1B_up_regulated_gene
IL1B SLAMF8 IL1B_up_regulated_gene
IL1B ACTR3B IL1B_up_regulated_gene
IL1B PIM1 IL1B_up_regulated_gene
IL1B MAP3K8 IL1B_up_regulated_gene
IL1B POLR3K IL1B_up_regulated_gene
IL1B BCL3 IL1B_up_regulated_gene
IL1B HOXD13 IL1B_up_regulated_gene
IL1B CDCA2 IL1B_up_regulated_gene
IL1B RND1 IL1B_up_regulated_gene
IL1B CERS5 IL1B_up_regulated_gene
IL1B GPR171 IL1B_up_regulated_gene
IL1B PTGS2 IL1B_up_regulated_gene
IL1B NFATC2 IL1B_up_regulated_gene
IL1B B3GAT1 IL1B_up_regulated_gene
IL1B CH25H IL1B_up_regulated_gene
IL1B IL33 IL1B_up_regulated_gene
IL1B SLC5A1 IL1B_up_regulated_gene
IL1B NFKBIE IL1B_up_regulated_gene
IL1B SOCS3 IL1B_up_regulated_gene
IL1B ELF3 IL1B_up_regulated_gene
IL1B SELE IL1B_up_regulated_gene
IL1B TNFAIP2 IL1B_up_regulated_gene
IL1B TLR2 IL1B_up_regulated_gene
IL1B CHL1 IL1B_up_regulated_gene
IL1B CCL2 IL1B_up_regulated_gene
IL1B STEAP4 IL1B_up_regulated_gene
IL1B ZC3H12A IL1B_up_regulated_gene
IL1B IL6 IL1B_up_regulated_gene
IL1B VCAM1 IL1B_up_regulated_gene
IL1B NFKBIZ IL1B_up_regulated_gene
IL1B CXCL1 IL1B_up_regulated_gene
IL1B CCL20 IL1B_up_regulated_gene
IL1B LACC1 IL1B_up_regulated_gene
IL1B ZNF608 IL1B_up_regulated_gene
IL1B GBP6 IL1B_up_regulated_gene
IL1B LIM2 IL1B_down_regulated_gene
IL1B HS3ST2 IL1B_down_regulated_gene
IL1B COX5B IL1B_down_regulated_gene
IL1B PDE4B IL1B_down_regulated_gene
IL1B ASCL2 IL1B_down_regulated_gene
IL1B EHD2 IL1B_down_regulated_gene
IL1B SCN3A IL1B_down_regulated_gene
IL18 SC5D IL18_up_regulated_gene
IL18 DYNC2H1 IL18_up_regulated_gene
IL18 BCL9L IL18_up_regulated_gene
IL18 ALG9 IL18_up_regulated_gene
IL18 TMEM25 IL18_up_regulated_gene
IL18 NXPE4 IL18_up_regulated_gene
IL18 IFT46 IL18_down_regulated_gene
IL18 RNF214 IL18_down_regulated_gene
GSDMD CD226 GSDMD_up_regulated_gene
GSDMD KLRB1 GSDMD_up_regulated_gene
GSDMD CXCR6 GSDMD_up_regulated_gene
GSDMD THY1 GSDMD_up_regulated_gene
GSDMD SAMD3 GSDMD_up_regulated_gene
GSDMD TXK GSDMD_up_regulated_gene
GSDMD C5orf28 GSDMD_up_regulated_gene
GSDMD FASLG GSDMD_up_regulated_gene
GSDMD RGS11 GSDMD_up_regulated_gene
GSDMD CTSW GSDMD_up_regulated_gene
GSDMD MID1 GSDMD_up_regulated_gene
GSDMD CAMK2B GSDMD_down_regulated_gene
GSDMD CNBD2 GSDMD_down_regulated_gene
Inflammasome NLRP1 Inflammasome_hubgene
Inflammasome NLRP3 Inflammasome_hubgene
Inflammasome CASP4 Inflammasome_hubgene
Inflammasome CASP5 Inflammasome_hubgene
Inflammasome NLRC5 Inflammasome_hubgene
Inflammasome NLRP6 Inflammasome_hubgene
Inflammasome NLRP12 Inflammasome_hubgene
Inflammasome NLRP7 Inflammasome_hubgene
Inflammasome NAIP Inflammasome_hubgene
Inflammasome NLRC4 Inflammasome_hubgene
Inflammasome AIM2 Inflammasome_hubgene
Inflammasome IFI16 Inflammasome_hubgene
Inflammasome MEFV Inflammasome_hubgene
Inflammasome NLRP2 Inflammasome_hubgene
Inflammasome PYCARD Inflammasome_hubgene
The process of meta analysis is as follows: for the collected GEO data set with grouping information, we first perform a difference analysis by limma this R packet. Then, for each gene in each GEO dataset, we calculated the effect size using the effect size function in the meta ma, this R-package. Next, we combined the unbiased effect size and its variance in multiple GEO data sets using the directscombi function in the metaMA packet, while correcting the P value using the Benjamini-Hochberg method. Genes with final corrected P values less than 0.05 were included in the final study.
The differential gene analysis process comprises the following steps: and (3) constructing a comparison matrix according to the grouping information of the GEO data set samples, gradually substituting the comparison matrix and expression spectrum data into lmFit and eBayes functions of limma packages for analysis, and finally outputting a difference analysis result through a topTable function.
Fig. 1 is a schematic structural diagram of a system for predicting the activity status and treatment sensitivity of tumor inflammasome, wherein the system for predicting the activity status and treatment sensitivity of tumor inflammasome by using the gene set comprises a data input module, a data analysis module and an output module, the data analysis module is connected with the data input module and the output module in data communication, the data input module is used for inputting gene expression data of a training sample data set, gene expression data and related drug sensitivity screening data of cell lines obtained from a tumor drug sensitivity database GDSC and a CCLE database, gene expression and immunotherapy reactivity data of corresponding samples obtained from an immunotherapy database IMvigor210 corebiologices, and gene expression data of samples to be tested.
The training sample dataset contained 9881 total data from 33 tumor types obtained from the TCGA database of tumor genomic profiles. The data analysis module comprises an inflammasome activity state classification module, a targeted therapy analysis module and an immunotherapy analysis module.
The inflammatory corpuscle activity state classification module firstly takes gene expression data of a training sample data set as input, calculates the score of an inflammatory corpuscle activity related gene set IRGs of each sample in the training sample data set by adopting an ssGSEA algorithm, classifies the activity state of the inflammatory corpuscles by utilizing unsupervised K-mean clustering according to the score, and defines the activity state of the inflammatory corpuscles as a class-inflammatory corpuscle activity state according to the classification resultIs low in-IL1BIs low inClass II-inflammasome ActivityIs low in-IL1BHeight ofClass III-inflammatory body ActivityIn-CASP1Height ofClass IV-inflammasome ActivityIn-IL18Height ofClass five-inflammasome activityHeight of-IL18Is low inClass six-inflammasome activityHeight of-IL18Height ofAnd then, taking the gene expression data of the sample to be detected as input to obtain the activity state type of the inflammatory corpuscle of the sample to be detected.
The targeted therapy analysis module firstly takes gene expression data of a cell line and related drug sensitivity screening data as input,calculating IRG scores of five gene sets of the cell line by using ssGSEA algorithm, classifying the cell line into six types of inflammatory corpuscle activity state types based on the IRG scores of the five gene sets by SVM machine learning algorithm, comparing the magnitude of IC50 value of each targeting drug in the six types of inflammatory corpuscle activity state types, and determining the second type-inflammatory corpuscle activityIs low in-IL1BHeight ofPreferring BRAF targeting drug resistance, and then determining whether the sample to be tested is BRAF targeting drug resistance according to the activity state type of the inflammasome of the sample to be tested.
The immunotherapy analysis module firstly takes gene expression and immunotherapy reactivity data of a corresponding sample as input, calculates IRG scores of five gene sets of the corresponding sample by using a ssGSEA algorithm, predicts the activity state types of inflammatory bodies of patients in an IMvigor210CoreBiologies database based on the IRG scores by using an SVM machine learning algorithm, compares the immunotherapy reaction conditions among the activity state types of the inflammatory bodies, determines that the activity enhancement of the inflammatory bodies is immunotherapy resistance, and then determines whether the sample to be tested is immunotherapy resistance according to the activity state types of the inflammatory bodies of the sample to be tested.
The output module is used for outputting the activity state type of the inflammatory corpuscle of the sample to be tested, whether BRAF targeting drug is resistant or not and whether immunotherapy is resistant or not.
The data input module, the data analysis module and the output module all adopt the existing devices for inputting, outputting and analyzing.
Figure 2 shows a flow diagram of a method for predicting tumor inflammasome activity status and treatment sensitivity. In the figure, the method for predicting the activity state and treatment sensitivity of the tumor inflammasome comprises the following steps:
the method comprises the following steps: obtaining a training sample data set, and obtaining the training sample data set from a TCGA (tumor genome atlas) database, wherein the training sample data set comprises 9881 cases of 33 types of tumor types, and the 9881 cases of data comprise gene mutation data, gene copy number variation data, gene expression data and clinical information data of each case of sample.
Step two: and constructing an inflammasome activity related gene set, wherein the inflammasome activity related gene set adopts the constructed inflammasome activity related gene set IRGs.
Step three: and taking TCGA gene expression data of the five IRGs gene sets in the step two as input, respectively calculating IRG scores of the five gene sets by using a ssGSEA algorithm, and classifying the training sample data set by using unsupervised K-means clustering based on the IRG scores of the five gene sets. The parameters are as follows: simulation number = 100; distance = euclidean distance. After clustering, the number of clusters is determined to be 6 according to the consistency data. Tumorap analysis of 5 scores found that when the number of clusters was 6, the classes of patients could be well separated.
Step four: determining the activity intensity and the activation pattern of the inflammatory corpuscle class 6, distributing the conditions in different types according to five IRG scores, and defining the sample as the inflammatory corpuscle activity classIs low in-IL1BIs low inClass II-inflammasome ActivityIs low in-IL1BHeight ofClass III-inflammatory body ActivityIn-CASP1Height ofClass IV-inflammasome ActivityIn-IL18Height ofClass five-inflammasome activityHeight of-IL18Is low inClass six-inflammasome activityHeight of-IL18Height of
Step five: and (3) targeted treatment sensitivity analysis, wherein gene expression data and related drug sensitivity screening data of cell line data are obtained from a tumor drug sensitivity database GDSC and a CCLE database, IRG scores of five gene sets of the cell line are calculated by utilizing a ssGSEA algorithm, the cell line is classified into six types of inflammatory corpuscle activity state types according to the IRG scores of the five gene sets through a SVM machine learning algorithm, the IC50 value of each targeted drug in the six types of inflammatory corpuscle activity state types is compared, and the fact that the second type-inflammatory corpuscle activity is low-IL 1B is more prone to BRAF targeted drug resistance is determined.
Step six: and (3) carrying out immunotherapy sensitivity analysis, obtaining gene expression and immunotherapy reactivity data of the sample from an immunotherapy database IMvigor210CoreBiologies, calculating IRG scores of five gene sets corresponding to the sample by using an ssGSEA algorithm, predicting the inflammatory body activity state type of the patient in the IMvigor210CoreBiologies database based on the IRG scores of the five gene sets through an SVM machine learning algorithm, comparing the immunotherapy reaction conditions among the inflammatory body activity state types, and determining that the activity enhancement of the inflammatory body is immunotherapy resistance.
Step seven: analyzing the activity state of the inflammatory corpuscle of the sample to be detected to obtain gene expression data of the sample to be detected, calculating IRG scores of five gene sets of the sample to be detected by utilizing a ssGSEA algorithm, predicting the activity state type of the inflammatory corpuscle of the sample to be detected based on the IRG scores of the five gene sets of the sample to be detected and six types of inflammatory corpuscle activity state type labels of the TCGA sample, and predicting whether BRAF targeted drug resistance and immunotherapy resistance of the sample to be detected are achieved through SVM machine learning algorithms of the fifth step and the sixth step.
To predict the inflammatory corpuscle activity status type of samples in the external dataset by five inflammatory corpuscle activity-related scores, we used a two-layer validation strategy to compare the prediction accuracy of six machine learning algorithms, which include classification and regression trees (CART), Logistic Regression (LR), Linear Discriminant Analysis (LDA), knoeghbors classifiers (KNN), gaussian nb (nb), and Support Vector Machines (SVM). Briefly, TCGA samples were randomly divided into a training set (80%) and a validation set (20%). The prediction accuracy of the six algorithms was then compared by five cross-validation processes using the training set to overcome the overfitting problem. The accuracy of the six algorithms was further evaluated using the validation set as an outer layer evaluation. And finally, the SVM algorithm with the highest prediction precision in the two-layer verification strategy.
Example 1
Extracting primary mouse glioma cell (SB 1) from mouse spontaneous tumor model (SB model), sequencing mouse primary glioma cell SB1 to obtain gene expression profile, analyzing the gene expression profile of mouse primary glioma cell SB1 according to the seventh step in the concrete implementation mode, and determining that mouse primary glioma cell SB1 belongs to the five classes-inflammatory corpuscle active corpuscleHeight of-IL18Is low in. SB1 was implanted in situ in mouse brain using mouse stereotactic technique and was treated with immune checkpoint and inhibited class V-inflammasome activity using MB, respectivelyHeight of-IL18Is low inIn the modelInflammasome activity, one group contrasts survival of mice with different treatment methods. In another group, mice were sacrificed at 15 days, brain tissue was removed, paraformaldehyde fixed, paraffin embedded sections were sectioned, and tumor size was examined by hematoxylin-eosin staining.
FIG. 3 is a graph comparing the effect of the combination of murine primary glioma immune checkpoint blockade and an inhibitor of inflammatory body activity on the treatment, wherein graph A is a graph comparing the survival period after treatment, graph B is a graph comparing the tumor size after treatment, wherein + PBS is a PBS-added control group and + antipD-L1Ab is an immune checkpoint treatment group and + antipD-L1Ab&MB is the immune checkpoint blockade in combination with an inhibitor of inflammatory body activity treatment group. As can be seen from the figure, the activity of the five classes-the inflammasomeHeight of-IL18Is low inAfter the primary glioma of the type mouse is treated by combining an immune check point and an inflammatory corpuscle activity inhibitor, the tumor is obviously reduced, and the life cycle of the mouse is prolonged. Inhibition of five-class-inflammasome activity by MBHeight of-IL18Is low inThe activity of the inflammasome in the form may enhance the therapeutic effect of immune checkpoint therapy. It can be seen that the method for predicting the activity state and treatment sensitivity of tumor inflammasome can accurately predict the activity state of the inflammasome.
Example 2
We downloaded 168 breast cancer gene expression data via the cBioPortal database (http:// www.cbioportal.org/datasets), which was analyzed by step seven in the detailed description, and predicted 168 breast cancers as four of the six types of inflammatory corpuscle activity states. Meanwhile, four groups of classified inflammatory corpuscle activity status types were analyzed according to step four.
Fig. 4 is the external data of the classification and prediction results of the inflammatory corpuscle activity status of breast cancer patients, wherein a is the classification and prediction results of the inflammatory corpuscle activity status and the scores of the processes and pathways related to biological functions in different categories, B is the distribution of five inflammatory corpuscle activity-related scores in different categories, and C is the distribution of scores related to immune microenvironment in different categories. According to fig. 4, external data breast cancer patients were classified into four of six inflammasome activity status categories (two, three, four and five), and we found that three to five categories were attributed to inflammasome-enhanced, enhanced categories with stronger immunosuppressive status. It can be seen that the expression profile of 168 breast cancer patients is characterized by weak activity of first and second types of inflammatory corpuscles, moderately enhanced activity of third and fourth types of inflammatory corpuscles, highly enhanced activity of fifth and sixth types of inflammatory corpuscles, and stronger immunosuppressive characteristic according to enhanced activity of inflammatory corpuscles.
GDSC is a drug sensitivity database, from the expression profile data of the cell line, corresponding five IRG scores can be calculated, then divided into six inflammatory corpuscle activity state types, and the sensitivity condition of each drug in six different inflammatory corpuscle activity state types is compared, thereby obtaining the treatment sensitivity. IMvigor210 corebiologices is an immunotherapy database that calculates the corresponding five IRG scores from the expression profiles of patients receiving immunotherapy, classifies them, and then compares the response rates of immunotherapy in each class of patients, resulting in sensitivity or resistance to each interstitial type of immunotherapy (immune checkpoint blockade therapy). And then substituting data of any sample into the model to obtain classification. The characteristics of this sample were then predicted based on the characteristics analyzed in the GDSC and IMvigor210 corebiologices databases.
The IMvigor210CoreBiologies database contains the treatment response result of anti-PD-L1 in the patient information, and the anti-PD-L1 treatment response or resistance distribution condition in different inflammatory corpuscle activity state types is tested by chi-square method, and the type with stronger inflammatory corpuscle activity is found to comprise more patients without response to the treatment.
IC50 is self-contained in the GDSC database and is used to evaluate drug sensitivity of a drug in a cell line. The Wilcoxon test is used for comparing the difference of the drug sensitivity index IC50 in a certain cell line and all other cell lines, the higher the median value of a certain drug IC50 in a certain class is, and the corrected p value is less than 0.05, which indicates that the cell line is more resistant to the drug. Other classes do not find drugs that are typically significantly resistant or sensitive.

Claims (3)

1. A gene set for predicting the activity state and treatment sensitivity of tumor inflammasome is an inflammasome activity related gene set IRGs, wherein the inflammasome activity related gene set IRGs comprises five gene sets of 15 inflammasome core genes, 34 CASP1 regulatory genes, 92 IL1B regulatory genes, 8 IL18 regulatory genes and 13 GSDMD regulatory genes;
the 15 inflammasome core genes comprise fifteen genes of NLRP1, NLRP3, CASP4, CASP5, NLRC5, NLRP6, NLRP12, NLRP7, NAIP, NLRC4, AIM2, IFI16, MEFV, NLRP2 and PYCARD;
the 34 CASP1 regulatory genes comprise TMEM260, TIFA, LSM4, CERS2, PLCG2, FAM219A, LAMTOR3, KLF11, CTSH, CAPZA2, ACTL6A, C10orf11, KLRB1, UBR1, PPP1R12B, ZNF512B, FARP2, PPIP5K1, PHKA1, BACE1, MRPS27, TRAPPC12, TAS1R3, CD247, CYBRD1, PCED1A, AP5M1, RMDN1, FBXO31, BLOC1S6, ATP10D, FMO1, FKBP5 and NNT thirty-four genes;
the 92 IL1 regulatory genes comprise ADAM, SLC11A, EREG, CSF, ESR, BID, BACH, TNFRSF1, MAPKAPK, IFNAR, NFKB, OSMR, BTN1A, IRF, GNAQ, SULF, SLC7A, SLAMF, SELP, NPY1, PROKR, SOD, SERPINB, RAB, PTGER, STX, IGSF, ARHGAP, TBC1D, TNIP, SLAMF, ACTR3, PIM, MAP, 3K, POLR3, BCL, HOXD, CDCA, RND, CERS, GPR171, PTGS, NFATC, B3GAT, CH25, IL, SLC5A, NFKBIE, SOELF, SELE, TNFP, TLR, CHL, CCL, STEAP, ZC3H12, IL, CXVCAM, NFKCCL, LACCL, LACOX, SHFC, SALC, SCLC, SCHS 3, SCHS, SASC 3, SACK 608, SACK;
the 8 IL18 regulatory genes include the eight genes CD226, KLRB1, CXCR6, THY1, SAMD3, TXK, C5orf28, FASLG, RGS11, CTSW, MID1, CAMK2B, and CNBD 2;
the 13 GSDMDM regulatory genes comprise eight genes including NLRP1, NLRP3, CASP4, CASP5, NLRC5, NLRP6, NLRP12, NLRP7, NAIP, NLRC4, AIM2, IFI16, MEFV, NLRP2 and PYCARD.
2. A system for predicting the activity status of tumor inflammasome and the treatment sensitivity by using the gene set according to claim 1, wherein the system comprises a data input module, a data analysis module and an output module, the data analysis module is connected with the data input module and the output module in data communication, the data input module is used for inputting the gene expression data of a training sample data set, the gene expression data and related drug sensitivity screening data of cell lines obtained from tumor drug sensitivity databases GDSC and CCLE databases, the gene expression and immunotherapy reactivity data of corresponding samples obtained from an immunotherapy database IMvigor210 corebiologices, and the gene expression data of samples to be tested;
the training sample dataset comprises 9881 data of 33 types of tumors obtained by a tumor genome map TCGA database;
the data analysis module comprises an inflammasome activity state classification module, a targeted therapy analysis module and an immunotherapy analysis module;
the inflammatory corpuscle activity state classification module firstly takes gene expression data of a training sample data set as input, calculates the score of an inflammatory corpuscle activity related gene set IRGs of each sample in the training sample data set by adopting an ssGSEA algorithm, classifies the inflammatory corpuscle activity state by utilizing unsupervised K-mean clustering according to the score, and defines the inflammatory corpuscle activity state as a class-inflammatory corpuscle activity state according to the classification resultIs low in-IL1BIs low inClass II-inflammasome ActivityIs low in-IL1BHeight ofClass III-inflammatory body ActivityIn-CASP1Height ofClass IV-inflammasome ActivityIn-IL18Height ofClass five-inflammasome activityHeight of-IL18Is low inClass six-inflammasome activityHeight of-IL18Height ofThen, taking gene expression data of a sample to be detected as input to obtain the activity state type of the inflammatory corpuscle of the sample to be detected;
the targeted therapy analysis module firstly takes gene expression data and related drug sensitivity screening data of a cell line as input, calculates IRG scores of five gene sets of the cell line by using an ssGSEA algorithm, and calculates IRG scores of five gene sets of the cell line by using an SVM machine learning algorithmClassifying the cell line into six classes of inflammasome activity state types based on the IRG scores of the five gene sets, comparing the magnitude of the IC50 value of each targeting agent in the six classes of inflammasome activity state types, determining the class II-inflammasome activityIs low in-IL1BHeight ofThe BRAF targeting drug resistance is prone to be caused, and then whether the BRAF targeting drug resistance exists in the sample to be detected is determined according to the activity state type of the inflammatory corpuscles of the sample to be detected;
the immunotherapy analysis module firstly takes gene expression and immunotherapy reactivity data of a corresponding sample as input, calculates IRG scores of five gene sets of the corresponding sample by using a ssGSEA algorithm, predicts the activity state types of inflammatory corpuscles of patients in an IMvigor210CoreBiologies database based on the IRG scores by using an SVM machine learning algorithm, compares the immunotherapy reaction conditions among the activity state types of the inflammatory corpuscles, determines that the activity enhancement of the inflammatory corpuscles is immunotherapy resistance, and then determines whether the sample to be tested is immunotherapy resistance according to the activity state types of the inflammatory corpuscles of the sample to be tested;
the output module is used for outputting the activity state type of the inflammatory corpuscle of the sample to be tested, whether BRAF targeting drug resistance exists or not and whether immunotherapy resistance exists or not.
3. A method for predicting the activity status and treatment sensitivity of tumor inflammasome, comprising the steps of:
the method comprises the following steps: acquiring a training sample data set, and acquiring 9881 total data of 33 types of tumors from a tumor genome atlas TCGA (genetic Algorithm) database as the training sample data set, wherein the 9881 total data comprise gene mutation data, gene copy number variation data, gene expression data and clinical information data of each sample;
step two: constructing an inflammasome activity-related gene set IRGs, wherein the inflammasome activity-related gene set IRGs comprises five gene sets including 15 inflammasome core genes, 34 CASP1 regulatory genes, 92 IL1B regulatory genes, 8 IL18 regulatory genes and 13 GSDMD regulatory genes;
the 15 inflammasome core genes comprise fifteen genes of NLRP1, NLRP3, CASP4, CASP5, NLRC5, NLRP6, NLRP12, NLRP7, NAIP, NLRC4, AIM2, IFI16, MEFV, NLRP2 and PYCARD;
the 34 CASP1 regulatory genes comprise TMEM260, TIFA, LSM4, CERS2, PLCG2, FAM219A, LAMTOR3, KLF11, CTSH, CAPZA2, ACTL6A, C10orf11, KLRB1, UBR1, PPP1R12B, ZNF512B, FARP2, PPIP5K1, PHKA1, BACE1, MRPS27, TRAPPC12, TAS1R3, CD247, CYBRD1, PCED1A, AP5M1, RMDN1, FBXO31, BLOC1S6, ATP10D, FMO1, FKBP5 and NNT thirty-four genes;
the 92 IL1 regulatory genes comprise ADAM, SLC11A, EREG, CSF, ESR, BID, BACH, TNFRSF1, MAPKAPK, IFNAR, NFKB, OSMR, BTN1A, IRF, GNAQ, SULF, SLC7A, SLAMF, SELP, NPY1, PROKR, SOD, SERPINB, RAB, PTGER, STX, IGSF, ARHGAP, TBC1D, TNIP, SLAMF, ACTR3, PIM, MAP, 3K, POLR3, BCL, HOXD, CDCA, RND, CERS, GPR171, PTGS, NFATC, B3GAT, CH25, IL, SLC5A, NFKBIE, SOELF, SELE, TNFP, TLR, CHL, CCL, STEAP, ZC3H12, IL, CXVCAM, NFKCCL, LACCL, LACOX, SHFC, SALC, SCLC, SCHS 3, SCHS, SASC 3, SACK 608, SACK;
the 8 IL18 regulatory genes include the eight genes CD226, KLRB1, CXCR6, THY1, SAMD3, TXK, C5orf28, FASLG, RGS11, CTSW, MID1, CAMK2B, and CNBD 2;
the 13 GSDMDM regulatory genes comprise eight genes including NLRP1, NLRP3, CASP4, CASP5, NLRC5, NLRP6, NLRP12, NLRP7, NAIP, NLRC4, AIM2, IFI16, MEFV, NLRP2 and PYCARD;
step three: calculating IRG scores of five gene sets in IRGs (inflammatory corpuscle activity related gene sets), calculating the IRG scores of the five gene sets of each sample by using a ssGSEA (single Strand genetic algorithm) by taking gene expression data of the training sample data set as input, classifying the training sample data set by using unsupervised K-mean clustering based on the IRG scores of the five gene sets of each sample, wherein the parameters of the K-mean clustering are set as follows: the simulation times =100, the distance = Euclidean distance, and the number of clusters is determined to be 6 according to consistency data after clustering;
step four: determining activity intensity and activation mode of six kinds of inflammatory corpuscle, and defining the sample in the training sample data set as one kind-inflammatory corpuscle activity according to the distribution of IRG scores of five gene sets in step three in different typesIs low in-IL1BIs low inClass II-inflammasome ActivityIs low in-IL1BHeight ofClass III-inflammatory body ActivityIn-CASP1Height ofClass IV-inflammasome ActivityIn-IL18Height ofClass five-inflammasome activityHeight of-IL18Is low inClass six-inflammasome activityHeight of-IL18Height of
Step five: the method comprises the steps of carrying out targeted treatment sensitivity analysis, obtaining gene expression data of cell line data and related drug sensitivity screening data from a tumor drug sensitivity database GDSC and a CCLE database, calculating IRG scores of five gene sets of the cell line by using a ssGSEA algorithm, classifying the cell line into six types of inflammatory corpuscle activity state types in step four based on the IRG scores of the five gene sets through a SVM machine learning algorithm, comparing the IC50 value of each targeted drug in the six types of inflammatory corpuscle activity state types, and determining the second type-inflammatory corpuscle activity state typesIs low in-IL1BHeight ofPredisposed to BRAF-targeted drug resistance;
step six: an immunotherapy sensitivity analysis step, wherein gene expression and immunotherapy reactivity data of corresponding samples are obtained from an immunotherapy database IMvigor210CoreBiologies, IRG scores of five gene sets of the corresponding samples are calculated by using an ssGSEA algorithm, the type of the activity state of an inflammatory corpuscle of a patient in the IMvigor210CoreBiologies database is predicted based on the IRG scores of the five gene sets through an SVM machine learning algorithm, and the enhanced activity of the inflammatory corpuscle is determined to be immunotherapy resistance by comparing the immunotherapy reaction conditions among the types of the activity state of the inflammatory corpuscle;
step seven: analyzing the activity state of the inflammatory corpuscle of the sample to be detected to obtain gene expression data of the sample to be detected, calculating IRG scores of five gene sets of the sample to be detected by utilizing a ssGSEA algorithm, and determining the activity state type of the inflammatory corpuscle of the sample to be detected, whether BRAF targeted drug resistance exists or not and whether immunotherapy resistance exists or not based on the five IRG scores of the sample to be detected and six inflammatory corpuscle activity state type labels of a training sample data set.
CN202110689083.XA 2021-04-21 2021-06-22 Gene set system and method for predicting activity state and treatment sensitivity of tumor inflammasome Pending CN113707223A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2021104307536 2021-04-21
CN202110430753 2021-04-21

Publications (1)

Publication Number Publication Date
CN113707223A true CN113707223A (en) 2021-11-26

Family

ID=78648157

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110689083.XA Pending CN113707223A (en) 2021-04-21 2021-06-22 Gene set system and method for predicting activity state and treatment sensitivity of tumor inflammasome

Country Status (1)

Country Link
CN (1) CN113707223A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108883131A (en) * 2015-08-06 2018-11-23 明尼苏达大学董事会 Adjusting marrow source property inhibits the inflammatory corpusculum of cell to activate for treating GvHD or tumour
CN111640508A (en) * 2020-05-28 2020-09-08 上海生物信息技术研究中心 Method for constructing pan-tumor targeted drug susceptibility state evaluation model based on high-throughput sequencing data and clinical phenotype and application
CN112011616A (en) * 2020-09-02 2020-12-01 复旦大学附属中山医院 Immune gene prognosis model for predicting hepatocellular carcinoma tumor immune infiltration and postoperative survival time
CN112435714A (en) * 2020-11-03 2021-03-02 北京科技大学 Tumor immune subtype classification method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108883131A (en) * 2015-08-06 2018-11-23 明尼苏达大学董事会 Adjusting marrow source property inhibits the inflammatory corpusculum of cell to activate for treating GvHD or tumour
CN111640508A (en) * 2020-05-28 2020-09-08 上海生物信息技术研究中心 Method for constructing pan-tumor targeted drug susceptibility state evaluation model based on high-throughput sequencing data and clinical phenotype and application
CN112011616A (en) * 2020-09-02 2020-12-01 复旦大学附属中山医院 Immune gene prognosis model for predicting hepatocellular carcinoma tumor immune infiltration and postoperative survival time
CN112435714A (en) * 2020-11-03 2021-03-02 北京科技大学 Tumor immune subtype classification method and system

Similar Documents

Publication Publication Date Title
US11621083B2 (en) Cancer evolution detection and diagnostic
Elyasigomari et al. Cancer classification using a novel gene selection approach by means of shuffling based on data clustering with optimization
EP3729439B1 (en) Assessment of mapk-ap 1 cellular signaling pathway activity using mathematical modelling of target gene expression
CN105874079A (en) Molecular diagnostic test for lung cancer
US20180211719A1 (en) Systematic pharmacological method for personalized medicine
CN109295230A (en) A method of the polygene combined abrupt climatic change based on ctDNA assesses tumour dynamic change
JP2022513399A (en) Bone marrow characterization using cell-free messenger RNA
Yoon et al. Analysis of oral microbiome in glaucoma patients using machine learning prediction models
CN107292130B (en) Drug method for relocating based on gene mutation and gene expression
Chen et al. Integrated analysis of multiple microarray studies to identify novel gene signatures in ulcerative colitis
CN114913919A (en) Intelligent reading and reporting method, system and server for genetic variation of single-gene disease
Xin et al. Identification of pulpitis‐related potential biomarkers using bioinformatics approach
CN116895330A (en) Construction method and application of psoriasis accurate parting model
CN112863604B (en) Method for predicting tumor interstitial mechanism and treatment sensitivity
CN113707223A (en) Gene set system and method for predicting activity state and treatment sensitivity of tumor inflammasome
Liu et al. A four-lncRNA risk signature for prognostic prediction of osteosarcoma
Lai et al. Integration of bulk RNA sequencing and single-cell analysis reveals a global landscape of DNA damage response in the immune environment of Alzheimer’s disease
Ren et al. Wound age estimation based on next-generation sequencing: Fitting the optimal index system using machine learning
Yu et al. Identification and validation of immune cells and hub genes alterations in recurrent implantation failure: A GEO data mining study
Choudhury et al. Machine Learning and Bioinformatics Models to Identify Gene Expression Patterns of Glioblastoma Associated with Disease Progression and Mortality
Deng et al. Interactions of host defense and hyper-keratinization in psoriasis
Liu et al. Identification of novel targets for multiple myeloma through integrative approach with Monte Carlo cross-validation analysis
Zhang et al. P53 pathway activate detection based on machine learning: The modified XGBoost-based method of pan-cancer pathway activity detection in the cancer genome atlas
Chen et al. Prediction of feature genes in trauma patients with the TNF rs1800629 A allele using support vector machine
Karimi et al. Gene network reveals LASP1, TUBA1C, and S100A6 are likely playing regulatory roles in multiple sclerosis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination