CN114927231B - Method and device for predicting early lung adenocarcinoma progress based on gene expression information - Google Patents

Method and device for predicting early lung adenocarcinoma progress based on gene expression information Download PDF

Info

Publication number
CN114927231B
CN114927231B CN202210391575.5A CN202210391575A CN114927231B CN 114927231 B CN114927231 B CN 114927231B CN 202210391575 A CN202210391575 A CN 202210391575A CN 114927231 B CN114927231 B CN 114927231B
Authority
CN
China
Prior art keywords
adenocarcinoma
tumor
expression
genes
immune
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210391575.5A
Other languages
Chinese (zh)
Other versions
CN114927231A (en
Inventor
赵悦
高健
李媛
曹志伟
陈海泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University Shanghai Cancer Center
Original Assignee
Fudan University Shanghai Cancer Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University Shanghai Cancer Center filed Critical Fudan University Shanghai Cancer Center
Priority to CN202210391575.5A priority Critical patent/CN114927231B/en
Publication of CN114927231A publication Critical patent/CN114927231A/en
Application granted granted Critical
Publication of CN114927231B publication Critical patent/CN114927231B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B35/00ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
    • G16B35/20Screening of libraries
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Primary Health Care (AREA)
  • Library & Information Science (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biochemistry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Chemical & Material Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method and a device for predicting early lung adenocarcinoma progress based on gene expression information, wherein the method comprises the following steps: the tumor-related genes and immune-related genes are screened out by using a statistical method, the tumor growth index and the immune function index are calculated based on the expression of the two groups of genes, and finally, the difference value between the tumor growth index and the immune function index is used as the tumor progress index of the lung adenocarcinoma for predicting the early lung adenocarcinoma progress. The method of the invention can predict the tumor-immune system balance state of a lung adenocarcinoma patient and evaluate the progress level of the tumor according to the tumor-immune system balance state.

Description

Method and device for predicting early lung adenocarcinoma progress based on gene expression information
Technical Field
The invention relates to the technical field of bioinformatics, in particular to a method and a device for predicting early lung adenocarcinoma progress based on gene expression information.
Background
Among all the pathological subtypes of lung cancer, lung adenocarcinoma (Lung adenocarcinoma, LUAD) is the most common pathological subtype. If surgical excision can be performed at the stage of precancerous lesions of lung adenocarcinoma, in-situ adenocarcinoma (Adenocarcinoma in situ, AIS) and micro-invasive adenocarcinoma (MINIMALLY INVASIVE adenocarpioma, MIA), the five-year survival rate of patients after surgery can reach or approach 100%. Once advanced to invasive adenocarcinoma, the prognosis of the patient is significantly reduced, and thus it is necessary to study the evolution of lung adenocarcinoma to discover new targets and develop new treatments. Although genomic and immunoassays for AIS, MIA and LUAD patients have been studied, there is a lack of systematic studies directed to key molecular events that drive the evolution of lung adenocarcinoma.
Because of the limited number of pre-invasive tumor samples of lung adenocarcinoma, there is currently very little international genomics research on pre-cancerous lesions of lung adenocarcinoma, and no technology is currently available to predict the progression of early lung adenocarcinoma.
Accordingly, the prior art is still in need of improvement and development.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, the present invention aims to provide a method and a device for predicting early lung adenocarcinoma progress based on gene expression information, which aims to solve the problem that the prior art lacks a method and a device capable of accurately predicting early lung adenocarcinoma progress.
The technical scheme of the invention is as follows:
a method of predicting early lung adenocarcinoma progression based on gene expression information, comprising the steps of:
Dividing the obtained lung tissue into four types of normal tissue, in-situ adenocarcinoma, micro-invasive adenocarcinoma and invasive adenocarcinoma in sequence according to tumor pathological features, and respectively performing full transcriptome sequencing on the four types of tissue to generate a sequencing library;
The sequencing results of the whole transcriptome of the normal tissue, the in-situ adenocarcinoma, the micro-invasive adenocarcinoma and the invasive adenocarcinoma form three control groups, wherein the three control groups comprise a normal tissue and in-situ adenocarcinoma control group, an in-situ adenocarcinoma and micro-invasive adenocarcinoma control group and a micro-invasive adenocarcinoma and invasive adenocarcinoma control group;
Performing variance analysis on each gene in the sequencing library among the three control groups to determine a differential expression gene;
Screening a group of tumor-related genes with the expression quantity showing a significant rising trend in three control groups from the differential expression genes, and calculating a tumor growth index according to the tumor-related genes;
screening a group of immune related genes with the expression quantity showing a significant decrease trend in a normal tissue and in-situ adenocarcinoma control group and a micro-infiltration gonad cancer and infiltration gonad cancer control group from the differential expression genes, and calculating an immune function index according to the immune related genes;
the difference between the tumor growth index and the immune function index was used as a tumor progression index for lung adenocarcinoma for predicting early lung adenocarcinoma progression.
The method for predicting early lung adenocarcinoma progression based on gene expression information, wherein each gene in the sequencing library is subjected to analysis of variance between the three control groups, and the step of determining the differentially expressed genes comprises:
variance analysis was performed on each gene in the sequencing library in each of the three control groups, and genes with p <0.0001 and an inter-group |log2-expression multiple| of 2 or more were used as differential expression genes.
The method for predicting early lung cancer progression based on gene expression information comprises screening a group of tumor-related genes with significantly increased expression levels in three control groups from the differentially expressed genes, wherein the group of tumor-related genes comprises BCL2L15, COMP, CST1, FAM83A, SLC A5, PGLYRP4, CLPSL2, ARSH, CDH17, COL10A1, SPP1, MMP3, DDX4, FGF11 and CASR.
The method for predicting early lung cancer progression based on gene expression information, wherein the step of calculating a tumor growth index from the tumor-associated gene comprises:
The expression level of the tumor-associated gene is log2 log transformed, and then for each sample, the tumor growth index is calculated as the average value of log2 log transformation of the expression level of the tumor-associated gene, and the calculation formula of the tumor growth index is: wherein TPM is the expression quantity of tumor related genes, and N is the quantity of tumor related genes.
The method for predicting early lung cancer progress based on gene expression information comprises screening a group of immune related genes with significantly reduced expression levels in a normal tissue and in-situ adenocarcinoma control group, micro-invasive adenocarcinoma and invasive adenocarcinoma control group from the differential expression genes, wherein the immune related genes comprise ITLN2、MARCO、C8B、MASP1、CD36、TAL1、PPBP、CDH5、MSR1、TBX21、C6、MCAM、GZMH、CZMB、CXCL12、LILRB2、CXCR1、CXCR2、LAMP3、IL1RL1.
The method for predicting early lung cancer progression based on gene expression information, wherein the step of calculating an immune function index from the immune-related genes comprises:
log2 log conversion is carried out on the expression quantity of the immune related genes, and then for each sample, the immune function index is calculated as the average value of log2 log conversion of the expression quantity of the immune related genes, and the calculation formula of the immune function index is as follows: Wherein TPM is the expression quantity of immune related genes, and n is the quantity of immune related genes.
An apparatus for predicting early lung cancer progression based on gene expression information, comprising:
The sequencing module is used for dividing the acquired lung tissue into four types of normal tissue, in-situ adenocarcinoma, micro-invasive adenocarcinoma and invasive adenocarcinoma in sequence according to tumor pathological characteristics, and respectively carrying out full transcriptome sequencing on the four types of tissue to generate a sequencing library;
the grouping module is used for forming three control groups from the sequencing results of the full transcriptome of the normal tissue, the in-situ adenocarcinoma, the micro-invasive adenocarcinoma and the invasive adenocarcinoma, wherein the three control groups comprise a normal tissue and in-situ adenocarcinoma control group, an in-situ adenocarcinoma and micro-invasive adenocarcinoma control group and a micro-invasive adenocarcinoma and invasive adenocarcinoma control group;
the differential expression gene determining module is used for respectively carrying out variance analysis on each gene in the sequencing library among the three control groups to determine differential expression genes;
a tumor growth index calculation module, which is used for screening a group of tumor-related genes with the expression quantity showing a significant rising trend in three control groups from the differential expression genes, and calculating a tumor growth index according to the tumor-related genes;
The immune function index calculation module is used for screening a group of immune related genes with the expression quantity showing a significant decline trend in a normal tissue and in-situ adenocarcinoma control group and a micro-infiltration adenocarcinoma and infiltration adenocarcinoma control group from the differential expression genes, and calculating an immune function index according to the immune related genes;
A tumor progress index calculation module for taking the difference between the tumor growth index and the immune function index as the tumor progress index of the lung adenocarcinoma for predicting the early lung adenocarcinoma progress.
The device for predicting early lung cancer progression based on expression information, wherein the tumor growth index calculation module comprises:
A tumor-associated gene screening unit for screening out a group of tumor-associated genes whose expression levels in three control groups are significantly increased, including BCL2L15, COMP, CST1, FAM83A, SLC A5, PGLYRP4, CLPSL2, ARSH, CDH17, COL10A1, SPP1, MMP3, DDX4, FGF11, CASR, among the differentially expressed genes;
A tumor growth index calculation unit for log2 log-transforming the expression level of the tumor-associated gene, and then calculating a tumor growth index as an average value of log2 log-transforming the expression level of the tumor-associated gene for each sample, the calculation formula of the tumor growth index being: wherein TPM is the expression quantity of tumor related genes, and N is the quantity of tumor related genes.
The device for predicting early lung cancer progression based on expression information, wherein the immune function index calculation module comprises:
An immune related gene screening unit for screening out a group of immune related genes with significantly reduced expression levels in a normal tissue and in-situ adenocarcinoma control group, micro-invasive adenocarcinoma and invasive adenocarcinoma control group from the differentially expressed genes, including ITLN2、MARCO、C8B、MASP1、CD36、TAL1、PPBP、CDH5、MSR1、TBX21、C6、MCAM、GZMH、CZMB、CXCL12、LILRB2、CXCR1、CXCR2、LAMP3、IL1RL1;
An immune function index calculation unit for log2 log-transforming the expression level of the immune-related gene, and then calculating an immune function index as an average value of log2 log-transforming the expression level of the immune-related gene for each sample, wherein the immune function index has a calculation formula of: Wherein TPM is the expression quantity of immune related genes, and n is the quantity of immune related genes.
The beneficial effects are that: according to the result of sequencing data of a complete transcriptome of lung adenocarcinoma at different development stages, key genes capable of predicting the evolution process from lung in-situ adenocarcinoma (adenocarcinoma in situ, AIS) to micro-invasive adenocarcinoma (MINIMALLY INVASIVE adenoocarcinoma, MIA) to invasive adenocarcinoma (invasive adenocarcinoma, LUAD) are screened, the key genes comprise a group of tumor-related genes related to the growth potential of tumors and a group of immune-related genes related to the functional level of the surrounding immune system, the standardized expression amounts of the key genes are used for modeling, the corresponding tumor growth index and immune function index are calculated, and finally the difference value between the tumor growth index and the immune function index is used as the tumor progress index of the lung adenocarcinoma to predict the early lung adenocarcinoma progress. The invention can reflect the tumor-immune balance state of lung adenocarcinoma at different development stages, thereby realizing the prediction of the progress of the lung adenocarcinoma and the prognosis of patients.
Drawings
FIG. 1 is a flow chart of a method for predicting early lung adenocarcinoma progression based on gene expression information according to the present invention.
FIG. 2 shows 12 expression patterns of 2023 genes during lung adenocarcinoma progression.
FIG. 3 is a graph showing the results of comparing tumor progression index in A) the data set of the present application, B) the external validation set, and C) the TCGA-LUAD data set for different stages of lung adenocarcinoma progression.
FIG. 4 is a graph showing the comparison of prognosis survival for lung adenocarcinoma patients with different tumor progression indices in the data set of the present application and the TCGA-LUAD data set.
Detailed Description
The invention provides a method and a device for predicting early lung adenocarcinoma progress based on gene expression information, which are used for making the purposes, technical schemes and effects of the invention clearer and more definite, and are further described in detail below. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, fig. 1 is a flowchart of a method for predicting early lung adenocarcinoma progress based on gene expression information, which includes the steps of:
S10, dividing the acquired lung tissue into four types of normal tissue, in-situ adenocarcinoma, micro-invasive adenocarcinoma and invasive adenocarcinoma in sequence according to tumor pathological features, and respectively performing full transcriptome sequencing on the four types of tissue to generate a sequencing library;
S20, sequencing the whole transcriptome of the normal tissue, the in-situ adenocarcinoma, the micro-invasive adenocarcinoma and the invasive adenocarcinoma to form three control groups, wherein the three control groups comprise a normal tissue and in-situ adenocarcinoma control group, an in-situ adenocarcinoma and micro-invasive adenocarcinoma control group and a micro-invasive adenocarcinoma and invasive adenocarcinoma control group;
s30, respectively performing variance analysis on each gene in the sequencing library among the three control groups to determine a differential expression gene;
s40, screening a group of tumor-related genes with the expression levels in three control groups showing a significant rising trend from the differential expression genes, and calculating a tumor growth index according to the tumor-related genes;
S50, screening a group of immune related genes with the expression quantity showing a significant decline trend in a normal tissue and in-situ adenocarcinoma control group and a micro-infiltration gonad cancer and infiltration gonad cancer control group from the differential expression genes, and calculating an immune function index according to the immune related genes;
S60, using the difference value between the tumor growth index and the immune function index as a tumor progress index of the lung adenocarcinoma to predict early lung adenocarcinoma progress.
Specifically, the invention searches the change of the gene expression level in the whole process of the lung adenocarcinoma from normal tissues to precancerous lesions and then to invasive lung adenocarcinoma by analyzing the large-scale full transcriptome sequencing data information of the Chinese lung adenocarcinoma patients, screens out 2 groups of representative genes by using a statistical method, wherein one group of genes reflects the inherent growth potential of tumors, the other group of genes reflects the functional state of an immune system, designs a tumor progression index based on the expression quantity of the two groups of genes, can predict the tumor-immune system balance state of the lung adenocarcinoma patients, and evaluates the tumor progression level according to the tumor progression state. The tumor-immune system balance state predicted by the tumor progress index can be verified by external lung adenocarcinoma data, and the technology of the invention provides theoretical basis and technical support for predicting the progress of lung adenocarcinoma.
The tumor progress index designed by the invention can predict the prognosis of the lung adenocarcinoma patient, and the result can be verified by external data, thereby providing a new index for predicting the prognosis of the lung adenocarcinoma patient.
The invention screens out a group of gene combinations capable of reflecting the inherent growth potential of tumors and the functional state of an immune system through full transcriptome sequencing, and can design a specific detection mode aiming at the 2 groups of genes according to the gene combinations in the future, thereby providing a new thought and method for the development of a detection kit.
The invention is further illustrated by the following examples:
First, 150 cases of surgically resected or biopsied extracted fresh lung tumor tissue and paired paracancerous normal tissue were used, divided into 4 groups according to the pathological characteristics of the tumor: 150 normal tissues, 16 in situ Adenocarcinomas (AIS), 52 Micro Invasive Adenocarcinomas (MIA) and 82 invasive adenocarcinomas (LUAD), after sampling, total RNA samples were extracted using the RNA extraction kit from Macherey-Nagel company (Germany), ribosomal RNA was removed, a sequencing library was generated and paired-end sequencing was performed on the Illumina HiSeq X Ten platform, read for 150bp.
Next, the results of the whole transcriptome sequencing of the normal tissue, the in situ adenocarcinoma, the micro-invasive adenocarcinoma, and the invasive adenocarcinoma are combined into three control groups consisting of a normal tissue and in situ adenocarcinoma control group (normal tissue vs AIS), an in situ adenocarcinoma and micro-invasive adenocarcinoma control group (AIS vs MIA), and a micro-invasive adenocarcinoma and invasive adenocarcinoma control group (MIA vs LUAD); performing analysis of variance on each gene in the sequencing library in each of the three control groups, and taking genes with p <0.0001 and inter-group |log2-expression multiple| of 2 or more as differential expression genes; genes that exhibited significant differential expression in at least 1 of the 3 comparisons were then selected for downstream analysis, and 12 expression patterns were determined based on their up-or down-regulation in adjacent two sets of samples, as shown in fig. 2. Based on the expression profile, 12 expression patterns were identified, and pathway analysis was performed to reveal the biological function of each pattern. Specifically, the statistically different genes selected in this example were 2023 in total, and 12 expression patterns were determined according to the change in expression level in the stages of development of two adjacent tumors, for example, expression pattern 1, the second group was higher than the first group, the third group was higher than the second group, and the fourth group was higher than the third group, in a gradually rising situation; expression pattern 2, second set higher than first set, third set not significantly different from second set, fourth set higher than third set, and so on. The example then selects biologically significant genes in both the upward and downward trends as representative genes to determine our final candidate genes. FIG. 2 shows the 12 expression patterns, the leftmost line graph represents the trend of the expression levels of genes between adjacent groups, and the notes in the graph represent the main biological functions of the genes.
Next, in order to minimize the possible confounding effect of the introduction of the low-expression genes, the present invention filters out genes with average expression level (TPM) <1.0 in all samples, and screens out a group of tumor-related genes whose expression levels significantly increase in three control groups, including BCL2L15, COMP, CST1, FAM83A, SLC A5, PGLYRP4, CLPSL2, ARSH, CDH17, COL10A1, SPP1, MMP3, DDX4, FGF11, CASR; the expression level of the tumor-associated gene is log2 log transformed, and then for each sample, the tumor growth index is calculated as the average value of log2 log transformation of the expression level of the tumor-associated gene, and the calculation formula of the tumor growth index is: wherein TPM is the expression quantity of tumor related genes, and N is the quantity of tumor related genes.
Further, screening out a group of immune-related genes with significantly reduced expression levels in a normal tissue and in-situ adenocarcinoma control group, a micro-invasive adenocarcinoma and an invasive adenocarcinoma control group from the differential expression genes, wherein ITLN2、MARCO、C8B、MASP1、CD36、TAL1、PPBP、CDH5、MSR1、TBX21、C6、MCAM、GZMH、CZMB、CXCL12、LILRB2、CXCR1、CXCR2、LAMP3、IL1RL1; performs log2 log conversion on the expression levels of the immune-related genes, and then for each sample, calculating an immune function index as an average value of log2 log conversion on the expression levels of the immune-related genes, wherein the calculation formula of the immune function index is as follows: Wherein TPM is the expression quantity of immune related genes, and n is the quantity of immune related genes.
Finally, the present invention defines the difference between the tumor growth index and the immune function index as the tumor progression index of lung adenocarcinoma, i.e. tumor progression index = tumor growth index-immune function index. The tumor progression index calculated using the above formula gradually increases with the progression of the tumor, as shown in fig. 3, so the present invention considers that the evolution and progression of lung adenocarcinoma can be predicted based on the expression level of this 2-group gene. In the present invention, a negative tumor progression index indicates that the immune system has sufficient capacity to inhibit tumor progression, while a positive tumor progression index indicates that the immune system is no longer capable of inhibiting tumor cell growth and the balance between tumor-immune system is broken. In our study cohort, the tumor progression index was negative in normal tissue, but positive in AIS and later stages, indicating that immune escape was already present in the pre-infiltration stage AIS of lung adenocarcinoma and became more severe as the disease progressed, as shown in fig. 3 a.
To verify the tumor progression index of the present invention, we observed the same trend of increase in another dataset showing a significant increase in tumor progression index from normal tissue to Atypical Adenomatous Hyperplasia (AAH) and then to LUAD, as shown in FIG. 3B. The present invention further found that in the AAH stage, the tumor progression index was negative, at which time the tumor could not overcome the immune system to metastasize further. Although the TCGA-LUAD dataset did not contain the pre-invasive stage of lung adenocarcinoma, we calculated the tumor progression index for each sample and made a comparison between the normal and tumor samples. In this partial comparison, we observed that the tumor progression index of the tumor samples was significantly higher than that of the normal samples, as shown in FIG. 3C.
In the invention, 2 groups of representative genes are screened for calculating tumor progression indexes, wherein the expression product of the BCL2L15 gene is involved in regulating apoptosis in up-regulated genes. The expression product of COMP gene is a non-collagenous extracellular matrix protein, and high expression of which has been reported to promote epithelial-mesenchymal transition of cancer cells, has a poor prognosis in patients. The expression product of the CST1 gene is serum cystatin, whose high expression is associated with poor prognosis for a variety of cancers. FAM83A is a possible proto-oncogene that plays a role in the Epidermal Growth Factor Receptor (EGFR) pathway, activating the downstream RAS/MAPK and PI3K/AKT/TOR signaling pathways, promoting cell growth. Among the down-regulated genes, the expression product of ITLN gene is involved in the body's defense against pathogens. MARCO is a receptor on the surface of macrophages and plays an important role in the innate immunity of the body. The C8B gene encodes the beta chain of complement C8. The MASP1 gene encodes a serine protease that functions both in innate and adaptive immunity as a component of the lectin pathway of complement activation. The CD36 gene encodes a major glycoprotein on the surface of platelets, which serves as a receptor for thrombospondin in platelets and other various cell lines, playing an important role in tumor immunity. TAL1 is related to the origin of hematopoietic malignancies and has been reported to be associated with pre-T cell acute lymphoblastic leukemia and childhood T cell acute lymphoblastic leukemia. The PPBP gene encodes a platelet-derived growth factor, belonging to the CXC chemokine family, which activates neutrophils. The CDH5 gene encodes a classical cadherin of the cadherin superfamily.
Furthermore, to investigate whether the tumor progression index designed according to the present application has prognostic guidance value for lung adenocarcinoma patients, we performed survival analysis on the data set according to the present application and the TCGA-LUAD data set. The analysis results showed that patients with high tumor progression index had significantly worse (as shown in fig. 4 a and B) and Overall survival (RFS) in the data set of the present application, whereas patients with higher tumor progression index had worse OS in the TCGA-LUAD data set, but the two groups had comparable progression-free survival (PFS) as shown in fig. 4C and D.
Based on the method, the invention also provides a device for predicting early lung cancer progress based on gene expression information, which comprises:
The sequencing module is used for dividing the acquired lung tissue into four types of normal tissue, in-situ adenocarcinoma, micro-invasive adenocarcinoma and invasive adenocarcinoma in sequence according to tumor pathological characteristics, and respectively carrying out full transcriptome sequencing on the four types of tissue to generate a sequencing library;
the grouping module is used for forming three control groups from the sequencing results of the full transcriptome of the normal tissue, the in-situ adenocarcinoma, the micro-invasive adenocarcinoma and the invasive adenocarcinoma, wherein the three control groups comprise a normal tissue and in-situ adenocarcinoma control group, an in-situ adenocarcinoma and micro-invasive adenocarcinoma control group and a micro-invasive adenocarcinoma and invasive adenocarcinoma control group;
the differential expression gene determining module is used for respectively carrying out variance analysis on each gene in the sequencing library among the three control groups to determine differential expression genes;
a tumor growth index calculation module, which is used for screening a group of tumor-related genes with the expression quantity showing a significant rising trend in three control groups from the differential expression genes, and calculating a tumor growth index according to the tumor-related genes;
The immune function index calculation module is used for screening a group of immune related genes with the expression quantity showing a significant decline trend in a normal tissue and in-situ adenocarcinoma control group and a micro-infiltration adenocarcinoma and infiltration adenocarcinoma control group from the differential expression genes, and calculating an immune function index according to the immune related genes;
A tumor progress index calculation module for taking the difference between the tumor growth index and the immune function index as the tumor progress index of the lung adenocarcinoma for predicting the early lung adenocarcinoma progress.
The device provided by the invention measures the unbalance degree between the inherent growth potential of the tumor and the immune microenvironment by designing and calculating the tumor progress index, proves that the tumor progress index has obvious differences in different development stages of lung adenocarcinoma, can predict the postoperative survival time of a lung adenocarcinoma patient, and is verified by an external data set.
In some embodiments, the tumor growth index calculation module comprises:
A tumor-associated gene screening unit for screening out a group of tumor-associated genes whose expression levels in three control groups are significantly increased, including BCL2L15, COMP, CST1, FAM83A, SLC A5, PGLYRP4, CLPSL2, ARSH, CDH17, COL10A1, SPP1, MMP3, DDX4, FGF11, CASR, among the differentially expressed genes;
A tumor growth index calculation unit for log2 log-transforming the expression level of the tumor-associated gene, and then calculating a tumor growth index as an average value of log2 log-transforming the expression level of the tumor-associated gene for each sample, the calculation formula of the tumor growth index being: wherein TPM is the expression quantity of tumor related genes, and N is the quantity of tumor related genes.
In some embodiments, the immune function index calculation module comprises:
An immune related gene screening unit for screening out a group of immune related genes with significantly reduced expression levels in a normal tissue and in-situ adenocarcinoma control group, micro-invasive adenocarcinoma and invasive adenocarcinoma control group from the differentially expressed genes, including ITLN2、MARCO、C8B、MASP1、CD36、TAL1、PPBP、CDH5、MSR1、TBX21、C6、MCAM、GZMH、CZMB、CXCL12、LILRB2、CXCR1、CXCR2、LAMP3、IL1RL1;
An immune function index calculation unit for log2 log-transforming the expression level of the immune-related gene, and then calculating an immune function index as an average value of log2 log-transforming the expression level of the immune-related gene for each sample, wherein the immune function index has a calculation formula of: Wherein TPM is the expression quantity of immune related genes, and n is the quantity of immune related genes.
In summary, the research results of the invention show that the increase of the inherent growth potential of the tumor and the impaired immune response to the tumor drive the progress of the lung adenocarcinoma together, while the use of the tumor progress indexes respectively representing the inherent growth potential of the tumor and the development of the immune function related genes in 2 parts of the invention measures the unbalanced level between the inherent growth potential of the tumor and the immune microenvironment of the tumor, thereby having prognostic value for lung adenocarcinoma patients.
It is to be understood that the invention is not limited in its application to the examples described above, but is capable of modification and variation in light of the above teachings by those skilled in the art, and that all such modifications and variations are intended to be included within the scope of the appended claims.

Claims (9)

1. A method for predicting early lung adenocarcinoma progression based on gene expression information, comprising the steps of:
Dividing the obtained lung tissue into four types of normal tissue, in-situ adenocarcinoma, micro-invasive adenocarcinoma and invasive adenocarcinoma in sequence according to tumor pathological features, and respectively performing full transcriptome sequencing on the four types of tissue to generate a sequencing library;
The sequencing results of the whole transcriptome of the normal tissue, the in-situ adenocarcinoma, the micro-invasive adenocarcinoma and the invasive adenocarcinoma form three control groups, wherein the three control groups comprise a normal tissue and in-situ adenocarcinoma control group, an in-situ adenocarcinoma and micro-invasive adenocarcinoma control group and a micro-invasive adenocarcinoma and invasive adenocarcinoma control group;
Performing variance analysis on each gene in the sequencing library among the three control groups to determine a differential expression gene;
Screening a group of tumor-related genes with the expression quantity showing a significant rising trend in three control groups from the differential expression genes, and calculating a tumor growth index according to the tumor-related genes;
screening a group of immune related genes with the expression quantity showing a significant decrease trend in a normal tissue and in-situ adenocarcinoma control group and a micro-infiltration gonad cancer and infiltration gonad cancer control group from the differential expression genes, and calculating an immune function index according to the immune related genes;
the difference between the tumor growth index and the immune function index was used as a tumor progression index for lung adenocarcinoma for predicting early lung adenocarcinoma progression.
2. The method of predicting early lung adenocarcinoma progression based on gene expression information of claim 1, wherein the step of determining differentially expressed genes comprises performing an analysis of variance between the three control groups, respectively, for each gene in the sequencing library:
variance analysis was performed on each gene in the sequencing library in each of the three control groups, and genes with p <0.0001 and an inter-group |log2-expression multiple| of 2 or more were used as differential expression genes.
3. The method for predicting early lung cancer progression according to claim 1, wherein the screening of the differentially expressed genes for a group of tumor-associated genes whose expression levels significantly increase in three control groups comprises BCL2L15, COMP, CST1, FAM83A, SLC A5, PGLYRP4, CLPSL2, ARSH, CDH17, COL10A1, SPP1, MMP3, DDX4, FGF11, CASR.
4. The method for predicting early lung cancer progression based on gene expression information of claim 3, wherein the step of calculating a tumor growth index from the tumor-associated genes comprises:
The expression level of the tumor-associated gene is log2 log transformed, and then for each sample, the tumor growth index is calculated as the average value of log2 log transformation of the expression level of the tumor-associated gene, and the calculation formula of the tumor growth index is: wherein TPM is the expression quantity of tumor related genes, and N is the quantity of tumor related genes.
5. The method for predicting early lung cancer progression based on gene expression information of claim 1, wherein selecting a group of immune-related genes with significantly reduced expression levels in a normal tissue and in-situ adenocarcinoma control group, a micro-invasive adenocarcinoma and an invasive adenocarcinoma control group from the differentially expressed genes comprises ITLN2、MARCO、C8B、MASP1、CD36、TAL1、PPBP、CDH5、MSR1、TBX21、C6、MCAM、GZMH、CZMB、CXCL12、LILRB2、CXCR1、CXCR2、LAMP3、IL1RL1.
6. The method of predicting early lung cancer progression based on gene expression information of claim 5, wherein the step of calculating an immune function index from the immune-related genes comprises:
log2 log conversion is carried out on the expression quantity of the immune related genes, and then for each sample, the immune function index is calculated as the average value of log2 log conversion of the expression quantity of the immune related genes, and the calculation formula of the immune function index is as follows: Wherein TPM is the expression quantity of immune related genes, and n is the quantity of immune related genes.
7. An apparatus for predicting early lung cancer progression based on gene expression information, comprising:
The sequencing module is used for dividing the acquired lung tissue into four types of normal tissue, in-situ adenocarcinoma, micro-invasive adenocarcinoma and invasive adenocarcinoma in sequence according to tumor pathological characteristics, and respectively carrying out full transcriptome sequencing on the four types of tissue to generate a sequencing library;
the grouping module is used for forming three control groups from the sequencing results of the full transcriptome of the normal tissue, the in-situ adenocarcinoma, the micro-invasive adenocarcinoma and the invasive adenocarcinoma, wherein the three control groups comprise a normal tissue and in-situ adenocarcinoma control group, an in-situ adenocarcinoma and micro-invasive adenocarcinoma control group and a micro-invasive adenocarcinoma and invasive adenocarcinoma control group;
the differential expression gene determining module is used for respectively carrying out variance analysis on each gene in the sequencing library among the three control groups to determine differential expression genes;
a tumor growth index calculation module, which is used for screening a group of tumor-related genes with the expression quantity showing a significant rising trend in three control groups from the differential expression genes, and calculating a tumor growth index according to the tumor-related genes;
The immune function index calculation module is used for screening a group of immune related genes with the expression quantity showing a significant decline trend in a normal tissue and in-situ adenocarcinoma control group and a micro-infiltration adenocarcinoma and infiltration adenocarcinoma control group from the differential expression genes, and calculating an immune function index according to the immune related genes;
A tumor progress index calculation module for taking the difference between the tumor growth index and the immune function index as the tumor progress index of the lung adenocarcinoma for predicting the early lung adenocarcinoma progress.
8. The apparatus for predicting early lung cancer progression based on expression information of claim 7, wherein the tumor growth index calculation module comprises:
A tumor-associated gene screening unit for screening out a group of tumor-associated genes whose expression levels in three control groups are significantly increased, including BCL2L15, COMP, CST1, FAM83A, SLC A5, PGLYRP4, CLPSL2, ARSH, CDH17, COL10A1, SPP1, MMP3, DDX4, FGF11, CASR, among the differentially expressed genes;
A tumor growth index calculation unit for log2 log-transforming the expression level of the tumor-associated gene, and then calculating a tumor growth index as an average value of log2 log-transforming the expression level of the tumor-associated gene for each sample, the calculation formula of the tumor growth index being: wherein TPM is the expression quantity of tumor related genes, and N is the quantity of tumor related genes.
9. The apparatus for predicting early lung cancer progression based on expression information of claim 7, wherein the immune function index calculation module comprises:
An immune related gene screening unit for screening out a group of immune related genes with significantly reduced expression levels in a normal tissue and in-situ adenocarcinoma control group, micro-invasive adenocarcinoma and invasive adenocarcinoma control group from the differentially expressed genes, including ITLN2、MARCO、C8B、MASP1、CD36、TAL1、PPBP、CDH5、MSR1、TBX21、C6、MCAM、GZMH、CZMB、CXCL12、LILRB2、CXCR1、CXCR2、LAMP3、IL1RL1;
An immune function index calculation unit for log2 log-transforming the expression level of the immune-related gene, and then calculating an immune function index as an average value of log2 log-transforming the expression level of the immune-related gene for each sample, wherein the immune function index has a calculation formula of: Wherein TPM is the expression quantity of immune related genes, and n is the quantity of immune related genes.
CN202210391575.5A 2022-04-14 2022-04-14 Method and device for predicting early lung adenocarcinoma progress based on gene expression information Active CN114927231B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210391575.5A CN114927231B (en) 2022-04-14 2022-04-14 Method and device for predicting early lung adenocarcinoma progress based on gene expression information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210391575.5A CN114927231B (en) 2022-04-14 2022-04-14 Method and device for predicting early lung adenocarcinoma progress based on gene expression information

Publications (2)

Publication Number Publication Date
CN114927231A CN114927231A (en) 2022-08-19
CN114927231B true CN114927231B (en) 2024-07-09

Family

ID=82807432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210391575.5A Active CN114927231B (en) 2022-04-14 2022-04-14 Method and device for predicting early lung adenocarcinoma progress based on gene expression information

Country Status (1)

Country Link
CN (1) CN114927231B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109841281A (en) * 2017-11-29 2019-06-04 郑州大学第一附属医院 Construction method based on coexpression similitude identification adenocarcinoma of lung early diagnosis mark and risk forecast model
CN112582028A (en) * 2020-12-30 2021-03-30 华南理工大学 Lung cancer prognosis prediction model, construction method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1892303A1 (en) * 2006-08-22 2008-02-27 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Methods for identifying therapeutical targets in tumors and for determining and targeting angiogenesis and hemostasis related to adenocarcinomas of the lung
CN113140258B (en) * 2021-04-28 2024-03-19 上海海事大学 Method for screening potential prognosis biomarkers of lung adenocarcinoma based on tumor invasive immune cells

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109841281A (en) * 2017-11-29 2019-06-04 郑州大学第一附属医院 Construction method based on coexpression similitude identification adenocarcinoma of lung early diagnosis mark and risk forecast model
CN112582028A (en) * 2020-12-30 2021-03-30 华南理工大学 Lung cancer prognosis prediction model, construction method and device

Also Published As

Publication number Publication date
CN114927231A (en) 2022-08-19

Similar Documents

Publication Publication Date Title
Vieira et al. An update on breast cancer multigene prognostic tests—emergent clinical biomarkers
Calza et al. Intrinsic molecular signature of breast cancer in a population-based cohort of 412 patients
Hsu et al. Identification of potential biomarkers related to glioma survival by gene expression profile analysis
Gevaert et al. Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks
JP6140202B2 (en) Gene expression profiles to predict breast cancer prognosis
Yu et al. Feature selection and molecular classification of cancer using genetic programming
JP4619350B2 (en) Diagnosis and prognosis of breast cancer patients
US20060211036A1 (en) Metastasis-associated gene profiling for identification of tumor tissue, subtyping, and prediction of prognosis of patients
EP2041307A2 (en) Prediction of breast cancer response to taxane-based chemotherapy
WO2010063121A1 (en) Methods for biomarker identification and biomarker for non-small cell lung cancer
WO2010003773A1 (en) Algorithms for outcome prediction in patients with node-positive chemotherapy-treated breast cancer
AU2005312081A1 (en) Methods and systems for prognosis and treatment of solid tumors
Huang et al. Molecular portrait of breast cancer in C hina reveals comprehensive transcriptomic likeness to C aucasian breast cancer and low prevalence of luminal A subtype
CN115807089B (en) Liver cell liver cancer prognosis biomarker and application thereof
Barrett et al. Transcriptional analyses of Barrett's metaplasia and normal upper GI mucosae
Marchini et al. Analysis of gene expression in early-stage ovarian cancer
US20090069196A1 (en) Prediction of Breast Cancer Response to Chemotherapy
Schaner et al. Variation in gene expression patterns in effusions and primary tumors from serous ovarian cancer patients
Chang et al. The promise of microarrays in the management and treatment of breast cancer
CN114927231B (en) Method and device for predicting early lung adenocarcinoma progress based on gene expression information
Mitchell et al. Inter-platform comparability of microarrays in acute lymphoblastic leukemia
CN113811621A (en) Method for determining RCC subtype
KR100835296B1 (en) Methods of Selecting Gene Set Predicting Cancer Phenotype
Yang et al. An integrated model of clinical information and gene expression for prediction of survival in ovarian cancer patients
Guan et al. Identification of tamoxifen-resistant breast cancer cell lines and drug response signature

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant