CN113444795A

CN113444795A - Biomarker related to lung cancer survival time and application of biomarker in prediction of lung cancer prognosis

Info

Publication number: CN113444795A
Application number: CN202110726875.XA
Authority: CN
Inventors: 杨承刚; 李雨晨; 王丹
Original assignee: Beijing Medintell Bioinformatic Technology Co Ltd
Current assignee: Beijing Medintell Bioinformatic Technology Co Ltd
Priority date: 2021-06-29
Filing date: 2021-06-29
Publication date: 2021-09-28

Abstract

The invention discloses a biomarker related to the survival time of lung cancer and application thereof in predicting the prognosis of lung cancer. Markers of the invention include ANLN, ARNTL2, BMP5, GRIA1, and/or MAL. The biomarker can effectively judge the prognosis of a patient, and further guide clinical medication.

Description

Biomarker related to lung cancer survival time and application of biomarker in prediction of lung cancer prognosis

Technical Field

The invention relates to the field of disease diagnosis, in particular to biomarkers related to the survival time of lung cancer and application of the biomarkers in predicting the prognosis of the lung cancer.

Background

Lung cancer is the most common malignancy worldwide, with morbidity and mortality among men and women leading (Bray F, Ferlay J, et al. Global cancer mortality world for 36cancers in 185countries. CA: a cancer journel for clinicians,68(6),394 (2018)). Lung cancer is classified into Small Cell Lung Cancer (SCLC) and non-small cell lung cancer (NSCLC), and 80-85% of lung cancer patients are NSCLC. NSCLC is largely classified into three histological types, lung adenocarcinoma, lung squamous carcinoma and large cell carcinoma, with lung adenocarcinoma being The predominant histological type, accounting for about 40% (binder e.epidemiology: The dominant malignancy. nature,513(7517),52-3 (2014)). Different histological types respond differently to chemotherapy. The development process of lung cancer is very complicated, and researches in past decades show that certain gene (KARS, EGFR, HER2, MET, PI3KA) mutation and ROS1, ALK gene rearrangement play an important role in the pathogenesis of lung cancer, and also become a key link of the current stage lung cancer treatment, and lay a foundation for the arrival of personalized medical age (Bergethon K, Shaw AT, Ou SH et al.ROS1 registration details a unique molecular class of lung cancer. journal of clinical on-alcohol: of clinical j ournal of the American Society of clinical on-alcohol, 30(8), 863-. Significant progress has been made in the diagnosis and treatment of lung Cancer in recent years due to the popularity of early tumor screening, the development of medical technology, and the improvement of resident lifestyle, however, epidemiological data show that the 5-year overall survival rate for all stages of lung Cancer is as low as 15.9% (Ettingger DS, Akerley W, Borghaei H et al. non-small cell lung Cancer, version 2.2013.Journal of the National Comprehensive Cancer Network: JNCCN,11(6), quiz 653 (645) (2013)), and the main factors affecting the survival time of lung Cancer patients are relapse and metastasis.

Currently, a TNM (tumor node metastasis) staging system is commonly used clinically as an index for judging the prognosis of a lung Cancer patient, and a lung Cancer TNM staging standard is promulgated and implemented by the International Cancer consortium (UICC), and is the most widely applied tumor staging system in the current stage of lung Cancer diagnosis and treatment development. The TNM staging system is divided into four stages (stage I, stage II, stage III and stage IV) according to three indexes of the state (T) of a primary tumor, the regional lymph node condition (N) and the distant metastasis condition (M). Currently, TNM staging systems also have limited predictive capabilities, and there is a strong clinical need for novel markers that can accurately predict the prognosis of patients with lung cancer (Shi X, Li R, Dong X et al. IRGS: an animal-related gene classifier for lung cancer patients, journal of clinical medicine,18(1),55 (2020)).

Disclosure of Invention

The invention aims to provide application of biomarkers in predicting lung cancer prognosis and a product and a system/device for predicting lung cancer prognosis by using molecular markers.

In order to achieve the above objects, the present invention provides, in a first aspect, use of a reagent for detecting biomarkers including ANLN, ARNTL2, BMP5, GRIA1 and/or MAL in the manufacture of a product for predicting prognosis of lung cancer.

Further, the biomarkers are ANLN, ARNTL2, BMP5, GRIA1, and MAL.

Further, the reagent comprises a reagent for detecting the expression level of the biomarker in the sample by a digital imaging technology, a protein immunization technology, a dye technology, a nucleic acid sequencing technology, a nucleic acid hybridization technology, a chromatographic technology and a mass spectrometry technology.

Further, the reagent sample comprises tissue and body fluid.

In a second aspect, the invention provides a product for predicting the prognosis of lung cancer, the product comprising reagents for detecting biomarkers comprising ANLN, ARNTL2, BMP5, GRIA1 and/or MAL.

Further, the product comprises a chip and a kit.

Further, the kit comprises a qPCR kit, an immunoblotting detection kit, an immunochromatography detection kit, a flow cytometry kit, an immunohistochemical detection kit, an ELISA kit and an electrochemiluminescence detection kit.

Further, the kit also includes instructions for predicting a prognosis for lung cancer.

Further, the reagents comprise primers or probes that specifically bind to the biomarker genes; an antibody, peptide, aptamer, or compound that specifically binds to the marker protein.

In a third aspect, the present invention provides a system/apparatus for predicting lung cancer prognosis, comprising:

the acquisition unit is used for acquiring data of biomarkers in a sample to be detected, wherein the biomarkers comprise ANLN, ARNTL2, BMP5, GRIA1 and/or MAL;

and the processing unit is used for inputting the data of the biomarkers into a lung cancer prognosis prediction model to obtain a prediction result of the lung cancer progress of the sample to be detected.

Further, the prognostic prediction model is a Cox regression model.

Further, the Cox regression model is a LASSO Cox regression model.

Further, the formula of the prognostic prediction model is risk score ═ C1 × ExpANLN + C2 × expanrntl 2+ C3 × ExpBMP5+ C4 × ExpGRIA1+ C5 × expmax; wherein ExpANLN, ExpARNTL2, ExpBMP5, ExpGRIA1 and ExpMAL respectively represent the expression levels of ANLN, ARNTL2, BMP5, GRIA1 and MAL.

Further, the C1, the C2, the C3, the C4 and the C5 are 0.1157, 0.1034, -0.0592, -0.0470 and-0.0822 respectively.

A fourth aspect of the present invention provides a computer-readable storage medium storing a program for executing a lung cancer prognosis prediction model constructed from the biomarkers ANLN, ARNTL2, BMP5, GRIA1, and/or MAL.

Further, the prognostic prediction model is a Cox regression model.

Further, the Cox regression model is a LASSO Cox regression model.

Further, the formula of the prognostic prediction model is risk score ═ C1 × ExpANLN + C2 × expanrntl 2+ C3 × ExpBMP5+ C4 × ExpGRIA1+ C5 × expmax.

A fifth aspect of the present invention provides an electronic apparatus, comprising:

a client component, wherein the client component comprises a user interface;

a server component, wherein the server component comprises at least one memory unit configured to receive data input comprising sequencing data for biomarkers generated from a sample, the biomarkers comprising ANLN, ARNTL2, BMP5, GRIA1, and/or MAL; the user interface operatively coupled with the server component; and

a computer processor operatively coupled to the at least one memory unit, wherein the computer processor is programmed as an executable program for running a lung cancer prognostic prediction model constructed from biomarkers.

Further, the prognostic prediction model is a Cox regression model.

Further, the Cox regression model is a LASSO Cox regression model.

Further, the formula of the prognostic prediction model is risk score ═ C1 × ExpANLN + C2 × expanrntl 2+ C3 × exppbmp 5+ C4 × Exp GRIA1+ C5 × expmal.

In a sixth aspect, the present invention provides the use of a reagent for detecting biomarkers including ANLN, ARNTL2, BMP5, GRIA1 and/or MAL in the manufacture of a product for evaluating the effect of a medicament on the treatment of lung cancer.

The invention has the advantages and beneficial effects that:

according to the invention, ANLN, ARNTL2, BMP5, GRIA1 and/or MAL are/is selected as biomarkers, so that the prognosis of a lung cancer patient can be effectively predicted, and early intervention and early treatment can be realized.

Drawings

FIG. 1 is a graph of survival for a combination of ANLN, ARNTL2, BMP5, GRIA1, and MAL in a training set to predict prognosis of lung adenocarcinoma;

FIG. 2 is a survival graph demonstrating that a combination of ANLN, ARNTL2, BMP5, GRIA1, and MAL in a panel predicts prognosis for lung adenocarcinoma;

FIG. 3 is a ROC plot of ANLN, ARNTL2, BMP5, GRIA1, and MAL in a training set combined to predict prognosis of lung adenocarcinoma;

FIG. 4 is a ROC plot demonstrating the joint prediction of lung adenocarcinoma prognosis for a set of ANLN, ARNTL2, BMP5, GRIA1, and MAL.

Detailed Description

Some aspects and embodiments of the invention will now be discussed with reference to the figures. Other aspects and embodiments will become apparent to those skilled in the art. All documents mentioned herein are incorporated herein by reference.

Sample(s)

As used herein, a "sample" may be a cell or tissue sample (e.g., a biopsy), a biological fluid, an extract (e.g., a protein or DNA extract obtained from a subject). In particular, the sample may be a tumor sample, e.g. a solid tumor, e.g. lung adenocarcinoma. The sample may be a sample freshly obtained from the subject, or may be a sample that has been processed and/or stored (e.g., frozen, fixed, or subjected to one or more purification, enrichment, or extraction steps) prior to making the determination.

As used herein, "and/or" should be viewed as specifically disclosing each of the two specified features or components, with or without the other. For example, "a and/or B" will be considered a specific disclosure of each of (i) a, (ii) B, and (iii) a and B, as if each were individually listed herein.

Biomarkers

As used herein, "biomarker" refers to a biomolecule that is present in an individual at different concentrations that can be used to predict the cancer status of the individual. Biomarkers can include, but are not limited to, nucleic acids, proteins, and variants and fragments thereof. A biomarker may be DNA comprising all or part of a nucleic acid sequence encoding the biomarker, or the complement of such a sequence. Biomarker nucleic acids useful in the present invention are considered to include DNA and RNA comprising all or part of any nucleic acid sequence of interest.

In a particular embodiment of the invention, the biomarker comprises ANLN, ARNTL2, BMP5, GRIA1 and/or MAL. Biomarkers such as ANLN (antibiotic binding protein, gene ID: 54443), ARNTL2(aryl hydrocarbon receptor nuclear translocator like 2, gene ID: 56938), BMP5(bone morphotropic protein 5, gene ID: 653), GRIA1 (glutamic acid ionotropic receptor AMPA type subbend 1, gene ID: 2890), MAL (MAL, T cell differentiation protein, gene ID: 4118), including genes and their encoded proteins and homologs, mutations, and isoforms. The term encompasses full-length, unprocessed biomarkers, as well as any form of biomarker that results from processing in a cell. The term encompasses naturally occurring variants (e.g., splice variants or allelic variants) of the biomarkers. The gene ID is available at https:// www.ncbi.nlm.nih.gov/gene/. The nucleotide sequence of each gene disclosed as the NCBI gene ID number at 23/6/2021 is expressly incorporated herein by reference.

Gene expression

Reference to determining an expression level refers to determining the expression level of an expression product of a gene. The expression level can be determined at the nucleic acid level or at the protein level.

The determined gene expression level can be considered to provide an expression profile. By "expression profile" is meant a set of data relating to the expression levels of one or more related genes in an individual in a form that allows comparison with comparable expression profiles (e.g., from individuals with known prognoses) to help determine the prognosis and select an appropriate treatment for the individual patient.

Determination of the gene expression level may involve determining the presence or amount of mRNA in a cancer cell sample. Methods for doing so are well known to the skilled person. Gene expression levels can be determined in cancer cell samples using any conventional method, for example using nucleic acid microarrays or using nucleic acid synthesis (e.g., quantitative PCR).

Alternatively or additionally, the determination of the level of gene expression may involve determining the level of protein expressed from the gene in a sample comprising cancer cells obtained from the individual. Protein expression levels can be determined by any useful means, including the use of immunoassays. For example, expression levels can be determined by Immunohistochemistry (IHC), western blotting, ELISA, immunoelectrophoresis, immunoprecipitation, flow cytometry, mass cytometry, and immunostaining. Using any of these methods, the relative expression levels of the proteins of the biomarkers disclosed herein can be determined.

As an alternative embodiment, the expression level of the gene may also be detected using advanced sequencing methods. For example, Illumina can be used to detect biomarkers. Next generation Sequencing (e.g., Sequencing-By-Synthesis or TruSeq methods using, for example, the HiSeq, HiScan, genoanalyzer, or MiSeq system (Illumina, Inc., san. ca)). Biomarkers can also be detected using Ion current sequencing (Ion Torrent Systems, inc., guliford, connecticut) or other suitable semiconductor sequencing methods.

As an alternative embodiment, RNase profiling (mapping) can be used to quantify biomarkers using mass spectrometry. The isolated RNA may be enzymatically digested with an RNA endonuclease (RNase) having high specificity (e.g., RNase T1, which cleaves 3' to all unmodified guanosine residues) prior to analysis of the isolated RNA by MS or tandem MS (MS/MS) methods. The first method developed used reverse phase HPLC coupled directly to ESI-MS to perform on-line chromatographic separation of endonuclease digests. The presence of post-transcriptional modifications can be revealed by mass shifts from those expected based on the RNA sequence. Ions of abnormal mass/charge values can then be isolated for tandem MS sequencing, thereby locating the sequence position of the post-transcriptionally modified nucleoside.

Matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) has also been used as an analytical method to obtain information about post-transcriptionally modified nucleosides. MALDI-based methods can be distinguished from ESI-based methods by separation steps. In MALDI-MS, mass spectrometry is used to separate biomarkers.

The term "primer" as used herein refers to a nucleic acid sequence having a short free 3' -hydroxyl group, which is a short nucleic acid that can form a base pair with a complementary template and serves as an origin of replication for the template strand. The primers can prime DNA synthesis in the presence of reagents for polymerization (i.e., DNA polymerase or reverse transcriptase) and four different nucleoside triphosphates in appropriate buffer solutions and temperatures. The PCR conditions and the lengths of the sense and antisense primers can be appropriately selected according to the techniques known in the art.

The term "probe" as used herein refers to a nucleic acid fragment (e.g., RNA or DNA) corresponding to several bases to several hundred bases that can specifically bind to mRNA, and the presence or absence and expression level of a particular mRNA can be confirmed by a tag. The probe may be prepared in the form of an oligonucleotide probe, a single-stranded DNA probe, a double-stranded DNA probe, or an RNA probe. Suitable probes and hybridization conditions may be appropriately selected according to techniques known in the art.

The term "antibody" as used herein is well known in the art and refers to a specific immunoglobulin directed against an antigenic site. The antibody of the present invention refers to an antibody that specifically binds to the biomarker protein of the present invention, and can be produced according to a conventional method in the art. Forms of antibodies include polyclonal or monoclonal antibodies, antibody fragments (such as Fab, Fab ', F (ab')2, and Fv fragments), single chain Fv (scfv) antibodies, multispecific antibodies (such as bispecific antibodies), monospecific antibodies, monovalent antibodies, chimeric antibodies, humanized antibodies, human antibodies, fusion proteins comprising an antigen binding site of an antibody, and any other modified immunoglobulin molecule comprising an antigen binding site, so long as the antibody exhibits the desired biological binding activity.

The term "peptide" as used herein has the ability to bind to a target substance to a high degree and does not undergo denaturation during heat/chemical treatment. Also, due to its small size, it can be used as a fusion protein by attaching it to other proteins. In particular, since it can be specifically attached to a high molecular protein chain, it can be used as a diagnostic kit and a drug delivery substance.

The term "aptamer" as used herein refers to a polynucleotide composed of a specific type of single-stranded nucleic acid (DNA, RNA or modified nucleic acid) which itself has a stable tertiary structure and has the property of being able to bind with high affinity and specificity to a target molecule. As described above, since the aptamer can specifically bind to an antigenic substance like an antibody, but is more stable and has a simple structure than a protein, and is composed of a polynucleotide that is easily synthesized, it can be used instead of an antibody.

In addition, the kit of the present invention may comprise an antibody that specifically binds to the marker component; a secondary antibody conjugate conjugated to a marker developed by reaction with a substrate; a chromogenic substrate solution that undergoes a chromogenic reaction with the marker, a washing solution, an enzyme reaction termination solution, and the like, and may be prepared as a plurality of separate packages or compartments containing the reagent components used.

Prognosis

Whether the prognosis is considered good or poor can vary between cancer and disease stage. In general, a good prognosis is one in which Overall Survival (OS) and/or Progression Free Survival (PFS) is longer than the mean for that stage and cancer type. If PFS and/or OS are below the mean for the stage and type of cancer, the prognosis may be considered poor. The mean may be median survival OS or PFS.

In general, a "good prognosis" is a prognosis in which the survival (OS and/or PFS) of an individual patient may be favorable compared to the expectation of a population of patients in a comparable disease setting. This can be defined as better than median survival (i.e., survival over 50% of patients in the population).

Chip/kit

In the present invention, "chip", also referred to as "array", refers to a solid support comprising attached nucleic acid or peptide probes. Arrays typically comprise a plurality of different nucleic acid or peptide probes attached to the surface of a substrate at different known locations. These arrays, also known as "microarrays," can generally be produced using either mechanosynthesis methods or light-guided synthesis methods that incorporate a combination of photolithography and solid-phase synthesis methods. The array may comprise a flat surface, or may be nucleic acids or peptides on beads, gels, polymer surfaces, fibers such as optical fibers, glass, or any other suitable substrate. The array may be packaged in a manner that allows for diagnostic or other manipulation of the fully functional device.

A "microarray" is an ordered array of hybridization array elements, such as polynucleotide probes (e.g., oligonucleotides) or binding agents (e.g., antibodies), on a substrate. The matrix may be a solid matrix, for example, a glass or silica slide, beads, a fiber optic binder, or a semi-solid matrix, for example, a nitrocellulose membrane. The nucleotide sequence may be DNA, RNA or any permutation thereof.

In the present invention, the components of the kit may be packaged in the form of an aqueous medium or in a lyophilized form. Suitable containers in the kit generally include at least one vial, test tube, flask, pet bottle, syringe, or other container in which a component may be placed and, preferably, suitably aliquoted. Where more than one component is present in the kit, the kit will also typically comprise a second, third or other additional container in which the additional components are separately disposed. However, different combinations of components may be contained in one vial. The kit of the invention will also typically include a container for holding the reactants, sealed for commercial sale. Such containers may include injection molded or blow molded plastic containers in which the desired vials may be retained.

Classification method based on gene expression

The present invention provides methods for classifying, predicting or monitoring cancer in a subject. In particular, one or more pattern recognition algorithms may be used to evaluate data obtained from gene expression analysis. Such analytical methods may be used to form predictive models that may be used to classify test data. For example, one convenient and particularly effective classification method employs multivariate statistical analysis modeling, first using data from samples from known subgroups (e.g., from subjects known to have a particular cancer prognosis subgroup: high risk and low risk) ("modeled data") to form a model ("predictive model"), and second classifying unknown samples (e.g., "test samples") according to subgroups.

Pattern recognition methods have been widely used to characterize many different types of problems, such as across linguistics, fingerprinting, chemistry, and psychology. In the case of the methods described herein, pattern recognition is the use of multivariate statistics (both parametric and non-parametric) to analyze the data and thereby classify the samples based on a series of observed measurements and predict the values of some dependent variables. There are two main approaches. One group of methods is referred to as "unsupervised" and these simply reduce the data complexity in a reasonable manner and also produce a display map that can be interpreted by the human eye.

Another approach is referred to as "supervised" in which a mathematical model is generated using a training set of samples with known classes or results, and then evaluated using a separate validation dataset. Here, a "training set" of gene expression data is used to construct a statistical model that correctly predicts a "subset" of each sample. The training set is then tested with independent data (called a test or validation set) to determine the robustness (robustness) of the computer-based model. These models are sometimes referred to as "expert systems," but may be based on a series of different mathematical procedures, such as support vector machines, decision trees, k-nearest neighbor and naive Bayes (Bayes). Supervised methods may use datasets with reduced dimensionality (e.g., the first few principal components), but typically use unreduced data with all dimensions. In all cases, these methods allow for the quantitative delineation of the multivariate borders that characterize and separate each subtype according to its intrinsic gene expression profile. Any predicted confidence limit (confidence limit), e.g., probability level on goodness of fit, may also be obtained. The robustness of the predictive model can also be checked using cross-validation by omitting selected samples from the analysis.

Pattern recognition methods have been widely used to characterize many different types of problems, such as across linguistics, fingerprinting, science, and psychology. In the case of the methods described herein, pattern recognition is the use of multivariate statistics (both parametric and non-parametric) to analyze the data and thereby classify the samples based on a series of observed measurements and predict the values of some dependent variables. There are two main approaches. One group of methods is referred to as "unsupervised" and these simply reduce the data complexity in a reasonable manner and also produce a display map that can be interpreted by the human eye. However, this type of approach may not be suitable for developing clinical assays that can be used to classify samples derived from a subject without relying on an initial sample population for training a predictive algorithm.

System/apparatus

A device as applied herein shall at least comprise the above-mentioned units. The units of the device are operatively connected to each other. How the units are operatively linked will depend on the type of unit contained in the device. For example, in case a tool for automatic quantitative measurement of biomarkers is applied in the acquisition unit, the data obtained by said automatic operation unit may be processed by a processing unit, e.g. by a computer program running on a computer as data processor, in order to facilitate the diagnosis. In one embodiment, the data processor performs a comparison of the amount of the biomarker to a reference.

Further, in this case, the unit is constituted by a single device. However, the acquisition unit and the processing unit may also be physically separate. In this case, operational connection (operational connection) may be realized via wired and wireless connection between units allowing data transmission. The wireless connection may use a wireless lan (wlan) or the internet. The wired connection may be achieved by optical and non-optical cable connections between the units. The cable for wired connection is further suitable for high-throughput data transmission.

Readable storage medium

The present invention provides a computer-readable storage medium storing a program for executing a lung cancer prognosis prediction model constructed from the biomarkers ANLN, ARNTL2, BMP5, GRIA1, and/or MAL. The computer readable storage medium, such as computer executable code, may take many forms, including but not limited to tangible storage media, carrier wave media, or physical transmission media. Non-volatile storage media include, for example, optical or magnetic disks, any storage device such as in any computer or the like, volatile storage media include dynamic memory, such as the main memory of such computer platforms. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier-wave transmission media can take the form of electrical or electromagnetic signals, or acoustic or light waves, such as those generated during radio frequency and infrared data communications. Thus, common forms of computer-readable media include, for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards, paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer can read programming code and/or data. Many of these computer readable media may take the form of one or more sequences of one or more instructions that are executable by a processor to perform operations.

The following detailed description of embodiments of the present application will be made with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present application, are given by way of illustration and explanation only, and are not intended to limit the present application.

Example Gene markers associated with diagnosis and prognosis of Lung cancer

1. Data download

Acquiring RNA-seq data and clinical information of lung adenocarcinoma from a TCGA (TCGA), removing samples with missing survival information and 0 survival period, and taking 496 sample amount as a training set; and (3) acquiring chip sequencing data and clinical information of the lung adenocarcinoma from the GEO database, removing samples with missing survival information and 0 survival period, and taking the sample with the inclusion amount of 226 as a verification set.

2. Data normalization

RNA-seq data of TCGA was normalized by the Voom method, and chip data of GEO was normalized by the RMA method.

3. One-factor Cox analysis

And carrying out single-factor Cox analysis on the genes of the training set and the verification set, and screening the genes which are simultaneously related to the survival of the lung cancer patient in the two data sets, wherein the gene with the P <0.05 is considered to have an influence on the survival of the lung cancer patient.

4. LASSO Cox regression analysis

And performing LASSO Cox regression analysis to construct a LASSO regression model. TCGA data as training set and GEO data as test set. And constructing a prognostic gene signature by using a LASSO Cox regression model system and linear combination of mRNA expression levels to form a risk scoring formula.

And calculating the risk score of each sample by using the same formula when the GEO verification set is verified, dividing all samples into a high risk group and a low risk group according to the median of the risk scores, and further performing survival analysis and Receiver Operating Characteristic (ROC) curve analysis.

5. Survival Curve analysis

And (3) performing survival analysis and drawing survival curves on the lung cancer patients in the high-risk group and the low-risk group of the training set and the verification set by adopting R software 'survivval', 'surviviner' and 'ggplot 2', and performing difference comparison between the groups through log-rank test.

6. ROC curve analysis

In order to evaluate the accuracy of the prognosis model in predicting the lung cancer prognosis, the R software 'survivval' and 'timeROC' packages are adopted to detect the prognosis efficiencies of the biomarkers for 1 year, 3 years and 5 years by using time-dependent ROC curves, the significance of the difference between various groups of ROC curves is detected by using a self-sampling method, and the difference P <0.05 is considered to be statistically different.

7. Results

TCGA data is used as a training set, a backward gene signature is constructed by linear combination of LASSO Cox regression model coefficients and gene expression levels,

risk score no 0.1157 × ExpANLN +0.1034 × expanrntl 2-0.0592 × ExpBMP5-0.047 × ExpGRIA1-0.0822 × exppal.

The lung cancer patients were analyzed in two groups, high risk group (high score) and low risk group (low score), according to the median of the risk scores, and by KM survival analysis, the difference in survival time of the two groups was compared, and the cumulative survival rate of the patients in the high risk group was found to be significantly lower than that in the low risk group. The same formula is used to calculate the risk score in the GEO data. Consistent with the results for the TCGA training set, the cumulative survival of patients in the high risk group was significantly lower than that in the low risk group (fig. 1 and 2).

The prognosis ROC curve analysis is carried out on the lung cancer patients in the training set and the verification set, and the result shows that the risk score prognosis model has better distinguishing performance on the prognosis of the lung cancer patients (figure 3 and figure 4).

In conclusion, the gene signature based on the five genes of the present invention can predict the prognosis of lung cancer.

The preferred embodiments of the present application have been described in detail with reference to the accompanying drawings, however, the present application is not limited to the details of the above embodiments, and various simple modifications can be made to the technical solution of the present application within the technical idea of the present application, and these simple modifications are all within the protection scope of the present application.

It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various possible combinations are not described in the present application.

In addition, any combination of the various embodiments of the present application is also possible, and the same should be considered as disclosed in the present application as long as it does not depart from the idea of the present application.

Claims

1. Use of a reagent for the detection of a biomarker comprising ANLN, ARNTL2, BMP5, GRIA1 and/or MAL in the manufacture of a product for predicting the prognosis of lung cancer.

2. The use of claim 1, wherein the biomarkers are ANLN, ARNTL2, BMP5, GRIA1, and MAL.

3. The use of claim 2, wherein the agent comprises an agent for detecting the level of expression of a biomarker in a sample by digital imaging techniques, protein immunization techniques, dye techniques, nucleic acid sequencing techniques, nucleic acid hybridization techniques, chromatography techniques, mass spectrometry techniques.

4. The use of claim 3, wherein the reagent sample comprises tissue, body fluid.

5. A product for predicting the prognosis of lung cancer, said product comprising reagents for detecting biomarkers comprising ANLN, ARNTL2, BMP5, GRIA1 and/or MAL;

preferably, the product comprises a chip, a kit;

preferably, the kit comprises a qPCR kit, an immunoblotting detection kit, an immunochromatography detection kit, a flow cytometry kit, an immunohistochemical detection kit, an ELISA kit and an electrochemiluminescence detection kit;

preferably, the kit further comprises instructions for predicting the prognosis of lung cancer.

6. The product of claim 5, wherein the reagents comprise primers or probes that specifically bind to the biomarker genes; an antibody, peptide, aptamer, or compound that specifically binds to the marker protein.

7. A system/apparatus for predicting lung cancer prognosis, comprising:

Preferably, the prognostic prediction model is a Cox regression model;

preferably, the Cox regression model is a LASSOCox regression model;

preferably, the prognostic predictive model is formulated as a risk score of C1 × ExpANLN + C2 × expanrntl 2+ C3 × exppbmp 5+ C4 × Exp GRIA1+ C5 × Exp MAL;

preferably, the C1, C2, C3, C4 and C5 are 0.1157, 0.1034, -0.0592, -0.0470 and-0.0822 respectively.

8. A computer-readable storage medium characterized by storing a program for executing a lung cancer prognosis prediction model constructed from biomarkers ANLN, ARNTL2, BMP5, GRIA1, and/or MAL;

preferably, the prognostic prediction model is a Cox regression model;

preferably, the Cox regression model is a LASSOCox regression model;

9. An electronic device, comprising:

a client component, wherein the client component comprises a user interface;

a server component, wherein the server component comprises at least one memory unit configured to receive data input comprising sequencing data for biomarkers generated from a sample, the biomarkers comprising ANLN, ARNTL2, BMP5, GRIA1, and/or MAL;

the user interface operatively coupled with the server component; and

a computer processor operatively coupled to the at least one memory unit, wherein the computer processor is programmed as an executable program for running a lung cancer prognosis prediction model constructed from biomarkers;

preferably, the prognostic prediction model is a Cox regression model;

preferably, the Cox regression model is a LASSOCox regression model;

preferably, the prognostic prediction model is formulated as a risk score of C1 × ExpANLN + C2 × expanrntl 2+ C3 × exppbmp 5+ C4 × ExpGRIA1+ C5 × expmax;

10. Use of a reagent for detecting a biomarker for the manufacture of a product for evaluating the efficacy of a drug for the treatment of lung cancer, wherein the biomarker comprises ANLN, ARNTL2, BMP5, GRIA1 and/or MAL;

preferably, the reagents comprise primers or probes that specifically bind to the biomarker genes; an antibody, peptide, aptamer, or compound that specifically binds to the marker protein.