KR101825369B1

KR101825369B1 - Genetic Bio-marker for Predicting Prognosis in Cancer and Use thereof

Info

Publication number: KR101825369B1
Application number: KR1020150084070A
Authority: KR
Inventors: 최선심; 민재웅
Original assignee: 강원대학교산학협력단
Priority date: 2015-06-15
Filing date: 2015-06-15
Publication date: 2018-02-06
Also published as: KR20160147392A

Abstract

The present invention relates to a genetic marker for judging the classification or prognosis of cancer, and a method of classifying cancer and determining a prognosis using the same. The biomarker for determining cancer classification or prognosis according to the present invention is one or more genes selected from NCAPH, CDCA5, CDCA8, SPC25, DLGAP5, NUSAP1, NEK2, CCNB1, KIAA1524 and ZWINT. By measuring the expression level of the genes of the present invention, it is possible to classify the cancer patients and determine their prognosis. In addition, the genes of the present invention can be usefully used to select a therapeutic agent effective for cancer patients and establish a treatment plan.

Description

(Genetic Bio-marker for Predictive Prognosis in Cancer and Use thereof)

The present invention relates to gene biomarkers for cancer prognosis determination and uses thereof.

Tumors of different patients are not identical at the molecular or morphological and pathological levels. These characteristics are usually referred to as inter-tumor heterogeneity (1-3). The causes of different patient tumors having different characteristics are as follows: (1) the location of the origin or organ of different tumor cells; (2) a variety of biological factors that may affect the development or progression of cancer, such as the sex, age, or condition of the hormone; And (3) genetic diversity between hosts (4-6). Recent evidence suggests that some evidence for deeper levels of tumor heterogeneity, ie, intra-tumor heterogeneity, within a single patient has been reported (7-10). Thus, studies of heterogeneity in tumors have received attention as the latest research field in cancer genomics (11-14). Because tumors composed of distinct subpopulations can vary in sensitivity to cancer treatments, it is important to understand the degree and extent of tumor heterogeneity within tumors (15-17). The presence of a resistant subgroup containing different genetic mutations can determine whether specific treatment methods for subgroups of patients are effective. Therefore, even single patients require profound temporal or spatial examinations of tumor cells. Indeed, the increased diversity of sub-clones within tumors is associated with an increased risk of progression from Barrett's esophagus to malignant tumors (18-19).

The surprising and rapid development of mass-parallel sequencing, also referred to as next generation sequencing (NGS), has dramatically elucidated the extent of tumor heterogeneity and has resulted in hundreds of cancer-causing somatic mutations and structural variants (20-22). Several researchers have conducted studies on tumor heterogeneity using multi-region sequencing approaches based on NGS techniques (7, 23-24). However, recent research reports have shown that Hiley et al. [25] suggest that single cell sequencing is indispensable for the complete identification of the actual number of subclones present in tumors. This is because multi-region sequencing methods can not separate the constituents of subclones present in these tumors, because the regions of the tumor are a collection of numerous cells with different genetic conditions.

The patent documents and references cited herein are hereby incorporated by reference to the same extent as if each reference was individually and clearly identified by reference.

Korean Patent Publication No. 10-2012-7013385

1. Bedard PL, et al., Nature, 2013, 501 (7467): 355-64. 2. Lyng H, et al., International journal of cancer, 2001; 96 (3): 182-90. 3. Cusnir M, Human vaccines & immunotherapeutics, 2012; 8 (8): 1143-5. 4. Pinto N and Dolan ME. Curr Drug Metab. 2011 Jun; 12 (5): 487-97. 5. Yasuda S, et al., Clinical Pharmacology & Therapeutics, 2008; 84 (3): 417-23. 6. Jiang Y, et al., The American Journal of Human Genetics, 2013; 93 (2): 249-63. 7. Sottorivaya, et al., Proc Natl Acad Sci U SA, 2013 Mar 5; 110 (10): 4009-14. 8. Michor F, et al., Cancer Prev Res (Phila), 2010 Nov; 3 (11): 1361-4. 9. Gerlinger M, et al., N Engl J Med, 2012; 366 (10): 883-92. 10. Swanton C. Cancer Res, Oct 1, 72 (19): 4875-82. 11. Zhang J, et al. Science, 2014, 346 (6206): 256-9. 12. Hajirasouliha I, et al., Bioinformatics, 2014 Jun 15; 30 (12): i78-86. 13. de Bruin EC, et al., Genome Med, 2013; 5 (11): 101. 14. Aparicio S and Mardis E. Genome Biol, 2014 Oct 1; 15 (9): 463. 15. Fisher R, et al., Br J Cancer, 2013; 108 (3): 479-85. 16. Denisov EV, et al., Scientific reports, 2014; 17. Heppner GH, et al., Cancer Res, 1978 Nov; 38 (11 Pt 1): 3758-63. 18. Leedham SJ, et al., Gut, 2008 Aug; 57 (8): 1041-8. 19. Owonikoko T, et al., Am J Clin Pathol, 2002 Apr; 117 (4): 558-66. 20. Kandoth C, et al., Nature, 2013; 502 (7471): 333-9. 21. Roukos D, et al., Assessing tumor heterogeneity and emergence mutations using next-generation sequencing for cancer drugs resistance. 2012. 22. Timmermann B, et al., Plos one, 2010; 5 (12): e15661. 23. Nik-Zainal S, et al., Cell, 2012; 149 (5): 979-93. 24. Yachida S, et al., Nature, 2010; 467 (7319): 1114-7. 25. Hiley C, et al., Genome Biol, 2014; 15: 453. 26. Patel AP, Tirosh I, Trombetta JJ, Shalek AK, Gillespie SM, Wakimoto H, et al., Science 2014 Jun 20; 344 (6190): 1396-401. 27. Prate, et al., Scientific reports. 2013; 28. Heller ER, et al., Int J Oncol. 2013; 42 (2): 583-96. 29. West RB, et al., Proc Natl Acad Sci U S 2006 Jan 17; 103 (3): 690-5. 30. Miyagia, et al., Metabolomics. 2010; 6 (1): 146-55. 31. Shapiro E, et al., Nature Reviews Genetics. 2013; 14 (9): 618-30. 32. Bhargava V, et al., Scientific reports. 2014; 33. Saliba AE, et al., Nucleic Acids Res. 2014 Aug; 42 (14): 8845-60. 34. Tu Y, et al., Proc Natl Acad Sci U S 2002 Oct 29; 99 (22): 14031-6. 35. Lundberg E, et al., Molecular systems biology. 2010; 6 (1). 36. Whitfield ML, et al., Mol Biol Cell. 2002 Jun; 13 (6): 1977-2000.

The present inventors have sought to identify biomarkers capable of predicting the therapeutic prognosis of cancer patients through the study of tumor heterogeneity in cancer. As a result, it was confirmed that correlated genes expressed in a single cell of 34 cancer-derived xenograft cancer can be identified and the therapeutic prognosis of cancer can be predicted by measuring the expression levels of these correlated genes. .

Accordingly, an object of the present invention is to provide a composition for judging the classification or prognosis of cancer.

Another object of the present invention is to provide a kit for judging the classification or prognosis of cancer.

It is still another object of the present invention to provide a biomarker for judging the classification or prognosis of cancer.

It is still another object of the present invention to provide a method for measuring the expression status of a biomarker gene for the classification or prognosis of cancer according to the present invention from a biological sample so as to provide information necessary for classification or prognosis of cancer.

Other objects and technical features of the present invention will be described in more detail with reference to the following detailed description, claims and drawings.

According to one aspect of the present invention there is provided a cancer agent comprising an agent capable of measuring the expression status of at least one gene selected from the group consisting of NCAPH, CDCA5, CDCA8, SPC25, DLGAP5, NUSAP1, NEK2, CCNB1, KIAA1524 and ZWINT For the classification or prognosis determination.

As used herein, the term " expression state of a gene " refers to a gene expression level (e.g., mRNA level or protein level), an activity level of a protein expressed from the gene, a mutation of the gene, Is meant to include the methylation state of the gene, preferably the expression level of the gene or the activity level of the protein expressed from the gene, and most preferably the expression level of the gene.

The term " classifying " as used herein is meant to include determining clinically relevant characteristics in cancer or determining a particular prognosis of a cancer patient. Classification of such cancers includes, but is not limited to: (i) assessing the likelihood of metastasis to another organ, or the risk of recurrence; (Ii) evaluation of stage of cancer; (Iii) assessment of the prognosis of cancer patients without treatment; (Iv) evaluation of the responsiveness of cancer patients to treatment methods (chemotherapy, radiation therapy, surgical procedures); (V) diagnosis of the actual cancer patient's response to current or past treatment methods; (Vi) selection of the preferred method for the treatment of cancer patients; And (iii) prognosis for the life expectancy of cancer patients (eg prognosis for overall survival).

Therefore, the term " cancer classification " in the present invention has some equivalents to " cancer prognosis determination " and can be used in combination within the overlapping meanings.

The genes used as cancer biomarkers in the present invention are one or more genes selected from the group consisting of the following genes: CCNB1 (cyclin B1), CDCA8 (cell division cycle associated 8), CDCA5 (cell division cycle associated 5), DLGAP5 (NSCA-related kinase 2), NUSAP1 (nucleolar and spindle-associated protein 1), SPC25 (NSC-1) SPC25, NDC80 kinetochore complex component), ZWINT (ZW10 interacting kinetochore protein).

According to one embodiment of the present invention, the cancer is lung cancer, kidney cancer, or liver cancer.

According to another embodiment of the present invention, the cancer is lung cancer, and the lung cancer includes small cell lung cancer and non-small cell lung cancer, preferably non-small cell lung cancer.

In the present invention, the non-small cell lung cancer includes, but is not limited to, lung adenocarcinoma (LADC), lung squamous cell carcinoma (LUSC), bronchopulmonary cancer.

In the present invention, the expression level of the genes can be measured to determine the classification of cancer or the prognosis of cancer.

According to another embodiment of the present invention, when the expression level of the genes is up-regulated, it is judged to be a bad prognosis in relation to the prognosis of cancer. In relation to the classification of cancer, the possibility of metastasis of cancer to other organs, It is evaluated as a high risk, highly advanced cancer stage and the life expectancy of the patient is considered to be short.

According to another embodiment of the present invention, when the level of expression of the genes is down-regulated, it is judged to be a good prognosis in relation to the prognosis of cancer. Regarding the classification of cancer, the possibility of metastasis or recurrence of cancer to other organs The risk is low, it is evaluated as early cancer stage, and the life expectancy of the patient is considered to be long.

As used herein, the term " upregulation " means a level of expression of the genes of the present invention above a normal expression level, and " downregulation " Which means a lower level.

In the present invention, the term " agent capable of measuring the expression state of a gene " means that (i) a nucleic acid sequence encoding the gene or a nucleic acid probe capable of hybridizing to the sequence complementary thereto, (ii) (Iii) an antibody capable of immunologically binding to a protein, polypeptide, or fragment thereof encoded by said gene, or a primer set or primer set capable of amplifying all of said antibody But are not limited to, fragments.

Methods for measuring the level of expression of a particular gene are known to those skilled in the art and include, for example, microarray analysis (e.g., mRNA or microRNA expression analysis or copy number analysis), quantitative real time PCR (e.g., "qRT- TaqMan (TM)), immunological assays (e.g., ELISA, immunohistochemistry), and the like.

In the present invention, the activity level of a protein or a polypeptide encoded by the genes is also used in a sense equivalent to the expression level of the gene. For example, a high activity level of a protein or polypeptide means a high expression level, and a low activity level means a low expression level.

According to another aspect of the present invention there is provided a cancer agent comprising an agent capable of measuring the expression status of one or more genes selected from the group consisting of NCAPH, CDCA5, CDCA8, SPC25, DLGAP5, NUSAP1, NEK2, CCNB1, KIAA1524 and ZWINT And a kit for judging the prognosis.

According to one embodiment of the present invention, in the kit of the present invention, the cancer is cancer, lung cancer, kidney cancer or liver cancer.

In the present invention, the non-small cell lung cancer includes, but is not limited to, lung adenocarcinoma, squamous cell carcinoma of the lung, bronchopulmonary cancer.

The kit of the present invention may comprise a carrier for various components, such as, for example, a container such as a bag, box, tube, rack, But are not limited to, a support.

As described above, the kit of the present invention is an agent capable of measuring the expression state of the gene, comprising (i) a nucleic acid probe capable of hybridizing to the nucleic acid sequence encoding the gene or a complementary sequence thereof, (ii) a primer or a primer set capable of amplifying a part or the whole of the gene, (iii) a protein, polypeptide, or fragment thereof immunologically bound Or fragments of such antibodies.

The nucleic acid probe or primer may be labeled with a suitable detection marker. The detection markers include, but are not limited to, radioactive isotopes, fluorophores, biotin, enzymes (e.g., alkaline phosphatase), enzyme substrates, ligands and antibodies. For further details, see Jablonski et al., Nucleic Acids Res., 14: 6115-6128 (1986); Nguyen et al., Biotechniques, 13: 116-123 (1992); Rigby et al., J. Mol. Biol., 113: 237-251 (1977), the disclosures of which are incorporated herein by reference.

The kit of the present invention may comprise various components. But are not limited to, for example, Taq polymerase, deoxyribonucleotides, dideoxyribonucleotides, other suitable primers capable of amplifying the target DNA sequence, RNase A, and the like.

The kit of the present invention may preferably include instructions for using the kit of the present invention to determine the classification of cancer or prognosis of cancer.

According to another embodiment of the present invention there is provided a method of screening for a cancer classifying or cancer comprising one or more genes selected from the group consisting of NCAPH, CDCA5, CDCA8, SPC25, DLGAP5, NUSAP1, NEK2, CCNB1, KIAA1524 and ZWINT Provide a biomarker for prognosis.

According to another aspect of the present invention there is provided a method of screening for cancer in a mammal comprising the steps of contacting a biological sample with NCAPH, CDCA5, CDCA8, SPC25, DLGAP5, NUSAP1, NEK2, CCNB1, KIAA1524 and ZWINT The expression level of the at least one gene selected from the group consisting of < RTI ID = 0.0 >

According to an embodiment of the present invention, the cancer is lung cancer, kidney cancer or liver cancer.

According to one embodiment of the present invention, in the kit of the present invention, the biological sample is a tissue, a whole blood, a serum, a plasma, a body fluid, a urine, a cell, a cell lysate or a supernatant of a cell culture.

The features and advantages of the present invention are summarized as follows.

(i) The present invention relates to a gene marker for judging the classification of cancer or prognosis of cancer, and a method of classifying cancer and determining prognosis using the same.

(Ii) The biomarker for the classification or prognosis of cancer in the present invention is at least one gene selected from NCAPH, CDCA5, CDCA8, SPC25, DLGAP5, NUSAP1, NEK2, CCNB1, KIAA1524 and ZWINT.

(Iii) The genes of the present invention are expressed in cancer patients in a jointly regulated expression level, and cancer patients can be grouped according to their expression levels.

(Vi) Cancer patients grouped according to the expression levels of the genes of the present invention are clearly distinguished in prognosis.

(v) Therefore, by measuring the expression levels of the genes of the present invention, it is possible to classify cancer patients and determine their prognosis, and are useful for selecting therapeutic agents effective for cancer patients.

(Vi) Furthermore, the genes of the present invention can be usefully used for selecting a therapeutic agent effective for cancer patients and establishing a treatment plan.

The present invention relates to a gene marker for judging the classification of cancer or prognosis of cancer, and a method of classifying cancer and judging prognosis using the same. The biomarker for the classification of cancer or prognosis of cancer of the present invention is one or more genes selected from NCAPH, CDCA5, CDCA8, SPC25, DLGAP5, NUSAP1, NEK2, CCNB1, KIAA1524 and ZWINT. By measuring the expression level of the genes of the present invention, it is possible to classify the cancer patients and determine their prognosis. In addition, the genes of the present invention can be usefully used to select a therapeutic agent effective for cancer patients and establish a treatment plan.

FIG. 1 schematically shows an entire analysis process of the experiment of the present invention. The upper part shows preparation of a single cell and the lower part shows the process of removing the transcriptome of PDX-LADC single cells.
Figure 2 shows the type and expression correlation of tumor tissue according to the experimental step. Panel A shows gene expression correlations with tumor tissue types and shows Pearson correlation coefficients between tumor tissues. Panel B shows a box plot analysis of Pearson correlation coefficients between groups.
Figure 3 shows the expression correlations between pT and bulkT improved through data cleaning. Panel A shows the results before data cleaning. Panel B shows the results after data cleaning.
Figure 4 shows two different genes with a simple expression pattern. Panel A shows the simple expression of genes expressed in 1-2 single cells. In this case, the FPKM values of the remaining cells are " 0 ". Panel B shows the housekeeping expression of a gene that is variably expressed between -3 and 3 Log 2 multiples over 34 single cells.
Figure 5 shows the expression of G64 over 34 single cells. The log 2 multiples of the G64 gene expressed in each single cell relative to the mean value of the G64 gene expressed in 34 single cells are shown. The orange dotted line represents the boundary of a single cell that has a positive correlation with a single cell having a negative correlation.
FIG. 6 shows hierarchical clustering analysis combined with heat map of 34 single cells using the G64 gene set and PCA analysis results thereof. Panel A shows the hierarchical clustering analysis combined with a heat map and Panel B shows the results of the PCA analysis.
Figure 7 shows the hierarchical clustering analysis combined with the heat map of 488 LADC tumor samples using the G64 gene set and the PCA analysis results thereof. Panel A shows the hierarchical clustering analysis combined with a heat map and Panel B shows the PCA analysis.
Figure 8 shows the hierarchical clustering analysis combined with the heat map of 488 LADC tumor samples using the G10 gene set and the results of PCA analysis thereof. Panel A shows the hierarchical clustering analysis combined with a heat map and Panel B shows the results of the PCA analysis.
FIG. 9 shows survival analysis results of patients providing 488 LADC samples according to the degree of G10 expression. Panel A shows the survival rates of the C1 and C3 groups. Panel B shows the survival rates of C1 and C2, C1 and C3, and C2 and C3.
FIG. 10 shows a hierarchical clustering analysis combined with a heat map of 289 kidney cancer (KIRP) tumor samples using the G10 gene set, a PCA analysis thereof, and a comparison of the survival rates of the C1 group and the C3 group. Panel A shows a hierarchical clustering analysis combined with a heat map. Panel B shows the results of the PCA analysis. Panel C shows the survival rates of groups C1 and C3.
FIG. 11 shows a hierarchical clustering analysis of a heat map of 351 liver cancer (LIHC) tumor samples using the G10 gene set, a PCA analysis thereof, and a comparison of the survival rates of the C1 group and the C3 group. Panel A shows a hierarchical clustering analysis combined with a heat map. Panel B shows the results of the PCA analysis. Panel C shows the survival rates of groups C1 and C3.

Example

Materials and Methods

1. Data Cleaning

Among the 23,284 genes as the first transcriptome of the assay, FPKM values were found to be " 0 " over 34 patient-derived xenograft lung adenocarcinoma single cells (PDX-LADC single cells) 9,455 genes were selected (34-36). In order to solve the infinity problem arising from the process of calculating the change in expression level for 9,455 genes, all 0 to 0.1 FPKM values were converted to 0.1.

2. Identification of correlated genes

After the data was purified as described above, cell-cell correlations related to expression of each gene were estimated through Pearson's correlation analysis (PCA) using R (http: //www.r-project. org /). Cell expression pattern with a Pearson correlation coefficient (r)> 0.9 after removal of genes with high correlation coefficients expressed in simple expression or house-keeping expression patterns in PDX-LADC single cells The gene group was selected as " seed gene " and analyzed. The seed genes were used to expand the number of highly expressed genes. The expansion of the gene was performed through a correlation test with the remaining genes using the threshold value r > 0.75. As a result, 64 genes with high correlation and expression were found in 34 PDX-LADC single cells and named 'G64' (Table 1). In addition, G64 was used to further narrow the classifier gene to find a minimum gene group that can clearly divide the patient group among the 64 genes constituting G64. To do this, gene-gene correlation tests were performed on the genes in the up-regulated group and the down-regulated group, and correlation coefficients were calculated. The top 10 genes were selected by combining all the correlation coefficients calculated based on each gene, and this was named 'G10' (Table 2).

3. Removal of Experimental DEG (Differently Expressed Gene)

Single cells used to produce a single cell transcriptome were produced using a patient-derived single cell approach, which may be a xenotransplantation process, a cell culture process, All of the processes induce additional expression changes independent of tumorigenesis. Therefore, it is necessary to remove differential expression genes (DEGs) that are generated during the experiment rather than by tumorigenesis. RNA-seq transcripts from three different tissues were used: primary tumor-derived transcript (pT); Xenotransplantation tumor-derived transcript (xeonT); And a tumor cell-derived transcript (bulkT) obtained after resection from xenograft tumor tissue and cultured. The reason for removing such experimental DEGs is to remove three different types of DEGs: (1) DEGs between pT and xeonT; (2) DEGs between xeonT and bulkT; And (3) DEGs between pT and bulkT. In each of the above comparisons, the estimation of DEG was performed through Log 2-fold change for each FPKM value. In any of these comparisons, it was excluded from the further analysis below if the DEG is considered to be DEG (i.e., the Log 2 multiples change is greater than 2).

4. Data for TCGA LADC

RNA-seq data for 488 LADC patients and post-treatment survival information for LADC patients were downloaded from the TCGA website (https://tcga-data.nci.nih.gov/tcga/tcgaHome2.jsp).

5. Statistical analysis

All statistical tests and analyzes were performed using R studio (R studio, Racine 2012). The 'cor.test' function was used to calculate the Pearson correlation coefficient (http://www.rproject.org/) to determine how FPKM values are related among different cells. The 'hclust' function was used for hierarchical clustering (https://stat.ethz.ch/R-manual/R-patched/library/stats/html/hclust.html). The packages 'factoMineR' (http://factominer.free.fr) and 'rgl' (https://r-forge.r-project.org/projects/rgl/) perform principal component analysis (PCA) and And visualization of results (30). The 'survival' package combined with the Kaplan-Meier survival curve was used to estimate the survival rate of patients with LADC (http://r-forge.r-project.org). Cox proportional hazards were calculated using the 'coxph' function of survival packages (27, 28). Other scripting tasks that are required for multiple batch jobs were performed using Perl scripts.

Experiment result

1. Removal of the transcriptome of PDX-LADC single cells

The entire analysis procedure of this experiment is shown in Fig. In Fig. 1, the upper part describes the preparation of a single cell, and the lower part is the part of the process of removing the transcriptome of PDX-LADC single cells. The lack of sequencing depth is a known technical limitation in the single-cell based sequencing approach (26, 27, 28, 29, 31-33). In addition, many genes differentially expressed in 34 normal LADC single cells compared with single cells are expressed not by tumor formation but rather by experimental procedures for preparing single cells (for example, xenotransplantation processes or cells Culture process). mN, pT, xenoT and bulkT are the masses obtained by collecting and matched normal ',' primary tumor ',' xenograft tumor 'and' Of tumors. &Quot; The correlation between pT and mN expression correlates with the expression relationship between pT and xeonT or between pT and bulkT when the correlation between the samples is compared before the removal of the gene showing the change by the experimental procedure (Panel B in Fig. 2). Therefore, in order to eliminate the above errors, the genes showing the expression changes by xenotransplantation or cell culture process were excluded from the later analysis. As a result of the extensive cleaning process as described above, only 9455 genes remained in 23284 genes, and the expression correlation between pT and bulkT was significantly improved from 0.44 to 0.92 (Pearson correlation coefficient (r) 3).

2. Identification of highly correlated gene sets in 34 single cells

The purpose of this experiment is to understand the degree of intra-tumor heterogeneity through profiling of transcripts. Hierarchical clustering analysis of all 9455 gene expressions is not useful to reach the conclusion that it is a single cell heterogeneity. Because the Pearson correlation coefficient of the genes expressed in each cell is 0.55-0.89, the genes expressed in each cell are very similar, and there is a very large amount of noise in grouping into the clusters Because. However, the degree of single cell heterogeneity based on gene expression is the reason why the degree of single cell heterogeneity based on gene expression must be determined by the gene group, not by a single gene, which shows noisy expression across 34 single cells. High expression correlations should be estimated based on genes with complex expression patterns among single cells. This is because high cell-cell correlations determined by the expression of grouped genes can only occur by chance when genes are expressed in one or two single cells or when genes are expressed anywhere (Figure 4) . The selected genes were further screened based on the reasons described in the experimental method, and a total of 5586 genes were selected for cell-cell correlation in 34 single cells. As a result, a total of 20 genes were identified as a seed gene, for example, a gene expressed in a highly correlated manner among 34 single cells in the case of a threshold value of Pearson correlation coefficient ≥ 0.9 (Fig. 5 ). Subsequently, genes with positive correlation with each of the 20 seed genes with a Pearson correlation coefficient of at least 0.75 were searched to further collect extended sets of correlation genes. As shown in FIG. 6, a total of 64 genes were identified as an extended set of correlated genes and named 'G64' (Table 1).

Symbol Description Symbol Description ANLN aniline, actin binding protein KIF14 kinesin family member 14 ASF1B anti-silencing function 1B histone chaperone KIF11 kinesin family member 11 ATAD2 ATPase family, AAA domain containing 2 KIF22 kinesin family member 22 AURKB aurora kinase B KIF20B kinesin family member 20B BARD1 BRCA1 associated RING domain 1 MCM10 minichromosome maintenance complex component 10 BRCA2 breast cancer 2, early onset MCM3 minichromosome maintenance complex component 3 C9orf100 Rho Guanine Nucleotide Exchange Factor (GEF) 39 MCM4 minichromosome maintenance complex component 4 CAV1 caveolin 1, caveolae protein, 22 kDa MLF1IP MLF1-Interacting Protein CCNB1 cyclin B1 MTBP MDM2 binding protein CDCA2 cell division cycle associated 2 NCAPG2 non-SMC condensin II complex, subunit G2 CDCA8 cell division cycle associated 8 NCAPH non-SMC < / RTI > condensin I complex, subunit H CDCA5 cell division cycle associated 5 NDC80 NDC80 kinetochore complex component CDKN3 cyclin-dependent kinase inhibitor 3 NEK2 NIMA-related kinase 2 CENPE centromere protein E, 312 kDa NRM nuclear envelope membrane protein CENPM centromere protein M NUSAP1 nucleolar and spindle associated protein 1 CENPK centromere protein K PARPBP PARP1 binding protein CENPN centromere protein N PBK PDZ binding kinase CENPW centromere protein W PTMA prothymosin, alpha CEP55 centrosomal protein 55 kDa RAD51AP1 RAD51 associated protein 1 CKS1B CDC28 protein kinase regulatory subunit 1B RAD54B RAD54 homolog B (S. cerevisiae) DDX39A DEAD (Asp-Glu-Ala-Asp) box polypeptide 39A RFC2 replication factor C (activator 1) 2, 40 kDa DLGAP5 discs, large (Drosophila) homolog-associated protein 5 RRM2 ribonucleotide reductase M2 ESPL1 extra spindle pole bodies homolog 1 (S. cerevisiae) SHCBP1 SHC SH2-domain binding protein 1 FAM111B family with sequence similarity 111, member B SPC25 SPC25, NDC80 kinetochore complex component GAS2L3 growth arrest-specific 2 like 3 TACC3 transforming, acidic coiled-coil containing protein 3 GEN1 GEN1 Holliday junction 5 'flap endonuclease TRIP13 thyroid hormone receptor interactor 13 GMNN geminin, DNA replication inhibitor TROAP trophinin associated protein HIST1H3B histone cluster 1, H3b TUBA1B tubulin, alpha 1b HMGB1 high mobility group box 1 UHRF1 ubiquitin-like with PHD and ring finger domains 1 HMGN2 high mobility group nucleosomal binding domain 2 USP1 ubiquitin specific peptidase 1 ITGB3BP integrin beta 3 binding protein (beta3-endonexin) VRK1 vaccinia related kinase 1 KIAA1524 KIAA1524 ZWINT ZW10 interacting kinetochore protein

3. Identification of tumor heterogeneity by a set of correlation genes

Hierarchical clustering analysis was performed to identify intra-tumor heterogeneity in 34 single cells using the G64 gene set. The analysis was performed by analyzing the separation pattern of single cells by expression of G64 gene sets. The expression level of each gene in each single cell was calculated by dividing the FPKM value in each single cell by the average FPKM value of 34 single cells and then by Log2 conversion. Interestingly, 34 single cells derived from a single LADC tumor site were divided into two subpopulations according to the degree of G64 expression (Fig. 6). That is, the two subgroups are downregulated and upregulated. These results were also confirmed by Principal Component Analysis (PCA) (Fig. 6). The present inventors have confirmed that the results classified by the G64 subgroup can also be applied to the inter-tumoral level. For this purpose, transcriptome information of 488 LADC tumor samples prepared by the RNA-seq approach was retrieved from the TCGA database. The change in the expression level of each gene in each tumor sample was calculated in the same manner as in the analysis of the 34 single cells. Hierarchical clustering analysis was then performed on 488 tumor samples combined with a heat map illustration. As a result of the analysis, surprisingly, G64, which classified single cells into two subgroups, showed the same pattern of classification for LADC samples (FIG. 7). The pattern of G64 classification into two groups was identical in the analysis using 88 Korean lung cancer sample transcripts prepared using the RNA-seq approach. These results imply that the classification into subgroups by G64 is not limited to intra-tumor levels but may extend to inter-tumor levels.

4. Excavation of minimal gene set G10 with correlation

A set of minimum genes having a correlation with cancer classification was analyzed using the obtained G64 gene set. For this purpose, gene-gene correlation tests were performed on the G64 expression-upregulated group and the G64 expression-downregulated group, respectively, and the correlation coefficient of each gene was calculated. The correlation coefficients of each gene were summed and the top 10 genes were selected. As a result, 10 genes, NCAPH, CDCA5, CDCA8, SPC25, DLGAP5, NUSAP1, NEK2, CCNB1, KIAA1524 and ZWINT were identified as a set of genes maintaining lung cancer classification function and named as 'G10' (Table 2). Based on the above results, hierarchical clustering analysis of 488 LADC tumor samples was performed using G10 in the same manner as G64. As a result of the analysis, G10 classified 488 LADC samples into three subgroups (panel A of FIG. 8). The subpopulations are downregulated in the C1 population (no expression or very low levels of expression), respectively, depending on the degree of expression of the G10 gene; A slightly downregulated C2 population (low level of expression); And upregulated C3 populations (high levels of expression). The above results were also confirmed in the PCA analysis (panel B in Fig. 8).

Symbol Description CCNB1 cyclin B1 CDCA8 cell division cycle associated 8 CDCA5 cell division cycle associated 5 DLGAP5 discs, large (Drosophila) homolog-associated protein 5 KIAA1524 KIAA1524 NCAPH non-SMC < / RTI > condensin I complex, subunit H NEK2 NIMA-related kinase 2 NUSAP1 nucleolar and spindle associated protein 1 SPC25 SPC25, NDC80 kinetochore complex component ZWINT ZW10 interacting kinetochore protein

5. Correlation of Correlation Gene Set and Prognosis of Lung Cancer in LADC

5.1. Relationship between G64 gene set and lung cancer prognosis

We have determined whether the down-regulation or up-regulation of G64 is related to the clinical characteristics of LADC patients, such as prognosis and survival rate. For this purpose, clinical data on 488 LADC samples retrieved from TCGA were analyzed by chi-squared test and univariate logistic regression analysis. Results showed that age, It is significantly associated with down-regulation or up-regulation of the G64 gene with several clinical factors including age of initial diagnosis, stage of cancer, smoking or vital. . Four variables related to smoking habits such as smoking status, tabacco year, tabacco reformed year, and tabacco pack per year were all associated with the expression pattern of G64. For example, smokers, smokers, organs who smoked a lot of tobacco tend to have G64 upwards. Multinomial logistic regression analysis was performed to analyze the independent effects of G64 expression changes. The results of the analysis showed that the upregulation of G64 was closely related to the poor prognosis of the patient compared to the downregulation of G64. The odds of vital of the vital in the single cell group with G64 downregulated was 2.286. For example, in the smoking cessation variable, the odds ratio was 3.508, which showed the highest tendency for the upward adjustment of G64 (Table 3). These results suggest that G64, identified as a classifier in 34 LADC single cells, may be an excellent tumor-classifying factor for lung adenocarcinoma in particular.

5.2. Association of G10 gene set with lung cancer prognosis

We also analyzed the level of expression of G10, a gene set that can be classified as a subset of lung cancer, according to gene expression level, in relation to the treatment outcome, survival rate, or clinical characteristics of patients. Kaplan-Meier survival analysis showed that the survival rate of the down-regulated group (C1, n = 105) was lower than that of the up-regulated group (C3, n = 191) (P = 0.002) (Fig. 9). The above results indicate that the survival rate of the G10-down-regulated C1 population is better than that of the G10 up-regulated C3 population. Univariate logistic regression and multivariate logistic regression analysis were performed to determine the relationship between the expression pattern of G10 and the clinical characteristics of LADC patients. Univariate logistic regression analysis showed that all clinical features except for sex (p = 0.054) of LADC patients were significantly different from those of G10 expression (Table 4). In addition, the polynomial logistic regression analysis showed that the smoking cessation rate was the most significant difference (p = 0) in the expression of G10 despite the intrinsic correlation between clinical characteristics of LADC patients (Table 5) .

6. Clinical relevance of correlated gene sets and cancer

Co-expression of the genes that make up G64 can provide an interesting viewpoint on cancer that has not yet been studied, such as lung cancer. Therefore, we examined how G64 was differentially expressed in 501 lung squamous cell carcinoma (LUSC) samples retrieved from TCGA. Interestingly, 501 LUSC samples were also classified by G64, although not significant compared to the LADC results. Of the total 501 LUSC samples, only 21% (n = 107) were classified in the G64 downregulation group, while in the LADC sample, approximately 50% were included in the G64 downregulation group. These results indicate that LADC patients (P <0.0001, chi-sqaure test). The difference between LADC and LUSC was confirmed in 88 Korean lung cancer samples. Interestingly, the co-regulated patterns of these G64 genes, as described above, may change the ratio of G64 up or down regulated patterns depending on the carcinoma, but other cancers found in TCGA, such as breast, kidney, and hepatocellular carcinoma It is confirmed that they appear in common. We analyzed other cancers that were searched in the TCGA to classify other carcinomas using the "G10" gene set or to be able to diagnose the prognosis. As a result, it was confirmed that kidney renal papillary cell carcinoma (KIRP) and liver hepatocellular carcinoma (LIHC) were classified according to the expression pattern of G10. In the case of renal cancer, the expression of G10 was classified as a down-regulated C1 group and an up-regulated C3 group, which was also confirmed in PCA analysis (panels A and B in FIG. 10). The survival rate of the C1 group was significantly higher than that of the C3 group (p = 0.001) in the prognosis of renal cancer patients (panel C in FIG. 10). In the case of liver cancer, the expression of G10 was down-regulated in the C1 group; C2 group with weakly regulated downward; And up-regulated C3 groups, which were also confirmed in the PCA analysis (panels A and B in Figure 11). The survival rate of the C1 group was significantly higher than that of the C3 group (p = 0.000) in the prognosis of patients with liver cancer (panel C in FIG. 11).

The specific embodiments described herein are representative of preferred embodiments or examples of the present invention, and thus the scope of the present invention is not limited thereto. It will be apparent to those skilled in the art that modifications and other uses of the invention do not depart from the scope of the invention described in the claims.

Claims

A composition for determining the classification or prognosis of lung cancer, renal cancer or liver cancer, which comprises an agent capable of measuring the expression status of G10 gene consisting of NCAPH, CDCA5, CDCA8, SPC25, DLGAP5, NUSAP1, NEK2, CCNB1, KIAA1524 and ZWINT .

The composition according to claim 1, wherein the expression level of the gene is an expression level of the gene.

delete

The composition according to claim 1, wherein the preparation is a nucleic acid sequence encoding a G10 gene consisting of NCAPH, CDCA5, CDCA8, SPC25, DLGAP5, NUSAP1, NEK2, CCNB1, KIAA1524 and ZWINT, or a probe capable of hybridizing to the complementary sequence wherein the probe is a probe or a primer.

A kit for determining the classification or prognosis of a lung cancer, kidney cancer or liver cancer comprising the composition of any one of claims 1, 2, and 5 as an active ingredient.

A biomarker composition for determining the classification or prognosis of lung cancer, kidney cancer or liver cancer comprising the G10 gene consisting of NCAPH, CDCA5, CDCA8, SPC25, DLGAP5, NUSAP1, NEK2, CCNB1, KIAA1524 and ZWINT as an active ingredient.

delete

Expression status of G10 gene consisting of NCAPH, CDCA5, CDCA8, SPC25, DLGAP5, NUSAP1, NEK2, CCNB1, KIAA1524 and ZWINT from biological samples to provide information necessary for classification or prognosis of lung cancer, kidney cancer or liver cancer / RTI >

delete

10. The method according to claim 9, wherein the biological sample is a supernatant of tissue, whole blood, serum, plasma, body fluid, urine, cell, cell lysate or cell culture.