WO2022097844A1

WO2022097844A1 - Method for predicting survival prognosis of pancreatic cancer patients by using gene copy number variation information

Info

Publication number: WO2022097844A1
Application number: PCT/KR2021/001162
Authority: WO
Inventors: 공선영; 한성식; 우상명; 김민경; 기창석; 조은해; 이태림
Original assignee: 국립암센터; 주식회사 녹십자지놈
Priority date: 2020-11-04
Filing date: 2021-01-28
Publication date: 2022-05-12
Also published as: KR20220060198A

Abstract

The present invention relates to a method of providing information for predicting the survival prognosis of pancreatic cancer patients. Specifically, the present invention relates to: a method of providing information for predicting the survival prognosis of pancreatic cancer patients, the method being characterized by the use of a gene variation specific for pancreatic cancer, in particular, a gene copy number variation (CNV); and a use thereof. The method of providing information for predicting the survival prognosis of pancreatic cancer patients according to the present invention is highly accurate due to predicting the survival prognosis on the basis of a copy number variation for a pancreatic cancer survival prognosis-specific gene and thus can increase utility related to the prediction of treatment effects and survival prognosis, and does not require whole-genome sequencing and thus is fast and therefore useful.

Description

A method to predict the survival prognosis of pancreatic cancer patients using gene copy number mutation information

The present invention relates to an information providing method for predicting the survival prognosis of a pancreatic cancer patient, specifically, a pancreatic cancer patient characterized by using a gene mutation specific to pancreatic cancer, in particular, a gene copy number variation (CNV). It relates to a method for providing information for predicting the survival prognosis of and its use.

The pancreas is located behind the stomach and in the middle of the body and is about 20 cm long. It is surrounded by organs such as the stomach, duodenum, small intestine, large intestine, liver, gallbladder, and spleen. The total length is about 15 to 20 cm, and the weight is about 100 g, and it is divided into a head, a body, and a tail. The pancreas has an exocrine function that secretes digestive enzymes that break down carbohydrates, fats, and proteins in the ingested food, and an endocrine function that secretes hormones such as insulin and glucagon that control blood sugar.

Pancreatic cancer is a mass (tumor mass) made up of cancer cells in the pancreas. There are several types of pancreatic cancer, and pancreatic ductal adenocarcinoma generated from pancreatic duct cells accounts for about 90% of pancreatic ductal adenocarcinoma. In addition, there are cystic cancer (cystic adenocarcinoma) and endocrine tumors.

Pancreatic cancer is difficult to detect early because there are no specific early symptoms. Loss of appetite, weight loss, etc. appear, but it is not a characteristic symptom of pancreatic cancer, but may appear sufficiently in other diseases.

In addition, the pancreas is thin, about 2 cm thick, and is surrounded by only a capsule, and it is in close contact with the superior mesenteric artery that supplies oxygen to the small intestine and the portal vein that transports nutrients absorbed from the intestine to the liver, so cancer invasion occurs easily. In addition, it is characterized by early metastasis to the nerve bundles and lymph nodes in the back of the pancreas. In particular, pancreatic cancer cells grow rapidly.

Pancreatic cancer is the 14th most common cancer in the world, and its incidence is remarkably increasing, and it ranks fourth among the major causes of cancer deaths in the United States. The initial symptoms of pancreatic cancer are not specific, and clinical symptoms such as weakness, loss of appetite, and weight loss occur after systemic metastasis has already occurred, so regular diagnosis is necessary.

Pancreatic cancer has a poor prognosis, with only 1 to 4% of patients showing a 5-year survival rate after surgery, and a median survival of 5 months. Since 80 to 90% of patients are found in a state in which curative resection, which is expected to be cured at the time of diagnosis, is not possible, treatment mainly relies on anticancer therapy. Anticancer drugs known to be effective in pancreatic cancer so far include 5-fluorouracil, gemcitabine, and tarceva, but their effectiveness is extremely low and the response rate is only around 15%. Therefore, there is an urgent need to develop more effective early diagnosis and treatment methods to improve the prognosis of pancreatic cancer patients.

On the other hand, various tests such as karyotyping, fluorescence allotropy, chromosome microarray, and NGS-based screening tests are performed to check for chromosomal abnormalities including DNA copy number mutations (CNVs) that appear due to a deficiency or duplication of a part of the chromosome. (Capalbo A, et al., Hum Reprod. Vol. 32(3), pp. 492-498, 2017). Karyotyping has a lower resolution of about 5 Mb compared to other tests, and it is impossible to detect a chromosomal deletion/duplication of a smaller size. Chromosomal deletions and duplications of less than 5 Mb are referred to as microdeletions/duplications, and the ratio of microdeletions/duplications among single-gene diseases corresponds to 15% of all mutations (Vissers LE, et al., Hum Mol). Genet. Vol. 15;14 Spec No. 2:R215-23., 2005).

In order to detect such microdeletions/duplications, fluorescence in situ hybridization (FISH) using a probe complementary to a specific nucleotide sequence and a chromosome microarray test are being conducted. Fluorescence in situ hybridization is a test method that confirms the presence of a specific nucleotide sequence in a chromosome by attaching a fluorescent label to a probe complementary to the nucleotide sequence to be confirmed. Since it shows a resolution of 100kb-1Mb, it is possible to detect microdeletions/duplications, but there is a disadvantage that only known mutations can be detected because only the parts complementary to the probe sequence can be identified.

Currently, microarray-based comparative genomic hybridization (aCGH) is being used as the most common test method to check chromosomal microdeletion/duplication (Russo CD, et al., Cancer Discov. Vol. 4(1), pp. 19-21, 2014). The size of the CNV detectable through the microarray is determined by the density of the probe, and it is possible to detect CNVs with a size of approximately 50 kb. However, chromosomal abnormalities due to chromosomal rearrangement, such as translocation or inversion, cannot be detected.

Next-generation sequencing (NGS) is a sequencing method that divides chromosomes into small pieces and analyzes the genetic information of each piece in parallel. With the development of genetic analysis technology, NGS is being used as a screening test for genetic diseases in newborns because of its relatively short test time and cost, and its high resolution capable of detecting single nucleotide polymorphisms (SNPs) and indels (INDELs). However, due to the principle nature of NGS, which divides and analyzes chromosomes into small ones, there are technical limitations in detecting structural variations or CNVs of large-scale chromosomes (Yohe S, Thyagarajan B., Arch Pathol Lab Med. Vol. 141(11), pp. 1544-1557, 2017.).

However, NGS is capable of detecting chromosomal abnormalities caused by chromosomal rearrangements that cannot be detected in probe-based microarrays and new previously unknown CNVs (Talkowski ME, et al., Am J Hum Genet. Vol. 88 (Talkowski ME, et al., Am J Hum Genet. Vol. 88) 4), pp. 469-81, 2011). In addition, it has the advantage of showing higher coverage and resolution than microarrays and detecting breakpoints where chromosomal abnormalities start due to the characteristic of fragmenting the chromosome into small pieces and analyzing the nucleotide sequence (Zhao M, et al., BMC Bioinformatics) (Vol. 14, Suppl 11:S1, 2013).

On the other hand, nucleic acid copy number variation (DNA Copy Number Variation, CNV) refers to a phenomenon in which a specific region of the genome is deleted or amplified. The copy number variation of may be as follows.

1. ABEFG (- C - D - Area Deletion)

2. ABCDDDDDDEFG ( -D -region Amplification)

When nucleic acid fragment data of a person with a deletion mutation is aligned with a human reference chromosome, a smaller amount of nucleic acid fragment is obtained (reduction in the number of copies) compared to a person without mutation in the mutation region, and by the same logic, a person with an amplification mutation is When nucleic acid fragment data is aligned with a reference chromosome, a larger amount of nucleic acid fragments are obtained (increased number of copies) compared to a person without mutation in the corresponding mutation region.

Such nucleic acid copy number variation can affect the prognosis of pancreatic cancer in various ways, for example, an increase in the amount of gene expression due to an increase in the copy number of oncogenes, proto-oncogenes, etc., cancer suppressor genes (tumor suppressor gene) is known to have a good or bad effect on the prognosis of pancreatic cancer according to a decrease in the amount of gene expression due to a decrease in the copy number or a change in the gene expression amount due to a change in the copy number of other genes.

Recently, it was reported that the expression patterns of LYRM1, KNTC1, IGF2BP2 and CDC6 genes are related to the survival prognosis of pancreatic cancer (Xiaokai Yan et al., Cancer Manag Res., Vol. 11, pp. 273-283, 2019). .

However, a method for predicting the survival prognosis of pancreatic cancer patients with high accuracy and sensitivity based on pancreatic cancer-specific genetic mutations, particularly gene copy number mutations, is not known, and there is an urgent need for such a technology.

Under this technical background, the present inventors made intensive efforts to develop a method for predicting the survival prognosis of pancreatic cancer patients based on copy number mutation. As a result, it was found that the presence or absence of copy number mutation in a specific gene is closely related to the survival prognosis of pancreatic cancer patients. And it was confirmed that the prognosis of pancreatic cancer patients, especially the survival prognosis, can be accurately predicted by using this.

발명의 요약Summary of the invention

An object of the present invention is to provide a method of providing information for predicting the survival prognosis of pancreatic cancer patients.

Another object of the present invention is to provide an apparatus for providing information for predicting survival prognosis of pancreatic cancer patients.

Another object of the present invention is to provide information for predicting survival prognosis of a pancreatic cancer patient by the above method, and to provide a computer-readable recording medium including instructions configured to be executed by a processor.

Another object of the present invention is to provide a kit for amplifying a target nucleic acid used in the method.

Another object of the present invention is to provide a method for predicting survival prognosis of pancreatic cancer patients.

Another object of the present invention is to provide an apparatus for predicting the survival prognosis of a pancreatic cancer patient by the above method.

Another object of the present invention is to predict the survival prognosis of a pancreatic cancer patient by the above method, and to provide a computer-readable recording medium including instructions configured to be executed by a processor.

In order to achieve the above object, the present invention provides ABHD6, ACVR2B, ADCY8, ARHGEF10, ATF6, ATP13A4, BCAT1, BCL2, BMP1, C8orf12, C9orf92, CASC1, CCBE1, CDCP1, CDKN2A, CSGALNACT1, DLGAP2, DMRT1, DOCK5, DMRT1, ERICH1-AS1, FAM135B, FAM49B, FER1L6, FLNB, GATA4, GLDC, GLIS3, GSDMC, IFLTD1, ISPD-AS1, ITPR2, KANK1, KCNMB2, KHDRBS3, KIAA0196, KRAS, LARS2-AS1, LINC00578INC00, LMLINC00INC00 LOC100128993, LINC02052, LRMP, LRRC6, LTF, MAP4, MCPH1, MFHAS1, NAALADL2, NIN, NXPH1, OPA1, PEBP4, PHF20L1, PHLPP1, PSD3, RASSF8, RPA3-AS1, SEROX5, SGKLC2, SFMBT3, SGKLC223 Copy number variation (CNV) information of one or more genes selected from the group consisting of SQLE, TATDN1, TBL1XR1, THSD7A, TMEM110, TMEM110-MUSTN1, TMEM196, TMEM65, TMEM71, ZFP30, ZNF569, ZNF577 and ZNF583 Use, specifically, provides an information providing method for predicting survival prognosis of pancreatic cancer patients, characterized in that it comprises the step of detecting a copy number mutation of the gene.

The present invention also provides an information providing apparatus used in the information providing method for predicting the survival prognosis of the pancreatic cancer patient, and a computer readable recording medium including instructions for performing the information providing method.

The present invention also provides ABHD6, ACVR2B, ADCY8, ARHGEF10, ATF6, ATP13A4, BCAT1, BCL2, BMP1, C8orf12, C9orf92, CASC1, CCBE1, CDCP1, CDKN2A, CSGALNAS1, DLGAP2, DMRT1, DOCK135B, DPYSL2, ERICH1-DPYSL2 FAM49B, FER1L6, FLNB, GATA4, GLDC, GLIS3, GSDMC, IFLTD1, ISPD-AS1, ITPR2, KANK1, KCNMB2, KHDRBS3, KIAA0196, KRAS, LARS2-AS1, LINC00477, LINC00578, LINC00639, LMLN02, LOC10012899 LRRC6, LTF, MAP4, MCPH1, MFHAS1, NAALADL2, NIN, NXPH1, OPA1, PEBP4, PHF20L1, PHLPP1, PSD3, RASSF8, RPA3-AS1, SERPINB5, SFMBTAT1, SGK223, SLC38A3, SMARCA2, SOXDN5, SMARCA2 a probe that specifically binds to one or more genes selected from the group consisting of THSD7A, TMEM110, TMEM110-MUSTN1, TMEM196, TMEM65, TMEM71, ZFP30, ZNF569, ZNF577 and ZNF583; Or it provides a kit for amplifying a target nucleic acid comprising a primer for amplifying the pancreatic cancer-specific gene.

In addition, the present invention provides the use of the kit for amplifying the target nucleic acid to predict the survival prognosis of pancreatic cancer patients.

In addition, the present invention relates to ABHD6, ACVR2B, ADCY8, ARHGEF10, ATF6, ATP13A4, BCAT1, BCL2, BMP1, C8orf12, C9orf92, CASC1, CCBE1, CDCP1, CDKN2A, CSGALNAS1, DLGAP2, DMRT1, DOCK5, ERICH1-DPYSL2, ERICH1 , FAM49B, FER1L6, FLNB, GATA4, GLDC, GLIS3, GSDMC, IFLTD1, ISPD-AS1, ITPR2, KANK1, KCNMB2, KHDRBS3, KIAA0196, KRAS, LARS2-AS1, LINC00477, LINC00578, LINC05C00639, LMLNC02 LINC00639 , LRRC6, LTF, MAP4, MCPH1, MFHAS1, NAALADL2, NIN, NXPH1, OPA1, PEBP4, PHF20L1, PHLPP1, PSD3, RASSF8, RPA3-AS1, SERPINB5, SFMBTOX1, SGK223, SFMBTOX1, TBL1XR1, TBLARCA3, SMARCA5 , THSD7A, TMEM110, TMEM110-MUSTN1, TMEM196, TMEM65, TMEM71, ZFP30, ZNF569, ZNF577 and ZNF583 Using copy number variation (CNV) information of one or more genes selected from the group consisting of, specifically It provides a method for predicting survival prognosis of a pancreatic cancer patient, comprising the step of detecting a copy number mutation of the gene.

The present invention also provides an apparatus for predicting survival prognosis of a pancreatic cancer patient used in the method for predicting survival prognosis of a pancreatic cancer patient, and a computer-readable recording medium including instructions for performing the method.

1 is an overall flowchart of a method for providing information for predicting the survival prognosis of a pancreatic cancer patient according to the present invention.

2 is an example of the result of detecting copy number variation by applying the CBS algorithm according to the present invention. It means one segment.

3 is a result showing an amplification segment derived by GISTIC analysis according to the present invention. The lower X-axis value represents False Discovery Rate (FDR)-adjusted p value (Q value) value, and the upper X-axis is calculated in GISTIC analysis. The G-score value (the value calculated by calculating the frequency and intensity of CNV observed in 315 patients with pancreatic cancer) is shown, and the y-axis means the chromosome number.

4 is a result showing a deletion segment derived by GISTIC analysis according to the present invention. The lower X-axis value represents False Discovery Rate (FDR)-adjusted p value (Q value) value, and the upper X-axis is calculated in GISTIC analysis. The G-score value (the value calculated by calculating the frequency and intensity of CNV observed in 315 patients with pancreatic cancer) is shown, and the y-axis means the chromosome number.

5 is an example of a method of grouping genes by GISTIC analysis according to the present invention, (A) shows the Z value of the gene for each sample, (B) is a gene for each sample according to the standard of the present invention is the result of grouping

6 is a graph showing the number of genes derived by Kaplan-Meier (K-M) analysis according to an embodiment of the present invention.

7 is a graph comparing p-value values derived from K-M survival analysis of GSS_All or GSS_TopN for each set according to an embodiment of the present invention.

8 is a Venn diagram of a TopN gene for each set according to an embodiment of the present invention.

9 is a result of GSS_79 analysis of the survival prognosis of 183 pancreatic cancer patients obtained from the TCGA database using 79 genes selected according to an embodiment of the present invention.

10 is a result of analyzing the survival prognosis prediction performance of GSS_10 in pancreatic cancer patient data of the TCGA database according to an embodiment of the present invention.

11 is a result of analyzing the survival prognosis prediction performance of GSS_8 in pancreatic cancer patient data of the TCGA database according to an embodiment of the present invention.

12 is a summary of the prognostic prediction performance of GSS_79, GSS_10, and GSS_8 analyzed from pancreatic cancer patient data of the TCGA database according to an embodiment of the present invention.

발명의 상세한 설명 및 바람직한 구현예DETAILED DESCRIPTION OF THE INVENTION AND PREFERRED EMBODIMENTS

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. In general, the nomenclature used herein and the experimental methods described below are well known and commonly used in the art.

In terms of terms used herein, the singular expression should be understood to include a plural expression unless the context clearly dictates otherwise, and terms such as "comprises" include the specified feature, number, step, operation, and element. , parts or combinations thereof are to be understood, but not to exclude the possibility of the presence or addition of one or more other features or numbers, step operation components, parts or combinations thereof.

Also, in performing the method according to the present invention, each process constituting the method may occur differently from the specified order unless a specific order is clearly described in context. That is, each process may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.

The present invention relates to a method for providing information for predicting the prognosis of pancreatic cancer patients, particularly survival prognosis, characterized by using a gene mutation specific for pancreatic cancer, in particular, a copy number variation (CNV) of the gene, and its use. .

Specifically, the present invention relates to ABHD6, ACVR2B, ADCY8, ARHGEF10, ATF6, ATP13A4, BCAT1, BCL2, BMP1, C8orf12, C9orf92, CASC1, CCBE1, CDCP1, CDKN2A, CSGALNAS1, DLGAP2, DMRT1, DOCK5, ERICH1-DPYSL2, ERICH1 , FAM49B, FER1L6, FLNB, GATA4, GLDC, GLIS3, GSDMC, IFLTD1, ISPD-AS1, ITPR2, KANK1, KCNMB2, KHDRBS3, KIAA0196, KRAS, LARS2-AS1, LINC00477, LINC00578, LINC05C00639, LMLNC02 LINC00639 , LRRC6, LTF, MAP4, MCPH1, MFHAS1, NAALADL2, NIN, NXPH1, OPA1, PEBP4, PHF20L1, PHLPP1, PSD3, RASSF8, RPA3-AS1, SERPINB5, SFMBTOX1, SGK223, SFMBTOX1, TBL1XR1, TBLARCA3, SMARCA5 , THSD7A, TMEM110, TMEM110-MUSTN1, TMEM196, TMEM65, TMEM71, ZFP30, ZNF569, ZNF577 and at least one selected from the group consisting of ZNF5833, preferably 2 or more, more preferably 5 or more, most preferably is to an information providing method for predicting the prognosis of pancreatic cancer patients, in particular survival prognosis, comprising the step of detecting copy number mutations of 8 or more genes.

Specific information on genes used in the information providing method for predicting the prognosis of pancreatic cancer patients through gene copy number mutation according to the present invention, in particular, survival prognosis is as shown in Table 1.

Preferably, in the method for providing information for predicting the prognosis, particularly survival prognosis, of a pancreatic cancer patient through the detection of the gene copy number mutation according to the present invention, the genes for detecting the gene copy number mutation are ABHD6, CASC1, FAM49B, KANK1, LINC00477 , MCPH1, SOX5 and at least one selected from the group consisting of TATDN1, preferably 2 or more, more preferably 5 or more, most preferably all 8 types,

Additionally ACVR2B, ADCY8, ARHGEF10, ATF6, ATP13A4, BCAT1, BCL2, BMP1, C8orf12, C9orf92, CCBE1, CDCP1, CDKN2A, CSGALNACT1, DLGAP2, DMRT1, DOCK5, DPYSL2, ERICH135B, FER1, FAML , GLIS3, GSDMC, IFLTD1, ISPD-AS1, ITPR2, KCNMB2, KHDRBS3, KIAA0196, KRAS, LARS2-AS1, LINC00578, LINC00639, LMLN, LOC100128993, LINC02052, LRMP, AA LRRC6, LTFIN, NXMAP4, LTFIN, NXMAP4 , OPA1, PEBP4, PHF20L1, PHLPP1, PSD3, RASSF8, RPA3-AS1, SERPINB5, SFMBT1, SGK223, SLC38A3, SMARCA2, SQLE, TBL1XR1, THSD7A, TMEM110, TMEM110-MUSTN1, TMEM65, ZNF5, TMEM65, NFEM196, ZNF5 And ZNF583 is characterized in that it comprises one or more selected from the group consisting of.

Most preferably, the eight genes of ABHD6, CASC1, FAM49B, KANK1, LINC00477, MCPH1, SOX5 and TATDN1 may include, but are not limited to, KRAS and CDKN2A.

The present invention in one aspect,

(1) ABHD6, ACVR2B, ADCY8, ARHGEF10, ATF6, ATP13A4, BCAT1, BCL2, BMP1, C8orf12, C9orf92, CASC1, CCBE1, CDCP1, CDKN2A, CSGALNAS1, DLGAP2, DMRT1, DOCK5, FAMDPYSL2, EAMDPYSL2, EAMDPYSL2 , FER1L6, FLNB, GATA4, GLDC, GLIS3, GSDMC, IFLTD1, ISPD-AS1, ITPR2, KANK1, KCNMB2, KHDRBS3, KIAA0196, KRAS, LARS2-AS1, LINC00477, LINC00578, LINC00639, LMLN, LR LOC100128993, INC05, LOC100639, LMLN , LTF, MAP4, MCPH1, MFHAS1, NAALADL2, NIN, NXPH1, OPA1, PEBP4, PHF20L1, PHLPP1, PSD3, RASSF8, RPA3-AS1, SERPINB5, SFMBT1, SGK223, SLC38A3, THBLSD1 SMARCA2, TOXAT , TMEM110, TMEM110-MUSTN1, TMEM196, TMEM65, TMEM71, ZFP30, ZNF569, ZNF577 and ZNF583 Detecting copy number variation (CNV) of one or more genes selected from the group consisting of, and copying the detected gene quantifying the degree of number variation; and

(2) If the number of genes whose copy number variation degree of the gene quantified in step (1) exceeds the cut-off value, it is determined that the survival prognosis of the pancreatic cancer patient is bad. step;

It relates to a method of providing information for predicting survival prognosis of pancreatic cancer patients, including.

Preferably, the step (1) may be characterized in that it is performed by a method including the following steps, but is not limited thereto.

(1-1) obtaining DNA sequence information (reads) of a target sample obtained from a biological sample;

(1-2) aligning the sequence information (reads) to a reference genome database of a reference group;

(1-3) checking the quality of the aligned sequence information (reads); and

(1-4) detecting the copy number variation of the gene and quantifying the degree of copy number variation

For example, in step (1), the copy number variation may be detected using a Digital Droplet Polymerase Chain Reaction (ddPCR) or Multiplex Ligation-dependent Probe Amplification (MLPA) method.

ddPCR is an experimental method that amplifies and quantifies the amount of target DNA by separating (about 20 μl) PCR reaction solution into (about 20,000) fine droplets. (amplified), 0 (not amplified) digital signals are recognized and counted, and the number of copies of the target DNA can be calculated through the Poisson distribution.

For example, preferably, step (1) may be characterized in that it is performed by a method including the following steps, but is not limited thereto.

(1-1) amplifying the DNA of the target sample obtained from the biological sample by ddPCR;

(1-2) counting whether the gene is amplified; and

(1-3) Detecting the copy number variation of the gene through the Poisson distribution, and quantifying the degree of copy number variation.

In addition, multiplex ligation-dependent probe amplification (MLPA) is a method that can check the presence or concentration of a target site by hybridizing a probe to a target site, ligation, and amplifying the product by PCR. It can be used to search for duplicate mutations.

For example, the step (1) may be characterized in that it is performed by a method including the following steps, but is not limited thereto.

(1-1) ligation by treating the DNA of the target sample obtained from the biological sample with a probe capable of specifically binding to the gene;

(1-2) amplifying the ligation product of the gene and the probe by PCR; and

(1-3) analyzing the amplification product, detecting a copy number variation of the gene, and quantifying the degree of copy number variation.

In addition, the step (1-4) may be characterized in that it is performed by a method including the following steps, but is not limited thereto.

(a) After counting the number of reads for each gene section of a reference sample without gene copy number variation, the value of the number of reads aligned in each gene section is divided by the total number of reads in the sample, and the GC content calculating a depth average (Reference_Mean_Depthgene) and a standard deviation value (Reference_SDgene) of a reference sample for each gene section by performing a step of correcting a depth bias by (contents);

(b) after counting the read count for each gene section of the aligned target sample obtained in step (1-3), the value of the number of reads aligned in each gene section is calculated as the total number of reads in the sample calculating a normalized depth value of a target sample for each gene section by dividing by . and

(c) the depth average (Reference_Mean_Depthgene) and standard deviation value (Reference_SDgene) of the reference sample obtained in step (a) and the normalized depth value of the target sample obtained in step (b) based on the following formula calculating a Z (Zgene) value for each normalized gene section of the aligned sequence information using 1;

Formula 1:

Z _gene = (Normalized_Depth _gene - Reference_Mean_Depth _gene ) / Reference_SD _gene

In the present invention, any method known to those skilled in the art may be used as a method of correcting the depth bias by the GC amount.

In the present invention, the amount of GC means a value representing the ratio of G to C among the nucleotide sequences A, T, G, and C constituting a specific region (gene, bin, etc.). For example, when there is a sequence called ATTCGCACATCCCGCACACT, the number of A, T, G, and C bases among the total 20 base sequences constituting this sequence is 5, 4, 2, and 9, respectively, of which the G and C bases are The ratio (2+9) / 20 = 55% is the amount of GC in this sequence.

In general, it is known that the read depth varies depending on the amount of GC in the bin when the read depth analysis is performed in units of bins. That is, as the amount of GC increases, the depth value shows a specific tendency.

In order to correct the depth bias according to the amount of GC, the following method can be applied.

First, if the GC amount of all bins to be analyzed is rounded to one decimal place and calculated, several bins with one GC amount value exist. The median depth value of these bins is determined as the representative depth value of this GC amount. do.

For example, if the depth values of Bin1, Bin2, Bin3, Bin4, and Bin5 were 10, 20, 30, 40, and 50, respectively, and the GC amount was 31.5, 31.5, 31.5, 28.4, and 28.4, respectively, the GC amount in this sample was The representative depth value when 31.5 is median(10, 20, 30) = 20, and when the GC amount is 28.4, the representative depth value is median(40, 50) = 45.

After calculating the representative depth value of all the GC amount values that can come out of one sample in the above method, the GC amount is received as an input (independent variable) using the LOESS (Locally Estimated Scatterplot Smoothing) algorithm, and the representative depth is predicted (dependent). variable) to build a regression model.

The depth value predicted through the built-up regression model can be considered as a depth bias according to the amount of GC, and the depth is corrected according to the amount of GC by subtracting this depth bias value from the depth value calculated for each bin (Equation 2)

Equation 2: GC-corrected Depth = Depth _bin - LOESS Predicted Depth _bin

As used herein, the term "reads" refers to nucleic acid fragments obtained by analyzing sequence information using various methods known in the art. Therefore, in the present specification, the terms “sequence information” and “lead” have the same meaning in that they are the result of obtaining sequence information through a sequencing process.

In the present invention, the term “bin” is used synonymously with a certain section or section, and refers to a portion of the entire chromosome sequence having a specific size.

The size of a certain section (bin) in the present invention may be characterized in that it is 10 to 100,000 kbp, preferably 50 to 50,000 kbp, more preferably 100 to 10,000 kbp, and most preferably 500 to 5,000 kbp. It is not limited.

In the present invention, the term “reference sample” refers to a sample of a reference group that can be compared like a standard sequence database, and is a sample obtained from a group of people who do not currently have a specific disease or condition. In the present invention, the standard nucleotide sequence in the standard chromosomal sequence database of the reference sample may be a reference chromosome registered with a public health institution such as NCBI.

As used herein, the term “biological sample” refers to a sample obtained from a living body of an animal such as a human, preferably selected from blood, abdominal fluid, tissue, saliva, urine, hair, feces, spinal fluid, cerebrospinal fluid, and bile fluid. It may be characterized as one or more, but is not limited thereto.

In the present invention, the DNA of the target sample obtained from the biological sample can be used without limitation as long as it is a fragment of nucleic acid extracted from the biological sample, preferably cell-free DNA, exosomal DNA, or a fragment of intracellular nucleic acid. may be, but is not limited thereto.

In the step of determining that the survival prognosis of the pancreatic cancer patient is bad when the number of genes in which the quantified degree of variation in the copy number of the gene exceeds a cut-off value, the copy number of the gene The degree of variation can be quantified and determined based on the Z (Z _gene ) value, and the normal range of the Z value is -1 to 1, preferably -1.5 to 1.5, more preferably -2 to 2 However, the present invention is not limited thereto, and may be flexibly set according to the purpose or accuracy of the diagnosis.

In addition, the reference value (cut-off) may be set to a value of 10% or more, preferably 20% or more, more preferably 30% or more, and most preferably 40% of all target genes, for example, In the case of detecting the degree of gene copy number variation for 40 genes, the cut-off may be set to 4 at 10%, preferably 8 at 20%, and more preferably at 12 at 30%. However, the present invention is not limited thereto.

In particular, when 10 genes of ABHD6, CASC1, FAM49B, KANK1, LINC00477, MCPH1, SOX5 and TATDN1, KRAS and CDKN2A are used, the cut-off is one at 10%, preferably two at 20%. , more preferably 30%, but may be set to three, but is not limited thereto.

In the present invention, reads can be obtained by, but not limited to, massively parallel sequencing methods. The massively parallel sequencing method is preferably performed as a next-generation sequencing (NGS) method, but is not limited thereto.

In the present invention, the next-generation sequencing method may be performed by any method known in the art using a next-generation sequencer. Next-generation sequencing includes any sequencing method that determines the nucleotide sequence of either an individual nucleic acid molecule or a clonally extended proxy for an individual nucleic acid molecule in a highly similar manner (e.g., 105 or more molecules are sequenced simultaneously do). In one embodiment, the relative abundance of a nucleic acid species in a library can be estimated by counting the relative number of occurrences of its cognate sequence in data generated by sequencing experiments. Next-generation sequencing methods are known in the art and are described, for example, in Metzker, M. (2010) Nature Biotechnology Reviews 11:31-46, which is incorporated herein by reference.

In one embodiment, next-generation sequencing is performed to determine the nucleotide sequence of an individual nucleic acid molecule (e.g., HeliScope Gene Sequencing system from Helicos BioSciences and Pacific Biosciences). PacBio RS system). In other embodiments, sequencing, e.g., mass-parallel short-read sequencing that yields more bases of sequence per sequencing unit (e.g., San Diego, CA) than other sequencing methods yielding fewer but longer reads. The Illumina Inc. Solexa sequencer method determines the nucleotide sequence of a cloned extended proxy for an individual nucleic acid molecule (e.g., Illumina, San Diego, CA). Illumina Inc. Solexa sequencer; 454 Life Sciences (Branford, Conn.) and Ion Torrent). Other methods or machines for next-generation sequencing include, but are not limited to, 454 Life Sciences (Branford, Conn.), Applied Biosystems (Foster City, CA; SOLiD Sequencer), Helicos. Bioscience Corporation (Cambridge, MA) and emulsion and microfluidic sequencing techniques Nano Droplets (eg, GnuBio Drops).

Platforms for next-generation sequencing include, but are not limited to, Roche/454's Genome Sequencer (GS) FLX System, Illumina/Solexa Genome Analyzer (GA). , Life/APG's Support Oligonucleotide Ligation Detection (SOLiD) system, Polonator's G.007 system, Helicos BioSciences' HeliScope Gene Sequencing system and Pacific Biosciences' PacBio RS system.

In the present invention, the alignment step is not limited thereto, but may be performed using the BWA algorithm and the Hg19 sequence. In the present invention, the BWA algorithm may include, but is not limited to, BWA-mem, BWA-ALN, BWA-SW or Bowtie2.

In the present invention, the step of obtaining DNA sequence information (reads) of the target sample obtained from the biological sample according to step (1-1) comprises:

(i) a nucleic acid purified by removing proteins, fats, and other residues from the isolated DNA using a salting-out method, a column chromatography method, or a beads method obtaining a;

(ii) preparing a single-end sequencing or pair-end sequencing library for the purified nucleic acid;

(iii) reacting the prepared library with a next-generation sequencer; and

(iv) obtaining sequence information (reads) of nucleic acids from the next-generation gene sequencing machine;

may be performed, including, but not limited to.

In addition, in the present invention, the step of checking the quality of the aligned sequence information (reads) according to step (1-3) is to select a sequence that satisfies the quality reference value of the mapping quality score It may be characterized in that it is carried out in a method comprising a step, but is not limited thereto.

In addition, in the present invention, the quality reference value may vary depending on a desired standard, but is preferably 15-70 points, more preferably 50-70 points, and most preferably 60 points, but is limited thereto. it is not

The information providing method for predicting the survival prognosis of a pancreatic cancer patient according to the present invention may include the following steps in one specific form, but is not limited thereto (see FIG. 1 ).

(1) Extraction of cell-free DNA (cfDNA) from plasma of peripheral blood

(2) Securing nucleic acid fragment data by massive parallel sequencing method

(3) Aligning the nucleic acid fragment data to a human reference genome

(4) Check the quality in the sorted data

(5) Detection of gene copy number mutations related to pancreatic cancer

(6) Derivation of mutation score

(7) Counting the number of genes whose mutation score is above the normal range

(8) Prediction of survival prognosis of pancreatic cancer patients

In another aspect, the present invention is an information providing device used in a method for providing information for predicting survival prognosis of a pancreatic cancer patient according to the present invention, the device comprising:

(1) a gene copy number mutation detection unit for detecting a copy number mutation of a gene in which a pancreatic cancer-specific gene copy number mutation occurs according to the present invention described in Table 1 and the like;

(2) a calculation unit that quantifies the degree of copy number variation based on the detected gene copy number variation information, and calculates the number of genes whose quantified gene copy number variation is outside a normal range; and

(3) a survival prognosis determining unit that determines that the survival prognosis is poor when the number of genes whose degree of gene copy number mutation is out of the normal range exceeds a reference value;

It relates to an information providing device comprising a.

The present invention is a computer-readable medium used for a method of providing information for predicting survival prognosis of a pancreatic cancer patient according to the present invention from another aspect, wherein the medium is provided by a processor providing information for predicting survival prognosis of a pancreatic cancer patient a command that is configured to be executed;

(1) detecting a copy number mutation of a gene in which a pancreatic cancer-specific gene copy number mutation occurs according to the present invention described in Table 1 and the like;

(2) quantifying the degree of copy number variation based on the detected gene copy number variation information, and calculating the number of genes whose quantified gene copy number variation is outside a normal range; and

(3) determining that the survival prognosis is poor when the number of genes whose degree of gene copy number mutation is out of the normal range exceeds a reference value;

It relates to a computer-readable medium comprising instructions configured to be executed by a processor comprising:

In another aspect, the present invention provides a kit for amplifying a target nucleic acid used in a method for providing information for predicting survival prognosis of a pancreatic cancer patient according to the present invention, the kit comprising:

a probe that specifically binds to a pancreatic cancer-specific gene according to the present invention described in Table 1 and the like; Or it relates to a kit for amplifying a target nucleic acid comprising a primer for amplifying a pancreatic cancer-specific gene according to the present invention described in Table 1 and the like.

In the present invention, the kit is a nucleic acid amplification reaction such as a buffer, DNA polymerase, DNA polymerase cofactor, and deoxyribonucleotide-5-triphosphate (dNTP) (eg, , polymerase chain reaction) may optionally include reagents necessary for carrying out. Optionally, the kit of the present invention may also include various oligonucleotide molecules, reverse transcriptase, various buffers and reagents, and antibodies that inhibit DNA polymerase activity. In addition, the optimal amount of the reagent used in a specific reaction of the kit can be easily determined by a person skilled in the art after learning the description herein. Typically, the equipment of the present invention may be manufactured as a separate package or compartment comprising the aforementioned components.

In one embodiment, the kit may include a compartmentalized carrier means for holding a sample, a container for containing reagents, and a container for containing primers or probes.

The carrier means is suitable for containing one or more containers, such as bottles and tubes, each container containing independent components for use in the method of the present invention. In the context of the present invention, one of ordinary skill in the art can readily dispense the required formulation in a container.

In another aspect, the present invention provides ABHD6, ACVR2B, ADCY8, ARHGEF10, ATF6, ATP13A4, BCAT1, BCL2, BMP1, C8orf12, C9orf92, CASC1, CCBE1, CDCP1, CDKN2A, CSGALNACT1, DLGAP1, DMRT1, DOCK5, DPYSL2 AS1, FAM135B, FAM49B, FER1L6, FLNB, GATA4, GLDC, GLIS3, GSDMC, IFLTD1, ISPD-AS1, ITPR2, KANK1, KCNMB2, KHDRBS3, KIAA0196, KRAS, LARS2-AS1, LINC00477, LINC00INC00578, LINCOC1001200INC00578 LINC02052, LRMP, LRRC6, LTF, MAP4, MCPH1, MFHAS1, NAALADL2, NIN, NXPH1, OPA1, PEBP4, PHF20L1, PHLPP1, PSD3, RASSF8, RPA3-AS1, SERPINB5, SFMBT1, SGK223, SFMOXLC38A3, SMK223 Using copy number variation (CNV) information of one or more genes selected from the group consisting of TATDN1, TBL1XR1, THSD7A, TMEM110, TMEM110-MUSTN1, TMEM196, TMEM65, TMEM71, ZFP30, ZNF569, ZNF577 and ZNF583; Specifically, it relates to a method for predicting survival prognosis of a pancreatic cancer patient, comprising the step of detecting a copy number mutation of the gene.

In another aspect, the present invention provides an information providing device used in a method for predicting survival prognosis of a pancreatic cancer patient according to the present invention, the device comprising:

(4) a gene copy number mutation detection unit for detecting a copy number mutation of a gene in which a pancreatic cancer-specific gene copy number mutation occurs according to the present invention described in Table 1 and the like;

(5) a calculation unit that quantifies the degree of copy number variation based on the detected gene copy number variation information, and calculates the number of genes whose quantified gene copy number variation is outside a normal range; and

(6) a survival prognosis determining unit that determines that the survival prognosis is poor when the number of genes whose degree of gene copy number mutation is out of the normal range exceeds the reference value;

It relates to a device for predicting survival prognosis of pancreatic cancer patients, comprising:

In another aspect, the present invention is a computer-readable medium used in a method for predicting survival prognosis of a pancreatic cancer patient according to the present invention, wherein the medium includes instructions configured to be executed by a processor for predicting the survival prognosis of a pancreatic cancer patient,

(4) detecting a copy number mutation of a gene in which a pancreatic cancer-specific gene copy number mutation occurs according to the present invention described in Table 1 and the like;

(5) quantifying the degree of copy number variation based on the detected gene copy number variation information, and calculating the number of genes whose quantified gene copy number variation is outside a normal range; and

(6) determining that the survival prognosis is poor when the number of genes whose degree of gene copy number mutation is out of the normal range exceeds a reference value;

In another aspect, the present invention provides a kit for amplifying a target nucleic acid used in the method for predicting survival prognosis of a pancreatic cancer patient according to the present invention, the kit comprising:

Hereinafter, the present invention will be described in more detail through examples. Hereinafter, the present invention will be described in more detail through examples. However, these examples are for illustrative purposes only, and the scope of the present invention is not limited to these examples.

Example

Hereinafter, the present invention will be described in more detail through examples. These examples are only for illustrating the present invention, and it will be apparent to those of ordinary skill in the art that the scope of the present invention is not to be construed as being limited by these examples.

Example 1. Detection of copy number variation in pancreatic cancer patients

DNA from 315 pancreatic cancer patients was extracted and a library for full-length chromosomes was prepared. The completed library was subjected to sequencing on NextSeq equipment (illumina, USA), and sequence information data of an average of 18.4 million reads per sample was produced.

After converting the Bcl file (including sequencing information) into fastq format in the next-generation sequencing (NGS) equipment, the fastq file was aligned with the library sequence based on the Hg19 sequence of the reference chromosome using the BWA-mem algorithm. Sequencing data confirmed that Q30 satisfies 80% or more and Mapping quality satisfies 60.

After dividing the chromosome into a certain section (1,000,000bp, bin), count the number of reads aligned in the bin, divide the read count value aligned in each bin by the total number of reads in the sample, and calculate the depth bias by GC contents in R Corrected using the loess function built into the stat package, which is the language's basic statistics package.

The above procedure is performed in the normal sample group without copy number variation (CNV), the mean and standard deviation values for each bin are calculated, and the normalized depth value is calculated for each bin by performing the above procedure in the sample group of pancreatic cancer patients, A standardized Z _bin value was obtained using Equation 3 below.

Formula 3:

Z _bin = (Normalized_Depth _bin - Reference_Mean_Depth _bin ) / Reference_SD _bin

Circular Binary Segmentation (CBS) algorithm was applied to the calculated Z value for each bin to detect (segmentation) a region with a different copy number from the periphery of the entire genome region (see FIG. 2). As shown in FIG. 2 , A is an example of an amplification segment in which the number of copies is increased compared to the surrounding, B is an example of a deletion segment in which the number of copies is decreased compared to the surrounding, and a continuous red line indicates one segment.

Example 2. Pancreatic cancer-specific gene region selection

2-1. Primary screening of pancreatic cancer-specific genomic regions

Performing the segment analysis using DNA samples obtained from 315 patients with pancreatic cancer,

For the copy number mutation region obtained in Example 1, using the Genomic Identification of Significant Targets in Cancer (GISTIC) algorithm, amplification and deletion regions commonly occurring in 315 pancreatic cancer patients were first selected.

As a result, as shown in FIGS. 3 and 4 , a total of 9 amplification regions and 6 deletion regions were selected.

The red figure on the left of FIGS. 3 and 4 shows the amplification segment region repeatedly observed in 315 pancreatic cancer patients, and the blue figure on the right shows the deletion segment region. In addition, the lower x-axis value of FIGS. 3 and 4 represents the false discovery rate (FDR)-adjusted p value (Q value) value, and the upper X-axis is the G-score value calculated in the GISTIC analysis (observed in 315 patients with pancreatic cancer). CNV frequency and intensity), and the y-axis is the chromosome number.

The derived coordinates of each area are shown in Table 2.

2-2. Secondary Screening of Gene Regions Relevant to Pancreatic Cancer Survival Prognosis

Among the pancreatic cancer-specific CNV regions selected through GISTIC analysis, regions related to pancreatic cancer survival prognosis were secondarily selected in units of genes.

Specifically, the Z value of the gene unit was calculated for 2,272 genes included in the coordinate region first selected in step 2-1. That is, after counting the number of reads for each gene section of a reference sample without gene copy number variation, the read count value aligned in each gene section is divided by the total number of reads in the sample, and the GC content ( contents), calculates the depth average (Reference_Mean_Depth _gene ) and standard deviation value (Reference_SD _gene ) of the reference sample for each gene section, and then reads for each gene section of the sorted target sample After counting the number (read count), the value of the number of reads aligned in each gene section is divided by the total number of reads of the sample, and depth bias by GC content is corrected, 2,272 After calculating the normalized depth value of the target sample for each gene section, the Z (Z _gene ) value for each normalized gene section of the aligned sequence information was calculated using Equation 1:

Formula 1:

Then, as described in FIG. 5, when the Z value of the gene unit calculated in each sample satisfies Z > 2 in the GISTIC Amplification region, the corresponding gene value of the sample is designated as group 1 (poor prognosis group), or , When the Z value of the gene unit calculated in each sample satisfies Z < -2 in the GISTIC Deletion region, the corresponding gene value of the sample was designated as group 1 (poor prognosis group), and when the above two conditions were not satisfied The corresponding gene value of the sample was assigned to group 0 (good prognosis group).

That is, as shown in FIG. 5, when Samples 1 to 315 are divided into 1 and 0 groups based on the Z value of gene Gene1 included in the GISTIC deletion region, Sample_2 and Sample_4 satisfying Z < -2 are designated as group 1 and the remaining samples can be designated as group 0, and when the groups of Samples 1 to 315 are divided based on the Z value of gene Gene3 included in the GISTIC Amplification region, Sample_1 and Sample_4 satisfying Z > 2 are designated as group 1. and the remaining samples can be designated as group 0.

Then, the difference in survival prognosis between the poor prognosis group (group 1) and the good prognosis group (group 0) of pancreatic cancer was compared in all 2,272 genes included in the GISTIC Peak Region. At this time, in order to prevent overfitting, if the number of samples included in one of the two groups was less than 20, the corresponding gene was excluded from the analysis target. In addition, the Five Fold Cross Validation (5-F CV) method was used by dividing the data of a total of 315 people into 5 to select genes while avoiding the problem of overfitting in limited data and to verify the prognostic performance of GSS.

With Kaplan-Meier survival analysis, the statistical significance of whether there was a difference in the survival period between the two groups (the survival period of group 1 was shorter than the survival period of group 0) was confirmed using the log-rank test. That is, all genes satisfying the p-value <0.05 condition or the top N genes among them were selected based on the raw p-value calculated from the log-rank test result.

For example, in CV_1, 229 genes satisfying the K-M analysis raw p-value < 0.05 among 2,272 genes were identified, and the K-M p-value calculated from these 229 genes was calculated from the smallest gene to 2 in order. , three, four … Select 50 and calculate all GSS_2 to GSS_50 (N = 2 to 50 tests), and as a result of checking the prognostic performance, when GSS was calculated using the top 36 genes (best N=36) 0, It was confirmed that the difference in survival between groups 1 was the largest, and the upper N value of CV_1 was 36.

As a result of K-M analysis in five CVs (Cross Validation), there were 229 genes in CV_1, 269 genes in CV_2, 301 genes in CV_3, 213 genes in CV_4, and 246 genes in CV_5 satisfying the raw p-value < 0.05 criteria. It was confirmed that dogs were derived (FIG. 6).

In addition, for each CV, GSS_All was calculated by adding the Z values of all genes that satisfy the raw p-value < 0.05. 40 in CV_3, 45 in CV_4, 38 in CV_4, and 47 in CV_5. The result of calculating GSS_TopN by adding the values of N genes in the order of the smallest raw p-value for each CV and finding the optimal cutoff value , CV_1 to N=36, cutoff=6, CV_2 to N=35, cutoff=7, CV_3 to N=15, cutoff=2, CV_4 to N=43, cutoff=7, CV_5 to N=33, cutoff=4 was confirmed (Table 3).

For optimal cut-off, for example, in CV_1, GSS_TopN was confirmed to be 36, so set all integer values between 1 and 35 (N-1=36-1) as cut-off values between 0 and 1 groups. The survival difference performance was checked, and the cutoff value showing the largest difference was selected. That is, in the case of GSS_Top36, cutoff = 1, 2, 3, ... 35 As a result of applying all 35 cut-offs in this way, cutoff = 6 was selected, and patients with GSS_Top36 values between 0 and 6 were classified as good (0) and patients with values between 7 and 36. When the patients were classified into the poor prognosis (1) group, it was confirmed that the difference in survival period between the two groups was the largest, and the cut-off was determined as 6.

Example 3. Confirmation of pancreatic cancer-specific gene region and survival prognosis prediction performance

3-1. GSS-Based Pancreatic Cancer Survival Prognosis Prediction

As a result of K-M survival analysis comparing the two groups by dividing the GSS values derived in Example 2 into good and poor pancreatic cancer survival prognosis groups based on the cutoff criteria in Table 3, as shown in FIG. Overall, there was a statistically significant difference in survival prognosis between the two groups (raw p-value <0.05), and a better p-value was confirmed in GSS_TopN than in GSS_All.

Also, in the test data, GSS_All showed a statistically significant difference in survival prognosis between the two groups in 4 out of 5 CVs and 3 in GSS_TopN. In the test data, in most cases, a better p-value was found in GSS_TopN than in GSS_All.

As shown in FIG. 8 , in the inclusion relationship of the Top N genes selected in each of the five CVs, there were a total of 79 genes that were selected (union) at least once in the five CVs (see Table 1), of which five Genes that were commonly selected (intersecting) in the entire CV of No. were identified as KANK1, ABHD6, CASC1, TATDN1, SOX5, FAM49B, LINC00477 and MCPH1 in total.

[Table 1]

3-2. GSS_79-based pancreatic cancer survival prognosis prediction performance verification

Gene-unit copy number mutation and survival prognostic data of 183 pancreatic cancer patients published to the public by The Cancer Genome Atlas (TCGA) Research Network led by the National Cancer Institute (NCI) (https://www.cbioportal) When .org/study/summary?id=paad_tcga_pan_can_atlas_2018) is used as external validation data, the GSS_79 value is calculated using the 79 genes selected in Example 3-1, and the cutoff criterion is 8 As a result of confirming the prognosis prediction performance, as shown in FIG. 9 , it was confirmed that the GSS_79 value represents a statistically significant difference in survival prognosis.

3-3. GSS_8+KRAS+CDKN2A-based pancreatic cancer survival prognosis prediction performance verification

KRAS and CDKN2A, which were judged to be meaningful in the existing bin unit analysis, passed the statistical criteria only in some CVs in the gene unit analysis (KRAS: 3 times, CDKN2A: 2 times). Although it was a gene that was significant only in some CVs, since they were important genes in the existing bin unit analysis, KRAS and CDKN2A were added to the above 8 genes, GSS_10 was calculated from 10 genes, and the prognosis prediction performance was confirmed by setting the cutoff criterion as 1. As a result, as described in FIG. 10 , it was confirmed that GSS_10 was p-value = 0.059.

3-4. GSS_8-based Pancreatic Cancer Survival Prognosis Prediction

In Example 3-1, the GSS_8 value was calculated using 8 genes that were commonly selected (intersection) in all five CVs in Example 3-1, and the prognostic performance was confirmed when the cutoff criterion was 1, as shown in FIG. Likewise, the GSS_8 value showed a statistically significant difference in survival prognosis. 12 is a summary of the prognostic prediction performance of GSS_79, GSS_10, and GS_8 in TCGA data.

As a specific part of the present invention has been described in detail above, for those of ordinary skill in the art, it is clear that this specific description is only a preferred embodiment, and the scope of the present invention is not limited thereby. will be. Accordingly, it is intended that the substantial scope of the present invention be defined by the appended claims and their equivalents.

The method of providing information for predicting the survival prognosis of pancreatic cancer patients according to the present invention predicts the survival prognosis based on the copy number variation for the pancreatic cancer survival prognosis-specific gene. Not only can it be increased, but it is useful because it does not require whole-genome sequencing.

Claims

ABHD6, ACVR2B, ADCY8, ARHGEF10, ATF6, ATP13A4, BCAT1, BCL2, BMP1, C8orf12, C9orf92, CASC1, CCBE1, CDCP1, CDKN2A, CSGALNACT1, DLGAP2, DMRT1, DOCK5, FLAMDPYSL2, FAMER1, FRICH6,BYSL2 FLNB, GATA4, GLDC, GLIS3, GSDMC, IFLTD1, ISPD-AS1, ITPR2, KANK1, KCNMB2, KHDRBS3, KIAA0196, KRAS, LARS2-AS1, LINC00477, LINC00578, LINC00639, LMLN, LRRC02 LMLN, LINC02, LINC6 MAP4, MCPH1, MFHAS1, NAALADL2, NIN, NXPH1, OPA1, PEBP4, PHF20L1, PHLPP1, PSD3, RASSF8, RPA3-AS1, SERPINB5, SFMBT1, SGK223, SLC38A3, SMARCA2, TMATOXDN1, SQLE, TBLXR38A3, SMARCA2, TMATOXDN1, SQLE, TMEM110-MUSTN1, TMEM196, TMEM65, TMEM71, ZFP30, ZNF569, ZNF577 and ZNF583 of at least one gene selected from the group consisting of copy number variation (CNV: copy number variation) pancreatic cancer comprising the step of detecting the A method of providing information for predicting patient survival prognosis.
The method of claim 1, wherein the gene is at least one selected from the group consisting of ABHD6, CASC1, FAM49B, KANK1, LINC00477, MCPH1, SOX5 and TATDN1.
The method of claim 2, wherein the genes are ABHD6, CASC1, FAM49B, KANK1, LINC00477, MCPH1, SOX5, TATDN1, KRAS and CDKN2A.
[4] The method according to any one of claims 1 to 3, wherein the method for providing information for predicting survival prognosis of a pancreatic cancer patient comprising the following steps:

(1) detecting the copy number variation of the gene and quantifying the degree of copy number variation of the detected gene; and

(2) If the number of genes whose copy number variation degree of the gene quantified in step (1) exceeds the cut-off value, it is determined that the survival prognosis of the pancreatic cancer patient is bad. step.
[Claim 5] The method according to claim 4, wherein step (1) is performed by a method comprising the following steps:

(1-1) obtaining DNA sequence information (reads) of a target sample obtained from a biological sample;

(1-2) aligning the sequence information (reads) to a reference genome database of a reference group;

(1-3) checking the quality of the aligned sequence information (reads); and

(1-4) detecting the copy number variation of the gene and quantifying the degree of copy number variation
[Claim 6] The method according to claim 5, wherein step (1-4) is performed by a method comprising the following steps:

(a) After counting the number of reads in each section of the reference sample without gene copy number variation, the value of the number of reads aligned in each gene section is divided by the total number of reads in the sample, and the GC content ( contents), calculating a depth average (Reference_Mean_Depth gene ) and a standard deviation value (Reference_SD gene ) of a reference sample for each gene section by performing a step of correcting a depth bias;

(b) after counting the number of reads for each section of the aligned target sample obtained in step (1-3), the value of the number of reads aligned in each gene section is used as the total number of reads in the sample calculating a normalized depth value of a target sample for each gene section by dividing and correcting a depth bias by GC contents; and

(c) the depth average (Reference_Mean_Depth gene ) and standard deviation value (Reference_SD gene ) of the reference sample obtained in step (b) and the normalized depth value of the target sample obtained in step (c) Based on (normalized depth) calculating Z (Z gene ) values for each normalized section of the aligned sequence information using Equation 1 below;

Formula 1:

Z gene = (Normalized_Depth gene - Reference_Mean_Depth gene ) / Reference_SD gene
[Claim 7] The method of claim 6, wherein the normal range of the Z value is -2 to 2.
The method of claim 7, wherein the cut-off of the number of genes outside the normal range of the Z value is 10% or more of the total number of genes.
According to claim 4, wherein in step (1), ddPCR (Digital Droplet Polymerase Chain Reaction) or MLPA (Multiplex Ligation-dependent Probe Amplification) method is used to detect the copy number mutation predicting survival prognosis of pancreatic cancer patients How to provide information for
According to claim 5, wherein the biological sample is blood, abdominal fluid, tissue, saliva, urine, hair, feces, spinal fluid, brain fluid and bile fluid, characterized in that at least one selected from the pancreatic cancer patient survival prognosis prediction information How to provide.
According to claim 5, wherein the DNA of the target sample obtained from the biological sample is cell-free nucleic acid (cell-free DNA) or exosomal nucleic acid (exosomal DNA) provides information for predicting survival prognosis of pancreatic cancer patients Way.
[Claim 6] The method according to claim 5, wherein step (1-1) comprises the following steps:

(i) a nucleic acid purified by removing proteins, fats, and other residues from the isolated DNA using a salting-out method, a column chromatography method, or a beads method obtaining a;

(ii) preparing a single-end sequencing or pair-end sequencing library for the purified nucleic acid;

(iii) reacting the prepared library with a next-generation sequencer; and

(iv) obtaining sequence information (reads) of nucleic acids in the next-generation gene sequencing machine.
[Claim 6] The survival prognosis prediction of a pancreatic cancer patient according to claim 5, wherein step (1-3) is performed by a method comprising selecting a sequence that satisfies a quality criterion value of a mapping quality score. How to provide information for
The method of claim 13, wherein the quality reference value is the alignment quality score of 15 to 70.
An information providing device used in a method of providing information for predicting survival prognosis of a pancreatic cancer patient according to any one of claims 1 to 14, the device comprising:

(1) a copy number mutation detection unit for detecting a copy number mutation of the gene;

(2) a calculation unit that quantifies the degree of copy number variation based on the detected gene copy number variation information, and calculates the number of genes whose quantified gene copy number variation is outside a normal range; and

(3) a survival prognosis determining unit that determines that the survival prognosis is poor when the number of genes whose degree of gene copy number mutation is out of the normal range exceeds a reference value;

Information providing device comprising a.
A computer-readable recording medium used for a method of providing information for predicting the survival prognosis of a pancreatic cancer patient according to any one of claims 1 to 14, wherein the medium provides information for predicting the survival prognosis of a pancreatic cancer patient instructions configured to be executed by a processor;

(1) detecting a copy number variation of the gene;

(2) quantifying the degree of copy number variation based on the detected gene copy number variation information, and calculating the number of genes whose quantified gene copy number variation is outside a normal range; and

(3) determining that the survival prognosis is poor when the number of genes whose degree of gene copy number mutation is out of the normal range exceeds a reference value;

A computer-readable recording medium comprising instructions configured to be executed by a processor comprising a.
A kit for amplifying a target nucleic acid used in the method of providing information for predicting the survival prognosis of a pancreatic cancer patient according to any one of claims 1 to 14, the kit comprising:

a probe that specifically binds to the gene; Or a kit for amplifying a target nucleic acid comprising a primer for amplifying the gene.