WO2021107232A1 - Method for forming biomarker panel for diagnosing ovarian cancer and biomarker panel for diagnosing ovarian cancer - Google Patents

Method for forming biomarker panel for diagnosing ovarian cancer and biomarker panel for diagnosing ovarian cancer Download PDF

Info

Publication number
WO2021107232A1
WO2021107232A1 PCT/KR2019/016773 KR2019016773W WO2021107232A1 WO 2021107232 A1 WO2021107232 A1 WO 2021107232A1 KR 2019016773 W KR2019016773 W KR 2019016773W WO 2021107232 A1 WO2021107232 A1 WO 2021107232A1
Authority
WO
WIPO (PCT)
Prior art keywords
ovarian cancer
biomarker
biomarker panel
data
prognosis
Prior art date
Application number
PCT/KR2019/016773
Other languages
French (fr)
Korean (ko)
Inventor
황소현
강혜윤
안희정
최민철
주원덕
김세화
김태헌
Original Assignee
의료법인 성광의료재단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 의료법인 성광의료재단 filed Critical 의료법인 성광의료재단
Priority to PCT/KR2019/016773 priority Critical patent/WO2021107232A1/en
Publication of WO2021107232A1 publication Critical patent/WO2021107232A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B35/00ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
    • G16B35/10Design of libraries
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis

Definitions

  • the present invention relates to a method of constructing a biomarker for ovarian cancer diagnosis.
  • One aspect comprises the steps of obtaining gene expression data of a plurality of individual biomarkers from a sample isolated from the subject; And to provide a method of constructing a biomarker panel for ovarian cancer diagnosis, comprising the step of selecting a gene related to the prognosis of ovarian cancer from the data.
  • Another aspect is to provide a biomarker panel for ovarian cancer diagnosis constructed by the method.
  • Another aspect is selected from the group consisting of ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, TLN1, USP19 and ZFAND6 It is to provide a biomarker panel for ovarian cancer diagnosis comprising an agent for measuring the expression level of one or more biomarkers.
  • Another aspect is ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, and TLN1, USP19 in a sample isolated from a subject. Measuring the expression level of one or more biomarkers selected from the group consisting of; and comparing the level of the biomarker with a corresponding result of the corresponding marker in a control sample.
  • One aspect comprises the steps of obtaining gene expression data of a plurality of individual biomarkers from a sample isolated from the subject; And it provides a method of constructing a biomarker panel for ovarian cancer diagnosis, comprising the step of selecting a gene related to the prognosis of ovarian cancer from the data.
  • the method includes acquiring gene expression data of a plurality of individual biomarkers from a sample isolated from a subject.
  • the subject is a subject for diagnosing ovarian cancer, for example, a subject for predicting the likelihood of ovarian cancer, a subject for diagnosing a state of ovarian cancer, a subject for determining prognosis, prevention or treatment of ovarian cancer It means a subject for determining the dosage of the drug for use, a subject for determining a treatment method according to the progression of ovarian cancer, and the like.
  • the subject may be a vertebrate, specifically mammals, amphibians, reptiles, birds, etc., and more specifically, may be mammals, for example, humans ( Homo sapiens ), and Koreans.
  • the sample may include a tissue, cell, whole blood, serum, plasma, saliva, sputum, cerebrospinal fluid, or urine sample isolated from the subject.
  • the data may be transcript expression data of a patient with ovarian cancer.
  • the data are, for example, in the top 30% of those with very large variance in gene expression using the Median Absolute Deviation (MAD) to select genes related to the prognosis of refractory ovarian cancer. It may be obtained by selecting the gene.
  • the ovarian cancer patient may be Korean.
  • the method includes selecting a gene related to the prognosis of ovarian cancer from the data.
  • the step may be to first select a gene having a large difference in the treatment prognosis of ovarian cancer from the data.
  • the difference in the treatment prognosis of the ovarian cancer may be confirmed, for example, through an AUC analysis or a log-Rank test of disease-free survival.
  • the step may include selecting a secondary gene through literature search among the primary selected genes.
  • the literature may be PMID (PubMed identifier): 28539603, 23160375, 27517492, 26988033, and the secondary gene can be selected by confirming whether the primary selected gene is related to a molecular biological pathway that causes or prevents cancer. have.
  • a gene related to the prognosis of ovarian cancer may be finally selected from the selected secondary gene through machine learning analysis.
  • the machine learning analysis may be performed, for example, through a random forest (RF) or decision tree model.
  • Random forest is a method proposed by Leo Breiman and Adel Cutler as a kind of bagging algorithm consisting of a combination of decision trees of CART.
  • the nodes of each tree are structured so that high-dimensional data can be divided into smaller pieces of lower dimensions so that they can be quickly classified.
  • Each of these trees completes the final classification by ensemble and voting.
  • Trees generated by a random vector with the same probability distribution are each independently constructed, and if the number of constructed trees is taken to infinity, misclassification is generalized and converges.
  • -Bag Random Selection without Replacement
  • decision tree model is a method of finding a decision rule based on the structure of a tree as one of the analysis techniques of data mining.
  • a decision tree is a method of finding a decision rule based on a tree structure. It is a powerful and widely used analysis technique that classifies or predicts a group of interest into several subgroups.
  • the general algorithm of a decision tree has different formation processes such as stopping rules and pruning. The rules to be
  • Child nodes are determined by determining which predictor variable is used to classify the distribution of the target variable best. Purity or other classification criteria are used to determine the degree to which the distribution of the target variable is distinguished. to measure it.
  • Stop Criteria A rule that specifies that the current node becomes a terminal node without further separation.
  • Pruning Decision trees with too many nodes are likely to have very large prediction errors when applied to new data. Therefore, it is desirable to select a decision tree having a sub-tree structure of an appropriate size as the final model by removing inappropriate nodes from the formed decision tree.
  • the method according to an embodiment may further include verifying the prognosis of the selected gene using The Cancer Genome Altas (TCGA) data.
  • TCGA The Cancer Genome Altas
  • TCGA The Cancer Genome Altas
  • the term "TCGA (The Cancer Genome Altas) data” refers to genetic mutation data related to cancer and is a huge data set that can comprehensively explain changes at the molecular level that occur in cancer cells. used when doing Using the data, it is possible to list specific genomic or molecular level changes in cancer, as well as define a more meaningful classification system for cancer types and subtypes.
  • the prognosis is predicted using TCGA data using the random forest model, and the selected gene is classified into a group with a good prognosis and a group with a poor prognosis. Thereafter, the two groups classified according to the predicted prognosis can be verified by checking whether the prognosis of disease-free survival is different as predicted by the random forest model.
  • Another aspect provides a panel of biomarkers for diagnosing ovarian cancer constructed by the method. Another aspect is selected from the group consisting of ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, TLN1, USP19 and ZFAND6 It provides a biomarker panel for ovarian cancer diagnosis, comprising an agent for measuring the expression level of one or more biomarkers.
  • the biomarker panel may include an agent for measuring the expression level of a biomarker of LMO7, RAB12, RPL23, TES, USP19, or a combination thereof. In another embodiment, the biomarker panel may include an agent for measuring the expression level of a biomarker of RPL23, USP19, or a combination thereof.
  • biomarker panel is constructed using any combination of biomarkers for ovarian cancer diagnosis, and the combination may refer to the entire set, or any subset or subcombination thereof. That is, the biomarker panel may refer to one set of biomarkers, and may refer to any type of biomarker to be measured. Thus, when RPL23 is part of a panel of biomarkers, for example, RPL23 mRNA or RPL23 protein can be considered to be part of that panel. While individual biomarkers are useful as diagnostic agents, sometimes combinations of biomarkers can provide greater values than single biomarkers alone in determining a particular condition. Specifically, detecting a plurality of biomarkers in a sample may increase the sensitivity and/or specificity of the test.
  • the biomarker panel is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 , 21 or more biomarker types.
  • the biomarker panel is comprised of a minimum number of biomarkers to generate a maximum amount of information.
  • the biomarker panel comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 , consisting of more than 21 biomarker types.
  • a biomarker panel consists of "a set of biomarkers" no biomarkers are present other than those that make up the set.
  • the biomarker panel consists of one biomarker disclosed herein.
  • the biomarker panel consists of two biomarkers disclosed herein.
  • the biomarker panel consists of three biomarkers disclosed herein. In another embodiment, the biomarker panel consists of four biomarkers disclosed herein. In another embodiment, the biomarker panel consists of five biomarkers disclosed herein. In another embodiment, the biomarker panel consists of six biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 7 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 8 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of nine biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 10 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 11 biomarkers disclosed herein.
  • the biomarker panel consists of 12 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 13 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 14 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 15 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 16 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 17 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 18 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 19 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 20 biomarkers disclosed herein.
  • the biomarker panel consists of 21 biomarkers disclosed herein.
  • the biomarker of the present invention represents a statistically significant difference in the diagnosis of ovarian cancer.
  • a diagnostic test using such biomarkers alone or in combination exhibits a sensitivity and specificity of about 85% or greater, about 90% or greater, about 95% or greater, about 98% or greater, and about 100% or greater. .
  • the biomarker may be obtained from transcript data of cells. Details of the data are the same as described above.
  • the agent for measuring the expression level of the biomarker may be a primer pair, a probe, or an antisense nucleotide. Specifically, it may be an agent for measuring the mRNA level of the biomarker gene, and may be a primer pair, a probe, or an antisense nucleotide that specifically binds to the gene.
  • the biopanel may include at least two primer pairs, probes or antisense nucleotides, and each of the primer pairs, probes or antisense nucleotides is ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A. , RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, TLN1, USP19 and ZFAND6.
  • the agent for measuring the expression level of the biomarker may be an antibody.
  • the antibody may be a monoclonal antibody, eg, a monoclonal antibody that specifically binds to any of the above biomarkers.
  • the biopanel comprises at least 21 antibodies, each of which is ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1 , SIAH2, TES2, TIMM17B, TLN1, USP19 and ZFAND6.
  • Another aspect is ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, and TLN1, USP19 in a sample isolated from a subject. Measuring the expression level of one or more biomarkers selected from the group consisting of; and comparing the expression level of the biomarker with a corresponding result of the corresponding marker in a control sample.
  • the method according to one embodiment includes ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, TLN1 in a sample isolated from an individual. , measuring the expression level of one or more biomarkers selected from the group consisting of USP19 and ZFAND6.
  • the measurement of the biomarker level in the sample is ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, TLN1, USP19 and ZFAND6 It can be carried out by measuring the mRNA level or protein level of one or more biomarker genes selected from the group consisting of. Specifically, the measurement of the mRNA level is a process of determining the presence and expression level of mRNA of genes in a sample of an individual for diagnosing cancer, and measuring the amount of mRNA.
  • RT-PCR reverse transcription polymerase reaction
  • Competitive RT-PCR competitive reverse transcription polymerase reaction
  • Real-time RT-PCR real-time reverse transcription polymerase reaction
  • RNase protection assay RNase protection assay
  • Northern blotting and a DNA chip.
  • the protein level measurement is a process of confirming the presence and expression level of a cancer diagnostic marker protein in a sample of an individual in order to diagnose cancer. The amount of the protein can be confirmed using an antibody that specifically binds to the marker protein, and the protein expression level itself can be measured without using the antibody.
  • the protein level measurement or comparative analysis method includes protein chip analysis, immunoassay, ligand binding assay, MALDI-TOF (Matrix Desorption/Ionization Time of Flight Mass Spectrometry) analysis, SELDI-TOF (Surface Enhanced Laser Desorption/Ionization Time of Flight Mass Spectrometry analysis, radioimmunoassay, radioimmunodiffusion method, octeroni immunodiffusion method, rocket immunoelectrophoresis, tissue immunostaining, complement fixation assay, two-dimensional electrophoresis analysis, liquid chromatography-mass spectrometry Mass Spectrometry, LC-MS), LC-MS/MS (liquid chromatography-Mass Spectrometry/Mass Spectrometry), Western blot, and ELISA (enzyme linked immunosorbent assay) are available.
  • MALDI-TOF Microx Desorption/Ionization Time of Flight Mass Spectrometry
  • SELDI-TOF Surface Enhanced
  • the method comprises comparing the expression level of the biomarker with a corresponding result of the corresponding marker in a control sample. For example, when the expression level of the biomarker is significantly increased compared to a control group, it may be determined that the prognosis of ovarian cancer is poor.
  • the method comprises the steps of analyzing transcript expression data and machine learning analysis of ovarian cancer patients with clear differences in treatment prognosis, and can construct a complex biomarker with higher diagnostic ability of ovarian cancer than a single biomarker. have.
  • the composite biomarker constructed according to the method can provide more appropriate treatment to patients by predicting the prognosis of patients with refractory ovarian cancer.
  • FIG. 1 shows a schematic diagram of a method for selecting a complex biomarker by constructing a model by selecting genes related to prognosis.
  • 2A is a graph classified according to importance when constructing a random forest prediction model with 5 intractable ovarian cancer genes.
  • Figure 2b shows a decision tree model constructed with five intractable ovarian cancer genes.
  • Figure 2c shows a random forest model constructed with five intractable ovarian cancer genes.
  • 2D is a graph showing the results of verification with TCGA data after constructing a random forest model with two genes estimated to have good predictive power for intractable ovarian cancer.
  • RNA samples were prepared from ovarian cancer tissues of 54 patients in their 30s and 70s who underwent ovarian cancer surgery at CHA Bundang Hospital. Specifically, RNA was extracted from ovarian cancer tissue using Trizol according to Invitrogen's protocol, and then the RNA was quantified using NanoDrop200. Thereafter, it was confirmed that there was no abnormality in the RNA using the Agilent 200 Bioanalyzer.
  • stage 3 most of the stages of ovarian cancer patients correspond to stage 3, but include stage 1 to 4 patients. Ages ranged from those in their 30s to 70s, with the 50s being the most common.
  • stage 1 or 2 stage were excluded from the analysis.
  • the patient group was 6 patients in the No recurrence (NR) group without recurrence or recurrence.
  • NR No recurrence
  • Example 2 Gene selection related to the prognosis of intractable ovarian cancer through expression data analysis
  • the genes corresponding to the top 30% among those with very large variance in gene expression levels were selected using the median absolute deviation (MAD). In this case, the gene is not related to the group shown in Table 1.
  • Gene expression analysis was performed using the mRNA sequencing pipeline of The Cancer Genome Atlas (TCGA). MapSplice v2.1.6 was used to map the transcript, and the amount of mRNA expressed in each gene was measured using RSEM-1.2.5 and NCBI hg19 RefSeq.
  • the AUC value was 0.75 or more and statistically significant genes were selected in the log-Rank analysis of disease-free survival and shown in Table 2 below.
  • the gene that predicted the group well according to the patient's prognosis was selected using the statistical ROCR package, and the survival R package was used for disease-free survival analysis.
  • RPL23 and USP19 have the largest values in MeanDecreaseAccuracy and MeanDecreaseGini. This quantifies how much the accuracy decreases when determining the random forest prediction model excluding the genes, and the larger the number, the more important the gene corresponds to the predictive power of the model. Therefore, as shown in Fig. 2a, when the importance of genes was confirmed through the random forest model, it can be seen that RPL23 and USP19 are the most important genes in the MeanDecreaseAccuracy and MeanDecreaseGini methods.
  • Figure 2b shows a decision tree model constructed with five intractable ovarian cancer genes.
  • USP19 was used the most, and after dividing groups by USP19, a decision tree model was made to determine the next group using the TES gene. Therefore, it can be seen that USP19 is the most important gene among intractable ovarian cancer genes.
  • TCGA data were used for verification.
  • TCGA data there were 205 patients with stage 3 or higher. Since the clinical characteristics of the 205 patients are not clearly known from the TCGA data, the random forest model was divided into a group predicted to have a good prognosis and a group predicted to have a poor prognosis. It was verified whether it was visible or not.
  • Figure 2c shows a random forest model constructed with five intractable ovarian cancer genes. As shown in Figure 2c, when the random forest model constructed with 5 ovarian cancer genes was applied to the TCGA test set, 16 patients predicted to have a good prognosis and 189 patients predicted to have a poor prognosis were disease-free. It was confirmed that there was a difference in prognosis as predicted by survival.
  • FIG. 2D is a graph showing the results of verification with TCGA data after constructing a random forest model with two genes estimated to have good predictive power for intractable ovarian cancer.
  • FIG. 2D it was found that the disease-free survival prognosis of 53 patients predicted to have a good prognosis and 152 patients predicted to have a poor prognosis differed at a statistically significant level as predicted.
  • the p-value of disease-free survival was better than the random forest model made with two genes. Therefore, it can be seen that the two-gene random forest model is a suitable model for predicting the prognosis of refractory ovarian cancer.

Abstract

The present invention relates to a method for forming a biomarker panel for diagnosing ovarian cancer, and a biomarker panel for diagnosing ovarian cancer, and forms a biomarker panel through transcript expression data analysis and machine learning analysis of ovarian cancer patients with clear differences in treatment prognosis, so as to predict the prognosis of ovarian cancer patients, and thus more appropriate treatment can be carried out for patients.

Description

난소암 진단을 위한 바이오마커 패널을 구성하는 방법 및 난소암 진단을 위한 바이오마커 패널Method of constructing a biomarker panel for ovarian cancer diagnosis and biomarker panel for ovarian cancer diagnosis
본 발명은 난소암 진단을 위한 바이오마커를 구성하는 방법에 관한 것이다.The present invention relates to a method of constructing a biomarker for ovarian cancer diagnosis.
난치성 난소암 환자 (High-grade serous ovarian cancer)의 70%는 대부분 암으로 인해 사망에 이르게 된다. 수술 이후에 받게 되는 항암요법이 표준치료인데, 항암치료 이후에 6개월 이내에 대략 25%의 환자들이 재발을 하고 전체 5년 이상 살 확률이 31% 밖에 되지 않아 표준치료 이외의 새로운 치료법의 개발이 필요한 상황이다. 표준치료 이외의 새로운 치료법 개발을 위해 The Cancer Genome Atlas (TCGA) 연구자 컨소시엄에서 약 300명의 난소암 환자들을 대상으로 돌연변이 데이터 (exome data), 발현데이터 (mRNA data) 및 에피제네틱스 (epigenetics) 데이터를 생산해 내고 분석하여서 환자들의 특징을 파악하였다. 그러나, 이 기존 TCGA 연구에서는 예후에서 큰 차이가 보이는 환자 클러스터나 유전자 발현 마커리스트를 찾지 못하였다. 70% of patients with intractable ovarian cancer (High-grade serous ovarian cancer) die mostly due to cancer. The standard treatment for chemotherapy after surgery is 25% of patients relapse within 6 months after chemotherapy, and only 31% of patients have a chance to live more than 5 years after chemotherapy, so it is necessary to develop a new treatment other than standard treatment. situation. To develop new treatments other than standard treatments, The Cancer Genome Atlas (TCGA) researcher consortium produced exome data, expression data (mRNA data) and epigenetics data for about 300 ovarian cancer patients. The patient's characteristics were identified by analysis. However, in this existing TCGA study, no patient clusters or gene expression marker lists with significant differences in prognosis were found.
따라서, 난치성 난소암 환자의 예후를 예측하기 위한 발현 유전자의 마커 리스트를 확보할 필요성이 있다.Therefore, there is a need to secure a list of markers of expressed genes for predicting the prognosis of refractory ovarian cancer patients.
일 양상은 개체로부터 분리된 시료로부터 복수 개의 개별 바이오마커의 유전자 발현 데이터를 획득하는 단계; 및 상기 데이터로부터 난소암의 예후와 관련이 있는 유전자를 선별하는 단계를 포함하는 난소암 진단을 위한 바이오마커 패널을 구성하는 방법을 제공하는 것이다. One aspect comprises the steps of obtaining gene expression data of a plurality of individual biomarkers from a sample isolated from the subject; And to provide a method of constructing a biomarker panel for ovarian cancer diagnosis, comprising the step of selecting a gene related to the prognosis of ovarian cancer from the data.
다른 양상은 상기 방법에 의해 구성된 난소암 진단을 위한 바이오마커 패널을 제공하는 것이다. Another aspect is to provide a biomarker panel for ovarian cancer diagnosis constructed by the method.
다른 양상은 ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, TLN1, USP19 및 ZFAND6로 구성된 군에서 선택되는 하나 이상의 바이오마커의 발현 수준을 측정하는 제제를 포함하는 난소암 진단을 위한 바이오마커 패널을 제공하는 것이다. Another aspect is selected from the group consisting of ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, TLN1, USP19 and ZFAND6 It is to provide a biomarker panel for ovarian cancer diagnosis comprising an agent for measuring the expression level of one or more biomarkers.
다른 양상은 개체로부터 분리된 시료에서 ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, TLN1, USP19 및 ZFAND6로 구성된 군에서 선택되는 하나 이상의 바이오마커의 발현 수준을 측정하는 단계; 및 상기 바이오마커의 수준을 대조군 시료의 해당 마커의 상응하는 결과와 비교하는 단계를 포함하는 난소암의 예후를 예측하는 방법을 제공하는 것이다.Another aspect is ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, and TLN1, USP19 in a sample isolated from a subject. Measuring the expression level of one or more biomarkers selected from the group consisting of; and comparing the level of the biomarker with a corresponding result of the corresponding marker in a control sample.
일 양상은 개체로부터 분리된 시료로부터 복수 개의 개별 바이오마커의 유전자 발현 데이터를 획득하는 단계; 및 상기 데이터로부터 난소암의 예후와 관련이 있는 유전자를 선별하는 단계를 포함하는 난소암 진단을 위한 바이오마커 패널을 구성하는 방법을 제공한다. One aspect comprises the steps of obtaining gene expression data of a plurality of individual biomarkers from a sample isolated from the subject; And it provides a method of constructing a biomarker panel for ovarian cancer diagnosis, comprising the step of selecting a gene related to the prognosis of ovarian cancer from the data.
일 구체예에 따른 방법은 개체로부터 분리된 시료로부터 복수 개의 개별 바이오마커의 유전자 발현 데이터를 획득하는 단계를 포함한다. 상기 개체는 난소암의 진단을 하기 위한 대상이 되며, 예를 들어 난소암의 가능성을 예측하기 위한 대상, 난소암의 상태를 진단하기 위한 대상, 예후 예측을 판단하기 위한 대상, 난소암 예방 또는 치료용 약제의 투여량을 결정하기 위한 대상, 난소암의 진행에 따른 치료 방법을 결정하기 위한 대상 등을 의미한다. 상기 개체는 척추동물인 것일 수 있고, 구체적으로 포유류, 양서류, 파충류, 조류 등 인 것일 수 있으며, 보다 구체적으로, 포유동물인 것일 수 있고, 예를 들면 인간(Homo sapiens)일 수 있으며, 한국인일 수 있다. 상기 시료는 개체로부터 분리된 조직, 세포, 전혈, 혈청, 혈장, 타액, 객담, 뇌척수액 또는 뇨와 같은 시료 등을 포함할 수 있다.The method according to one embodiment includes acquiring gene expression data of a plurality of individual biomarkers from a sample isolated from a subject. The subject is a subject for diagnosing ovarian cancer, for example, a subject for predicting the likelihood of ovarian cancer, a subject for diagnosing a state of ovarian cancer, a subject for determining prognosis, prevention or treatment of ovarian cancer It means a subject for determining the dosage of the drug for use, a subject for determining a treatment method according to the progression of ovarian cancer, and the like. The subject may be a vertebrate, specifically mammals, amphibians, reptiles, birds, etc., and more specifically, may be mammals, for example, humans ( Homo sapiens ), and Koreans. can The sample may include a tissue, cell, whole blood, serum, plasma, saliva, sputum, cerebrospinal fluid, or urine sample isolated from the subject.
상기 데이터는 난소암 환자의 전사체 발현 데이터인 것일 수 있다. 상기 데이터는 예를 들어, 난치성 난소암의 예후와 관련이 있는 유전자를 선별하기 위해 중앙값 절대 편차(Median Absolute Deviation, MAD)를 사용하여 유전자 발현 정도의 분산이 매우 큰 것 중 상위 30%에 해당하는 유전자를 선별함으로써 획득된 것일 수 있다. 상기 난소암 환자는 한국인일 수 있다. The data may be transcript expression data of a patient with ovarian cancer. The data are, for example, in the top 30% of those with very large variance in gene expression using the Median Absolute Deviation (MAD) to select genes related to the prognosis of refractory ovarian cancer. It may be obtained by selecting the gene. The ovarian cancer patient may be Korean.
일 구체예에 따른 방법은 상기 데이터로부터 난소암의 예후와 관련이 있는 유전자를 선별하는 단계를 포함한다. 상기 단계는 상기 데이터로부터 난소암의 치료 예후 차이가 큰 유전자를 1차로 선별하는 것일 수 있다. 상기 난소암의 치료 예후 차이는 예를 들어, AUC 분석이나 무병생존(disease-free survival)의 log-Rank 테스트를 통해 확인할 수 있다. 상기 단계는 1차 선별된 유전자 중에서 문헌 조사를 통해 2차 유전자를 선별하는 단계를 포함할 수 있다. 구체적으로 상기 문헌은 PMID(PubMed identifier): 28539603, 23160375, 27517492, 26988033일 수 있으며, 1차 선별된 유전자가 암을 유발하거나 방지하는 분자생물학적 경로와 관련성이 있는지 확인함으로써 2차 유전자를 선별할 수 있다. 이후, 선별된 상기 2차 유전자로부터 기계 학습 분석을 통해 난소암의 예후와 관련이 있는 유전자를 최종적으로 선별할 수 있다. 상기 기계 학습 분석은 예를 들어, 랜덤 포레스트 (Random forest, RF) 또는 의사결정 나무(decision tree) 모델을 통해 수행될 수 있다. The method according to one embodiment includes selecting a gene related to the prognosis of ovarian cancer from the data. The step may be to first select a gene having a large difference in the treatment prognosis of ovarian cancer from the data. The difference in the treatment prognosis of the ovarian cancer may be confirmed, for example, through an AUC analysis or a log-Rank test of disease-free survival. The step may include selecting a secondary gene through literature search among the primary selected genes. Specifically, the literature may be PMID (PubMed identifier): 28539603, 23160375, 27517492, 26988033, and the secondary gene can be selected by confirming whether the primary selected gene is related to a molecular biological pathway that causes or prevents cancer. have. Thereafter, a gene related to the prognosis of ovarian cancer may be finally selected from the selected secondary gene through machine learning analysis. The machine learning analysis may be performed, for example, through a random forest (RF) or decision tree model.
본 명세서 내 용어, "랜덤 포레스트(Random forest, RF)"는 CART의 의사결정 나무의 조합으로 이루어진 Bagging 알고리즘의 일종으로 Leo Breiman과 Adel Cutler에 의해 제안된 방법이다. 각 나무들의 마디들은 고차원을 갖는 자료를 하위 차원들의 작은 조각으로 나눠 빠르게 분류할 수 있도록 구성되어 있다. 이런 각 나무들은 조합(Ensemble)과 투표(Voting)에 의해 최종적인 분류를 완료하게 된다. 확률 분포가 같은 랜덤 벡터(Random Vector)에 의해 생성된 나무들은 각각 독립적으로 구성되고, 구성된 나무들의 개수를 무한으로 가져가면 오분류가 일반화되어 수렴하게 되는데, RF는 불규칙성(Randomness)과 Out-of-bag(Random Selection without Replacement) 기법을 이용하여 Adaboost 만큼의 정확도를 낼 수 있게 하고 경계면과 잡음(Noise)에 강한 성능을 보이며, Bagging과 Boosting 보다 빠르게 수렴하도록 도와주는 효과를 낸다. As used herein, the term "random forest (RF)" is a method proposed by Leo Breiman and Adel Cutler as a kind of bagging algorithm consisting of a combination of decision trees of CART. The nodes of each tree are structured so that high-dimensional data can be divided into smaller pieces of lower dimensions so that they can be quickly classified. Each of these trees completes the final classification by ensemble and voting. Trees generated by a random vector with the same probability distribution are each independently constructed, and if the number of constructed trees is taken to infinity, misclassification is generalized and converges. -Bag (Random Selection without Replacement) technique is used to make it as accurate as Adaboost, and it has strong performance against interface and noise, and has the effect of helping to converge faster than bagging and boosting.
RF 알고리즘은 자체적으로 주어진 데이터로부터 (training data set, test data set)를 복수 개(예를 들어 50개, 이 개수는 옵션으로 사용자가 조정 가능함) 만들어서 각각으로부터 의사결정 나무를 생성한다. 이렇게 되면 독립적인 의사결정 나무가 50개 생성되게 된다. 이렇게 50개의 의사결정 나무를 생성한 뒤에, 테스트 세트(test set)를 넣으면 하나의 테스트 샘플 당, 50개의 결정(암/정상)을 갖게 되며(각 의사결정 나무로부터 나온 값), 50개의 결정 값을 추려서 많은 쪽(majority vote)으로 최종 결과를 갖게 된다. 예를 들어 사람 A의 경우 45개의 의사결정 나무는 암으로 판정하고 5개의 의사결정 나무는 정상으로 판정했다면, vaverage score(전체 50개의 판정 중에서 암으로 판정된 비율)=45/50=0.9로 계산된다. 이때 암/정상을 구분하는 기준이 되는 컷-오프 값(cut-off value)을 0.5로 가정했을 때 A의 average score 0.9는 0.5보다 크므로 "암"으로 판정할 수 있다.The RF algorithm itself creates a plurality of (training data set, test data set) from the given data (for example, 50, this number can be adjusted by the user as an option) and creates a decision tree from each. This will create 50 independent decision trees. After creating 50 decision trees in this way, if you insert a test set, you will have 50 decisions (cancer/normal) per test sample (values from each decision tree), and 50 decision values. , and the final result is obtained by the majority vote. For example, in the case of Person A, if 45 decision trees are judged to be cancerous and 5 decision trees are judged to be normal, the vaverage score (ratio determined as cancer among all 50 judgments)=45/50=0.9 is calculated. do. At this time, assuming that the cut-off value, which is the criterion for distinguishing between cancer and normal, is 0.5, the average score of A is greater than 0.5, so it can be determined as "cancer".
본 명세서 내 용어, "의사결정 나무(decision tree) 모델이란, 데이터 마이닝(data mining)의 분석 기법 중 하나로 나무의 구조에 근거하여 의사결정 규칙을 찾아내는 방법이다. 의사결정 나무는 의사 결정 규칙을 도표화하여 관심의 대상이 되는 집단을 몇 개의 소집단으로 분류하거나 예측하는 강력하고 널리 쓰이는 분석 기법이다. 의사결정 나무의 일반적인 알고리즘에는 정지규칙 및 가지치기 등에서 서로 다른 형성 과정을 가지고 있다. 의사결정 나무에서 사용되는 규칙은 다음과 같다. As used herein, the term “decision tree model is a method of finding a decision rule based on the structure of a tree as one of the analysis techniques of data mining. A decision tree is a method of finding a decision rule based on a tree structure. It is a powerful and widely used analysis technique that classifies or predicts a group of interest into several subgroups.The general algorithm of a decision tree has different formation processes such as stopping rules and pruning. The rules to be
1. 분리기준: 어떤 예측 변수를 이용하여 어떻게 분리하는 것이 목표 변수의 분포를 가장 잘 구별해 주는지 파악하여 자식마디가 결정되는데, 목표 변수의 분포를 구별하는 정도를 순수도 또는 다른 분류 기준을 이용하여 측정하는 것이다. 1. Separation criteria: Child nodes are determined by determining which predictor variable is used to classify the distribution of the target variable best. Purity or other classification criteria are used to determine the degree to which the distribution of the target variable is distinguished. to measure it.
2. 정지기준: 더 이상 분리가 일어나지 않고 현재의 마디가 끝 마디(terminal node)가 되도록 지정하는 규칙을 의미한다. 2. Stop Criteria: A rule that specifies that the current node becomes a terminal node without further separation.
3. 가지치기: 지나치게 많은 마디를 가지는 의사결정 나무는 새로운 자료에 적용될 때 예측 오차가 매우 클 가능성이 있다. 다라서 형성된 의사결정 나무에서 적절하지 않은 마디를 제거하여 적당한 크기의 부(sub) 나무 구조를 가지는 의사결정 나무를 최종적인 모형으로 선택하는 것이 바람직하다. 3. Pruning: Decision trees with too many nodes are likely to have very large prediction errors when applied to new data. Therefore, it is desirable to select a decision tree having a sub-tree structure of an appropriate size as the final model by removing inappropriate nodes from the formed decision tree.
일 구체예에 따른 방법은 TCGA(The Cancer Genome Altas) 데이터를 이용하여 상기 선별된 유전자의 예후를 검증하는 단계를 추가로 포함할 수 있다. 본 명세서 내 용어, "TCGA(The Cancer Genome Altas) 데이터" 란 암에 관한 유전변이 데이터를 의미하며 암세포에서 발생하는 분자 수준의 변화를 종합적으로 설명할 수 있는 거대한 데이터 세트로서, 생물정보를 분석하고자 할 때 이용된다. 상기 데이터를 이용하여 암에서 나타나는 특정 유전체 수준의 변화나 분자 수준의 변화를 목록으로 정리할 수 있을 뿐만 아니라 암 유형과 하부 유형에 대해 보다 의미 있는 분류 체계를 정의할 수 있다. The method according to an embodiment may further include verifying the prognosis of the selected gene using The Cancer Genome Altas (TCGA) data. As used herein, the term "TCGA (The Cancer Genome Altas) data" refers to genetic mutation data related to cancer and is a huge data set that can comprehensively explain changes at the molecular level that occur in cancer cells. used when doing Using the data, it is possible to list specific genomic or molecular level changes in cancer, as well as define a more meaningful classification system for cancer types and subtypes.
상기 방법은 먼저, TCGA 데이터를 상기 랜덤 포레스트 모델로 예후를 예측하여 선별된 유전자의 예후가 좋은 그룹과 예후가 불량한 그룹으로 개체를 분류한다. 이후, 예측된 예후에 따라 분류된 상기 두 그룹이 상기 랜덤 포레스트 모델에서 예측한 바와 같이 무병생존의 예후가 달라지는지 확인함으로써 검증할 수 있다. In the method, the prognosis is predicted using TCGA data using the random forest model, and the selected gene is classified into a group with a good prognosis and a group with a poor prognosis. Thereafter, the two groups classified according to the predicted prognosis can be verified by checking whether the prognosis of disease-free survival is different as predicted by the random forest model.
다른 양상은 상기 방법에 의해 구성된 난소암 진단을 위한 바이오마커 패널을 제공한다. 또 다른 양상은 ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, TLN1, USP19 및 ZFAND6로 구성된 군에서 선택되는 하나 이상의 바이오마커의 발현 수준을 측정하는 제제를 포함하는 난소암 진단을 위한 바이오마커 패널을 제공한다. 일 구체예에서, 상기 바이오마커 패널은 LMO7, RAB12, RPL23, TES, USP19 또는 이들의 조합의 바이오마커의 발현 수준을 측정하는 제제를 포함할 수 있다. 다른 구체예에서, 상기 바이오마커 패널은 RPL23, USP19 또는 이들의 조합의 바이오마커의 발현 수준을 측정하는 제제를 포함할 수 있다. Another aspect provides a panel of biomarkers for diagnosing ovarian cancer constructed by the method. Another aspect is selected from the group consisting of ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, TLN1, USP19 and ZFAND6 It provides a biomarker panel for ovarian cancer diagnosis, comprising an agent for measuring the expression level of one or more biomarkers. In one embodiment, the biomarker panel may include an agent for measuring the expression level of a biomarker of LMO7, RAB12, RPL23, TES, USP19, or a combination thereof. In another embodiment, the biomarker panel may include an agent for measuring the expression level of a biomarker of RPL23, USP19, or a combination thereof.
본 명세서 내 용어, "바이오마커 패널"은 난소암 진단을 위한 바이오마커의 임의의 조합을 사용하여 구성된 것으로서, 상기 조합은 전체 세트, 또는 그의 임의의 서브세트 또는 서브조합을 의미할 수 있다. 즉, 바이오마커 패널은 바이오마커 한 세트를 의미할 수 있으며, 측정되는 임의 형태의 바이오마커를 의미할 수 있다. 따라서, RPL23가 바이오마커 패널의 일부일 경우, 예를 들어, RPL23 mRNA 또는 RPL23 단백질이 상기 패널의 일부인 것으로 간주할 수 있다. 개별 바이오마커가 진단제로서 유용한 반면, 때로는 특정 상태를 결정하는데 있어서 단독으로 단일의 바이오마커보다는 바이오마커 조합이 더 큰 값을 제공할 수 있다. 구체적으로, 시료 중 복수 개의 바이오마커를 검출하는 것이 시험의 감도 및/또는 특이성을 증가시킬 수 있다. 따라서, 일 구체예에서, 바이오마커 패널은 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21개 이상의 바이오마커 유형을 포함할 수 있다.  다른 구체예에서, 바이오마커 패널은 최소 개수의 바이오마커로 구성되어 최대량의 정보를 생성한다. 따라서, 다양한 구체예에서, 바이오마커 패널은 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21개 이상의 바이오마커 유형으로 구성된다. 바이오마커 패널이 "바이오마커 한 세트"로 구성될 경우, 상기 세트를 이루는 것 이외에는 어떤 바이오마커도 존재하지 않는다. 일 구체예에서, 바이오마커 패널은 본원에 개시된 1개의 바이오마커로 구성된다. 다른 구체예에서, 바이오마커 패널은 본원에 개시된 2개의 바이오마커로 구성된다. 다른 구체예에서, 바이오마커 패널은 본원에 개시된 3개의 바이오마커로 구성된다. 다른 구체예에서, 바이오마커 패널은 본원에 개시된 4개의 바이오마커로 구성된다. 다른 구체예에서, 바이오마커 패널은 본원에 개시된 5개의 바이오마커로 구성된다. 다른 구체예에서, 바이오마커 패널은 본원에 개시된 6개의 바이오마커로 구성된다. 다른 구체예에서, 바이오마커 패널은 본원에 개시된 7개의 바이오마커로 구성된다. 다른 구체예에서, 바이오마커 패널은 본원에 개시된 8개의 바이오마커로 구성된다. 다른 구체예에서, 바이오마커 패널은 본원에 개시된 9개의 바이오마커로 구성된다. 다른 구체예에서, 바이오마커 패널은 본원에 개시된 10개의 바이오마커로 구성된다. 다른 구체예에서, 바이오마커 패널은 본원에 개시된 11개의 바이오마커로 구성된다. 다른 구체예에서, 바이오마커 패널은 본원에 개시된 12개의 바이오마커로 구성된다. 다른 구체예에서, 바이오마커 패널은 본원에 개시된 13개의 바이오마커로 구성된다. 다른 구체예에서, 바이오마커 패널은 본원에 개시된 14개의 바이오마커로 구성된다. 다른 구체예에서, 바이오마커 패널은 본원에 개시된 15개의 바이오마커로 구성된다. 다른 구체예에서, 바이오마커 패널은 본원에 개시된 16개의 바이오마커로 구성된다. 다른 구체예에서, 바이오마커 패널은 본원에 개시된 17개의 바이오마커로 구성된다. 다른 구체예에서, 바이오마커 패널은 본원에 개시된 18개의 바이오마커로 구성된다. 다른 구체예에서, 바이오마커 패널은 본원에 개시된 19개의 바이오마커로 구성된다. 다른 구체예에서, 바이오마커 패널은 본원에 개시된 20개의 바이오마커로 구성된다. 다른 구체예에서, 바이오마커 패널은 본원에 개시된 21개의 바이오마커로 구성된다. 본 발명의 바이오마커는 난소암 진단에서 통계학상 유의적인 차이를 나타낸다. 일 구체예에서, 이러한 바이오마커를 단독으로, 또는 조합하여 사용하는 진단 시험은 약 85% 이상, 약 90% 이상, 약 95% 이상, 약 98% 이상, 및 약 100%의 감도 및 특이성을 나타낸다.As used herein, the term "biomarker panel" is constructed using any combination of biomarkers for ovarian cancer diagnosis, and the combination may refer to the entire set, or any subset or subcombination thereof. That is, the biomarker panel may refer to one set of biomarkers, and may refer to any type of biomarker to be measured. Thus, when RPL23 is part of a panel of biomarkers, for example, RPL23 mRNA or RPL23 protein can be considered to be part of that panel. While individual biomarkers are useful as diagnostic agents, sometimes combinations of biomarkers can provide greater values than single biomarkers alone in determining a particular condition. Specifically, detecting a plurality of biomarkers in a sample may increase the sensitivity and/or specificity of the test. Thus, in one embodiment, the biomarker panel is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 , 21 or more biomarker types. In another embodiment, the biomarker panel is comprised of a minimum number of biomarkers to generate a maximum amount of information. Thus, in various embodiments, the biomarker panel comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 , consisting of more than 21 biomarker types. When a biomarker panel consists of "a set of biomarkers", no biomarkers are present other than those that make up the set. In one embodiment, the biomarker panel consists of one biomarker disclosed herein. In another embodiment, the biomarker panel consists of two biomarkers disclosed herein. In another embodiment, the biomarker panel consists of three biomarkers disclosed herein. In another embodiment, the biomarker panel consists of four biomarkers disclosed herein. In another embodiment, the biomarker panel consists of five biomarkers disclosed herein. In another embodiment, the biomarker panel consists of six biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 7 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 8 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of nine biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 10 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 11 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 12 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 13 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 14 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 15 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 16 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 17 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 18 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 19 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 20 biomarkers disclosed herein. In another embodiment, the biomarker panel consists of 21 biomarkers disclosed herein. The biomarker of the present invention represents a statistically significant difference in the diagnosis of ovarian cancer. In one embodiment, a diagnostic test using such biomarkers alone or in combination exhibits a sensitivity and specificity of about 85% or greater, about 90% or greater, about 95% or greater, about 98% or greater, and about 100% or greater. .
상기 바이오마커는 세포의 전사체 데이터로부터 획득된 것일 수 있다. 데이터의 구체적인 내용은 전술한 바와 같다. The biomarker may be obtained from transcript data of cells. Details of the data are the same as described above.
상기 바이오마커의 발현 수준을 측정하는 제제는 프라이머쌍, 프로브 또는 안티센스 뉴클레오티드일 수 있다. 구체적으로, 상기 바이오마커 유전자의 mRNA 수준을 측정하기 위한 제제일 수 있으며, 상기 유전자에 특이적으로 결합하는 프라이머 쌍, 프로브 또는 안티센스 뉴클레오티드일 수 있다. 일 구체예에서, 상기 바이오패널은 적어도 2종의 프라이머 쌍, 프로브 또는 안티센스 뉴클레오티드를 포함할 수 있으며, 상기 각각의 프라이머 쌍, 프로브 또는 안티센스 뉴클레오티드는 ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, TLN1, USP19 및 ZFAND6에 특이적으로 결합할 수 있다.The agent for measuring the expression level of the biomarker may be a primer pair, a probe, or an antisense nucleotide. Specifically, it may be an agent for measuring the mRNA level of the biomarker gene, and may be a primer pair, a probe, or an antisense nucleotide that specifically binds to the gene. In one embodiment, the biopanel may include at least two primer pairs, probes or antisense nucleotides, and each of the primer pairs, probes or antisense nucleotides is ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A. , RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, TLN1, USP19 and ZFAND6.
상기 바이오마커의 발현 수준을 측정하는 제제는 항체일 수 있다. 상기 항체는 모노클로날 항체일 수 있으며, 예를 들어, 상기 바이오마커 중 임의의 것에 특이적으로 결합하는 모노클로날 항체일 수 있다. 일 구체예에서, 상기 바이오패널은 적어도 21종의 항체를 포함하며 이 항체들은 각각 ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, TLN1, USP19 및 ZFAND6에 특이적으로 결합할 수 있다.The agent for measuring the expression level of the biomarker may be an antibody. The antibody may be a monoclonal antibody, eg, a monoclonal antibody that specifically binds to any of the above biomarkers. In one embodiment, the biopanel comprises at least 21 antibodies, each of which is ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1 , SIAH2, TES2, TIMM17B, TLN1, USP19 and ZFAND6.
다른 양상은 개체로부터 분리된 시료에서 ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, TLN1, USP19 및 ZFAND6로 구성된 군에서 선택되는 하나 이상의 바이오마커의 발현 수준을 측정하는 단계; 및 상기 바이오마커의 발현 수준을 대조군 시료의 해당 마커의 상응하는 결과와 비교하는 단계를 포함하는 난소암의 예후를 예측하는 방법을 제공한다. Another aspect is ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, and TLN1, USP19 in a sample isolated from a subject. Measuring the expression level of one or more biomarkers selected from the group consisting of; and comparing the expression level of the biomarker with a corresponding result of the corresponding marker in a control sample.
일 구체예에 따른 방법은 개체로부터 분리된 시료에서 ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, TLN1, USP19 및 ZFAND6로 구성된 군에서 선택되는 하나 이상의 바이오마커의 발현 수준을 측정하는 단계를 포함한다. 상기 바이오마커 수준의 측정은 시료에서 ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, TLN1, USP19 및 ZFAND6로 구성된 군에서 선택되는 하나 이상의 바이오마커 유전자의 mRNA 수준을 측정하거나 또는 단백질 수준을 측정함으로써 수행될 수 있다. 구체적으로, 상기 mRNA 수준 측정은 암을 진단하기 위하여 개체의 시료에서 유전자들의 mRNA 존재 여부와 발현 정도를 확인하는 과정으로 mRNA의 양을 측정하는 것이다. 이를 위한 분석 방법으로는 역전사 중합효소반응(RT-PCR), 경쟁적 역전사 중합효소반응(Competitive RT-PCR), 실시간 역전사 중합효소반응(Real-time RT-PCR), RNase 보호 분석법(RPA; RNase protection assay), 노던 블랏팅(Northern blotting), DNA 칩 등이 있다. 또한, 상기 단백질 수준 측정은 암을 진단하기 위하여 개체의 시료에서 암 진단용 마커 단백질의 존재 여부와 발현 정도를 확인하는 과정이다. 상기 마커 단백질에 대하여 특이적으로 결합하는 항체를 이용하여 단백질의 양을 확인할 수 있으며, 항체를 이용하지 않고 단백질 발현 수준 자체를 측정할 수 있다. 상기 단백질 수준 측정 또는 비교 분석 방법으로는 단백질 칩 분석, 면역측정법, 리간드 바인딩 어세이, MALDI-TOF(Matrix Desorption/Ionization Time of Flight Mass Spectrometry)분석, SELDI-TOF(Surface Enhanced Laser Desorption/Ionization Time of Flight Mass Spectrometry)분석, 방사선 면역분석, 방사 면역 확산법, 오우크테로니 면역 확산법, 로케트 면역전기영동, 조직면역 염색, 보체 고정 분석법, 2차원 전기영동 분석, 액상 크로마토그래피-질량분석(liquid chromatography-Mass Spectrometry, LC-MS), LC-MS/MS(liquid chromatography-Mass Spectrometry/ Mass Spectrometry), 웨스턴 블랏, 및 ELISA(enzyme linked immunosorbentassay) 등이 있다.The method according to one embodiment includes ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, TLN1 in a sample isolated from an individual. , measuring the expression level of one or more biomarkers selected from the group consisting of USP19 and ZFAND6. The measurement of the biomarker level in the sample is ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, TLN1, USP19 and ZFAND6 It can be carried out by measuring the mRNA level or protein level of one or more biomarker genes selected from the group consisting of. Specifically, the measurement of the mRNA level is a process of determining the presence and expression level of mRNA of genes in a sample of an individual for diagnosing cancer, and measuring the amount of mRNA. Analysis methods for this are reverse transcription polymerase reaction (RT-PCR), competitive reverse transcription polymerase reaction (Competitive RT-PCR), real-time reverse transcription polymerase reaction (Real-time RT-PCR), RNase protection assay (RPA; RNase protection) assay), Northern blotting, and a DNA chip. In addition, the protein level measurement is a process of confirming the presence and expression level of a cancer diagnostic marker protein in a sample of an individual in order to diagnose cancer. The amount of the protein can be confirmed using an antibody that specifically binds to the marker protein, and the protein expression level itself can be measured without using the antibody. The protein level measurement or comparative analysis method includes protein chip analysis, immunoassay, ligand binding assay, MALDI-TOF (Matrix Desorption/Ionization Time of Flight Mass Spectrometry) analysis, SELDI-TOF (Surface Enhanced Laser Desorption/Ionization Time of Flight Mass Spectrometry analysis, radioimmunoassay, radioimmunodiffusion method, octeroni immunodiffusion method, rocket immunoelectrophoresis, tissue immunostaining, complement fixation assay, two-dimensional electrophoresis analysis, liquid chromatography-mass spectrometry Mass Spectrometry, LC-MS), LC-MS/MS (liquid chromatography-Mass Spectrometry/Mass Spectrometry), Western blot, and ELISA (enzyme linked immunosorbent assay) are available.
일 구체예에 따른 방법은 상기 바이오마커의 발현 수준을 대조군 시료의 해당 마커의 상응하는 결과와 비교하는 단계를 포함한다. 예를 들어, 상기 바이오마커의 발현 수준이 대조군에 비하여 유의적으로 증가하는 경우, 난소암의 예후가 불량한 것으로 판단할 수 있다.The method according to one embodiment comprises comparing the expression level of the biomarker with a corresponding result of the corresponding marker in a control sample. For example, when the expression level of the biomarker is significantly increased compared to a control group, it may be determined that the prognosis of ovarian cancer is poor.
일 양상에 따른 방법은 치료 예후의 차이가 분명한 난소암 환자의 전사체 발현 데이터 분석 및 기계 학습 분석 단계를 포함하는 것으로서, 단일 바이오마커에 비하여 난소암의 진단 능력이 높은 복합 바이오마커를 구성할 수 있다. 또한, 상기 방법에 따라 구성된 복합 바이오마커는 난치성 난소암 환자들의 예후를 예측함으로써 환자들에게 보다 적절한 치료를 제공할 수 있다.The method according to one aspect comprises the steps of analyzing transcript expression data and machine learning analysis of ovarian cancer patients with clear differences in treatment prognosis, and can construct a complex biomarker with higher diagnostic ability of ovarian cancer than a single biomarker. have. In addition, the composite biomarker constructed according to the method can provide more appropriate treatment to patients by predicting the prognosis of patients with refractory ovarian cancer.
도 1은 예후와 관련 있는 유전자를 선별하여 모델을 구축함으로써 복합 바이오마커를 선별하는 방법에 대한 모식도를 나타낸 것이다. 1 shows a schematic diagram of a method for selecting a complex biomarker by constructing a model by selecting genes related to prognosis.
도 2a는 난치성 난소암 유전자 5개로 랜덤 포레스트 예측 모델을 구축할 때 중요도에 따라 분류한 그래프이다. 2A is a graph classified according to importance when constructing a random forest prediction model with 5 intractable ovarian cancer genes.
도 2b는 난치성 난소암 유전자 5개로 구축한 의사결정 나무 모델을 나타낸 것이다. Figure 2b shows a decision tree model constructed with five intractable ovarian cancer genes.
도 2c는 난치성 난소암 유전자 5개로 구축한 랜덤 포레스트 모델을 나타낸 것이다. Figure 2c shows a random forest model constructed with five intractable ovarian cancer genes.
도 2d는 난치성 난소암의 예측력이 좋을 것으로 추정되는 2개의 유전자로 랜덤 포레스트 모델을 구축한 후, TCGA 데이터로 검증한 결과를 나타낸 그래프이다.2D is a graph showing the results of verification with TCGA data after constructing a random forest model with two genes estimated to have good predictive power for intractable ovarian cancer.
이하, 본 발명의 이해를 돕기 위하여 바람직한 실시예를 제시한다. 그러나 하기의 실시예는 본 발명을 보다 쉽게 이해하기 위하여 제공되는 것일 뿐, 하기 실시예에 의해 본 발명의 내용이 한정되는 것은 아니다.Hereinafter, preferred examples are presented to help the understanding of the present invention. However, the following examples are only provided for easier understanding of the present invention, and the contents of the present invention are not limited by the following examples.
[실시예][Example]
실시예 1. 환자 샘플의 준비Example 1. Preparation of Patient Samples
본 연구는 분당차병원 생명윤리 위원회(Institutional Review Board)의 승인을 받아 진행하였다. 분당차병원에서 난소암 수술을 받은 30~70대 환자 54 명의 난소암 조직에서 RNA를 분리하여 샘플을 준비하였다. 구체적으로, Invitrogen 社의 프로토콜에 따라 트리졸(Trizol)을 이용하여 난소암 조직에서 RNA를 추출한 후, 나노드롭200(NanoDrop200)을 이용하여 상기 RNA를 정량 하였다. 이후, Agilent 200 Bioanalyzer를 이용하여 RNA에 이상이 없음을 확인하였다. This study was conducted with the approval of the Institutional Review Board of CHA Bundang Hospital. RNA samples were prepared from ovarian cancer tissues of 54 patients in their 30s and 70s who underwent ovarian cancer surgery at CHA Bundang Hospital. Specifically, RNA was extracted from ovarian cancer tissue using Trizol according to Invitrogen's protocol, and then the RNA was quantified using NanoDrop200. Thereafter, it was confirmed that there was no abnormality in the RNA using the Agilent 200 Bioanalyzer.
특징Characteristic 구분division 환자 수 (45)Number of patients (45)
나이age <40<40 22
[40 ~ 50)[40 to 50) 1212
[50 ~ 60)[50 to 60) 20 (44.4%)20 (44.4%)
[60 ~ 70)[60 ~ 70) 66
[70 ~ 80)[70 ~ 80) 55
병기ordnance 3기3rd term 40 (88.9%)40 (88.9%)
4기4th 55
그룹group 재발하지 않음(No recurrence)No recurrence 66
재발: 백금 민감성(Platinum sensitive)Relapse: Platinum sensitive 2727
재발: 백금 저항성(Platinum resistance)Relapse: Platinum resistance 66
난치성(Refractory)Refractory 66
상기 표 1에서 난소암 환자들의 병기는 대부분 3기에 해당하지만, 1기부터 4기의 환자들을 포함하고 있다. 나이도 30대부터 70대까지 다양한데, 50대가 가장 많았다. 연구분석에 의해서 환자들의 병기는 예후 예측에 매우 관련이 높고, 발현 패턴 또한 병기와 관련이 높고 매우 다르다는 것을 확인하였다. 따라서, 환자들 중 병기가 1기 또는 2기에 해당하는 환자는 분석에서 제외하였다. 3기 이상에서 발견된 난치성 난소암의 환자는 모두 45명이다. 45명의 환자 중 예후가 좋은 환자는 6명이고, 재발하거나 치료에 반응하지 않는 예후가 좋지 않은 환자는 39명이다.또한, 환자 그룹은 재발을 하지 않은 No recurrence (NR) 그룹 6명, 재발을 했지만 6개월 이후에 재발을 한 백금 민감성(Platinum sensitive, PS) 그룹 27명, 6개월 이전에 재발을 한 백금 저항성(Platinum resistance, PR) 그룹 6명 및 반응이 매우 좋지 않은 난치성 (RF) 그룹 6명으로 이루어져 있다. 재발을 하지 않은 6명은 예후가 매우 좋은 그룹이고 6개월 이전에 재발을 한 백금 저항성 그룹이나 난치성 그룹은 예후가 매우 좋지 않은 그룹이다. 따라서, 이 그룹들을 재발 시기에 따라 NR 대 PS 대 PR+ RF의 세 그룹으로 나눌 수도 있고, NR 대 다른 그룹들이나 PR+RF 대 다른 그룹들의 두 그룹으로 나눌 수 있다.In Table 1, most of the stages of ovarian cancer patients correspond to stage 3, but include stage 1 to 4 patients. Ages ranged from those in their 30s to 70s, with the 50s being the most common. The study analysis confirmed that the stage of the patients was highly related to the prognosis, and the expression pattern was also highly related and very different from the stage. Therefore, patients with stage 1 or 2 stage were excluded from the analysis. A total of 45 patients with refractory ovarian cancer detected at stage 3 or higher. Of the 45 patients, 6 patients had a good prognosis and 39 patients had a poor prognosis who relapsed or did not respond to treatment. In addition, the patient group was 6 patients in the No recurrence (NR) group without recurrence or recurrence. However, there were 27 patients in the Platinum sensitive (PS) group who relapsed after 6 months, 6 patients in the Platinum resistance (PR) group who had recurrence before 6 months, and 6 patients in the refractory (RF) group with very poor response. consists of names. The six patients who did not relapse had a very good prognosis, and the platinum-resistant group or the intractable group that had relapsed before 6 months had a very poor prognosis. Therefore, these groups can be divided into three groups, NR vs. PS vs. PR+RF, or two groups: NR vs. other groups, or PR+RF vs. other groups, depending on the timing of recurrence.
실시예 2. 발현 데이터 분석을 통한 난치성 난소암의 예후와 관련이 있는 유전자 선별Example 2. Gene selection related to the prognosis of intractable ovarian cancer through expression data analysis
난치성 난소암의 예후와 관련이 있는 유전자를 선별하기 위해 중앙값 절대 편차(Median Absolute Deviation, MAD)를 사용하여 유전자 발현 정도의 분산이 매우 큰 것 중 상위 30%에 해당하는 유전자를 선별하였다. 이때 상기 유전자는 표 1에 나타난 그룹과 관련성이 없다. 유전자의 발현 분석은 The Cancer Genome Atlas (TCGA)의 mRNA 시퀀싱 파이프라인(sequencing pipeline)을 사용하였다. MapSplice v2.1.6을 사용하여 전사물(Transcript)에 맵핑(mapping)하고 RSEM-1.2.5 및 NCBI hg19 RefSeq을 사용하여 각 유전자에서 발현되는 mRNA의 양을 측정하였다. 이후, 그룹에 따른 AUC (Area Under a ROC curve) 값을 분석하여 AUC의 값이 0.75 이상이면서 무병생존(Disease-free survival)의 log-Rank 분석에서 통계적으로 유의미한 유전자를 선별하여 하기 표 2에 나타냈다. 이때, 환자의 예후에 따라 그룹을 잘 예측하는 유전자는 통계적 ROCR 패키지를 사용하여 선별하였고, 무병생존 분석은 survival R 패키지를 사용하였다.To select genes related to the prognosis of refractory ovarian cancer, the genes corresponding to the top 30% among those with very large variance in gene expression levels were selected using the median absolute deviation (MAD). In this case, the gene is not related to the group shown in Table 1. Gene expression analysis was performed using the mRNA sequencing pipeline of The Cancer Genome Atlas (TCGA). MapSplice v2.1.6 was used to map the transcript, and the amount of mRNA expressed in each gene was measured using RSEM-1.2.5 and NCBI hg19 RefSeq. Thereafter, by analyzing the AUC (Area Under a ROC curve) value according to the group, the AUC value was 0.75 or more and statistically significant genes were selected in the log-Rank analysis of disease-free survival and shown in Table 2 below. . At this time, the gene that predicted the group well according to the patient's prognosis was selected using the statistical ROCR package, and the survival R package was used for disease-free survival analysis.
유전자gene MADMAD AUCAUC DFSp-값DFSp-Value 유전자 전체 이름full gene name
1One ARAFARAF 18.818.8 0.880.88 0.0380.038 A-Raf proto-oncogeneA-Raf proto-oncogene
22 ARFGAP2ARFGAP2 22.522.5 0.880.88 0.0250.025 ADP ribosylation factor GTPase activating protein 2ADP ribosylation factor GTPase activating protein 2
33 IFIT1IFIT1 91.691.6 0.790.79 0.0390.039 Interferon induced protein with tetratricopeptide repeatsInterferon induced protein with tetratricopeptide repeats
44 IRF9IRF9 48.248.2 0.820.82 0.0100.010 Interferon regulatory factor 9Interferon regulatory factor 9
55 LMO7LMO7 15.115.1 0.880.88 0.0060.006 LIM domain 7 LIM domain 7
66 MTA2MTA2 19.819.8 0.900.90 0.040.04 Metastasis associated 1 family member 2Metastasis associated 1 family member 2
77 POLR2APOLR2A 23.323.3 0.900.90 0.0450.045 RNA polymerase II subunit ARNA polymerase II subunit A
88 RAB12RAB12 13.513.5 0.790.79 0.0130.013 RAB12, member RAS oncogene familyRAB12, member RAS oncogene family
99 RARGRARG 17.217.2 0.930.93 0.0430.043 Retinoic acid receptor gammaRetinoic acid receptor gamma
1010 RPL23P8RPL23P8 135135 0.850.85 0.0140.014 Ribosomal protein L23 pseudogene 8Ribosomal protein L23 pseudogene 8
1111 RPL23RPL23 769769 0.860.86 0.0420.042 Ribosomal protein L23 Ribosomal protein L23
1212 RTCA RTCA 1111 0.790.79 0.0470.047 RNA 3'-terminal phosphate cyclaseRNA 3'-terminal phosphate cyclase
1313 SCAPSCAP 14.114.1 0.910.91 0.0050.005 SREBP chaperone SREBP chaperone
1414 SELENONSELENON 20.220.2 0.900.90 0.0220.022 Selenoprotein NSelenoprotein N
1515 SERP1SERP1 48.948.9 0.880.88 0.0100.010 Stress associated endoplasmic reticulum protein 1Stress associated endoplasmic reticulum protein 1
1616 SIAH2SIAH2 21.621.6 0.880.88 8.0e-48.0e-4 Siah E3 ubiqutin protein ligase 2Siah E3 ubiqutin protein ligase 2
1717 TES2TES2 36.636.6 0.870.87 0.0370.037 Testin LIM domain proteinTestin LIM domain protein
1818 TIMM17BTIMM17B 26.626.6 0.880.88 0.0100.010 Translocase of inner mitochondrial membrane 17BTranslocase of inner mitochondrial membrane 17B
1919 TLN1TLN1 28.228.2 0.860.86 0.0440.044 Talin 1 Talin 1
2020 USP19USP19 12.212.2 0.880.88 0.0090.009 Ubiquitin specific peptidase 19Ubiquitin specific peptidase 19
2121 ZFAND6ZFAND6 40.240.2 0.760.76 0.0040.004 Zinc finger AN1-type containing 6Zinc finger AN1-type containing 6
실시예 3. 랜덤 포레스트(Random forest) 모델 및 의사결정 나무(Decision tree) 모델의 구축Example 3. Construction of Random Forest Model and Decision Tree Model
상기 표 2에 나타낸 21개의 유전자 중, 문헌에 개시된 암과의 관련성을 고려하여 5개의 유전자 (LMO7, RAB12, RPL23, TES 및 USP19)를 선택하였다. 이후, 상기 5개 유전자에 대하여 기계학습 모델링 방법인 랜덤 포레스트 모델과 의사결정 나무 모델을 구축하였다. Among the 21 genes shown in Table 2 above, 5 genes (LMO7, RAB12, RPL23, TES and USP19) were selected in consideration of their relevance to cancer disclosed in the literature. Thereafter, a random forest model and a decision tree model, which are machine learning modeling methods, were constructed for the five genes.
도 2a는 난치성 난소암 유전자 5개로 랜덤 포레스트 예측 모델을 구축할 때 중요도에 따라 분류한 그래프이다. RPL23 및 USP19은 MeanDecreaseAccuracy 및 MeanDecreaseGini에서 가장 큰 값을 갖는다. 이는 상기 유전자들을 제외하고 랜덤 포레스트 예측 모델을 결정할 때 정확도가 얼마나 감소하는지를 수치화한 것으로 상기 수치가 클수록 모델의 예측력에 중요한 유전자에 해당됨을 의미한다. 따라서, 도 2a에 나타난 바와 같이, 랜덤 포레스트 모델을 통해 유전자의 중요성을 확인하였을 때, MeanDecreaseAccuracy 및 MeanDecreaseGini 방법에서 RPL23과 USP19가 가장 중요한 유전자임을 알 수 있다.2A is a graph classified according to importance when constructing a random forest prediction model with 5 intractable ovarian cancer genes. RPL23 and USP19 have the largest values in MeanDecreaseAccuracy and MeanDecreaseGini. This quantifies how much the accuracy decreases when determining the random forest prediction model excluding the genes, and the larger the number, the more important the gene corresponds to the predictive power of the model. Therefore, as shown in Fig. 2a, when the importance of genes was confirmed through the random forest model, it can be seen that RPL23 and USP19 are the most important genes in the MeanDecreaseAccuracy and MeanDecreaseGini methods.
도 2b는 난치성 난소암 유전자 5개로 구축한 의사결정 나무 모델을 나타낸 것이다. 5개의 유전자 중에서 USP19가 가장 중요하게 사용되었고, USP19로 그룹을 나눈 후 TES 유전자를 사용하여 다음 그룹을 정하는 의사결정 나무 모델을 만들었다. 따라서, 난치성 난소암 유전자 중 USP19가 가장 중요한 유전자임을 알 수 있다.Figure 2b shows a decision tree model constructed with five intractable ovarian cancer genes. Among the five genes, USP19 was used the most, and after dividing groups by USP19, a decision tree model was made to determine the next group using the TES gene. Therefore, it can be seen that USP19 is the most important gene among intractable ovarian cancer genes.
유전자 5개와 2개 각각 랜덤 포레스트 모델을 구축한 후, 검증을 위하여 TCGA 데이터를 사용하였다. TCGA 데이터에서 병기가 3기 이상인 사람은 모두 205명이었다. TCGA 데이터에서 205명 환자들의 임상적인 특징을 명확히 알 수 없기 때문에, 랜덤 포레스트 모델에서 예후가 좋을 것이라 예측되는 군과 예후가 좋지 않을 것이라 예측되는 군으로 나누어 두 군에서 무병생존이 통계적으로 유의미한 차이를 보이는지 여부를 검증하였다. After constructing a random forest model for each of 5 and 2 genes, TCGA data were used for verification. In the TCGA data, there were 205 patients with stage 3 or higher. Since the clinical characteristics of the 205 patients are not clearly known from the TCGA data, the random forest model was divided into a group predicted to have a good prognosis and a group predicted to have a poor prognosis. It was verified whether it was visible or not.
도 2c는 난치성 난소암 유전자 5개로 구축한 랜덤 포레스트 모델을 나타낸 것이다. 도 2c에 나타난 바와 같이, 난소암 유전자 5개로 구축한 랜덤 포레스트 모델을 테스트 세트인 TCGA의 환자들에 적용하였을 때, 예후가 좋을 것으로 예측되는 환자 16명과 예후가 좋지 않을 것으로 예측되는 환자 189명의 무병생존이 예측대로 예후의 차이가 있음을 확인할 수 있었다. Figure 2c shows a random forest model constructed with five intractable ovarian cancer genes. As shown in Figure 2c, when the random forest model constructed with 5 ovarian cancer genes was applied to the TCGA test set, 16 patients predicted to have a good prognosis and 189 patients predicted to have a poor prognosis were disease-free. It was confirmed that there was a difference in prognosis as predicted by survival.
또한, 도 2d는 난치성 난소암의 예측력이 좋을 것으로 추정되는 2개의 유전자로 랜덤 포레스트 모델을 구축한 후, TCGA 데이터로 검증한 결과를 나타낸 그래프이다. 도 2d에 나타난 바와 같이, 예후가 좋을 것으로 예측되는 환자 53명과 예후가 좋지 않을 것으로 예측되는 환자 152명의 무병생존 예후가 예측대로 통계적으로 유의미한 수준에서 차이가 나는 것을 알 수 있었다. 뿐만 아니라 무병생존의 p-값도 유전자 2개로 만든 랜덤 포레스트 모델의 결과가 더 우수한 것을 확인할 수 있었다. 따라서, 유전자 2개의 랜덤 포레스트 모델이 난치성 난소암의 예후를 예측하는데 적합한 모델임을 알 수 있다.In addition, FIG. 2D is a graph showing the results of verification with TCGA data after constructing a random forest model with two genes estimated to have good predictive power for intractable ovarian cancer. As shown in FIG. 2D , it was found that the disease-free survival prognosis of 53 patients predicted to have a good prognosis and 152 patients predicted to have a poor prognosis differed at a statistically significant level as predicted. In addition, it was confirmed that the p-value of disease-free survival was better than the random forest model made with two genes. Therefore, it can be seen that the two-gene random forest model is a suitable model for predicting the prognosis of refractory ovarian cancer.

Claims (12)

  1. 개체로부터 분리된 시료로부터 복수 개의 개별 바이오마커의 유전자 발현 데이터를 획득하는 단계; 및acquiring gene expression data of a plurality of individual biomarkers from a sample isolated from an individual; and
    상기 데이터로부터 난소암의 예후와 관련이 있는 유전자를 선별하는 단계를 포함하는, 난소암 진단을 위한 바이오마커 패널을 구성하는 방법.A method of constructing a biomarker panel for ovarian cancer diagnosis, comprising the step of selecting a gene related to the prognosis of ovarian cancer from the data.
  2. 청구항 1에 있어서, 상기 유전자를 선별하는 단계는 중앙값 절대 편차(Median Absolute Deviation, MAD)를 사용하여 수행하는 것인 방법. The method according to claim 1, wherein the step of selecting the gene is performed using a median absolute deviation (Median Absolute Deviation, MAD).
  3. 청구항 1에 있어서, TCGA(The Cancer Genome Altas) 데이터를 이용하여 상기 선별된 유전자의 예후를 검증하는 단계를 추가로 포함하는 것인 방법.The method according to claim 1, further comprising the step of verifying the prognosis of the selected gene using TCGA (The Cancer Genome Altas) data.
  4. 청구항 1에 있어서, 상기 데이터는 개별 바이오마커 유전자의 발현량 정보인 것인 방법.The method according to claim 1, wherein the data is information on the expression level of individual biomarker genes.
  5. 청구항 1에 있어서, 랜덤 포레스트 모델(Random forest model) 및 의사결정 나무 모델(Decision tree model)을 이용하여 개별 바이오마커 유전자의 발현 데이터 정보를 생성함으로써 난소암의 예후와 관련이 있는 유전자를 선별하는 것인 방법. The method according to claim 1, Selecting genes related to the prognosis of ovarian cancer by generating expression data information of individual biomarker genes using a random forest model and a decision tree model how to be.
  6. 청구항 1에 있어서, 상기 난소암은 병기가 3기 또는 4기인 것인 방법. The method according to claim 1, wherein the ovarian cancer is stage 3 or stage 4.
  7. 청구항 1의 방법에 의해 구성된 난소암 진단을 위한 바이오마커 패널.A biomarker panel for diagnosing ovarian cancer constructed by the method of claim 1.
  8. ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, TLN1, USP19 및 ZFAND6로 구성된 군에서 선택되는 하나 이상의 바이오마커의 발현 수준을 측정하는 제제를 포함하는, 난소암 진단을 위한 바이오마커 패널. At least one biologic selected from the group consisting of ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, TLN1, USP19 and ZFAND6 A biomarker panel for diagnosing ovarian cancer, comprising an agent for measuring the expression level of the marker.
  9. 청구항 8에 있어서, 상기 바이오마커는 세포의 전사체 데이터로부터 획득된 것인 바이오마커 패널.The biomarker panel of claim 8 , wherein the biomarker is obtained from transcript data of a cell.
  10. 청구항 8에 있어서, 상기 바이오마커의 발현 수준을 측정하는 제제는 프라이머쌍, 프보르 또는 안티센스 뉴클레오티드인 것인 바이오마커 패널.The biomarker panel of claim 8, wherein the agent for measuring the expression level of the biomarker is a primer pair, Fbor, or an antisense nucleotide.
  11. 청구항 8에 있어서, 상기 바이오마커의 발현 수준을 측정하는 제제는 항체인 것인 바이오마커 패널.The biomarker panel of claim 8, wherein the agent for measuring the expression level of the biomarker is an antibody.
  12. 개체로부터 분리된 시료에서 ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, TLN1, USP19 및 ZFAND6로 구성된 군에서 선택되는 하나 이상의 바이오마커의 발현 수준을 측정하는 단계; 및 상기 바이오마커의 수준을 대조군 시료의 해당 마커의 상응하는 결과와 비교하는 단계를 포함하는, 난소암의 예후를 예측하는 방법.group consisting of ARAF, ARFGAP2, IFIT1, IRF9, LMO7, MTA2, POLR2A, RAB12, RARG, RPL23P8, RPL23, RTCA, SCAP, SELENON, SERP1, SIAH2, TES2, TIMM17B, TLN1, USP19 and ZFAND6 in a sample isolated from a subject Measuring the expression level of one or more biomarkers selected from; and comparing the level of the biomarker with a corresponding result of the corresponding marker in a control sample.
PCT/KR2019/016773 2019-11-29 2019-11-29 Method for forming biomarker panel for diagnosing ovarian cancer and biomarker panel for diagnosing ovarian cancer WO2021107232A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/KR2019/016773 WO2021107232A1 (en) 2019-11-29 2019-11-29 Method for forming biomarker panel for diagnosing ovarian cancer and biomarker panel for diagnosing ovarian cancer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/KR2019/016773 WO2021107232A1 (en) 2019-11-29 2019-11-29 Method for forming biomarker panel for diagnosing ovarian cancer and biomarker panel for diagnosing ovarian cancer

Publications (1)

Publication Number Publication Date
WO2021107232A1 true WO2021107232A1 (en) 2021-06-03

Family

ID=76128792

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2019/016773 WO2021107232A1 (en) 2019-11-29 2019-11-29 Method for forming biomarker panel for diagnosing ovarian cancer and biomarker panel for diagnosing ovarian cancer

Country Status (1)

Country Link
WO (1) WO2021107232A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115232874A (en) * 2021-09-24 2022-10-25 四川大学华西第二医院 Application of long-chain non-coding RNA in regulation and control of ovarian cancer progression

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120087885A (en) * 2007-06-29 2012-08-07 안국약품 주식회사 Predictive markers for ovarian cancer
KR20170032892A (en) * 2017-03-13 2017-03-23 순천대학교 산학협력단 Selection method of predicting genes for ovarian cancer prognosis
KR20180036622A (en) * 2016-09-30 2018-04-09 서울대학교산학협력단 Apparatus and method for integrated analysis of gene expression omnibus's gene expression data
KR20190000168A (en) * 2017-06-22 2019-01-02 한국과학기술원 System and method for selecting multi-marker panels
KR20200055590A (en) * 2018-11-13 2020-05-21 의료법인 성광의료재단 Method for constructing a biomarker panel for diagnosing ovarian cancer and a biomarker panel for diagnosing ovarian cancer

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120087885A (en) * 2007-06-29 2012-08-07 안국약품 주식회사 Predictive markers for ovarian cancer
KR20180036622A (en) * 2016-09-30 2018-04-09 서울대학교산학협력단 Apparatus and method for integrated analysis of gene expression omnibus's gene expression data
KR20170032892A (en) * 2017-03-13 2017-03-23 순천대학교 산학협력단 Selection method of predicting genes for ovarian cancer prognosis
KR20190000168A (en) * 2017-06-22 2019-01-02 한국과학기술원 System and method for selecting multi-marker panels
KR20200055590A (en) * 2018-11-13 2020-05-21 의료법인 성광의료재단 Method for constructing a biomarker panel for diagnosing ovarian cancer and a biomarker panel for diagnosing ovarian cancer

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LUO N, GUO J, CHEN L, YANG W, QU X, CHENG Z: "ARHGAP10, downregulated in ovarian cancer, suppresses tumorigenicity of ovarian cancer cells", CELL DEATH AND DISEASE, vol. 7, 2016, pages e2157, XP055832616 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115232874A (en) * 2021-09-24 2022-10-25 四川大学华西第二医院 Application of long-chain non-coding RNA in regulation and control of ovarian cancer progression
CN115232874B (en) * 2021-09-24 2023-04-07 四川大学华西第二医院 Application of long-chain non-coding RNA in regulation and control of ovarian cancer progression

Similar Documents

Publication Publication Date Title
Helby et al. Shorter leukocyte telomere length is associated with higher risk of infections: a prospective study of 75,309 individuals from the general population
AU2012261820B2 (en) Molecular diagnostic test for cancer
CN103502473B (en) The prediction of gastro-entero-pancreatic tumor (GEP-NEN)
Liu et al. LRRK2 but not ATG16L1 is associated with Paneth cell defect in Japanese Crohn’s disease patients
JP5749171B2 (en) Biomarker
JP2016526888A (en) Sepsis biomarkers and their use
KR102170726B1 (en) Method for selecting biomarker and method for providing information for diagnosis of cancer using thereof
Ge et al. Prognostic values of immune scores and immune microenvironment-related genes for hepatocellular carcinoma
Shen et al. Identification of molecular biomarkers for pancreatic cancer with mRMR shortest path method
WO2007100913A2 (en) Genes differentially expressed in bipolar disorder and/or schizophrenia
WO2017216559A1 (en) Predicting responsiveness to therapy in prostate cancer
Zhang et al. Prognostic value of sorting nexin 10 weak expression in stomach adenocarcinoma revealed by weighted gene co-expression network analysis
WO2020091316A1 (en) Biomarker panel for determining molecular subtype of lung cancer, and use thereof
Wang et al. Molecular mechanisms and prognostic markers in head and neck squamous cell carcinoma: a bioinformatic analysis
Paul et al. Transcriptome and DNA methylome analyses reveal underlying mechanisms for the racial disparity in uterine fibroids
Salvetat et al. A game changer for bipolar disorder diagnosis using RNA editing-based biomarkers
Steinbrink et al. The host transcriptional response to Candidemia is dominated by neutrophil activation and heme biosynthesis and supports novel diagnostic approaches
WO2021107232A1 (en) Method for forming biomarker panel for diagnosing ovarian cancer and biomarker panel for diagnosing ovarian cancer
KR102360822B1 (en) Method for constructing a biomarker panel for diagnosing ovarian cancer and a biomarker panel for diagnosing ovarian cancer
CN104204223B (en) For the method for the in-vitro diagnosis or prognosis of carcinoma of testis
WO2010085124A2 (en) Marker for liver-cancer diagnosis and recurrence and survival prediction, a kit comprising the same, and prognosis prediction in liver-cancer patients using the marker
CN104169434A (en) A method for the in vitro diagnosis or prognosis of ovarian cancer
KR20220154618A (en) Biomarker composition for diagnosing of liver fibrosis using analysis of genome wide association study and DNA methylation from large cohort
WO2021101146A1 (en) Biomarker composition for predicting prognosis or determining progression stage of chronic liver disease
KR20220039065A (en) Novel biomarker for predicting drug-responsibility to colon cancer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19954438

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19954438

Country of ref document: EP

Kind code of ref document: A1