KR102042242B1

KR102042242B1 - Target gene screening method and apparatus based multi-omics data and survival analysis

Info

Publication number: KR102042242B1
Application number: KR1020180075201A
Authority: KR
Inventors: 공구; 김형용; 이정연
Original assignee: (주)인실리코젠
Priority date: 2018-06-29
Filing date: 2018-06-29
Publication date: 2019-11-07

Abstract

According to the present invention, a target gene screening method using survival analysis comprises the steps of: receiving multi-omics data about whole-genome; selecting a plurality of genes amplified above a reference value based on gene expression data included in the multi-omics data and determining a degree of association with a specific disease for each of the selected genes; and mathematically processing the degree of association and a correlation between the expression of the selected gene and the number of gene copies of the selected gene for each of the selected genes to determine a target score. A maximum significance boundary value for each gene is determined based on survival data included in the multi-omics data, and a target gene is selected from the genes filtered based on the maximum significance boundary value.

Description

TARGET GENE SCREENING METHOD AND APPARATUS BASED MULTI-OMICS DATA AND SURVIVAL ANALYSIS}

이하 설명하는 기술은 멀티오믹스 데이터를 이용하여 특정 질환에 대한 표적 유전자를 스크리닝하는 기법에 관한 것이다.The techniques described below relate to techniques for screening target genes for specific diseases using multiomic data.

차세대 염기서열분석(NGS, Next-generation sequencing) 기술의 발전에 따라 암(종양) 연구는 유전체, 전사체, 후성유전체, 단백체 등 다양한 오믹스 계층에 걸쳐 수행되어 있다. With the development of next-generation sequencing (NGS) technology, cancer (tumor) research is being conducted across a variety of ohmic layers, including genomes, transcripts, epigenetics, and proteins.

멀티오믹스(multi-omics) 자료에는 동일 환자에 대한 다양한 계층의 오믹스 자료가 존재하며 임상 정보가 연결되어 있다. 이들을 활용한 통합 분석은 유방암의 발병원인을 이해할 뿐 아니라 신규 암 표적 유전자를 발굴하는데 중요한 자료로 활용된다. 예컨대, 유방암 관련하여 공개된 주요 멀티오믹스 데이터세트는 METABRIC (Molecular Taxonomy of Breast Cancer International Consortium), TCGA (The Cancer Genome Atlas) 등이 있다.Multi-omics data includes multiple layers of ohmic data for the same patient and links clinical information. The integrated analysis using these data is not only to understand the causes of breast cancer, but also as an important data for identifying new cancer target genes. For example, the major multiomic datasets published in connection with breast cancer include the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC), The Cancer Genome Atlas (TCGA), and the like.

멀티오믹스 자료 분석을 통하여 암 표적 유전자 탐색에 도움을 주는 다양한 프로그램 도구들이 알려져 있다. cBioPortal (http://cbioportal.org)은 TCGA를 비롯한 공개된 여러 암종의 주요 데이터세트들에 대한 접근을 용이하게 하고, 각 데이터세트 별 멀티오믹스 자료 탐색, 가시화, 유전 변이 분석, 유전자간 상관분석, 생존분석, 네트워크 분석 등 다양한 통합 분석 기능을 제공한다.Various program tools are known to assist in the search for cancer target genes through multi-omic data analysis. cBioPortal (http://cbioportal.org) facilitates access to key datasets from several open carcinomas, including TCGA, and allows multidata analysis, visualization, genetic variation analysis, and cross-gene correlation for each dataset. It provides various integrated analysis functions such as analysis, survival analysis and network analysis.

최근 개발된 유방암 통합분석 플랫폼인 BCIP (Breast Cancer Integrated Platform, http://www.omicsnet.org/bcancer/)는 TCGA, METABRIC, GEO 데이터세트 등에 포함된 유방암 관련 멀티오믹스 자료들을 활용하여 여러 임상 요인에 따른 유전자 차등 발현, 상관분석, 생존분석, 유전자 기능 네트워크 분석 등 다양한 기능을 제공한다.Recently developed breast cancer integrated analysis platform, BCIP (Breast Cancer Integrated Platform, http://www.omicsnet.org/bcancer/) utilizes multiple clinical data related to breast cancer in TCGA, METABRIC, and GEO datasets. It provides various functions such as gene differential expression, correlation analysis, survival analysis and gene function network analysis according to factors.

Wang XS, Prensner JR, Chen G, Cao Q, Han B, Dhanasekaran SM, Ponnala R, Cao X, Varambally S, Thomas DG, Giordano TJ, Beer DG, Palanisamy N, Sartor MA, Omenn GS, Chinnaiyan AM,An integrative approach to reveal driver gene fusions from paired-end sequencing data in cancer. Nat Biotechnol. 2009 Nov;27(11):1005-11.Wang XS, Prensner JR, Chen G, Cao Q, Han B, Dhanasekaran SM, Ponnala R, Cao X, Varambally S, Thomas DG, Giordano TJ, Beer DG, Palanisamy N, Sartor MA, Omenn GS, Chinnaiyan AM, An integrative approach to reveal driver gene fusions from paired-end sequencing data in cancer. Nat Biotechnol. 2009 Nov; 27 (11): 1005-11. Kim JA, Tan Y, Wang X, Cao X, Veeraraghavan J, Liang Y, Edwards DP, Huang S, Pan X, Li K, Schiff R. and Wang XS, Comprehensive functional analysis of the tousled-like kinase 2 frequently amplified in aggressive luminal breast cancers. Nature Communications. 2016.Kim JA, Tan Y, Wang X, Cao X, Veeraraghavan J, Liang Y, Edwards DP, Huang S, Pan X, Li K, Schiff R. and Wang XS, Comprehensive functional analysis of the tousled-like kinase 2 frequently amplified in aggressive luminal breast cancers. Nature Communications. 2016.

종래 기술은 개별 유전자 이상에 대한 임상 영향을 평가지 못하거나, 전장유전체(genome-wide) 수준에서 표적 유전자를 선별하기 어려웠다. 전장 유전체 수준에서 유방암 표적 유전자를 선별하기 위해서는 유전적 변이와 암과의 연관도를 수치화하는 ConSig 점수, 유전자 간 합성치사(synthetic lethality) 네트워크 등이 함께 고려되어야 하지만, 기존의 웹 애플리케이션 도구들은 이러한 기능들을 제공하고 있지 않다.The prior art has not been able to assess the clinical impact on individual gene abnormalities, or it is difficult to select target genes at the genome-wide level. Screening for breast cancer target genes at the full-length genome requires consideration of ConSig scores, which quantify the association between genetic variation and cancer, and synthetic lethality networks between genes. Not providing them.

이하 설명하는 기술은 멀티오믹스 데이터를 이용하여 전장 유전체 수준에서 표적 유전자를 스크리닝하고자 한다. 이하 설명하는 기술은 낮은 복잡도를 갖는 생존 분석에 기반하여 표적 유전자를 스크리닝하고자 한다.The technique described below seeks to screen target genes at the full-length genome level using multi-omic data. The technique described below seeks to screen target genes based on survival assays with low complexity.

생존분석을 이용한 표적 유전자 스크리닝 방법은 전장 유전체에 대한 멀티오믹스(multi-omics) 데이터를 입력받는 단계, 상기 멀티오믹스 데이터에 포함된 유전자 발현 데이터를 기준으로 기준치 이상 증폭된 복수의 유전자를 선별하고, 선별한 유전자 각각에 대하여 특정 질환의 연관도를 결정하는 단계 및 상기 선별한 유전자 각각에 대하여 상기 선별한 유전자의 발현과 상기 선별한 유전자의 유전자 복제수 사이의 상관 관계와 상기 연관도를 수학적으로 처리하여 타깃 점수를 결정하는 단계를 포함한다. Target gene screening method using a survival analysis step of receiving multi-omics data for the full-length genome, selecting a plurality of genes amplified above the reference value based on the gene expression data contained in the multi-omic data Determining a degree of association of a specific disease with respect to each of the selected genes, and performing a correlation between the expression of the selected gene and the number of gene copies of the selected gene for each of the selected genes. Processing to determine the target score.

생존분석을 이용한 표적 유전자 스크리닝 장치는 전장 유전체에 대한 멀티오믹스(multi-omics) 데이터를 입력받는 입력장치, 상기 멀티오믹스 데이터를 이용하여 복수의 유전자 각각에 대하여 특정 질환과의 연관도를 연산하고, 상기 유전자 각각의 발현과 유전자 양태 사이의 상관 관계와 상기 연관도를 수학적 연산하여 타깃 점수를 연산하는 프로그램을 저장하는 저장 장치 및 상기 프로그램을 사용하여 상기 복수의 유전자 중 기준치 이상 증폭된 유전자를 선별하고, 선별한 유전자에 대하여 상기 타깃 점수를 결정하는 연산 장치를 포함한다.,Target gene screening apparatus using survival analysis is an input device that receives multi-omics data for the full-length genome, and calculates the degree of association with a specific disease for each of a plurality of genes using the multi-omic data And a storage device for storing a program for calculating a target score by mathematically calculating the correlation between the expression of each of the genes and the gene aspect, and a gene amplified above a reference value among the plurality of genes by using the program. And an arithmetic device for selecting and determining the target score with respect to the selected gene.

이하 설명하는 기술은 전장 유전체에 기반하여 표적 유전자를 스크리닝하면서도, 임상 정보를 반영하여 낮은 복잡도로 유전자를 선별할 수 있다. 나아가 이하 설명하는 기술은 표적 유전자에 대한 약물정보와 합성치사 표적 제공 기능을 기반으로 임상 영향을 고려한 표적 유전자를 선별할 수 있다.The technique described below can screen genes with low complexity while screening target genes based on the full-length genome, reflecting clinical information. Furthermore, the technology described below can select target genes in consideration of clinical effects based on drug information and synthetic lethal target providing function for the target gene.

도 1은 생존분석을 이용한 표적 유전자 스크리닝 방법에 대한 순서도의 예이다.
도 2는 생존분석을 이용한 표적 유전자 스크리닝 방법에 대한 순서도의 다른 예이다.
도 3은 생존분석을 이용한 표적 유전자 스크리닝 방법에 대한 순서도의 또 다른 예이다.
도 4는 생존분석을 이용한 표적 유전자 스크리닝 방법에 대한 순서도의 또 다른 예이다.
도 5는 유전자에 대한 최대 유의 경계값 결정 과정에 대한 예이다.
도 6은 표적 유전자 스크리닝하는 분석 장치에 대한 예이다.
도 7은 표적 유전자 스크리닝 장치가 동작하는 과정에 대한 예이다.
도 8은 표적 유전자 스크리닝 결과에 대한 예이다.
도 9는 표적 유전자 스크리닝 장치가 동작하는 과정에 대한 다른 예이다.
도 10은 표적 유전자 스크리닝 결과에 대한 다른 예이다.
도 11은 표적 유전자 스크리닝 시스템에서 이용하는 데이터 세트와 주요 기능에 대한 예이다.
도 12는 표적 유전자 스크리닝 시스템을 이용한 분석 화면의 예이다.
도 13은 표적 유전자 스크리닝 시스템을 이용한 분석 화면의 다른 예이다.
도 14는 표적 유전자 스크리닝 시스템을 이용한 생존 분석에 대한 예이다.
도 15는 표적 유전자 스크리닝 시스템을 이용한 표적 유전자 스크리닝 과정에 대한 예이다.
도 16은 표적 유전자 스크리닝 시스템을 이용한 표적 유전자 스크리닝 과정에 대한 다른 예이다.1 is an example of a flow chart for a target gene screening method using survival analysis.
Figure 2 is another example of a flow chart for the target gene screening method using survival analysis.
3 is another example of a flow chart for a target gene screening method using survival analysis.
Figure 4 is another example of a flow chart for the target gene screening method using survival analysis.
5 is an example of the process of determining the maximum significance threshold value for a gene.
6 is an example of an assay device for screening target genes.
7 is an example of the operation of the target gene screening device.
8 is an example of the target gene screening results.
9 is another example of the operation of the target gene screening device.
10 is another example of target gene screening results.
11 is an example of the data set and key functions used in the target gene screening system.
12 is an example of an analysis screen using the target gene screening system.
13 is another example of an analysis screen using a target gene screening system.
14 is an example for survival analysis using the target gene screening system.
15 is an example of a target gene screening procedure using the target gene screening system.
16 is another example of a target gene screening procedure using the target gene screening system.

이하 설명하는 기술은 다양한 변경을 가할 수 있고 여러 가지 실시례를 가질 수 있는 바, 특정 실시례들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나, 이는 이하 설명하는 기술을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 이하 설명하는 기술의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.The following description may be made in various ways and have a variety of embodiments, specific embodiments will be illustrated in the drawings and described in detail. However, this is not intended to limit the technology described below to specific embodiments, it should be understood to include all changes, equivalents, and substitutes included in the spirit and scope of the technology described below.

본 명세서에서 사용되는 용어에서 단수의 표현은 문맥상 명백하게 다르게 해석되지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함한다" 등의 용어는 설시된 특징, 개수, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 의미하는 것이지, 하나 또는 그 이상의 다른 특징들이나 개수, 단계 동작 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 배제하지 않는 것으로 이해되어야 한다.As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It is to be understood that the present invention means that there is a part or a combination thereof, and does not exclude the presence or addition possibility of one or more other features or numbers, step operation components, parts or combinations thereof.

도면에 대한 상세한 설명을 하기에 앞서, 본 명세서에서의 구성부들에 대한 구분은 각 구성부가 담당하는 주기능 별로 구분한 것에 불과함을 명확히 하고자 한다. 즉, 이하에서 설명할 2개 이상의 구성부가 하나의 구성부로 합쳐지거나 또는 하나의 구성부가 보다 세분화된 기능별로 2개 이상으로 분화되어 구비될 수도 있다. 그리고 이하에서 설명할 구성부 각각은 자신이 담당하는 주기능 이외에도 다른 구성부가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성부 각각이 담당하는 주기능 중 일부 기능이 다른 구성부에 의해 전담되어 수행될 수도 있음은 물론이다.Prior to the detailed description of the drawings, it is intended to clarify that the division of the components in the present specification is only divided by the main function of each component. That is, two or more components to be described below may be combined into one component, or one component may be provided divided into two or more according to more detailed functions. Each of the components to be described below may additionally perform some or all of the functions of other components in addition to the main functions of the components, and some of the main functions of each of the components are different. Of course, it may be carried out exclusively by.

또, 방법 또는 동작 방법을 수행함에 있어서, 상기 방법을 이루는 각 과정들은 문맥상 명백하게 특정 순서를 기재하지 않은 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 과정들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In addition, in carrying out the method or operation method, each process constituting the method may occur differently from the stated order unless the context clearly indicates a specific order. That is, each process may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.

이하 설명하는 기술은 특절 질환과 관련된 표적 유전자를 스크리닝하는 기법이다. 이하 설명하는 기술은 ConSig-Amp 기법을 응용하여 표적 유전자를 스크리닝한다. The technique described below is a technique for screening target genes associated with special diseases. The technique described below screens for target genes by applying the ConSig-Amp technique.

표적 유전자 스크리닝 과정은 컴퓨터 장치를 통해 수행된다. 컴퓨터 장치는 특정한 프로그램을 실행하여 표적 유전자를 스크리닝할 수 있다. 컴퓨터 장치는 유전자 발현 데이터를 입력받고, 일정한 연산을 수행하여 표적 유전자 후보를 결정할 수 있다. 컴퓨터 장치는 PC, 스마트기기, 서버 등과 같은 장치에 해당할 수 있다. 표적 유전자 스크리닝 과정은 서버에서 동작하는 플랫폼을 통해 수행될 수도 있다. 이하 표적 유전자 스크리닝을 수행하는 컴퓨터 장치를 분석 장치라고 명명한다.Target gene screening procedures are performed via a computer device. The computer device may run a particular program to screen for target genes. The computer device may receive gene expression data and perform certain operations to determine target gene candidates. The computer device may correspond to a device such as a PC, a smart device, a server, or the like. The target gene screening process may be performed via a platform running on the server. Hereinafter, a computer device for performing target gene screening is called an analysis device.

먼저 ConSig-Amp 기법에 대하여 간략하게 설명한다. 전장 유전체(genome-wide) 수준에서 표적 유전자 선별을 지원하는 기법은 존재하지 않는다. 다만 유사한 기법으로 ConSig(Concept Signature)-amp 분석이라는 기법이 존재한다. ConSig-Amp 기법은 멀티오믹스 데이터세트를 사용한다. ConSig-Amp는 TCGA 데이터세트 ER+ 아형(subtype)에서 특정 빈도 이상으로 유전자 증폭된 유전자들을 대상으로 암 연관도인 ConSig 점수와 선형 CNAs - 유전자 발현 상관계수를 결합한 ConSig-Amp 점수로 유전자 증폭 표적 유전자를 선별한다. 먼저 종래 ConSig-Amp 기법에 대하여 간략하게 설명한다. First, the ConSig-Amp technique is briefly described. There is no technique to support selection of target genes at the genome-wide level. However, there is a similar technique called ConSig (Concept Signature) -amp analysis. The ConSig-Amp technique uses a multiomic dataset. ConSig-Amp uses a ConSig-Amp score that combines a cancer-related ConSig score with a linear CNAs-gene expression correlation coefficient for genes amplified above a certain frequency in the TCGA dataset ER + subtype. Select. First, the conventional ConSig-Amp technique will be briefly described.

ConSig-Amp 기법은 컨셉(concept)이라는 개념을 사용한다. 분자 컨셉(molecular concept)은 분자 상호작용(molecular interaction), 유전자 주석(gene annotation) 및 대사 경로(pathway)를 포함한다. 시그니처 컨셉(signature concept)은 분자 컨셉에 암 관련 유전자를 배치하여 특정 유전자가 암 발병 및 진행에 관여하는 것을 나타낸다. 결국 ConSig-Amp 기법은 유전자와 유전자가 관련된 다양한 생물학적 작용을 분석하여, 특정 유전자가 암 발병에 관련되었는지를 밝혀내는 모델에 해당한다.The ConSig-Amp technique uses a concept called concept. Molecular concepts include molecular interactions, gene annotations, and metabolic pathways. The signature concept places cancer-related genes in a molecular concept, indicating that certain genes are involved in cancer development and progression. After all, the ConSig-Amp technique is a model for analyzing whether a gene is involved in cancer by analyzing genes and their various biological actions.

ConSig 점수를 결정하는 과정에 대하여 간략하게 설명한다. 입력된 유전자 데이터세트에서 암 유발과 관련된 유전자를 식별한다. 입력된 유전자에 대하여 융합(fusion) 또는 점(point) 변이에 대한 ConSig 점수를 연산한다. ConSig 점수는 아래 수학식 1을 통해 결정될 수 있다. The process of determining ConSig scores is briefly described. Identify genes associated with cancer induction in the input genetic dataset. ConSig scores for fusion or point variation are calculated for the input gene. ConSig score can be determined through Equation 1 below.

k는 특정 유전자와 연관된 컨셉(concept)의 개수이다. n_i는 전체 유전자의 개수이다. x_i는 컨셉 i에 참여하는 변이 유전자의 개수이다. k is the number of concepts associated with a particular gene. n _i is the total number of genes. x _i is the number of variant genes participating in concept i.

ConSig 점수는 컨셉 i에 관련된 변이 유전자에 대하여 측정된 신호를 합산한 값에 해당한다. ConSig 점수에서 정규화 팩터 k는 점수의 정규화를 위한 값이기도 하다. ConSig 점수는 특정 유전자가 특정 암의 발병과 진행에 영향을 주는 정도를 나타낼 수 있다. ConSig 점수가 높다면, 해당 유전자가 변이에 깊게 관여한다는 의미이다. 2차원에서 ConSig 점수 도표(plot)는 유전자와 암의 연관성을 나타낼 수 있다. The ConSig score corresponds to the sum of the measured signals for the variant genes related to Concept i. The normalization factor k in the ConSig score is also a value for normalization of the score. ConSig scores can indicate the extent to which certain genes affect the development and progression of certain cancers. High ConSig scores indicate that the gene is deeply involved in the mutation. In two dimensions, a ConSig score plot can indicate the association between genes and cancer.

ConSig-Amp 점수는 전술한 ConSig 점수에 일정한 상관 계수를 곱하여 결정할수 있다. 여기서 상관 계수는 유전자 발현과 유전자 복제수 사이의 상관관계를 의미한다. 예컨대, 유전자 복제수는 CNV(Copy number alteration, DNA 복제수변이)를 사용할 수 있다. 한편 상관 계수는 스피어만 상관계수(Spearman correlation coefficient)를 사용할 수 있다.The ConSig-Amp score may be determined by multiplying the ConSig score by a predetermined correlation coefficient. Correlation coefficient here means a correlation between gene expression and gene copy number. For example, the gene copy number may use CNV (Copy number alteration). On the other hand, the Spearman correlation coefficient may be used as the correlation coefficient.

ConSig-Amp 기법은 ConSig-Amp 점수를 기준으로 암과 같은 특정 질환과 관련성 깊은 타깃 유전자를 스크리닝할 수 있다. ConSig-Amp 기법은 유전체 변이(amplification)와 전사체 발현(RNA expression)의 상관분석이라는 멀티오믹스 분석의 응용 사례이다. ConSig-Amp 기법은 유의한 후보 표적을 찾게 한 유용한 방법이지만, 임상 영향을 고려하는 과정이 포함되지 않았다.The ConSig-Amp technique can screen target genes that are closely related to a specific disease, such as cancer, based on the ConSig-Amp score. The ConSig-Amp technique is an application of multi-omic analysis called the correlation between genome amplification and transcript expression. The ConSig-Amp technique is a useful way to find significant candidate targets, but does not involve taking into account clinical impact.

이하 설명하는 기술은 ConSig-Amp 기법을 개량한 새로운 표적 유전자 스크리닝 기법에 해당한다. 이하 설명하는 기술은 임상 정보를 사용하여 전장 유전체에 대해서도 빠르게 표적 유전자를 스크리닝한다. 도 1은 생존분석을 이용한 표적 유전자 스크리닝 방법(100)에 대한 순서도의 예이다. 도 1은 종래 ConSig-Amp 기법에 기반한 방법에 해당한다.The technique described below corresponds to a new target gene screening technique that is an improvement on the ConSig-Amp technique. The technique described below rapidly screens target genes even for full length genomes using clinical information. 1 is an example of a flow chart for a target gene screening method 100 using survival analysis. 1 corresponds to a method based on the conventional ConSig-Amp technique.

먼저 분석 장치는 멀티오믹스 데이터를 획득한다(110). 멀티오믹스 데이터는 유전체(Genome), 전사체(Transcriptome), 단백체(Proteome), 대사체(Metabolome), 후성유전체(Epigenome), 지질체(Lipodome) 등 다양한 분자 수준에서 생성된 여러 데이터들을 포함한다. 멀티오믹스 데이터는 종래 공개된 데이터베이스(DB)로부터 획득할 수 있다. 멀티오믹스 데이터는 다양한 DB로부터 데이터를 수집하여 마련될 수 있다. 멀티오믹스 데이터는 연구자의 실험 데이터를 포함할 수 있다. 유전체 발현 데이터는 NGS 분석을 통해 마련될 수 있다. 유전체 발현 데이터는 분석 기관의 서버로부터 수신할 수 있다.First, the analysis device obtains multi-mix data (110). Multiomic data includes data generated at various molecular levels, including genomes, transcriptomes, proteins, metabolomes, epigenomes, and lipids. . Multi-Omix data can be obtained from a previously published database (DB). Multi-omic data can be prepared by collecting data from a variety of DB. The multiomic data may include the researcher's experimental data. Genomic expression data can be prepared via NGS analysis. Genomic expression data can be received from a server of an analysis institution.

분석 장치는 멀티오믹스 데이터를 이용하여 ConSig 점수를 결정한다(120). 그리고 분석 장치는 멀티오믹스 데이터를 이용하여 유전자 발현과 복제수의 상관 관계를 분석한다. 분석 장치는 ConSig 점수에 상관 관계를 반영하여 타깃 점수를 결정한다(130). 분석 장치는 ConSig 점수에 상관 관계를 곱하여 타깃 점수를 결정할 수 있다. 이 경우 타깃 점수는 ConSig-Amp 점수에 해당한다. 경우에 따라서 분석 장치는 ConSig 점수와 상관 관계를 다른 수학적 연산으로 처리하여 다른 값을 산출할 수도 있다. 분석 장치는 최종적인 타깃 점수를 기반으로 표적 유전자를 스크리닝한다(140). 예컨대, 분석 장치는 유전자 중 타깃 점수가 일정한 기준치 이상인 유전자를 후보 유전자로 선별할 수 있다.The analyzing apparatus determines the ConSig score using the multi-omic data (120). The analysis device analyzes the correlation between gene expression and the number of copies using the multi-omic data. The analyzing apparatus determines the target score by reflecting the correlation in the ConSig score (130). The analysis device may determine the target score by multiplying the correlation by the ConSig score. In this case, the target score corresponds to the ConSig-Amp score. In some cases, the analysis device may calculate a different value by processing the correlation with the ConSig score by another mathematical operation. The analysis device screens the target gene based on the final target score (140). For example, the analysis device may select as a candidate gene a gene whose target score is equal to or greater than a predetermined reference value.

멀티오믹스 데이터에서 제공하는 유전자별 정량 정보에는 CNV (Copy-number variation), 마이크로어레이(microarray) 혹은 DNA-seq 매핑 깊이로 알 수 있는 선형 CNAs (linear CNAs) 정보, 발현 마이크로어레이 혹은 mRNA-, miRNA-seq 으로 알 수 있는 유전자 발현 정보, DNA 메틸화 마이크로어레이로 알 수 있는 DNA 메틸화 정보 등이 있다. ConSig-Amp 점수 산정에 사용되는 상관 관계는 유전자 발현과 유전자의 복제수 사이의 관계를 의미한다. 따라서 CNAs를 이용하여 유전자 복제수 내지 복제수 변이를 확인할 수 있다. 상관 관계는 CNAs-유전자 발현의 관계일 수 있다.Genetic quantitative information provided by multi-omic data includes linear CNAs information, expression microarrays or mRNA-, which can be identified by CNV (copy-number variation), microarray or DNA-seq mapping depth. gene expression information known by miRNA-seq, DNA methylation information known by DNA methylation microarrays, and the like. The correlation used in the ConSig-Amp score calculation refers to the relationship between gene expression and the number of copies of the gene. Therefore, gene copy number to copy number variation can be confirmed using CNAs. The correlation may be a relationship of CNAs-gene expression.

이때 분석 장치는 멀티오믹스 데이터 또는 별도의 데이터로부터 생존 정보(생존 데이터)를 추출하고, 특정 유전자에 대한 생존 분석을 수행할 수 있다. 분석 장치는 특정 유전자에 대한 생존 분석을 기준으로, 해당 유전자에 대한 최대 유의 경계값을 결정할 수 있다. 분석 장치는 특정 유전자에 대한 최대 유의 경계값을 기준으로 해당 유전자를 필터링할 수 있다. 예컨대, 분석 장치는 최대 유의 경계값은 과발현 유전자만을 대상으로 표적 유전자를 선별할 수 있다. 최대 유의 경계값 결정 과정은 후술한다. 분석 장치는 유전자 각각에 대하여 최대 유의 경계값을 결정할 수 있다.In this case, the analysis apparatus may extract survival information (survival data) from the multi-omic data or separate data, and perform survival analysis on a specific gene. The assay device may determine the maximum significance threshold for that gene based on the survival analysis for that particular gene. The analysis device may filter the genes based on the maximum significant boundary value for the particular gene. For example, the analysis device may select a target gene targeting only an overexpressed gene having a maximum significance threshold. The process of determining the maximum significant boundary value will be described later. The assay device may determine the maximum significance threshold for each gene.

분석 장치는 몇 가지 단계에서 분석 대상 유전자를 필터링할 수 있다. (1) 하나는 ConSig 점수를 산정하는 과정이다. 분석 장치는 유전자 발현 데이터를 기준으로 기준치 이상 증폭된 유전자를 선별한다. 분석 장치는 기준치 이상 증폭된 유전자 선별 전 또는 선별 후에, 유전자에 대한 최대 유의 경계값을 기준으로 특정 유전자를 필터링할 수 있다(150). (2) 다른 하나는 타깃 점수를 산정하는 과정이다. 분석 장치는 타깃 점수를 산정하는 과정에서 최대 유의 경계값을 기준으로 특정 유전자를 필터링하고, 필터링되어 남은 유전자를 대상으로 타깃 점수를 산정할 수 있다. (3) 또는 분석 장치는 타깃 점수를 모두 산정한 후에 최대 유의 경계값을 기준으로 후보 유전자를 필터링할 수도 있다. 분석 장치는 최대 유의 경계값을 기준으로 특정 유전자를 필터링하고, 필터링되어 남은 유전자에 대한 타깃 점수를 기준으로 표전 유전자를 선별할 수 있다.The assay device can filter the genes to be analyzed in several steps. (1) One is the process of calculating ConSig scores. The assay device selects genes amplified above the reference value based on the gene expression data. The analysis device may filter the specific gene based on the maximum significant threshold value for the gene before or after selection of the gene amplified above the reference value (150). (2) The other is the process of calculating the target score. In the process of calculating the target score, the analysis device may filter a specific gene based on the maximum significant boundary value, and calculate a target score for the filtered remaining gene. (3) Alternatively, the analysis apparatus may filter the candidate genes based on the maximum significant boundary value after calculating all the target scores. The analysis device may filter a specific gene based on the maximum significant boundary value, and select the representative gene based on the target score for the remaining gene.

도 2는 생존분석을 이용한 표적 유전자 스크리닝 방법(100)에 대한 순서도의 다른 예이다. 도 2는 도 1의 표적 유전자 스크리닝 방법(100)에 대한 다른 실시예에 해당한다. 2 is another example of a flow chart for a target gene screening method 100 using survival analysis. 2 corresponds to another embodiment of the target gene screening method 100 of FIG. 1.

분석 장치는 멀티오믹스 데이터의 임상 정보를 이용하여, 특정 아형 혹은 임상 특징의 환자들만 필터링한다(115). 이는 해당 아형 혹은 임상 특징에 맞는 표적 유전자 선별을 가능하게 한다. 이후 과정은 특정 아형 이나 임상 특징에 근거하여 필터링된 데이터만을 대상으로 수행된다.The analysis device uses the clinical information of the multiomic data to filter only patients of a particular subtype or clinical feature (115). This allows the selection of target genes for the subtype or clinical feature. Subsequent processes are performed only on data filtered based on specific subtypes or clinical characteristics.

종래 ConSig-Amp 방법은 다양한 유전변이 종류 가운데 유전자 증폭만을 고려하였다. DNA 메틸화 역시 중요한 암 후성 유전변이이고, 중요한 표적이 될 수 있다. ConSig-Amp 방법이 CNAs-유전자 발현 상관 분석에 기반한 것과 유사하게 DNA 메틸화-유전자 발현 상관 분석을 이용할 수 있다. 유전자 발현과 DNA 메틸화의 상관 관계를 이용하는 기법을 이하 ConSig-Met 기법이라고 명명한다. 분석 장치는 멀티오믹스 데이터 중 DNA 메틸화 정보를 이용하여 상관 관계를 분석할 수 있다.The conventional ConSig-Amp method only considers gene amplification among various genetic variants. DNA methylation is also an important cancer epigenetic mutation and can be an important target. DNA methylation-gene expression correlation analysis may be used similarly to the ConSig-Amp method based on CNAs-gene expression correlation analysis. Techniques that exploit the correlation between gene expression and DNA methylation are termed ConSig-Met techniques hereinafter. The analyzing apparatus may analyze the correlation using DNA methylation information in the multi-omic data.

도 3은 생존분석을 이용한 표적 유전자 스크리닝 방법(200)에 대한 순서도의 또 다른 예이다. 도 3은 ConSig-Met 기법이다. 3 is another example of a flow chart for a target gene screening method 200 using survival analysis. 3 is a ConSig-Met technique.

분석 장치는 멀티오믹스 데이터를 획득한다(210). 분석 장치는 멀티오믹스 데이터를 이용하여 ConSig 점수를 결정한다(220). 그리고 분석 장치는 멀티오믹스 데이터를 이용하여 유전자 발현과 DNA 메틸화의 상관 관계를 분석한다. 분석 장치는 ConSig 점수에 상관 관계를 반영하여 타깃 점수를 결정한다(230). 분석 장치는 ConSig 점수에 상관 관계를 곱하여 타깃 점수를 결정할 수 있다. 유전자 발현과 DNA 메틸화의 상관 관계를 이용하여 연산된 타깃 점수를 ConSig-Met 점수라고 명명한다. 경우에 따라서 분석 장치는 ConSig 점수와 상관 관계를 다른 수학적 연산으로 처리하여 다른 값을 산출할 수도 있다. 분석 장치는 최종적인 타깃 점수를 기반으로 표적 유전자를 스크리닝한다(240). 예컨대, 분석 장치는 유전자 중 타깃 점수가 일정한 기준치 이상인 유전자를 후보 유전자로 선별할 수 있다.The analysis apparatus acquires multi-mix data (210). The analyzing apparatus determines a ConSig score using the multi-omic data (220). The analytical device analyzes the correlation between gene expression and DNA methylation using multi-omic data. The analyzing apparatus determines the target score by reflecting the correlation in the ConSig score (230). The analysis device may determine the target score by multiplying the correlation by the ConSig score. The target score calculated using the correlation between gene expression and DNA methylation is called the ConSig-Met score. In some cases, the analysis device may calculate a different value by processing the correlation with the ConSig score by another mathematical operation. The assay device screens 240 target genes based on the final target scores. For example, the analysis device may select as a candidate gene a gene whose target score is equal to or greater than a predetermined reference value.

이때 분석 장치는 멀티오믹스 데이터 또는 별도의 데이터로부터 생존 정보(생존 데이터)를 추출하고, 특정 유전자에 대한 생존 분석을 수행할 수 있다. 분석 장치는 특정 유전자에 대한 생존 분석을 기준으로, 해당 유전자에 대한 최대 유의 경계값을 결정할 수 있다. 분석 장치는 특정 유전자에 대한 최대 유의 경계값을 기준으로 해당 유전자를 필터링할 수 있다(250). 예컨대, 분석 장치는 최대 유의 경계값은 과발현 유전자만을 대상으로 표적 유전자를 선별할 수 있다. 분석 장치는 유전자 각각에 대하여 최대 유의 경계값을 결정할 수 있다.In this case, the analysis apparatus may extract survival information (survival data) from the multi-omic data or separate data, and perform survival analysis on a specific gene. The assay device may determine the maximum significance threshold for that gene based on the survival analysis for that particular gene. The analysis device may filter the gene based on the maximum significant boundary value for the specific gene (250). For example, the analysis device may select a target gene targeting only an overexpressed gene having a maximum significance threshold. The assay device may determine the maximum significance threshold for each gene.

분석 장치는 몇 가지 단계에서 분석 대상 유전자를 필터링할 수 있다. (1) 하나는 ConSig 점수를 산정하는 과정이다. 분석 장치는 유전자 발현 데이터를 기준으로 기준치 이상 증폭된 유전자를 선별한다. 분석 장치는 기준치 이상 증폭된 유전자 선별 전 또는 선별 후에, 유전자에 대한 최대 유의 경계값을 기준으로 특정 유전자를 필터링할 수 있다. (2) 다른 하나는 타깃 점수를 산정하는 과정이다. 분석 장치는 타깃 점수를 산정하는 과정에서 최대 유의 경계값을 기준으로 특정 유전자를 필터링하고, 필터링되어 남은 유전자를 대상으로 타깃 점수를 산정할 수 있다. (3) 또는 분석 장치는 타깃 점수를 모두 산정한 후에 최대 유의 경계값을 기준으로 후보 유전자를 필터링할 수도 있다. 분석 장치는 최대 유의 경계값을 기준으로 특정 유전자를 필터링하고, 필터링되어 남은 유전자에 대한 타깃 점수를 기준으로 표전 유전자를 선별할 수 있다.The assay device can filter the genes to be analyzed in several steps. (1) One is the process of calculating ConSig scores. The assay device selects genes amplified above the reference value based on the gene expression data. The analysis device may filter a specific gene based on the maximum significant threshold for the gene before or after selection of the gene amplified above the reference value. (2) The other is the process of calculating the target score. In the process of calculating the target score, the analysis device may filter a specific gene based on the maximum significant boundary value, and calculate a target score for the filtered remaining gene. (3) Alternatively, the analysis apparatus may filter the candidate genes based on the maximum significant boundary value after calculating all the target scores. The analysis device may filter a specific gene based on the maximum significant boundary value, and select the representative gene based on the target score for the remaining gene.

도 4는 생존분석을 이용한 표적 유전자 스크리닝 방법(200)에 대한 순서도의 또 다른 예이다. 도 4는 도 3의 표적 유전자 스크리닝 방법에 대한 다른 실시예이다.4 is another example of a flow chart for a target gene screening method 200 using survival analysis. 4 is another embodiment of the target gene screening method of FIG. 3.

분석 장치는 멀티오믹스 데이터를 획득한다(210). 분석 장치는 멀티오믹스 데이터의 임상 정보를 이용하여, 특정 아형 혹은 임상 특징의 환자들만 필터링한다(215). 이는 해당 아형 혹은 임상 특징에 맞는 표적 유전자 선별을 가능하게 한다. 이후 과정은 특정 아형 이나 임상 특징에 근거하여 필터링된 데이터만을 대상으로 수행된다.The analysis apparatus acquires multi-mix data (210). The analysis device uses the clinical information of the multi-omic data to filter only patients of a particular subtype or clinical feature (215). This allows the selection of target genes for the subtype or clinical feature. Subsequent processes are performed only on data filtered based on specific subtypes or clinical characteristics.

분석 장치는 멀티오믹스 데이터를 이용하여 ConSig 점수를 결정한다(220). 그리고 분석 장치는 멀티오믹스 데이터를 이용하여 유전자 발현과 DNA 메틸화의 상관 관계를 분석한다. 분석 장치는 ConSig 점수에 상관 관계를 반영하여 타깃 점수를 결정한다(230). 분석 장치는 ConSig 점수에 상관 관계를 곱하여 타깃 점수를 결정할 수 있다. 유전자 발현과 DNA 메틸화의 상관 관계를 이용하여 연산된 타깃 점수를 ConSig-Met 점수라고 명명한다. 경우에 따라서 분석 장치는 ConSig 점수와 상관 관계를 다른 수학적 연산으로 처리하여 다른 값을 산출할 수도 있다. 분석 장치는 최종적인 타깃 점수를 기반으로 표적 유전자를 스크리닝한다(240). 예컨대, 분석 장치는 유전자 중 타깃 점수가 일정한 기준치 이상인 유전자를 후보 유전자로 선별할 수 있다.The analyzing apparatus determines a ConSig score using the multi-omic data (220). The analytical device analyzes the correlation between gene expression and DNA methylation using multi-omic data. The analyzing apparatus determines the target score by reflecting the correlation in the ConSig score (230). The analysis device may determine the target score by multiplying the correlation by the ConSig score. The target score calculated using the correlation between gene expression and DNA methylation is called the ConSig-Met score. In some cases, the analysis device may calculate a different value by processing the correlation with the ConSig score by another mathematical operation. The assay device screens 240 target genes based on the final target scores. For example, the analysis device may select as a candidate gene a gene whose target score is equal to or greater than a predetermined reference value.

유전자 과발현과 같은 정량정보는 과발현 그룹과 정상 그룹을 구분하기 위한 경계값 정보가 필요하다. 경계값은 유전자마다 다를 수 있으며, 생존분석 유의도(significance)가 가장 높은 경계값(cutoff)을 이용할 수 있다. 다만 최대 유의 경계값은 직접 모든 경계값에 대한 유의도를 계산함으로써 알 수 있는데 이는 과도한 계산을 필요로 한다. 이러한 문제를 해결하기 위하여 분석 장치는 두 단계로 최대 유의 경계값을 결정할 수 있다. Quantitative information such as gene overexpression requires boundary value information to distinguish between overexpression group and normal group. Thresholds can vary from gene to gene, and cutoffs with the highest survival analysis significance can be used. However, the maximum significant boundary value can be known by directly calculating the significance of all boundary values, which requires excessive calculation. In order to solve this problem, the analysis apparatus may determine the maximum significant boundary value in two steps.

도 5는 유전자에 대한 최대 유의 경계값 결정 과정에 대한 예이다. 분석 장치는 유전자 각각에 대하여 최대 유의 경계값을 결정할 수 있다. 분석 장치는 유전자 발현 정도(발현량)와 유전자 발현 유의도를 기준으로 최대 유의 경계값을 결정할 수 있다. 유전자 발현 유의도는 P값으로 정의될 수 있다. 여기서 유의도는 생준 분석에 대한 유의도를 의미한다. 도 5에서 가로축은 변위치(발현량)이고, 세로축은 유의도(-logP)이다. 도 5(A)는 분석 장치가 1차로 유의 영역을 결정하는 예이다. 분석 장치는 먼저 최대 유의 경계값을 포함하는 광역 범위를 설정한다. 분석 장치는 유의도를 기준으로 일정한 1차 유의 영역을 결정할 수 있다. 예컨대, 분석 장치는 특정한 범위의 유의도를 나타내는 변위치 범위를 1차 유의 영역으로 결정할 수 있다. 도 5(A)에서 A로 표시한 영역이 1차 유의 영역에 대한 예이다. 또는 분석 장치는 최대 유의도를 갖는 변위치를 중심으로 일정한 범위로 1차 유의 영역을 설정할 수 있다.5 is an example of the process of determining the maximum significance threshold value for a gene. The assay device may determine the maximum significance threshold for each gene. The analysis device may determine the maximum significance threshold based on the gene expression level (expression amount) and the gene expression significance. Gene expression significance can be defined as a P value. The significance here means the significance for the analysis. In FIG. 5, the horizontal axis represents displacement (expression), and the vertical axis represents significance (-logP). 5A is an example in which the analysis device primarily determines the region of significance. The analyzer first sets a wide range that includes the maximum significant boundary value. The analyzing apparatus may determine a constant primary significance region based on the significance level. For example, the analysis device may determine a range of displacement values representing a specific range of significance as the region of primary significance. A region indicated by A in FIG. 5A is an example of a primary significance region. Alternatively, the analysis apparatus may set the primary significance region in a predetermined range around the displacement value having the maximum significance.

이후 분석 장치는 1차 유의 영역에 포함된 모든 경계값 각각에 대하여 유의도를 연산한다. 분석 장치는 1차 유의 영역에 포함된 경계값 중 가장 생존분석 유의도가 높은 경계값을 최대 유의 경계값으로 결정한다. 도 5(B)는 1차 유의 영역 범위에서 최대 유의 경계값을 결정한 예를 도시한다. 도 5(B)에서 최대 유의 경계값을 B로 표시하였다.The analysis device then calculates the significance level for each of the boundary values included in the primary significance region. The analyzing apparatus determines a boundary value having the highest survival analysis significance among the boundary values included in the primary significance region as the maximum significance boundary value. 5B illustrates an example in which the maximum significance boundary value is determined in the primary significance region range. In FIG. 5B, the maximum significant boundary value is denoted by B.

분석 장치는 이와 같은 2 단계 유의도 분석을 통해 최대 유의 경계값을 결정한다. 분석 장치가 모든 경계값에 대한 유의도를 연산하지 않기에, 낮은 복잡도로 최대 유의 경계값을 결정할 수 있다.The analysis apparatus determines the maximum significance boundary value through the two-step significance analysis. Since the analysis device does not calculate the significance for all boundary values, it is possible to determine the maximum significance boundary value with low complexity.

도 6은 표적 유전자 스크리닝하는 분석 장치에 대한 예이다. 도 6은 분석 장치를 포함하는 시스템을 도시한다. 6 is an example of an assay device for screening target genes. 6 shows a system including an analysis device.

도 6(A)는 PC와 같은 분석 장치(350)에 대한 예이다. 도 6(A)는 DB(310) 및 분석 장치(350)를 포함하는 시스템에 대한 예이다. 분석 장치(350)는 DB(310)로부터 멀티오믹스 데이터를 수신한다. DB(310)는 전술한 멀티오믹스 데이터 및 임상 정보를 보유한다. 분석 장치(350)는 전장 유전체에 대한 멀티오믹스 데이터를 분석하여 전술한 ConSig-Amp 또는 ConSig-Met 기법으로 표적 유전자를 스크리닝한다. 분석 장치(350)는 전술한 바와 같이 각 유전자에 대한 최대 유의 경계값을 기준으로 필터링을 수행하여, 복잡도 낮은 표적 유전자 선별을 수행한다.6A is an example of an analysis device 350 such as a PC. 6A is an example of a system that includes a DB 310 and an analysis device 350. The analysis device 350 receives the multiomic data from the DB 310. DB 310 holds the above-described multi-omic data and clinical information. The analysis device 350 analyzes the multiomemic data of the full-length genome and screens the target gene by the above-described ConSig-Amp or ConSig-Met technique. As described above, the analysis apparatus 350 performs filtering based on the maximum significant boundary value for each gene, thereby performing a low complexity target gene selection.

도 6(B)는 네트워크상의 서버와 같은 분석 장치(450)에 대한 예이다. 도 6(B)는 DB(410), 분석 장치(450) 및 클라이언트 장치(50)를 포함하는 시스템에 대한 예이다. 분석 장치(450)는 DB(410)로부터 멀티오믹스 데이터를 수신한다. DB(410)는 전술한 멀티오믹스 데이터 및 임상 정보를 보유한다. 경우에 따라서 분석 장치(450)는 클라이언트 장치(50)로부터 멀티오믹스 데이터를 수신할 수도 있다. 클라이언트 장치(50)는 개인 PC, 스마트 기기 등일 수 있다. 클라이언트 장치(50)는 일정한 질환에 대한 표적 유전자 스크리닝을 분석 장치(450)에 요청할 수 있다. 분석 장치(450)는 전장 유전체에 대한 멀티오믹스 데이터를 분석하여 전술한 ConSig-Amp 또는 ConSig-Met 기법으로 표적 유전자를 스크리닝한다. 분석 장치(450)는 전술한 바와 같이 각 유전자에 대한 최대 유의 경계값을 기준으로 필터링을 수행하여, 복잡도 낮은 표적 유전자 선별을 수행한다. 분석 장치(450)는 스크리닝 결과(후보 표적 유전자)를 클라이언트 장치(50)에 전달할 수 있다.6B is an example of an analysis device 450 such as a server on a network. 6B is an example of a system including a DB 410, an analysis device 450, and a client device 50. The analyzing apparatus 450 receives the multiomic data from the DB 410. DB 410 holds the above-described multi-omic data and clinical information. In some cases, the analysis device 450 may receive multi-omic data from the client device 50. The client device 50 may be a personal PC, a smart device, or the like. Client device 50 may request analysis device 450 for target gene screening for certain diseases. The analysis device 450 analyzes the multiomemic data of the full-length genome and screens the target gene by the above-described ConSig-Amp or ConSig-Met technique. As described above, the analysis apparatus 450 performs filtering based on the maximum significant boundary value for each gene, and performs a low complexity target gene selection. The assay device 450 may deliver the screening result (candidate target gene) to the client device 50.

도 6(C)는 분석 장치(500)의 구성을 도시한 블록도의 예이다. 분석 장치(500)는 전술한 분석 장치(350) 또는 분석 장치(450)에 해당한다. 분석 장치(500)는 입력 장치(510), 저장장치(520), 연산장치(530) 및 출력장치(540)를 포함한다.FIG. 6C is an example of a block diagram showing the configuration of the analyzer 500. The analyzing apparatus 500 corresponds to the analyzing apparatus 350 or the analyzing apparatus 450 described above. The analyzer 500 includes an input device 510, a storage device 520, an arithmetic device 530, and an output device 540.

입력장치(510)는 전장 유전체에 대한 멀티오믹스 데이터를 입력받는다. 입력장치(510)는 키보드, 마우스, 터치패드와 같은 물리적인 인터페이스 장치일 수 있다. 또는 입력장치(510)는 외부 저장매체(USB 등)로부터 저장된 멀티오믹스 데이터를 수신하는 장치일 수도 있다. 또는 입력장치(510)는 외부 네트워크로부터 멀티오믹스 데이터를 수신하는 통신 장치일 수도 있다.The input device 510 receives multi-omic data for the full length dielectric. The input device 510 may be a physical interface device such as a keyboard, a mouse, or a touch pad. Alternatively, the input device 510 may be a device that receives stored multi-omic data from an external storage medium (USB, etc.). Alternatively, the input device 510 may be a communication device for receiving multi-mix data from an external network.

저장장치(520)는 표적 유전자 스크리닝을 위한 프로그램을 저장한다. 프로그램은 멀티오믹스 데이터를 이용하여 복수의 유전자 각각에 대하여 특정 질환과의 연관도를 연산하고, 유전자 각각의 발현과 유전자 양태 사이의 상관 관계를 연산한다. 유전자 양태는 유전자 발현에 따른 정량 정보에 해당한다. 정량 정보는 유전자 발현량, 유전자 복제수(변이량) 및 DNA 메틸화 중 적어도 하나를 포함한다. 프로그램은 전술한 연관도와 상관 관계를 수학적으로 연산하여 타깃 점수를 연산한다. 프로그램은 타깃 점수를 기준으로 표적 유전자를 선별한다. Storage 520 stores a program for target gene screening. The program calculates the degree of association with a particular disease for each of the plurality of genes using the multiomic data and calculates the correlation between the expression of each gene and the gene aspect. Gene aspects correspond to quantitative information according to gene expression. The quantitative information includes at least one of gene expression amount, gene copy number (variation amount) and DNA methylation. The program calculates a target score by mathematically calculating the correlation and the correlation described above. The program selects target genes based on target scores.

저장장치(520)는 수신한 멀티오믹스 데이터, 표적 유전자 선별과정에서 발생하는 데이터, 선별한 표적 후보 유전자 등에 대한 정보를 저장할 수 있다.The storage device 520 may store the received multi-omic data, data generated during the target gene selection process, and information on the selected target candidate gene.

연산장치(530)는 프로그램을 이용하여 표적 유전자를 스크리닝한다. 연산장치(530)는 복수의 유전자 중 기준치 이상 증폭된 유전자를 선별하고, 선별한 유전자에 대하여 타깃 점수를 연산한다. 연산장치(530)는 멀티오믹스 데이터에 포함된 생존 데이터를 기준으로 유전자 각각에 대한 최대 유의 경계값을 결정하고, 최대 유의 경계값을 기준으로 필터링된 유전자 중에서 표적 유전자를 선별할 수 있다. 연산 장치(530)는 CPU, AP(Application processor) 등과 같이 프로그램을 통해 특정한 연산을 처리하는 프로세서 장치를 의미한다.The computing device 530 screens the target gene using a program. The calculating device 530 selects a gene amplified above a reference value among a plurality of genes, and calculates a target score for the selected gene. The operation unit 530 may determine a maximum significance boundary value for each gene based on survival data included in the multi-omic data, and select a target gene from the filtered genes based on the maximum significance boundary value. The computing device 530 refers to a processor device that processes a specific operation through a program such as a CPU, an application processor (AP), or the like.

출력장치(540)는 표적 유전자 분석 결과를 출력하는 장치이다. 출력장치(540)는 영상을 출력하는 디스플레이 장치, 텍스트를 출력하는 프린터 등일 수 있다. 나아가 출력장치(540)는 분석한 결과를 다른 장치에 전달하는 통신 장치일 수도 있다.The output device 540 is a device for outputting a target gene analysis result. The output device 540 may be a display device for outputting an image, a printer for outputting text, or the like. Furthermore, the output device 540 may be a communication device for transmitting the result of analysis to another device.

설명하는 기술은 특정 질환에 대한 표적 유전자 스크리닝 기법이다. 다만 설명의 편의를 위하여 유전자 스크리닝 과정에 대한 실험 결과는 유방암을 기준으로 설명한다. 다만 이하 설명하는 표적 유전자 스크리닝 기법은 특정 암이나 질환에만 적용되는 것은 아니다. The technique described is a target gene screening technique for a particular disease. However, for convenience of explanation, the experimental results of the gene screening process will be described based on breast cancer. However, the target gene screening technique described below is not applicable only to a specific cancer or disease.

도 7은 표적 유전자 스크리닝 장치가 동작하는 과정(600)에 대한 예이다. 도 7는 유전자 증폭에 기반한 표적 유전자 선별에 대한 예이다. 즉, 도 7는 ConSig-Amp 기법에 기반한다.7 is an example of a process 600 in which the target gene screening device operates. 7 is an example for target gene selection based on gene amplification. That is, FIG. 7 is based on the ConSig-Amp technique.

DB(310, 410)는 도 6에서 설명한 데이터베이스이다. DB(310, 410)는 멀티오믹스 데이터세트 및 임상정보 데이터를 저장한다. 나아가 DB(310, 410)는 커스텀(custom) 데이터세트를 저장할 수 있다. 커스텀 데이터세트는 공개되지 않은 데이터로서, 특정 연구실에서 연구 및 실험 결과로 도출된 정보를 포함한다. DBs 310 and 410 are the databases described with reference to FIG. 6. DBs 310 and 410 store multi-omic datasets and clinical information data. Furthermore, DBs 310 and 410 can store custom datasets. Custom datasets are unpublished data that contain information derived from research and experiments in a particular laboratory.

예컨대, 유방암 관련 멀티오믹스 데이터세트는 METABRIC과 TCGA를 사용할 수 있다. METABRIC 데이터세트의 선형 CNAs 정보는 Synapse (https://www.synapse.org/#!Synapse:syn1688369/wiki/27311)에서 제공하는 자료를 사용하여 보완할 수 있다. For example, a breast cancer related multiomic dataset may use METABRIC and TCGA. Linear CNAs information in the METABRIC dataset can be supplemented using data provided by Synapse (https://www.synapse.org/#!Synapse:syn1688369/wiki/27311).

유방암 관련 임상정보 데이터는 GEO (https://www.ncbi.nlm.nih.gov/geo/) 에서 GEOparse (https://github.com/guma44/GEOparse) 프로그램을 이용하여 확보할 수 있다. Breast cancer-related clinical information data can be obtained from GEO (https://www.ncbi.nlm.nih.gov/geo/) using the GEOparse (https://github.com/guma44/GEOparse) program.

암 표적 유전자 탐색을 위한 보조자료로 NCBI (National Center for Biotechnology Information) Entrez Gene 인간 유전자 정보, MSigDB (Molecular Signature Database) 유전자셋 정보, TTD (Therapeutic Target Database) 표적 약물 정보, DAISY (Data mining synthetic lethality identification pipeline) 합성치사 네트워크 정보가 사용될 수 있다.National Center for Biotechnology Information (NCBI) Entrez Gene human gene information, MSigDB (Molecular Signature Database) geneset information, TTD (Therapeutic Target Database) target drug information, DAISY (Data mining synthetic lethality identification) pipeline) Synthetic lethal network information may be used.

한편 DB(310, 410)에 저장된 데이터는 서로 포맷이 다를 수 있다. 이 경우 분석 장치가 처리하기 위하여 사전에 일정하게 전처리할 필요가 있다. 분석 장치가 사용하는 데이터 구조를 공통데이터 구조라고 한다.Data stored in the DBs 310 and 410 may have different formats. In this case, it is necessary to uniformly pretreat the processing device in advance in order to process it. The data structure used by the analysis device is called a common data structure.

유방암 멀티오믹스 데이터세트는 pandas (https://pandas.pydata.org) 프로그램 라이브러리를 이용하여 구조화할 수 있다. 데이터세트별로 임상 정보를 환자 데이터 프레임에 오믹스 계층별, 유전자별 정보는 각각 선형 CNAs, CNAs, 발현, DNA 메틸화 데이터 프레임에 공통 데이터 구조에 맞추어 정리할 수 있다.Breast cancer multiomic datasets can be structured using the pandas (https://pandas.pydata.org) library. Clinical information per dataset can be organized according to common data structures in the linear CNAs, CNAs, expression, and DNA methylation data frames, respectively.

샘플(환자) 식별번호를 키(key)로 하여 각각의 데이터세트에 접근할 수 있도록 데이터를 처리할 수 있다. GEOparse 프로그램으로 확보한 GEO series 는 일련의 프로그래밍 과정을 통해 공통 데이터 구조에 맞도록 변환할 수 있다. The data can be processed so that each dataset can be accessed using a sample (patient) identification number as a key. The GEO series obtained with the GEOparse program can be converted to fit common data structures through a series of programming processes.

임상정보는 GSM (GEO sample) 레코드의 “Characteristics” 필드 내용을 파싱(parsing)하여 사용할 수 있다. 특히 환자 팔로우업(follow-up) 정보를 생존분석 가능한 형태로 변환할 수 있다. Clinical information can be used by parsing the contents of the “Characteristics” field of the GSM (GEO sample) record. In particular, patient follow-up information can be converted into a form that can be analyzed for survival.

나아가 유전자 발현 등 정량 정보가 정규화되어 있지 않은 경우, 별도의 log₂ 변환과 정규화 변환을 수행할 수 있다. 유전자 주석(annotation) 정보는 데이터세트마다 상이한데, Probe ID, NCBI Entrez Gene ID, 유전자 심볼이 항상 포함될 수 있도록 편집 프로그램(예컨대, mygene)을 이용하여 보완할 수 있다.Furthermore, when quantitative information such as gene expression is not normalized, separate log ₂ transformation and normalization transformation may be performed. Gene annotation information is different for each dataset, and can be supplemented by using an editing program (eg, mygene) to always include Probe ID, NCBI Entrez Gene ID, and gene symbol.

분석 장치는 유전자 증폭 정도(amplification frequency)를 확인한다(610). 분석 장치는 기준치 이상 증폭된 유전자를 선별한다. 분석 장치는 선별한 유전자를 대상으로 유전자 발현과 복제수의 상관 관계를 확인한다(620). 예컨대, 분석 장치는 CNA-유전자 발현 상관 관계를 확인할 수 있다.The analysis device checks the gene amplification frequency (610). The assay device selects genes amplified above the reference value. The analysis apparatus checks the correlation between the gene expression and the copy number of the selected gene (620). For example, the assay device can confirm the CNA-gene expression correlation.

분석 장치는 선별한 유전자와 특정 질환(예컨대, 유방암)과의 연관도에 해당하는 ConSig 점수를 연산하고, ConSig 점수에 상관 관계를 반영하여 ConSig-Amp 점수를 연산한다(630). 분석 장치는 ConSig 점수에 상관 관계를 곱하여 ConSig-Amp 점수 연산할 수 있다. 물론 전술한 바와 같이 ConSig-Amp 점수가 아닌 다른 유형의 점수를 타깃 점수로 결정할 수도 있다. The analyzing apparatus calculates a ConSig score corresponding to the degree of association between the selected gene and a specific disease (eg, breast cancer), and calculates a ConSig-Amp score by reflecting the correlation in the ConSig score (630). The analyzing apparatus may calculate the ConSig-Amp score by multiplying the correlation by the ConSig score. Of course, as described above, other types of scores other than the ConSig-Amp score may be determined as the target score.

분석 장치는 생존 분석에 기반한 필터링을 수행한다(640). 생존분석은 lifelines (https://github.com/CamDavidsonPilon/lifelines)를 이용하여, Kaplan-Meier, 로그순위 검정, Cox 회귀분석 등을 할 수 있다. Cox 회귀분석은 유전자 정량 수치를 단일 변수로 수행한 경우와, 그룹을 나눈 후, 범주형 변수로 수행한 경우에 대해 각각 수행할 수 있다.The analysis device performs filtering based on survival analysis (640). Survival analysis can use Kaplan-Meier, log rank test, Cox regression analysis using lifelines (https://github.com/CamDavidsonPilon/lifelines). Cox regression analysis can be performed for the case where the gene quantitative value is performed by a single variable, the group is divided, and then the categorical variable.

분석 장치는 생존 분석에 대한 최대 유의 경계값을 결정하고, 최대 유의 경계값을 기준으로 유전자를 필터링한다. 유전자 필터링 과정에 대해서는 전술한 바와 같다. (1) 분석 장치는 기준치 이상 증폭된 유전자를 최대 유의 경계값으로 다시 한번 필터링할 수 있다(도 7에서 점선으로 표시). 이 경우 분석 장치는 필터링한 유전자를 대상으로 ConSig-Amp 점수를 연산하게 된다. (2) 분석 장치는 ConSig-Amp 점수까지 나온 상태에서, 유전자를 필터링(도 7에서 이점 쇄선으로 표시)하여 필터링한 유전자에 대한 ConSig-Amp 점수를 기준으로 표적 유전자를 선별할 수 있다. 분석 장치는 필터링 결과가 반영된 최종 ConSig-Amp 점수를 산출한다(650). The assay device determines the maximum significance threshold for survival analysis and filters genes based on the maximum significance threshold. The gene filtering process is as described above. (1) The analysis device may once again filter the gene amplified above the reference value by the maximum significance boundary value (indicated by the dotted line in FIG. 7). In this case, the analysis device calculates a ConSig-Amp score for the filtered gene. (2) The analysis apparatus may select a target gene based on the ConSig-Amp score for the filtered gene by filtering the gene (indicated by the dashed-dotted line in FIG. 7) in the state up to the ConSig-Amp score. The analyzing apparatus calculates the final ConSig-Amp score reflecting the filtering result (650).

나아가 분석 장치는 약물 DB에 있는 정보를 활용하여, 특정 약물이 유효한 유전자를 확인할 수 있다(660). 약물 DB는 특정 유전자를 억제하거나, 활성할 수 있는 약물에 대한 정보를 포함한다. 따라서 분석 장치는 일정한 후보 유전자 중에서 특정 약물이 유효한 유전자를 표적 유전자로 선별할 수 있다.Further, the analysis apparatus may identify genes for which a particular drug is valid by using information in the drug DB (660). Drug DB contains information about drugs that can inhibit or activate specific genes. Therefore, the analysis device may select a gene for which a specific drug is effective among target candidate genes as a target gene.

또한 분석 장치는 합성 치사(Synthetic lethal) DB에 있는 정보를 활용하여, 표적 유전자를 선별할 수 있다(670). 합성 치사는 두 개 이상의 유전자가 억제(고장)되는 경우 세포가 죽는 현상을 말한다. 암세포는 이미 특정 유전자가 고장이므로, 조합으로 암세포가 죽게 될 유전자를 찾아 그 유전자를 표적으로 하는 약물을 만들면 암세포만 선택적으로 죽일 수 있다. 분석 장치는 따라서 합성 치사 관계에 있는 특정 유전자를 표적 유전자로 선별할 수 있다.In addition, the analysis device may select target genes using information in a synthetic lethal DB (670). Synthetic lethality is the death of cells when two or more genes are inhibited. Because cancer cells already have a specific gene that is broken, if you find a gene that will cause cancer cells to die and make a drug that targets the gene, you can selectively kill only cancer cells. The analytical device may thus select specific genes in synthetic lethality as target genes.

도 8은 표적 유전자 스크리닝 결과에 대한 예이다. 도 8은 TCGA 데이터세트에서 HER2+ 아형 유전자 증폭 표적 후보를 선별하고, METABRIC 데이터세트에서 HER2+ 아형 유전자 증폭 표적 후보를 선별한 예이다. 두 개의 데이터세트를 사용하여 상호 검증한 경우이다. 도 8에서 “*” 표시 유전자는 두 데이터세트 모두에서 유의한 후보 유전자이다. 도 8은 유전자 증폭 빈도 10% 이상 유전자에 대하여 ConSig-Amp 로 상위 30개를 정렬하고, 증폭 혹은 과발현이 유의한 생존분석 결과가 있는 유전자만 표시한 예이다. 도 8에서 Amp Freq.는 유전자 증폭 빈도, Amp OS p는 유전자 증폭에 의한 생존분석 OS log-rank p-value, Exp OS p는 유전자 발현에 의한 생존분석 OS log-rank p-value이고, Exp DFS p: 유전자 발현에 의한 생존분석 DFS log-rank p-value이다. 생존 분석에서 OS는 전체생존(Overall survival)이고, DFS는 무질병생존(Disease-free survival)을 의미한다.8 is an example of the target gene screening results. 8 is an example of selecting HER2 + subtype gene amplification target candidates from a TCGA dataset, and selecting HER2 + subtype gene amplification target candidates from a METABRIC dataset. This is the case where two datasets are used to verify each other. In Figure 8 the "*" gene is a significant candidate gene in both datasets. 8 is an example in which the top 30 are sorted by ConSig-Amp for genes with amplification frequency of 10% or more, and only a gene having a survival analysis result of significant amplification or overexpression is displayed. In Figure 8, Amp Freq. Is the frequency of gene amplification, Amp OS p is a survival analysis OS log-rank p-value by gene amplification, Exp OS p is a survival analysis OS log-rank p-value by gene expression, Exp DFS p: survival analysis by gene expression DFS log-rank p-value. In survival analysis, OS is overall survival, and DFS is disease-free survival.

도 9는 표적 유전자 스크리닝 장치가 동작하는 과정(700)에 대한 다른 예이다. 도 9는 DNA 메틸화에 기반한 표적 유전자 선별에 대한 예이다. 즉, 도 9는 ConSig-Met 기법에 기반한다.9 is another example of a process 700 in which the target gene screening device operates. 9 is an example for target gene selection based on DNA methylation. That is, Figure 9 is based on the ConSig-Met technique.

DB(310, 410)는 도 6에서 설명한 데이터베이스이다. DB(310, 410)는 멀티오믹스 데이터세트 및 임상정보 데이터를 저장한다. 나아가 DB(310, 410)는 커스텀(custom) 데이터세트를 저장할 수 있다. 한편 DB(310, 410)에 저장된 데이터는 서로 포맷이 다를 수 있다. 이 경우 분석 장치가 처리하기 위하여 사전에 일정하게 전처리할 필요가 있다. DBs 310 and 410 are the databases described with reference to FIG. 6. DBs 310 and 410 store multi-omic datasets and clinical information data. Furthermore, DBs 310 and 410 can store custom datasets. Data stored in the DBs 310 and 410 may have different formats. In this case, it is necessary to uniformly pretreat the processing device in advance in order to process it.

분석 장치는 유전자 증폭 정도(amplification frequency)를 확인한다(710). 분석 장치는 기준치 이상 증폭된 유전자를 선별한다. 분석 장치는 선별한 유전자를 대상으로 유전자 발현과 DNA 메틸화의 상관 관계를 확인한다(720).The analysis device checks the amplification frequency of the gene (710). The assay device selects genes amplified above the reference value. The analyzing apparatus checks the correlation between gene expression and DNA methylation in the selected gene (720).

분석 장치는 선별한 유전자와 특정 질환(예컨대, 유방암)과의 연관도에 해당하는 ConSig 점수를 연산하고, ConSig 점수에 상관 관계를 반영하여 ConSig-Met 점수를 연산한다(730). 분석 장치는 ConSig 점수에 상관 관계를 곱하여 ConSig-Met 점수 연산할 수 있다. The analyzing apparatus calculates a ConSig score corresponding to a correlation between the selected gene and a specific disease (eg, breast cancer), and calculates a ConSig-Met score by reflecting the correlation in the ConSig score (730). The analysis device may calculate the ConSig-Met score by multiplying the correlation by the ConSig score.

분석 장치는 생존 분석에 기반한 필터링을 수행한다(740). 생존분석은 lifelines (https://github.com/CamDavidsonPilon/lifelines)를 이용하여, Kaplan-Meier, 로그순위 검정, Cox 회귀분석 등을 할 수 있다. Cox 회귀분석은 유전자 정량 수치를 단일 변수로 수행한 경우와, 그룹을 나눈 후, 범주형 변수로 수행한 경우에 대해 각각 수행할 수 있다.The analysis device performs filtering based on survival analysis (740). Survival analysis can use Kaplan-Meier, log rank test, Cox regression analysis using lifelines (https://github.com/CamDavidsonPilon/lifelines). Cox regression analysis can be performed for the case where the gene quantitative value is performed by a single variable, the group is divided, and then the categorical variable.

분석 장치는 생존 분석에 대한 최대 유의 경계값을 결정하고, 최대 유의 경계값을 기준으로 유전자를 필터링한다. 유전자 필터링 과정에 대해서는 전술한 바와 같다. (1) 분석 장치는 기준치 이상 증폭된 유전자를 최대 유의 경계값으로 다시 한번 필터링할 수 있다(도 9에서 점선으로 표시). 이 경우 분석 장치는 필터링한 유전자를 대상으로 ConSig-Met 점수를 연산하게 된다. (2) 분석 장치는 ConSig-Met 점수까지 나온 상태에서, 유전자를 필터링(도 9에서 이점 쇄선으로 표시)하여 필터링한 유전자에 대한 ConSig-Met 점수를 기준으로 표적 유전자를 선별할 수 있다. 분석 장치는 필터링 결과가 반영된 최종 ConSig-Met 점수를 산출한다(750). The assay device determines the maximum significance threshold for survival analysis and filters genes based on the maximum significance threshold. The gene filtering process is as described above. (1) The analysis device may once again filter the gene amplified above the reference value by the maximum significance boundary value (indicated by the dotted line in FIG. 9). In this case, the analysis device calculates a ConSig-Met score for the filtered gene. (2) The analysis apparatus may select a target gene based on the ConSig-Met score for the filtered gene by filtering the gene (indicated by the dashed-dotted line in FIG. 9) in the state up to the ConSig-Met score. The analyzing apparatus calculates a final ConSig-Met score reflecting the filtering result (750).

나아가 분석 장치는 약물 DB에 있는 정보를 활용하여, 특정 약물이 유효한 유전자를 확인할 수 있다(760). 약물 DB는 특정 유전자를 억제하거나, 활성할 수 있는 약물에 대한 정보를 포함한다. 따라서 분석 장치는 일정한 후보 유전자 중에서 특정 약물이 유효한 유전자를 표적 유전자로 선별할 수 있다.Further, the analysis apparatus may identify genes for which a particular drug is valid by using information in the drug DB (760). Drug DB contains information about drugs that can inhibit or activate specific genes. Therefore, the analysis device may select a gene for which a specific drug is effective among target candidate genes as a target gene.

또한 분석 장치는 합성 치사(Synthetic lethal) DB에 있는 정보를 활용하여, 표적 유전자를 선별할 수 있다(770). 분석 장치는 합성 치사 관계에 있는 특정 유전자를 표적 유전자로 선별할 수 있다.In addition, the analysis apparatus may select the target gene using information in the synthetic lethal DB (770). The assay device may select a particular gene in a synthetic lethal relationship as the target gene.

도 10은 표적 유전자 스크리닝 결과에 대한 다른 예이다. 도 10은 TCGA 데이터세트에서 HER2+ 아형 유전자 증폭 표적 후보를 선별한 결과에 해당한다. 도 10은 ConSig-Met 로 상위 30 개 후보를 정렬하고, 유의한 생존분석 결과가 있는 유전자만 표시한 결과이다. 도 10에서 Exp OS는 유전자 발현에 의한 OS 생존분석, hazard는 생존분석시 위험 그룹이 과발현, 과메틸화 그룹인지(high), 낮은발현, 낮은 메틸화 그룹인지(low) 표시한다. 또 Met OS는 DNA 메틸화에 의한 OS 생존분석, Exp DFS: 유전자 발현에 의한 DFS 생존분석, Met DFS는 DNA 메틸화에 의한 DFS 생존분석을 의미한다.10 is another example of target gene screening results. 10 corresponds to the results of selecting HER2 + subtype gene amplification target candidates from the TCGA dataset. 10 is a result of sorting the top 30 candidates by ConSig-Met and displaying only genes with significant survival analysis results. In FIG. 10, Exp OS indicates OS survival analysis by gene expression and hazard indicates whether a risk group is overexpressed, hypermethylated group (high), low expression, or low methylated group (low) during survival analysis. Met OS refers to OS survival analysis by DNA methylation, Exp DFS: DFS survival analysis by gene expression, Met DFS refers to DFS survival analysis by DNA methylation.

이하 표적 유전자 스크리닝 장치 내지 시스템을 이용하여 분석을 수행하는 과정에 대하여 설명한다. 전술한 표적 유전자 스크리닝 장치 내지 시스템은 연구자가 비공개 네트워크에서 웹 애플리케이션으로 구현하여 결과를 실험하였다. 웹 애플리케이션은 표적 유전자 스크리닝 프로그램이 설치된 웹 서버에서 실행될 수 있다. Hereinafter, a process of performing the analysis using the target gene screening apparatus or system will be described. The target gene screening device or system described above was implemented by a researcher as a web application in a private network, and the results were tested. The web application can be run on a web server on which the target gene screening program is installed.

표적 유전자 스크리닝 시스템은 다양한 방식으로 분석을 수행할 수 있고, 분석 결과를 시각화하여 제공한다. 표적 유전자 스크리닝 시스템은 통합 멀티오믹스 분석에 기반한 표적 유전자 탐색 기능을 온라인에서 수행하게 한다. 이를 위하여 표적 유전자 스크리닝 시스템은 주요 암종의 멀티오믹스 데이터세트를 접근이 용이하도록 구조화하고, 자료 가시화, 유전자 차등 정량 분석, 상관분석, 기능분석, 생존분석 기능과 함께 임상 영향을 고려한 아형별 표적 유전자 선별 기능을 제공할 수 있다.Targeted gene screening systems can perform analysis in a variety of ways, and provide a visualized analysis results. Target gene screening systems enable on-line targeting of genes to search for target genes based on integrated multinomics analysis. For this purpose, the target gene screening system is designed to access the multi-omic dataset of major carcinoma for easy access, and to target data by subtype considering the clinical impact along with data visualization, gene quantitative analysis, correlation analysis, function analysis, and survival analysis functions. Screening functions can be provided.

도 11은 표적 유전자 스크리닝 시스템에서 이용하는 데이터세트와 주요 기능에 대한 예이다. 도 11(A)는 표적 유전자 스크리닝 시스템에서 이용하는 데이터세트와 기능을 도시한 예이다. 표적 유전자 스크리닝 시스템은 METABRIC, TCGA 와 같은 대규모 멀티오믹스 자료를 비롯하여, 유방암 임상 정보를 포함하고 있는 다수의 GEO 데이터세트, 혹은 자체적으로 생성한 암 유전체 연구 결과 데이터세트를 이용할 수 있다. 한편 표적 유전자 스크리닝 시스템은 전술한 바와 같이 각 데이터세트를 시스템의 공통 규격에 맞도록 객체지향 자료 구조로 변환할 수 있다. 표적 유전자 스크리닝 시스템은 NCBI Entrez Gene 인간 유전자 정보, MSigDB 유전자 세트 정보, TTD 표적 약물 정보, DAISY 합성치사 네트워크 정보를 활용할 수 있다.11 is an example of the datasets and key functions used in the target gene screening system. Figure 11 (A) is an example showing the dataset and function used in the target gene screening system. Targeted gene screening systems can use large-scale multiomemic data such as METABRIC and TCGA, multiple GEO datasets containing breast cancer clinical information, or in-house generated genome research datasets. The target gene screening system, on the other hand, can convert each dataset into an object-oriented data structure to meet the common specifications of the system, as described above. The target gene screening system may utilize NCBI Entrez Gene human gene information, MSigDB gene set information, TTD target drug information, DAISY synthetic lethal network information.

표적 유전자 스크리닝 시스템은 다음과 같은 4개의 기능을 제공할 수 있다. 도 11에서 Plot, Survival analysis, Target gene screening 및 Data tables로 표시한 기능이다. Plot은 분석 데이터를 시각화하여 다양한 형태로 표현(자료 가시화)하는 기능을 제공한다. Survival analysis는 분석 데이터를 이용한 생존 분석 기능을 제공한다. Target gene screening은 분석 데이터를 이용한 표적 유전자 스크리닝 기능을 제공한다. Data tables은 분석에 사용하는 데이터를 표시하는 기능을 제공한다. Data tables은 데이터에 대한 정량 정보를 제공할 수 있다.Target gene screening systems can provide four functions: In FIG. 11, the function is represented by Plot, Survival analysis, Target gene screening, and Data tables. Plots provide the ability to visualize analytical data and present it in various forms. Survival analysis provides survival analysis using analytical data. Target gene screening provides target gene screening using analytical data. Data tables provide the ability to display the data used for analysis. Data tables can provide quantitative information about the data.

유전자별 정량 정보에는 CNV 마아크로어레이 혹은 DNA-서열 매핑 깊이로 알 수 있는 선형 CNAs (linear CNAs) 정보, 발현 마이크로어레이 혹은 mRNA-서열, miRNA-서열로 알 수 있는 유전자 발현 정보, DNA 메틸화 마이크로어레이로 알 수 있는 DNA 메틸화 정보 등이 있다. 선형 CNAs 정보는 DNA 복제수(copy-number)를 추정하기 위한 원(raw) 정보이고, 동형삭제(homozygous deletion), 삭제(deletion),보통(neutral), 중복(duplication), 증폭(amplification)로 CNAs 추정(calling)될 수 있다.Genetic quantitative information includes linear CNAs (CNV macroarray) or DNA-sequence mapping depth, expression microarray or mRNA-sequence, miRNA-sequence gene expression information, DNA methylation microarray. DNA methylation information and the like. Linear CNAs information is raw information for estimating DNA copy-number, and is characterized by homozygous deletion, deletion, normal, duplication and amplification. CNAs can be called.

도 11(B)는 멀티오믹스 자료를 일정한 포맷으로 규격화한 예이다. 멀티오믹스 자료는 환자별 임상 정보 테이블과 다양한 오믹스 계층별 환자-유전자 테이블로 정리할 수 있다. TCGA, GEO 등 데이터세트별로 다른 형태의 자료 구조를 통일하기 위하여, 객체지향(object-oriented) 기술의 다형성(polymorphism) 방법으로 공통 규격을 상속받아 차이점만 명시하도록 자료 구조를 체계화할 수 있다. 이러한 전처리를 통하여 표적 유전자 스크리닝 시스템은 다른 형식의 데이터세트도 확장하여 사용하기 용이한 구조를 제공할 수 있다.Fig. 11B is an example in which the multiomix data is standardized in a certain format. Multi-Omix data can be organized into patient-specific clinical information tables and patient-gene tables by various ohmic strata. In order to unify the different data structures for each dataset, such as TCGA and GEO, the data structure can be organized to specify only the differences by inheriting the common standard through the polymorphism method of object-oriented technology. This pretreatment allows the target gene screening system to extend the structure of other types of datasets to provide an easy-to-use structure.

도 12 내지 도 16은 웹 애플리케이션의 사용자 화면과 분석 결과에 대한 예이다. 도 12 내지 도 16은 각각 웹 애플리케이션 인터페이스 화면과 분석을 수행한 예를 도시한다.12 to 16 show examples of a user screen and an analysis result of a web application. 12 to 16 illustrate examples of performing a web application interface screen and an analysis, respectively.

도 12는 표적 유전자 스크리닝 시스템을 이용한 분석 화면의 예이다. 도 12에서 좌측 상단에는 웹 애플리케이션의 출력 화면에 대한 예이다. 상단 탭 메뉴는 좌측부터 Plot, Survival analysis, Target gene screening 및 Data tables이다. 도 12는 Plot(자료가시화) 기능을 수행한 예이다. 12 is an example of an analysis screen using the target gene screening system. 12 shows an example of an output screen of a web application. The top tab menu is Plot, Survival analysis, Target gene screening and Data tables from left. 12 is an example of performing a Plot (data visualization) function.

표적 유전자 스크리닝 시스템은 특정 임상 정보에 따라 해당 유전자가 어떻게 변화하는지 확인하기 위해, 환자별 임상 정보, 아형, 조직 병리 정보 등 다양한 기준으로 유전자별 정량 정보를 Box plot으로 가시화할 수 있다. 유전자 스크리닝 시스템은 통계적 유의성을 확인하기 위해 그룹간 t 검정(Student's t-test), 전체 분산분석 (ANOVA, Analysis of variance), Tukey HSD 사후 검정 결과를 제공할 수 있다. 특히, 표적 유전자 스크리닝 시스템은 전체 임상 범주화 정보에 대한 차등 비교 분석 기능을 제공하여 다양한 임상 차이에 따른 차등 비교가 가능하다.The target gene screening system can visualize the quantitative information of genes in a box plot based on various criteria such as clinical information, subtypes, and histopathological information for each patient in order to check how the gene changes according to specific clinical information. Gene screening systems can provide results of Student's t-test, Analysis of variance (ANOVA), and Tukey HSD post-test to confirm statistical significance. In particular, the target gene screening system provides a differential comparison analysis function for the entire clinical categorization information, allowing differential comparison according to various clinical differences.

유전자를 입력하고, 정량정보의 종류 (선형 CNAs, 유전자 발현, DNA 메틸화) 선택하고, 차등 비교 기준 (Group-by)으로 아형을 선택하면, 유전자 스크리닝 시스템은 해당 정량정보를 차등 비교 기준으로 Box plot 가시화할 수 있다.When genes are input, the types of quantitative information (linear CNAs, gene expression, DNA methylation) are selected, and subtypes are selected as differential comparison criteria (Group-by), the gene screening system uses the box plot based on the differential comparison criteria. Can be visualized.

도 12를 살펴보면, METABRIC 데이터세트에서 유방암 아형별로 ERBB2유전자의 차등 선형 CNAs, 차등 발현을 확인했을 때, HER2+ 아형에서만 특별히 높은 것을 확인할 수 있다(ANOVA p<0.001). 암 2기이면서 화학요법 치료받은 환자들에 대해 재발 그룹과 정상 그룹의 ERBB2 차등 발현을 확인한 결과, 재발 그룹에서 유의하게 발현이 높았다(t 검정 p=0.0152).Referring to FIG. 12, when the differential linear CNAs and differential expression of the ERBB2 gene were identified for each breast cancer subtype in the METABRIC dataset, it was confirmed that the HER2 + subtype was particularly high (ANOVA p <0.001). In patients with stage 2 cancer and chemotherapy, the differential expression of ERBB2 in the relapse group and the normal group was significantly higher in the relapse group (t test p = 0.0152).

도 13은 표적 유전자 스크리닝 시스템을 이용한 분석 화면의 다른 예이다. 유전자별 정량 정보는 다양한 방법으로 상관분석함으로써, 특정 유전자, 특정 오믹스 계층의 정량 정보와 연관된 다른 계층 혹은 다른 유전자가 있는지 확인할 수 있다. 표적 유전자 스크리닝 시스템은 다양한 오믹스 계층간 유전자들이 서로 상호 관여하는지 확인하기 위해, x축 유전자와 정량 정보 종류, y축 유전자와 정량 정보 종류를 각각 선택하고, 임상 정보, 아형 등 소집단(subgroup)을 지정하면, 해당 소집단내에서 산점도(scatter plot)와 상관분석 결과를 제공할 수 있다. 표적 유전자 스크리닝 시스템은 다양한 유전자 이상과 유전자 대한 모든 가능한 조합으로 상관분석이 가능하다. 이는 멀티오믹스 유전자 상관관계를 복합적으로 확인할 수 있다는 장점이 있다. 또한, 입력 유전자와 상관관계, 역(reverse) 상관관계가 높은 유전자 목록을 전장유전체 수준에서 빠르게 계산하여 제공함으로써, 질의 유전자와 높은 상관관계의 유전자들의 현황을 확인할 수 있다.13 is another example of an analysis screen using a target gene screening system. Genetic quantitative information can be correlated in a variety of ways to determine if there are other genes or other genes associated with specific genes, quantitative information of a specific ohmic layer. The target gene screening system selects the x-axis gene, the quantitative information type, the y-axis gene and the quantitative information type, respectively, and checks subgroups, such as clinical information and subtypes, in order to check whether the genes between the various ohmic layers interact with each other. If specified, scatter plots and correlation analyzes can be provided within the subpopulation. Targeted gene screening systems can be correlated with various gene abnormalities and all possible combinations of genes. This has the advantage of being able to check the multi-omic gene correlation in combination. In addition, by providing a list of genes with high correlation and reverse correlation with input genes at the full-length dielectric level, it is possible to check the current status of genes with high correlation with the query genes.

표적 유전자 스크리닝 시스템은 유전자-유전자, 오믹스-오믹스 상관 분석 가시화. x축 유전자와 정량정보 종류, y축 유전자와 정량정보 종류를 입력받으면 산점도로 시각적인 분석 결과를 출력할 수 있다.Targeted gene screening systems visualize gene-gene, ohmic-omic correlation analysis. When the x-axis gene and quantitative information type and the y-axis gene and quantitative information type are input, a visual analysis result may be output as a scatter plot.

도 13을 살펴보면 METABRIC 데이터세트에서 ERBB2 유전자는 선형 CNAs와 유전자 발현간 높은 상관관계가 있다(pearson r=0.79). 또 ERBB2 유전자와 함께 유전자 증폭(coamplification) 되는 것으로 알려진 GRB7 유전자와도 높은 상관관계가 있는 것을 확인할 수 있다 (pearson r=0.96). 표적 유전자 스크리닝 시스템은 유전자 정량 정보 가시화, 차등 및 상관분석 기능을 통해, 임상 특징 등 특정 조건에서 각 유전자 어떻게 변화하고 어떤 유전자와 상호 관련을 갖는지 쉽게 파악하도록 한다. Referring to FIG. 13, the ERBB2 gene in the METABRIC dataset has a high correlation between linear CNAs and gene expression (pearson r = 0.79). In addition, it can be confirmed that there is a high correlation with the GRB7 gene, which is known to be coamplified with the ERBB2 gene (pearson r = 0.96). Targeted gene screening systems make it easy to identify how each gene changes and correlates with which gene under certain conditions, such as clinical characteristics, through quantitative information visualization, differential and correlation analysis.

특정 유전자의 유전적 이상(genetic aberration)이 임상에 유의한 영향을 미치는지 생존분석 방법으로 확인할 수 있다. 표적 유전자 스크리닝 시스템은 멀티오믹스 자료를 이용하여, 유전자 발현뿐 아닌 다양한 유전적 이상에 대하여도 임상 영향 확인이 가능하다. 표적 유전자 스크리닝 시스템은 개별 유전자의 다양한 유전적 이상에 대한 임상 영향을 확인하기 위하여, 유전자 증폭(amplification), 동형 삭제(homozygous deletion), 복제수 증가(gain), 감소(loss) 등 다양한 형태의 CNAs 여부, 돌연변이(SNVs) 여부, 그리고 선형 CNAs, 유전자 발현, DNA 메틸화 정량 정보 등 다양한 유전적 이상에 대한 생존분석 기능을 제공할 수 있다.Survival analysis can be used to determine whether genetic aberration of a particular gene has a significant clinical impact. Targeted genetic screening systems can identify clinical effects on a variety of genetic abnormalities as well as gene expression, using multi-omemic data. Targeted gene screening systems provide a variety of CNAs, including gene amplification, homozygous deletion, gain, and loss, to identify clinical effects on various genetic abnormalities of individual genes. It can provide survival analysis for a variety of genetic abnormalities, including whether they are present, whether they are mutations (SNVs), and linear CNAs, gene expression, and DNA methylation quantification information.

도 14는 표적 유전자 스크리닝 시스템을 이용한 생존 분석에 대한 예이다. 도 14(A)는 정량 정보를 기준으로 생존 분석을 수행하도록 메뉴를 선택한 화면의 예이다. 표적 유전자 스크리닝 시스템은 효과적인 임상 결정을 위한 정량정보 경계값 설정을 위해 "Max-sig cutoff" 기능을 선택하여 최대 생존 유의 경계값으로 생존분석할 수 있는 기능을 제공한다. 도 14(B)는 생존 분석 결과에 대한 예이다. METABRIC 데이터세트에서 확인한 결과, 기존에 알려진 대로 ERBB2 유전자의 증폭 (OS 로그순위 검정 p=0.001), TP53 유전자의 돌연변이 (OS 로그순위 검정 p=0.001), ESR1 유전자의 과발현 (OS 로그순위 검정 p<0.001)을 확인하였다. 또한 TCGA 데이터세트에서 CDH1 유전자의 hyper-methylation (OS 로그순위 검정 p=0.012)이 생존에 유의한 영향을 미치는 것으로 확인하였다. 14 is an example for survival analysis using the target gene screening system. 14A is an example of a screen in which a menu is selected to perform survival analysis on the basis of quantitative information. The target gene screening system provides the ability to analyze survival with the maximum survival significance threshold by selecting the "Max-sig cutoff" function to set the quantitative threshold for effective clinical determination. 14B is an example of a survival analysis result. As shown in the METABRIC dataset, amplification of the ERBB2 gene (OS log rank test p = 0.001), mutation of the TP53 gene (OS log rank test p = 0.001), overexpression of the ESR1 gene (OS log rank test p < 0.001). In addition, hyper-methylation of the CDH1 gene (OS log rank test p = 0.012) in the TCGA dataset was found to have a significant effect on survival.

도 15는 표적 유전자 스크리닝 시스템을 이용한 표적 유전자 스크리닝 과정에 대한 예이다. 도 15는 표적 유전자 증폭에 기반한 표적 유전자 스크리닝 과정에 대한 예이다.15 is an example of a target gene screening procedure using the target gene screening system. 15 is an example of a target gene screening process based on target gene amplification.

도 15(A)는 METABRIC 데이터세트 HER2+ 아형 유전자 증폭 표적을 선별하는 사용자 인터페이스 화면에 대한 예이다. 표적 유전자 스크리닝 시스템은 전장 유전체 수준에서 임상 영향을 고려한 표적 유전자를 선별하기 위해, 아형, 특정 임상 정보 등 소집단에 대하여, ConSig-Amp 점수와 함께, 다양한 유전적 이상에 대한 생존분석 결과, TTD 정보와 연계한 약물정보, DAISY 정보와 연계한 SL (Synthetic lethal), SDL (Synthethic dosage lethal) 파트너 정보를 함께 제공함으로써, 해당 소집단에 대한 종합적인 표적 후보 선별이 가능하다.15 (A) is an example of a user interface screen for selecting a METABRIC dataset HER2 + subtype gene amplification target. The target gene screening system uses the ConSig-Amp score, sub-survival analysis of various genetic abnormalities, TTD information and By providing linked drug information, Synthetic lethal (SL) and Synthethic dosage lethal (SDL) partner information in conjunction with DAISY information, comprehensive target candidate selection for the subgroup is possible.

도 15(B)는 표적 유전자 스크리닝 결과 화면에 대한 예이다. 도 15(B)를 살펴보면, 상위 순위 PTK2 의 CNAs 와 발현간 상관관계가 있고, 상위순위 ERBB2 의 SDL 파트너 TPD52L1 과 유전자 증폭 4 조합 생존분석하여 공통 유전자 증폭이 예후에 영향을 미침 확인할 수 있다. PTK2는 CNAs와 유전자 발현 상관관계가 높은 (Pearson r=0.75) 후보였으며, ERBB2와 합성치사 SDL 관계의 TPD52L1과 공통 유전자 증폭이 예후가 좋지 않음(DFS 로그순위 검정 p=0.007)을 확인하였다.15B is an example of a target gene screening result screen. Referring to FIG. 15 (B), there is a correlation between CNAs and expression of high rank PTK2, and the combination survival analysis of SDL partner TPD52L1 and gene amplification 4 of high rank ERBB2 can confirm common gene amplification affects prognosis. PTK2 was a candidate with high gene expression correlation (Pearson r = 0.75) with CNAs, and TPD52L1 and common gene amplification of ERBB2 and synthetic lethal SDL had a poor prognosis (DFS log rank test p = 0.007).

도 15(C)는 선별한 후보 유전자 ERBB2, PTK2 의 약물 표적 및 합성치사에 대한 결과 예이다. HER2+ 아형의 주요 유전자 증폭 표적과 치료 약물, 합성치사 파트너를 함께 에 표시하였다. 15 (C) is an example of results of drug targets and synthetic lethality of selected candidate genes ERBB2 and PTK2. The major gene amplification targets of HER2 + subtypes, therapeutic drugs, and synthetic lethal partners are labeled together.

특정 암 유전자(oncogenes)의 DNA 저메틸화(hypo-methylation) 혹은 암 억제 유전자(tumor suppressors)의 DNA 고메틸화(hyper-methylation)로 암 진행에 기여한다. 유방암 특정 아형에서 DNA 메틸화와 유전자 발현이 높은 상관관계가 있는 유전자가 있다면, DNA 메틸화 표적 유전자라고 유추할 수 있다. 따라서 해당 표적을 저해하거나 활성화함으로써 치료 효과를 기대할 수 있다. 도 16은 표적유전자 스크리닝 시스템을 이용한 표적 유전자 스크리닝 과정에 대한 다른 예이다. 도 16은 DNA 메틸화에 기반한 표적 유전자 스크리닝 과정에 대한 예이다.DNA hypomethylation of certain cancer genes (oncogenes) or DNA hypermethylation of cancer suppressors contributes to cancer progression. If a gene has a high correlation with DNA methylation and gene expression in certain breast cancer subtypes, it can be inferred as a DNA methylation target gene. Therefore, the therapeutic effect can be expected by inhibiting or activating the target. 16 is another example of a target gene screening procedure using the target gene screening system. 16 is an example of a target gene screening procedure based on DNA methylation.

도 16(A)는 TCGA 데이터세트 ER+ 아형 유전자 증폭 표적을 선별하는 사용자 인터페이스에 화면에 대한 예이다. TCGA 데이터세트의 ER+ 아형내에서 DNA 메틸화 후보 표적을 선별한 결과, CCND1 (ConSig-Met -2.29, Met DFS 로그순위 검정 p=0.001), ESR1 (ConSig-Met -2.354, Met OS 로그순위 검정 p<0.001), STAT5A (ConSig-Met -2.043, Met DFS 로그순위 검정 p=0.002), FGFR1 (ConSig-Met -1.512, Exp OS 로그순위 검정 p=0.016) 등 다수의 후보 표적과 합성치사 대안 표적을 확인할 수 있다. STAT5A의 경우, 고메틸화와 낮은 발현이 위험 그룹이므로, 암 억제 유전자(tumor suppressor)로 유추할 수 있다. ESR1 의 경우, 저메틸화와 과발현이 위험 그룹이므로, 암 유전자(oncogene)으로 유추할 수 있다.16 (A) is an example of a screen on the user interface for selecting a TCGA dataset ER + subtype gene amplification target. DNA methylation candidate targets were screened within the ER + subtype of the TCGA dataset. CCND1 (ConSig-Met -2.29, Met DFS log rank test p = 0.001), ESR1 (ConSig-Met -2.354, Met OS log rank test p < 0.001), STAT5A (ConSig-Met -2.043, Met DFS log rank test p = 0.002), FGFR1 (ConSig-Met -1.512, Exp OS log rank test p = 0.016) and multiple candidate candidates for synthetic lethality Can be. In the case of STAT5A, high methylation and low expression are risk groups, and thus can be inferred as cancer suppressor. In the case of ESR1, hypomethylation and overexpression are risk groups and can be inferred as oncogenes.

도 16(B)는 표적 유전자 스크리닝 결과 화면에 대한 예이다. 도 16(B)를 살펴보면 ESR1 의 경우, 표적 저해 약물 혹은 SDL 파트너 GCM2를 저해하는 약물의 효과를 기대할 수 있다. ESR1은 DNA 메틸화와 유전자 발현의 역 상관관계가 높고(Pearson r=-0.75), DNA 메틸화 정도에 따라 유의한 예후 차이를 나타내었다(OS 로그순위 검정 p<0.001).16 (B) is an example of a target gene screening result screen. Referring to FIG. 16 (B), in the case of ESR1, the effect of a drug that inhibits a target inhibitory drug or SDL partner GCM2 can be expected. ESR1 had a high inverse correlation between DNA methylation and gene expression (Pearson r = -0.75) and showed a significant prognostic difference according to the degree of DNA methylation (OS log rank test p <0.001).

도 16(C)는 선별한 후보 유전자 ERBB2, PTK2 의 약물 표적 및 합성치사에 대한 결과 예이다. 도 16(C)는 ER+ 아형의 주요 DNA 메틸화 표적과 치료 약물, 합성치사 파트너를 함께 표시한다.16 (C) is an example of results of drug targets and synthetic lethality of selected candidate genes ERBB2 and PTK2. Figure 16 (C) shows the major DNA methylation targets of ER + subtypes together with therapeutic drugs and synthetic lethal partners.

또한, 상술한 바와 같은 표적 유전자 스크리닝 방법은 컴퓨터에서 실행될 수 있는 실행가능한 알고리즘을 포함하는 프로그램(또는 어플리케이션)으로 구현될 수 있다. 상기 프로그램은 비일시적 판독 가능 매체(non-transitory computer readable medium)에 저장되어 제공될 수 있다.In addition, the target gene screening method as described above may be implemented as a program (or application) that includes an executable algorithm that may be executed on a computer. The program may be stored and provided in a non-transitory computer readable medium.

비일시적 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상술한 다양한 어플리케이션 또는 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM 등과 같은 비일시적 판독 가능 매체에 저장되어 제공될 수 있다.The non-transitory readable medium refers to a medium that stores data semi-permanently and is readable by a device, not a medium storing data for a short time such as a register, a cache, a memory, and the like. Specifically, the various applications or programs described above may be stored and provided in a non-transitory readable medium such as a CD, a DVD, a hard disk, a Blu-ray disk, a USB, a memory card, a ROM, or the like.

본 실시례 및 본 명세서에 첨부된 도면은 전술한 기술에 포함되는 기술적 사상의 일부를 명확하게 나타내고 있는 것에 불과하며, 전술한 기술의 명세서 및 도면에 포함된 기술적 사상의 범위 내에서 당업자가 용이하게 유추할 수 있는 변형 예와 구체적인 실시례는 모두 전술한 기술의 권리범위에 포함되는 것이 자명하다고 할 것이다.The embodiments and the drawings attached to this specification are merely to clearly show a part of the technical idea included in the above-described technology, and those skilled in the art can easily make it within the scope of the technical idea included in the description and the drawings of the above-described technology. It will be apparent that both the inferred modifications and the specific embodiments are included in the scope of the above-described technology.

50 : 클라이언트 장치
300 : 분석 시스템
310 : DB
350 : 분석 장치
400 : 분석 시스템
410 : DB
450 : 분석 장치
500 : 분석 장치
510 : 입력 장치
520 : 저장 장치
530 : 연산 장치
540 : 출력 장치50: client device
300: Analysis System
310: DB
350: analysis device
400: Analysis System
410: DB
450: analysis device
500: analysis device
510: input device
520: storage device
530: computing device
540: output device

Claims

In a method for screening a target gene using a computer device,
Receiving multi-omics data for the full-length dielectric;
Selecting a plurality of genes amplified above a reference value based on gene expression data included in the multi-omic data, and determining a degree of association of a specific disease with respect to each of the selected genes; And
Comprising a correlation between the expression of the selected gene and the number of gene copies of the selected gene for each of the selected genes and mathematically processing the association degree to determine a target score,
Target gene screening method using a survival analysis to determine the maximum significance threshold value for each gene based on the survival data included in the multi-omic data, and to select the target gene from the genes filtered based on the maximum significance threshold value .

The method of claim 1,
The correlation is a ConSig (Concep Signature) score, the target score is a ConSig-Amp (amplification) score of the target gene screening method using survival analysis.

The method of claim 1,
The gene expression data is a target gene screening method using a survival analysis using the clinical information contained in the multi-omic data, the data selected by the gene expression data corresponding to a specific subtype or specific clinical characteristics.

The method of claim 1,
Filtering genes from the plurality of genes based on the maximum significance threshold value, determining the association degree and the target score for the filtered genes;
Or a target gene screening method using a survival analysis for filtering the selected genes based on the maximum significant threshold and selecting the target genes based on the target score for the filtered genes.

The method of claim 1,
A survival analysis is performed on any one of the plurality of genes, and a primary significance region of the expression values for the one gene is determined based on the p value of the survival analysis, and included in the primary significance region. A target gene screening method using survival analysis, wherein the candidate boundary value having the maximum significance among the candidate boundary values is determined as the maximum significance boundary value for the one gene.

In a method for screening a target gene using a computer device,
Receiving multi-omics data for the full-length dielectric;
Selecting a plurality of genes amplified above a reference value based on gene expression data included in the multi-omic data, and determining a degree of association of a specific disease with respect to each of the selected genes; And
And mathematically processing the correlation and the correlation between the expression of the selected gene and DNA methylation of the selected gene for each of the selected genes to determine a target score,
Target gene screening method using a survival analysis to determine the maximum significant threshold value for each gene based on the survival data included in the multi-omic data, and to select the target gene from the genes filtered based on the maximum significant threshold value .

The method of claim 6,
The association is ConSig (Concep Signature) score target gene screening method using survival analysis.

The method of claim 6,
The gene expression data is a target gene screening method using a survival analysis using the clinical information contained in the multi-omic data, the data selected by the gene expression data corresponding to a specific subtype or specific clinical characteristics.

The method of claim 6,
Filtering genes from the plurality of genes based on the maximum significance threshold value, determining the association degree and the target score for the filtered genes;
Or a target gene screening method using a survival analysis for filtering the selected genes based on the maximum significant threshold and selecting the target genes based on the target score for the filtered genes.

The method of claim 6,
A survival analysis is performed on any one of the plurality of genes, and a primary significance region of the expression values for the one gene is determined based on the p value of the survival analysis, and included in the primary significance region. A target gene screening method using survival analysis, wherein the candidate boundary value having the maximum significance among the candidate boundary values is determined as the maximum significance boundary value for the one gene.

An input device for receiving multi-omics data for the full-length dielectric;
A program for calculating a degree of association with a specific disease for each of a plurality of genes using the multi-omic data, and calculating a target score by mathematically calculating a correlation between the expression of each gene and a gene aspect and the relationship. A storage device for storing the; And
Comprising a calculation device for selecting a gene amplified more than a reference value of the plurality of genes using the program, and determines the target score for the selected gene,
The computing device determines a maximum significance threshold for each gene based on the survival data included in the multi-omic data, and uses a survival analysis to select a target gene from among genes filtered based on the maximum significance threshold. Target Gene Screening Device.

The method of claim 11,
The association degree is a ConSig (Concep Signature) score, the target score is a ConSig-Amp (amplification) score of the target gene screening device using survival analysis.

The method of claim 11,
The gene embodiment is a target gene screening device using a survival analysis of any one of gene duplication or DNA methylation.

The method of claim 11,
The computing device performs a survival analysis on any one of the plurality of genes, determines a first significant region of expression values for the one gene based on the p value of the survival analysis, and the first significant A target gene screening apparatus using survival analysis that determines a candidate boundary value having a maximum significance among the candidate boundary values included in a region as the maximum significance boundary value for any one gene.

The method of claim 11,
The computing device selects the genes from the plurality of genes based on the maximum significance boundary value, or filters the selected genes based on the maximum significance boundary value, based on the target score for the filtered gene. Target gene screening device using a survival analysis to select the target gene.