KR20230007010A

KR20230007010A - Method and system for predicting metabolic disease risk

Info

Publication number: KR20230007010A
Application number: KR1020210087713A
Authority: KR
Inventors: 백수진; 진희정; 이시우; 백영화
Original assignee: 한국 한의학 연구원
Priority date: 2021-07-05
Filing date: 2021-07-05
Publication date: 2023-01-12
Also published as: KR102636560B1

Abstract

Disclosed are a method for predicting a metabolic disease risk and a system for the same. The purpose of the present invention is to provide the method for predicting a metabolic disease by using a methylation thickening area changed in the metabolic disease and predicting a risk of the metabolic disease. According to an embodiment of the present invention, the method for predicting a metabolic disease risk includes: a step of measuring RNA-seq (large capacity transcriptome data) and a methylation degree on a normal person group and a metabolic disease patient group, respectively; a step of selecting a differential expression gene in which the expression is changed only in the metabolic disease patient group in comparison with the normal person group, by using the methylation degree; a step of selecting a differential methylation area in which the methylation is changed only in the metabolic disease patient group in comparison with the normal person group, by using the methylation degree; and a step of determining an epigenetic inheritance index for predicting a metabolic disease patient by modeling the selected differential expression gene and the selected differential methylation area.

Description

Metabolic disease risk prediction method and system {METHOD AND SYSTEM FOR PREDICTING METABOLIC DISEASE RISK}

본 발명은 대사질환에서 변화하는 메틸레이션 후성영역을 활용하여, 대사질환을 예측하거나 그 위험도를 예측하는, 대사질환 위험도 예측 방법 및 시스템에 관한 것이다.The present invention relates to a metabolic disease risk prediction method and system for predicting a metabolic disease or predicting its risk by utilizing a methylation epigenetic region that changes in a metabolic disease.

또한, 본 발명은 대사질환자를 예측하고, 대사질환자의 위험도를 예측할 수 있는 후성유전지표(유전자)를 결정하는, 대사질환 위험도 예측 방법 및 시스템을 제공한다.In addition, the present invention provides a metabolic disease risk prediction method and system for predicting metabolic disease patients and determining epigenetic markers (genes) capable of predicting the risk of metabolic disease patients.

대사질환(대사증후군)은, 심장질환 및 당뇨병, 뇌졸중을 비롯하여 건강 문제의 위험성을 증가시키는 5가지 위험요소들(고혈압, 고혈당, 고중성지방 혈증, 낮은 고밀도지단백 콜레스테롤 그리고 중심비만) 중 3가지 이상을 한 개인이 가지고 있는 것을 뜻한다.Metabolic disease (metabolic syndrome) occurs when three or more of the five risk factors (high blood pressure, high blood sugar, hypertriglyceridemia, low high-density lipoprotein cholesterol, and central obesity) increase the risk of health problems, including heart disease, diabetes, and stroke. means that an individual has

생활습관의 급속한 서구화로 인해 질병의 양상은, 매우 큰 변화를 나타내고 있으며, 비만과 관련해 고혈압, 당뇨병, 고지혈증, 심뇌혈관질환 등은, 폭발적으로 증가하고 있는 추세이다.Due to the rapid westernization of lifestyle, the aspect of disease shows a very large change, and obesity-related hypertension, diabetes, hyperlipidemia, cardiovascular and cerebrovascular diseases, etc. are explosively increasing.

전 세계의 성인 대사증후군 유병률은, 20~25%이며, 미국의 유병률은 35%, 한국의 유병률은 30%까지 보고되고 있다.The prevalence of metabolic syndrome in adults worldwide is 20-25%, the prevalence in the United States is 35%, and the prevalence in Korea is reported to be up to 30%.

이에 따라, 대사질환의 위험도를 미리 예측하여 경고할 수 있게 하는, 후성유전지표(유전자)를 결정 함으로써, 환자가 대사질환으로 발전하지 않도록 하는 새로운 모델이 절실히 요구되고 있는 실정이다.Accordingly, there is an urgent need for a new model that prevents patients from developing metabolic diseases by determining epigenetic markers (genes), which can predict and warn the risk of metabolic diseases in advance.

본 발명의 실시예는, 대사질환에서 변화하는 메틸레이션 후성영역을 활용하여, 대사질환을 예측하고, 대사질환의 위험도를 예측하는 방법 및 시스템을 제공하는 것을 목적으로 한다.An object of the present invention is to provide a method and system for predicting metabolic diseases and predicting the risk of metabolic diseases by utilizing methylation epigenetic regions that change in metabolic diseases.

본 발명의 일실시예에 따른, 대사질환 위험도 예측 방법은, 검사자 그룹과 관련하여, 메틸레이션 영역의 메틸화 정도를 측정하는 단계; 상기 측정된 메틸화 정도를 이용하여, 대사질환 위험도 예측 모델링을 수행하는 단계; 및 상기 대사질환 위험도 예측 모델링의 수행 결과에 따라, 상기 검사자 그룹에 대한 대사질환의 위험도를 알려주는 단계를 포함할 수 있다.According to one embodiment of the present invention, a metabolic disease risk prediction method includes the steps of measuring the degree of methylation in a methylation region in relation to a group of examiners; performing metabolic disease risk prediction modeling using the measured methylation level; and informing the risk level of the metabolic disease to the examiner group according to a result of performing the metabolic disease risk prediction modeling.

또한, 본 발명의 실시예에 따른, 대사질환 위험도 예측 시스템은, 검사자 그룹과 관련하여, 메틸레이션 영역의 메틸화 정도를 측정하는 측정부; 및 상기 측정된 메틸화 정도를 이용하여, 대사질환 위험도 예측 모델링을 수행하고, 상기 대사질환 위험도 예측 모델링의 수행 결과에 따라, 상기 검사자 그룹에 대한 대사질환의 위험도를 알려주는 결정부를 포함하여 구성할 수 있다.In addition, the metabolic disease risk prediction system according to an embodiment of the present invention, in relation to the tester group, a measurement unit for measuring the degree of methylation of the methylation region; and a decision unit that performs metabolic disease risk prediction modeling using the measured methylation degree and informs the examiner group of the metabolic disease risk according to the results of the metabolic disease risk prediction modeling. there is.

본 발명의 일실시예에 따르면, 대사질환에서 변화하는 메틸레이션 후성영역을 활용하여, 대사질환을 예측하고, 대사질환의 위험도를 예측하는 방법 및 시스템을 제공 할 수 있다.According to one embodiment of the present invention, it is possible to provide a method and system for predicting metabolic diseases and risk of metabolic diseases by utilizing methylation epigenetic regions that change in metabolic diseases.

도 1은 본 발명의 일실시예에 따른 대사질환 위험도 예측 시스템의 구성을 도시한 블록도이다.
도 2는 후성유전지표를 활용한 대사질환 위험도 예측하는 방법을 설명하기 위한 도이다.
도 3은 후성유전마커를 이용한 ROC/AUC 결과(메틸레이션)를 보여주는 도이다.
도 4는 본 발명의 일실시예에 따른, 대사질환 위험도 예측 방법을 도시한 흐름도이다.1 is a block diagram showing the configuration of a metabolic disease risk prediction system according to an embodiment of the present invention.
Figure 2 is a diagram for explaining a method for predicting the risk of metabolic diseases using epigenetic markers.
3 is a diagram showing ROC/AUC results (methylation) using epigenetic markers.
Figure 4 is a flow chart showing a metabolic disease risk prediction method according to an embodiment of the present invention.

이하에서, 첨부된 도면을 참조하여 실시예들을 상세하게 설명한다. 그러나, 실시예들에는 다양한 변경이 가해질 수 있어서 특허출원의 권리 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 실시예들에 대한 모든 변경, 균등물 내지 대체물이 권리 범위에 포함되는 것으로 이해되어야 한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. However, since various changes can be made to the embodiments, the scope of the patent application is not limited or limited by these embodiments. It should be understood that all changes, equivalents or substitutes to the embodiments are included within the scope of rights.

실시예에서 사용한 용어는 단지 설명을 목적으로 사용된 것으로, 한정하려는 의도로 해석되어서는 안된다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Terms used in the examples are used only for descriptive purposes and should not be construed as limiting. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as "include" or "have" are intended to designate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, but one or more other features It should be understood that the presence or addition of numbers, steps, operations, components, parts, or combinations thereof is not precluded.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person of ordinary skill in the art to which the embodiment belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and unless explicitly defined in the present application, they should not be interpreted in an ideal or excessively formal meaning. don't

또한, 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 실시예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 실시예의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.In addition, in the description with reference to the accompanying drawings, the same reference numerals are given to the same components regardless of reference numerals, and overlapping descriptions thereof will be omitted. In describing the embodiment, if it is determined that a detailed description of a related known technology may unnecessarily obscure the gist of the embodiment, the detailed description will be omitted.

도 1은 본 발명의 일실시예에 따른 대사질환 위험도 예측 시스템의 구성을 도시한 블록도이다.1 is a block diagram showing the configuration of a metabolic disease risk prediction system according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일실시예에 따른, 대사질환 위험도 예측 시스템(100)은 측정부(110), 선정부(120), 및 결정부(130)를 포함하여 구성할 수 있다.Referring to FIG. 1 , a metabolic disease risk prediction system 100 according to an embodiment of the present invention may include a measurement unit 110, a selection unit 120, and a determination unit 130.

우선, 측정부(110)는 검사자 그룹과 관련하여, 메틸레이션 영역의 메틸화 정도를 측정 할 수 있다.First of all, the measurement unit 110 may measure the methylation degree of the methylation region in relation to the tester group.

결정부(130)는 상기 측정된 메틸화 정도를 이용하여, 대사질환 위험도 예측 모델링을 수행하고, 상기 대사질환 위험도 예측 모델링의 수행 결과에 따라, 상기 검사자 그룹에 대한 대사질환의 위험도를 알려줄 수 있다.The determiner 130 may perform metabolic disease risk prediction modeling using the measured methylation level, and inform the examiner group of the metabolic disease risk according to the results of the metabolic disease risk prediction modeling.

상기 측정부(110)는 정상인 그룹과 대사질환자 그룹의 메틸레이션 정보를 분석 할 수 있다.The measurement unit 110 may analyze methylation information of the normal group and the metabolic disease patient group.

또한, 상기 측정부(110)는 상기 정상인 그룹과 상기대사질환자 그룹 각각에 대해, RNA-seq(대용량 전사체 데이터)를 측정 할 수 있다.In addition, the measurement unit 110 may measure RNA-seq (large transcript data) for each of the normal group and the metabolic disorder group.

선정부(120)는 상기 메틸레이션 정보에 대한, 상기 RNA-seq에 의한 검증을 통해, 상기 메틸레이션 영역을 선정 할 수 있다. 여기서, 메틸화정도를 측정하는 메틸레이션 영역은, 정상인과 대사질환자의 80만개 메틸레이션 정보를 분석하고, RNA-seq(유전자발현)정보의 검증을 통해서 선정 될 수 있다.The selection unit 120 may select the methylation region through verification of the methylation information by RNA-seq. Here, the methylation region for measuring the degree of methylation can be selected by analyzing 800,000 methylation information of normal people and patients with metabolic diseases and verifying RNA-seq (gene expression) information.

또한, 측정부(110)는 정상인 그룹과 대사질환자 그룹 각각에 대해, RNA-seq(대용량 전사체 데이터)와, 메틸화 정도를 측정한다. 즉, 측정부(110)는 정상인과 대사질환자와의 차이를 구분할 수 있는, 유전자와 관련한 정보/데이터를 획득하는 역할을 할 수 있다.In addition, the measurement unit 110 measures RNA-seq (large-capacity transcript data) and methylation levels for each of the normal group and the metabolic disorder group. That is, the measurement unit 110 may play a role of acquiring information/data related to genes that can distinguish a difference between a normal person and a person with a metabolic disease.

여기서, RNA-Seq(RNA-sequencing)는 RNA 가닥을 NGS에 적합한 라이브러리로 만들어 시퀀싱하는 것을 지칭할 수 있다. RNA-Seq에 의해서는, 단백질을 합성하는 메신저 RNA(messenger RNA, mRNA)의 발현(expression)과 일부 돌연변이를 확인할 수 있으며, 이외에 단백질을 합성하지 않는 논코딩(non-coding) RNA도 검사 할 수 있다.Here, RNA-Seq (RNA-sequencing) may refer to sequencing RNA strands made into a library suitable for NGS. By RNA-Seq, the expression of messenger RNA (mRNA) that synthesizes proteins and some mutations can be confirmed, and non-coding RNAs that do not synthesize proteins can also be examined. there is.

메틸화 정도는 정상인 그룹과 대사질환자 그룹에 속하는 각 대상자에 관한 DNA 메틸화(DNA methylation)의 수준을 지칭할 수 있다. DNA 메틸화는 하나의 메틸기를 시토신(cytosine)의 5번째 탄소 위치(C5)에 추가함으로써 5-메틸시토신(5-methylcytosine; 5-mC)을 형성시키는 과정으로, 유전자 발현 조절, 트랜스포존 침묵, 유전체 안정화, X-염색체 불활성화, 유전체 각인 등과 관련되어 있는 매우 중요한 후성유전학적(epigenetic) 메커니즘일 수 있다.The degree of methylation may refer to the level of DNA methylation for each subject belonging to the normal group and the metabolic disorder group. DNA methylation is the process of forming 5-methylcytosine (5-mC) by adding one methyl group to the 5th carbon position (C5) of cytosine, regulating gene expression, silencing transposon, stabilizing genome It may be a very important epigenetic mechanism related to X-chromosome inactivation, genomic imprinting, etc.

실시예에서, 9인의 정상인 그룹과 11인의 대사질환자 그룹 각각에 대해, RNA-seq(대용량 전사체 데이터)와, 메틸화 정도를 측정하여 대상자 각각의 유전자 관련 정보/데이터를 획득 할 수 있다.In the embodiment, RNA-seq (large-scale transcriptome data) and methylation levels are measured for each of the group of 9 healthy people and the group of 11 people with metabolic disease, thereby obtaining gene-related information/data of each subject.

선정부(120)는 상기 RNA-seq를 이용하여, 상기 정상인 그룹에 비해 상기 대사질환자 그룹에 한하여 발현이 변화하는 차등발현유전자를 선정한다. 즉, 선정부(120)는 측정된 RNA-seq에 기초하여, 대사질환자 그룹에 속하는 대상자에게 만 특징적으로 발현이 일어나는 유전자를 파악하고, 이를 차등발현유전자로서 선정하는 역할을 할 수 있다.The selection unit 120 selects differentially expressed genes whose expression changes only in the metabolic disease patient group compared to the normal group using the RNA-seq. That is, the selection unit 120 may play a role of identifying genes that are characteristically expressed only in subjects belonging to the metabolic disease patient group based on the measured RNA-seq, and selecting them as differentially expressed genes.

상술의 실시예에서, 선정부(120)는 RNA-seq를 통해, 정상인 그룹에 속하는 9인에게서는 발현이 안되지만, 대사질환자 그룹에 속하는 11인에게서는 공통적으로 발현되는 유전자를 식별하여 차등발현유전자로서 선정할 수 있다.In the above-described embodiment, the selection unit 120 identifies a gene that is not expressed in 9 people belonging to the normal group but commonly expressed in 11 people belonging to the metabolic disease group through RNA-seq, and selects it as a differentially expressed gene. can do.

차등발현유전자의 선정에 있어, 선정부(120)는, 상기 RNA-seq에서 생산된 염기서열의 시퀀싱 리드와 연관하여, 유전자 별로 발현량을 확인하고, 상기 확인된 발현량에 따른 fold change(FC)가 2배 이상 차이나면서, 유의성을 나타내는 P-value 값이 0.05이하인 유전자를, 차등발현유전자로 선정할 수 있다.In the selection of differentially expressed genes, the selection unit 120 checks the expression level for each gene in association with the sequencing read of the nucleotide sequence produced in the RNA-seq, and fold change (FC according to the identified expression level) ) with a difference of more than 2 times and a P-value of 0.05 or less, which indicates significance, can be selected as a differentially expressed gene.

즉, 선정부(120)는 정상인에 비해 대사질환자에서 발현이 변화하는 유전자를 선정하기 위하여 RNA-seq에서 생산된 염기서열의 Q30 이상의 시퀀싱 리드 중 adaptor는 제거하고, 중복되는 리드를 제거 후, 사람 유전체에 매핑을 하여 유전자별 발현량을 측정 할 수 있다. 이후, 선정부(120)는 발현 fold change(FC)가 2배이상 차이가 나면서 유의성을 나타내는 P-value 값이 0.05이하인 유전자를 선별하여, 차등발현유전자를 선정할 수 있다.That is, the selection unit 120 removes adapters among sequencing reads of Q30 or higher of the nucleotide sequence produced by RNA-seq in order to select genes whose expression changes in patients with metabolic diseases compared to normal people, removes overlapping reads, and then By mapping to the genome, the expression level of each gene can be measured. Thereafter, the selection unit 120 may select a gene with a P-value of 0.05 or less, which indicates significance with a difference of expression fold change (FC) by more than two times, and select a differentially expressed gene.

또한, 선정부(120)는 상기 메틸화 정도를 이용하여, 상기 정상인 그룹에 비해 상기 대사질환자 그룹에 한하여 메틸레이션이 변화하는 차등메틸화영역을 선정할 수 있다. 즉, 선정부(120)는 측정된 메틸화 정도에 기초하여, 대사질환자 그룹에 속하는 대상자에게 만 특징적으로 보여지는 메틸레이션 값이 변화하는 영역을 파악하고, 이를 차등메틸화영역으로서 선정하는 역할을 할 수 있다.In addition, the selection unit 120 may select a differential methylation region in which methylation changes only in the metabolic disease patient group compared to the normal group by using the methylation degree. That is, the selection unit 120 identifies a region in which the methylation value changes, which is characteristic only for subjects belonging to the metabolic disease patient group, based on the measured degree of methylation, and selects it as a differential methylation region. there is.

상술의 실시예에서, 선정부(120)는 메틸화 정도를 통해, 정상인 그룹에 속하는 9인에게서는 변화하지 않던, 메틸레이션 값이, 대사질환자 그룹에 속하는 11인에게서는 공통적으로 변화하는 영역을 식별하여 차등메틸화영역으로서 선정할 수 있다.In the above-described embodiment, the selection unit 120 identifies a region where the methylation value, which did not change in 9 people belonging to the normal group, changes commonly in 11 people belonging to the metabolic disease group through the degree of methylation, and differentiates It can be selected as a methylation region.

차등발현유전자의 선정에 있어, 선정부(120)는, 상기 메틸화 정도의 측정을 통해 획득되는, 메틸레이션 칩(Human Methylation 850K BeadChip)의 이미지 파일로부터, 유의성을 나타내는 P-value 값이 0.05이하인 영역을, 상기 차등메틸화영역으로 선정 할 수 있다.In the selection of differentially expressed genes, the selection unit 120 selects regions in which the P-value value indicating significance is 0.05 or less, from the image file of the methylation chip (Human Methylation 850K BeadChip) obtained through the measurement of the degree of methylation. can be selected as the differential methylation region.

즉, 선정부(120)는 정상인에 비해 대사질환자에서 메틸레이션이 변화하는 영역을 선정하기 위해, 메틸레이션 칩의 이미지 파일을 이용해 정상인에 비해 대사질환자에서 통계적 차이(P value < 0.05)를 보이는CpG영역을 차등메틸화영역으로 선정 할 수 있다.That is, the selection unit 120 uses the image file of the methylation chip to select a region in which methylation changes in patients with metabolic disease compared to normal people, and CpG showing a statistical difference (P value < 0.05) in patients with metabolic disease compared to normal people. The region can be selected as a differential methylation region.

결정부(130)는 선정된 상기 차등발현유전자와 상기 차등메틸화영역에 대한 모델링을 통해, 대사질환자를 예측하기 위한 후성유전지표를 결정한다. 즉, 결정부(130)는 선정된 차등메틸화영역 내의 선정된 차등발현유전자를 추출하여, 대사질환 위험도 예측에 사용하는 후성유전지표를 정하는 역할을 할 수 있다.The determination unit 130 determines an epigenetic index for predicting a patient with a metabolic disease through modeling of the selected differential expression gene and the differential methylation region. That is, the determination unit 130 may play a role of determining an epigenetic index used for predicting a metabolic disease risk by extracting selected differentially expressed genes within the selected differential methylation region.

후성유전지표의 결정에 있어, 결정부(130)는, 상기 차등메틸화영역에 속하면서 상기 차등발현유전자인 복수의 유전지표를 식별하고, 상기 복수의 유전지표 중, AUC(Area Under the Curve)값이 상대적으로 높은 n개(상기 n은 3이상의 자연수)의 유전지표를, 상기 후성유전지표로 결정 할 수 있다.In determining the epigenetic index, the determination unit 130 identifies a plurality of genetic indexes belonging to the differential methylation region and being the differentially expressed genes, and among the plurality of genetic indexes, an area under the curve (AUC) value is Relatively high n (n is a natural number of 3 or more) genetic indexes can be determined as the epigenetic index.

여기서, AUC는 불균형한 데이터셋에서 모델 성능을 평가할 때 유용한 ROC(Receiver Operating Characteristics) 곡선의 하위 영역을 지칭할 수 있다. ROC 곡선은 모델의 분류, 계층 확률 추정, 점수화 성능을 시각적으로 보여주기 위해 널리 사용하는 도구이며 여러 응용 조건과는 무관하게 모델 성능 자체를 표현하므로 각 모델의 근본적인 장단점을 알 수 있게 해준다. AUC는 ROC 곡선 하위 영역에 해당하여 중요한 정보를 간단하게 알려주며, 단위 정사각형 안에 있는 분류자 곡선 아래쪽 영역으로 값이 0에서 1까지 될 수 있다.Here, AUC may refer to a subregion of a Receiver Operating Characteristics (ROC) curve that is useful when evaluating model performance in an imbalanced dataset. The ROC curve is a tool widely used to visually show the classification, layer probability estimation, and scoring performance of a model, and it allows us to know the fundamental strengths and weaknesses of each model because it expresses the model performance itself regardless of various application conditions. AUC corresponds to the area under the ROC curve, which briefly tells important information, and can be the area under the classifier curve within the unit square, with values ranging from 0 to 1.

일 실시예에서, 결정부(130)는, 상기 n개의 유전지표로서, 지표 'cg23248424', 'cg02891314', 'cg03849834'를 적어도 포함하여 결정 할 수 있다.In an exemplary embodiment, the determination unit 130 may include at least indices 'cg23248424', 'cg02891314', and 'cg03849834' as the n genetic indices.

지표 'cg23248424', 'cg02891314', 'cg03849834'를 결정한 후, 본 발명의 대사질환 위험도 예측 시스템(100)은 이들 후성유전지표를 이용하여, 대사질환자를 예측하고, 그 위험성을 예측할 수 있다.After determining the indices 'cg23248424', 'cg02891314', and 'cg03849834', the metabolic disease risk prediction system 100 of the present invention can predict metabolic disease patients and predict their risk using these epigenetic indicators.

본 발명에 의해서는, 대사질환에서 변화하는 메틸레이션 후성영역을 활용하여, 대사질환을 예측하고, 대사질환의 위험도를 예측하는 방법 및 시스템을 제공 할 수 있다.According to the present invention, it is possible to provide a method and system for predicting metabolic diseases and predicting the risk of metabolic diseases by utilizing methylation epigenetic regions that change in metabolic diseases.

DNA메틸화는 DNA상에서 발생하는 후성유전학적 변이를 말하는데, 부모로부터 물려받은 DNA 서열이 변하지 않고, DNA에 메틸기(CH3)가 달라붙는 화학적인 변화를 말한다.DNA methylation refers to an epigenetic mutation that occurs on DNA. It refers to a chemical change in which the DNA sequence inherited from parents does not change and the methyl group (CH3) attaches to DNA.

DNA메틸화는 상대적으로 바이오마커가 소량 존재한다고 해도 측정 가능성이 높아 진단에도 많이 사용되는 장점을 가지고 있다.DNA methylation has the advantage of being widely used in diagnosis due to its high measurability even when a relatively small amount of biomarker is present.

본 발명은 대사질환을 예측해 줄 수 있는 예측도가 높은 후성유전마커를 이용하여, 후성유전 영향력 점수(DNA 메틸레이션 점수)를 계산하고 예측의 정확성을 보여줌으로써, 대사질환을 진단 할 수 있는 예측 모델에 관한 것이다.The present invention is a predictive model capable of diagnosing metabolic diseases by calculating epigenetic influence scores (DNA methylation scores) using epigenetic markers with high predictive accuracy that can predict metabolic diseases and showing the accuracy of prediction. It is about.

본 발명에 의해서는, 후성유전 영향력 점수(DNA 메틸레이션 점수)를 통한 대사질환을 예측 할 수 있고, 소량의 시료로 진단할 수 있는 후성유전 진단 마커 예측 시스템을 지원할 수 있다.According to the present invention, a metabolic disease can be predicted through an epigenetic influence score (DNA methylation score), and an epigenetic diagnostic marker prediction system capable of diagnosing with a small amount of sample can be supported.

대사질환은 다양한 요인에 의해 발병하며 고혈압, 당뇨병, 고지혈증 등의 증상이 심혈과 뇌혈관질환으로 이어지는 위험인자를 동시다발적으로 갖고 있으며, 오랜시간 방치할 경우 심각한 합병증을 유발되는 질환이므로, 미리 예측할 수 있는 후성유전 진단 마커 예측할 수 있는 시스템이 요구된다.Metabolic diseases are caused by various factors, and symptoms such as hypertension, diabetes, and hyperlipidemia simultaneously have risk factors that lead to cardiovascular and cerebrovascular diseases. A system capable of predicting epigenetic diagnostic markers is required.

본 발명에서는 후성유전 영향력 점수(DNA 메틸레이션 점수)를 측정하여 대사 질환을 예측할 수 예측도를 계산하여, 정확도가 높은 후성유전지표를 제시한다.In the present invention, by measuring the epigenetic influence score (DNA methylation score) and calculating the predictive value of metabolic diseases, an epigenetic index with high accuracy is presented.

본 발명에서 활용하는 후성유전지표의 선택 방법은 아래와 같다.The method for selecting epigenetic markers used in the present invention is as follows.

1. 연구 샘플의 수집1. Collection of study samples

대사질환 위험도 예측 시스템(100)은, 정상(n=9)/대사질환(n=11) 샘플을 수집(혈액) 할 수 있다.The metabolic disease risk prediction system 100 may collect (blood) samples of normal (n=9)/metabolic disease (n=11).

대사질환 위험도 예측 시스템(100)은, OO시민 코호트 중 정상인(9명)과 대사질환자(11명)에 대해 샘플을 수집 할 수 있다.The metabolic disease risk prediction system 100 may collect samples from normal people (9 people) and metabolic disease patients (11 people) among the OO citizen cohort.

2. 샘플별 전사체(bulk RNA seq), 메틸레이션 칩(HM850k chip) 데이터 생산(Paired)2. Production of transcriptome (bulk RNA seq) and methylation chip (HM850k chip) data for each sample (Paired)

대사질환 위험도 예측 시스템(100)은, 수집된 샘플인, 정상인(9명)과 대사질환자(11명)를 대상으로 유전체 전반에 걸친 유전자 발현을 도출하고, RNA-seq(대용량 전사체 데이터)와 메틸화 정도를 측정하기 위해, 메틸레이션 칩을 수행 할 수 있다. The metabolic disease risk prediction system 100 derives gene expression across the genome for the collected samples, normal people (9 people) and metabolic disease patients (11 people), and RNA-seq (large-scale transcriptome data) and To measure the degree of methylation, a methylation chip can be performed.

메틸화는 특정 유전자내 특정 CpG에 사이토신(C)이 메틸화 되어 메틸사이토신(mC)으로 변화하는 것을 나타내며, 메틸화가 일어나면 그로 인해 전사인자 등의 결합이 방해를 받게 되어 유전자의 발현을 억제 할 수 있다.Methylation indicates that cytosine (C) is methylated at a specific CpG in a specific gene and changes to methylcytosine (mC). there is.

3. 정상 대비 대사질환에서 유의하게 변화하는 지표를 유전자 발현 분석(|FC| >2, P-value < 0.05)과 메틸레이션(P-value <0.05) 분석을 통해 선정3. Indicators that change significantly in metabolic diseases compared to normal are selected through gene expression analysis (|FC| >2, P-value < 0.05) and methylation (P-value <0.05) analysis

- 차등발현유전자의 선정 (|FC|>2 이면서 P-value<0.05)- Selection of differentially expressed genes (|FC|>2 and P-value<0.05)

대사질환 위험도 예측 시스템(100)은, 정상인에 비해 대사질환자에서 발현이 변화하는 유전자를 선정하기 위하여, RNA-seq에서 생산된 염기서열의 Q30 이상의 시퀀싱 리드 중 adaptor는 제거하고, 중복되는 리드를 제거한 후 사람 유전체에 매핑을 하여 유전자별 발현량을 측정 할 수 있다.Metabolic disease risk prediction system 100, in order to select genes whose expression changes in metabolic disease patients compared to normal people, among sequencing reads of Q30 or higher of the nucleotide sequence produced by RNA-seq, adapters are removed and overlapping leads are removed Then, the expression level of each gene can be measured by mapping to the human genome.

대사질환 위험도 예측 시스템(100)은, 발현 fold change(FC)가 2배이상 차이가 나면서 유의성을 나타내는 P-value 값이 0.05이하인 유전자를 선정 할 수 있다.The metabolic disease risk prediction system 100 can select a gene whose expression fold change (FC) has a difference of 2 or more times and a P-value value of 0.05 or less, which indicates significance.

- 차등메틸화영역의 선정 (P value <0.05)- Selection of differential methylation area (P value <0.05)

또한, 대사질환 위험도 예측 시스템(100)은, 정상인에 비해 대사질환자에서 메틸레이션이 변화하는 영역을 선정하기 위해, 메틸레이션 칩(Human Methylation 850K BeadChip)의 이미지 파일을 이용해 detection P value 값이 0.05이하이고 CpG영역이 99%이상인 데이터를 대상으로 Bayesian t-test를 수행 할 수 있다(P-value<= 0.05).In addition, the metabolic disease risk prediction system 100 uses an image file of a methylation chip (Human Methylation 850K BeadChip) to select a region in which methylation changes in patients with metabolic disease compared to normal people, so that the detection P value is 0.05 or less and Bayesian t-test can be performed for data with a CpG area of 99% or more (P-value<= 0.05).

4. 메틸레이션에 의해 발현이 변화하는 유전자 선정을 위해 상관관계 분석을 통해 유의성 정도로 최종 후성유전지표를 발굴 (P-value <0.05) : 총 47개 probe가 대사질환에서 메틸레이션에 의해 발현이 조절되는 것으로 확인4. Discovery of final epigenetic markers to the degree of significance through correlation analysis to select genes whose expression changes due to methylation (P-value <0.05): Expression of a total of 47 probes is controlled by methylation in metabolic diseases confirm to be

- 후성유전지표의 선정 - Selection of epigenetic markers

대사질환 위험도 예측 시스템(100)은, pearson correlation(r) 방법을 활용하여 유전자 발현이 변화하는 차등발현유전자와 메틸레이션 값이 변화하는 차등 메틸화 영역을 선정 할 수 있다(47개).The metabolic disease risk prediction system 100 may select differentially expressed genes in which gene expression changes and differential methylation regions in which methylation values change by using the pearson correlation (r) method (47).

후성유전지표는 정상인 대비 대사질환에서 메틸레이션 변화에 의해 발현이 변화하는 유전자에 포함된 CpG 영역을 나타낸다.Epigenetic markers represent CpG regions included in genes whose expression changes due to methylation changes in metabolic diseases compared to normal subjects.

- 대사질환 예측- Metabolic disease prediction

대사질환 위험도 예측 시스템(100)은, 추출된 후보군의 메틸레이션 값을 모델화 하여 대사질환자와 정상인을 구별해 본 결과, 47개 중 3개의 후성유전지표(cg23248424, cg02891314, cg03849834)를 개별 지표로 활용 할 수 있다.Metabolic disease risk prediction system 100 models the methylation values of the extracted candidates and distinguishes between metabolic disease patients and normal people. As a result, 3 out of 47 epigenetic markers (cg23248424, cg02891314, cg03849834) are used as individual indicators can do.

cg23248424의 경우는 AUC값이 0.828이고, cg02891314의 경우는 AUC값이 0.878이며, cg03849834의 경우는 AUC 값이 0.767 일 수 있다.In the case of cg23248424, the AUC value may be 0.828, in the case of cg02891314, the AUC value may be 0.878, and in the case of cg03849834, the AUC value may be 0.767.

도 2는 후성유전지표를 활용한 대사질환 위험도 예측하는 방법을 설명하기 위한 도이다.Figure 2 is a diagram for explaining a method for predicting the risk of metabolic diseases using epigenetic markers.

단계 210에서 대사질환 위험도 예측 시스템(100)은, 후성지표를 입력 받는다.In step 210, the metabolic disease risk prediction system 100 receives an epigenetic indicator.

대사질환 위험도 예측 시스템(100)은, 대상자의 대사질환 연관 후성유전지표가 포함된 36개 유전자 내 후성유전지표를 입력받을 수 있다.The metabolic disease risk prediction system 100 may receive an input of epigenetic markers within 36 genes including epigenetic markers associated with metabolic diseases of the subject.

후성유전지표의 값은 실험으로 새로 생성할 수도, 기존에 알고 있는 값을 입력할 수도 있다.The value of the epigenetic index can be newly created through experimentation or a previously known value can be entered.

단계 220에서 대사질환 위험도 예측 시스템(100)은, 후성유전 지표의 유의성 정도를 계산 할 수 있다.In step 220, the metabolic disease risk prediction system 100 may calculate the degree of significance of the epigenetic index.

대사질환 위험도 예측 시스템(100)은, 47개의 후성 지표 중에서 3개의 후성유전지표(cg23248424, cg02891314, cg03849834)를 적어도 하나 포함하는 조합을 예측알고리즘의 변수로 사용할 수 있다.The metabolic disease risk prediction system 100 may use a combination including at least one of three epigenetic markers (cg23248424, cg02891314, and cg03849834) among 47 epigenetic markers as variables of the prediction algorithm.

예측알고리즘은 간단한 선형회귀방법부터 SVM과 같은 다양한 머신러닝방법을 사용할 수 있다.Prediction algorithms can use various machine learning methods such as simple linear regression methods and SVMs.

단계 230에서 대사질환 위험도 예측 시스템(100)은, 결과 제공으로서, 유의성 정도에 따른 질환의 유무를 판별해서 결과 제공할 수 있다.In step 230, the metabolic disease risk prediction system 100, as a result provided, may determine the presence or absence of a disease according to the degree of significance and provide the result.

대사질환 위험도 예측 시스템(100)은, 3개의 후성유전지표(cg23248424, cg02891314, cg03849834) 각각을 하나의 변수로 사용하여 대사질환을 예측한 결과를 제공할 수 있다.The metabolic disease risk prediction system 100 may use each of the three epigenetic indicators (cg23248424, cg02891314, and cg03849834) as one variable to provide a metabolic disease prediction result.

도 3은 후성유전마커를 이용한 ROC/AUC 결과(메틸레이션)를 보여주는 도이다.3 is a diagram showing ROC/AUC results (methylation) using epigenetic markers.

도 3에는, 대사질환을 예측한 결과를 예시한다. 도 3을 통해서는, AUC가 0.77~0.88로 단독으로도 예측에 활용할 수 있음을 알 수 있다.3 illustrates the results of predicting metabolic diseases. Through FIG. 3, it can be seen that AUC can be used for prediction alone as 0.77 to 0.88.

후성유전지표 계산을 위해 사용된 유전자 마커는 [표 1]에 예시한다.Genetic markers used for epigenetic index calculation are exemplified in [Table 1].

대사질환 위험도 예측 시스템(100)은, 메틸레이션과 유전자 발현 상관관계 분석을 통해, 유의성 정도로 선정된 최종 후성유전지표의 후보군을 도출 할 수 있다.The metabolic disease risk prediction system 100 may derive a candidate group of final epigenetic markers selected to a degree of significance through analysis of correlation between methylation and gene expression.

도 3에서는, 후성유전지표의 예측력 판단을 위한 3개 Probe(2개 유전자)에 대한 ROC/AUC 결과를 예시한다.In FIG. 3, ROC/AUC results for 3 probes (2 genes) for determining the predictive power of epigenetic markers are exemplified.

대사질환 위험도 예측 시스템(100)은, 후성유전지표 이외에 표 1에서 제시된 예측력이 높은 후성유전지표를, 대사질환을 판별 할 수 있는 지표로 활용하여, 대사질환자의 맞춤치료 및 관리를 용이하게 할 수 있는 예측 시스템으로 활용 할 수 있다.The metabolic disease risk prediction system 100 utilizes the epigenetic index with high predictive power presented in Table 1 in addition to the epigenetic index as an index to determine metabolic disease, thereby facilitating personalized treatment and management of metabolic disease patients. It can be used as a predictive system.

지표 cg23248424 은 메틸레이션값으로 구한 AUC 값이 0.828로, 대사질환 예측력을 높일 수 있다.The index cg23248424 has an AUC value of 0.828 obtained from the methylation value, which can increase the predictive power of metabolic diseases.

이에 따라, 지표 cg23248424 의 메틸레이션 값을 입력값으로 사용하면, 대사질환 위험도 예측 시스템(100)은, 정상인의 메틸레이션 평균값인 0.368보다 메틸레이션 전체값 범위의(0~1) 10%(beta value: 0.1)가 차이나는 0.468 이상의 메틸레이션 값을 갖는 환자를, 대사질환자로 분류할 수 있다.Accordingly, when the methylation value of the index cg23248424 is used as an input value, the metabolic disease risk prediction system 100 is 10% (beta value) of the total methylation value range (0 to 1) than the average methylation value of 0.368 in normal people. : 0.1), patients with a methylation value of 0.468 or higher can be classified as having a metabolic disease.

표2에는 cg23248424의 메틸레이션 기준을 예시한다.Table 2 exemplifies the methylation criteria of cg23248424.

지표 cg02891314(GFPT2)는 메틸레이션값으로 구한 AUC 값이 0.878로, 대사질환 예측력을 높일 수 있다.The index cg02891314 (GFPT2) has an AUC value of 0.878 obtained from the methylation value, which can increase the predictive power of metabolic diseases.

이에 따라, 지표 cg02891314(GFPT2)의 메틸레이션 값을 입력값으로 사용하면, 대사질환 위험도 예측 시스템(100)은, 정상인의 메틸레이션 평균값인 0.337과 비교하여 메틸레이션 전체 범위(0~1)의10%(beta value: 0.1) 이상 차이나는 0.4337 이상의 메틸레이션을 갖는 환자를, 대사질환자로 분류할 수 있다.Accordingly, when the methylation value of the indicator cg02891314 (GFPT2) is used as an input value, the metabolic disease risk prediction system 100 compares the average methylation value of 0.337 in normal people to 10 of the entire methylation range (0 to 1). Patients with a methylation of 0.4337 or more, which differs by more than % (beta value: 0.1), can be classified as having a metabolic disease.

표 3에는, 지표 cg02891314(GFPT2)의 메틸레이션 기준을 예시한다.Table 3 exemplifies the methylation criteria of the index cg02891314 (GFPT2).

지표 cg03849834(TREML4)는 메틸레이션값으로 구한 AUC 값이 0.767로, 대사질환 예측력을 높일 수 있다.The index cg03849834 (TREML4) has an AUC value of 0.767 obtained from the methylation value, which can increase the predictive power of metabolic diseases.

이에 따라, 지표 cg03849834(TREML4)의 메틸레이션 값을 입력값으로 사용하면, 대사질환 위험도 예측 시스템(100)은, 정상인의 메틸레이션 평균값인 0.825와 비교하여 메틸레이션 전체 범위값(0~1)의 10%(beta value: 0.1) 이상 차이나는 0.925 이상의 메틸레이션을 갖는 환자를, 대사질환자로 분류할 수 있다.Accordingly, when the methylation value of the indicator cg03849834 (TREML4) is used as an input value, the metabolic disease risk prediction system 100 compares the average methylation value of 0.825 in normal people with the full range of methylation values (0 to 1). Patients with a methylation of 0.925 or more, which is a difference of 10% (beta value: 0.1) or more, can be classified as having a metabolic disease.

표 4에는 지표 cg03849834(TREML4)의 메틸레이션 기준을 예시한다.Table 4 exemplifies the methylation criteria of the index cg03849834 (TREML4).

이하, 도 4에서는 본 발명의 실시예들에 따른 대사질환 위험도 예측 시스템(100)의 작업 흐름을 상세히 설명한다.Hereinafter, in FIG. 4, the workflow of the metabolic disease risk prediction system 100 according to embodiments of the present invention will be described in detail.

도 4는 본 발명의 일실시예에 따른, 대사질환 위험도 예측 방법을 도시한 흐름도이다.Figure 4 is a flow chart showing a metabolic disease risk prediction method according to an embodiment of the present invention.

본 실시예에 따른 대사질환 위험도 예측 방법은 대사질환 위험도 예측 시스템(100)에 의해 수행될 수 있다.The metabolic disease risk prediction method according to this embodiment may be performed by the metabolic disease risk prediction system 100 .

우선, 대사질환 위험도 예측 시스템(100)은 검사자 그룹과 관련하여, 메틸레이션 영역의 메틸화 정도를 측정 할 수 있다.First, the metabolic disease risk prediction system 100 may measure the degree of methylation in the methylation region in relation to the tester group.

또한, 대사질환 위험도 예측 시스템(100)은 상기 측정된 메틸화 정도를 이용하여, 대사질환 위험도 예측 모델링을 수행하고, 상기 대사질환 위험도 예측 모델링의 수행 결과에 따라, 상기 검사자 그룹에 대한 대사질환의 위험도를 알려줄 수 있다.In addition, the metabolic disease risk prediction system 100 performs metabolic disease risk prediction modeling using the measured methylation level, and according to the results of the metabolic disease risk prediction modeling, the metabolic disease risk for the examiner group can tell you

대사질환 위험도 예측 시스템(100)은 정상인 그룹과 대사질환자 그룹의 메틸레이션 정보를 분석 할 수 있다.The metabolic disease risk prediction system 100 may analyze methylation information of a normal group and a metabolic disease patient group.

또한, 대사질환 위험도 예측 시스템(100)은 상기 정상인 그룹과 상기대사질환자 그룹 각각에 대해, RNA-seq(대용량 전사체 데이터)를 측정 할 수 있다.In addition, the metabolic disease risk prediction system 100 may measure RNA-seq (large transcript data) for each of the normal group and the metabolic disease patient group.

또한, 대사질환 위험도 예측 시스템(100)은 상기 메틸레이션 정보에 대한, 상기 RNA-seq에 의한 검증을 통해, 상기 메틸레이션 영역을 선정 할 수 있다. 여기서, 메틸화정도를 측정하는 메틸레이션 영역은, 정상인과 대사질환자의 80만개 메틸레이션 정보를 분석하고, RNA-seq(유전자발현)정보의 검증을 통해서 선정 될 수 있다.In addition, the metabolic disease risk prediction system 100 may select the methylation region by verifying the methylation information by RNA-seq. Here, the methylation region for measuring the degree of methylation can be selected by analyzing 800,000 methylation information of normal people and patients with metabolic diseases and verifying RNA-seq (gene expression) information.

또한, 대사질환 위험도 예측 시스템(100)은 정상인 그룹과 대사질환자 그룹 각각에 대해, RNA-seq(대용량 전사체 데이터)와, 메틸화 정도를 측정한다(410). 단계(410)는 정상인과 대사질환자와의 차이를 구분할 수 있는, 유전자와 관련한 정보/데이터를 획득하는 과정일 수 있다.In addition, the metabolic disease risk prediction system 100 measures RNA-seq (large-capacity transcriptome data) and methylation levels for each of the normal group and the metabolic disease patient group (410). Step 410 may be a process of obtaining information/data related to genes that can distinguish a normal person from a person with a metabolic disease.

또한, 대사질환 위험도 예측 시스템(100)은 상기 RNA-seq를 이용하여, 상기 정상인 그룹에 비해 상기 대사질환자 그룹에 한하여 발현이 변화하는 차등발현유전자를 선정한다(420). 단계(420)는 측정된 RNA-seq에 기초하여, 대사질환자 그룹에 속하는 대상자에게 만 특징적으로 발현이 일어나는 유전자를 파악하고, 이를 차등발현유전자로서 선정하는 과정일 수 있다.In addition, the metabolic disease risk prediction system 100 uses the RNA-seq to select differentially expressed genes whose expression changes only in the metabolic disease patient group compared to the normal group (420). Step 420 may be a process of identifying a gene that is characteristically expressed only in a subject belonging to a metabolic disease patient group based on the measured RNA-seq, and selecting it as a differentially expressed gene.

상술의 실시예에서, 대사질환 위험도 예측 시스템(100)은 RNA-seq를 통해, 정상인 그룹에 속하는 9인에게서는 발현이 안되지만, 대사질환자 그룹에 속하는 11인에게서는 공통적으로 발현되는 유전자를 식별하여 차등발현유전자로서 선정할 수 있다.In the above-described embodiment, the metabolic disease risk prediction system 100 identifies genes that are not expressed in 9 people belonging to the normal group but are commonly expressed in 11 people belonging to the metabolic disease group through RNA-seq, and differentially expressed. It can be selected as a gene.

차등발현유전자의 선정에 있어, 대사질환 위험도 예측 시스템(100)은, 상기 RNA-seq에서 생산된 염기서열의 시퀀싱 리드와 연관하여, 유전자 별로 발현량을 확인하고, 상기 확인된 발현량에 따른 fold change(FC)가 2배 이상 차이나면서, 유의성을 나타내는 P-value 값이 0.05이하인 유전자를, 차등발현유전자로 선정할 수 있다.In the selection of differentially expressed genes, the metabolic disease risk prediction system 100 checks the expression level for each gene in association with the sequencing read of the nucleotide sequence produced in the RNA-seq, and folds according to the identified expression level. A gene with a difference in change (FC) of more than 2 times and a P-value of 0.05 or less, which indicates significance, can be selected as a differentially expressed gene.

즉, 대사질환 위험도 예측 시스템(100)은 정상인에 비해 대사질환자에서 발현이 변화하는 유전자를 선정하기 위하여 RNA-seq에서 생산된 염기서열의 Q30 이상의 시퀀싱 리드 중 adaptor는 제거하고, 중복되는 리드를 제거 후, 사람 유전체에 매핑을 하여 유전자별 발현량을 측정 할 수 있다. 이후, 대사질환 위험도 예측 시스템(100)은 발현 fold change(FC)가 2배이상 차이가 나면서 유의성을 나타내는 P-value 값이 0.05이하인 유전자를 선별하여, 차등발현유전자를 선정할 수 있다.That is, the metabolic disease risk prediction system 100 removes adapters and removes overlapping reads among sequencing reads of Q30 or higher of the nucleotide sequence produced by RNA-seq in order to select genes whose expression changes in metabolic disease patients compared to normal people. After that, the expression level of each gene can be measured by mapping the human genome. Thereafter, the metabolic disease risk prediction system 100 selects a gene with a P-value value of 0.05 or less that indicates significance while the expression fold change (FC) is more than twice as different, and selects a differentially expressed gene.

계속해서, 대사질환 위험도 예측 시스템(100)은 상기 메틸화 정도를 이용하여, 상기 정상인 그룹에 비해 상기 대사질환자 그룹에 한하여 메틸레이션이 변화하는 차등메틸화영역을 선정할 수 있다(430). 단계(430)는 측정된 메틸화 정도에 기초하여, 대사질환자 그룹에 속하는 대상자에게 만 특징적으로 보여지는 메틸레이션 값이 변화하는 영역을 파악하고, 이를 차등메틸화영역으로서 선정하는 과정일 수 있다.Subsequently, the metabolic disease risk prediction system 100 may select a differential methylation region in which methylation changes only in the metabolic disease patient group compared to the normal group using the methylation degree (430). Step 430 may be a process of identifying a region in which a methylation value characteristically changes only in a subject belonging to a metabolic disease patient group is identified and selected as a differential methylation region, based on the measured methylation degree.

상술의 실시예에서, 대사질환 위험도 예측 시스템(100)은 메틸화 정도를 통해, 정상인 그룹에 속하는 9인에게서는 변화하지 않던, 메틸레이션 값이, 대사질환자 그룹에 속하는 11인에게서는 공통적으로 변화하는 영역을 식별하여 차등메틸화영역으로서 선정할 수 있다.In the above embodiment, the metabolic disease risk prediction system 100, through the degree of methylation, the methylation value, which did not change in 9 people belonging to the normal group, changes in common in 11 people belonging to the metabolic disease patient group. It can be identified and selected as a differential methylation region.

차등발현유전자의 선정에 있어, 대사질환 위험도 예측 시스템(100)은, 상기 메틸화 정도의 측정을 통해 획득되는, 메틸레이션 칩(Human Methylation 850K BeadChip)의 이미지 파일로부터, 유의성을 나타내는 P-value 값이 0.05이하인 영역을, 상기 차등메틸화영역으로 선정 할 수 있다.In the selection of differential expression genes, the metabolic disease risk prediction system 100, from the image file of the methylation chip (Human Methylation 850K BeadChip) obtained through the measurement of the degree of methylation, the P-value value indicating significance A region of 0.05 or less may be selected as the differential methylation region.

즉, 대사질환 위험도 예측 시스템(100)은 정상인에 비해 대사질환자에서 메틸레이션이 변화하는 영역을 선정하기 위해, 메틸레이션 칩의 이미지 파일을 이용해 정상인에 비해 대사질환자에서 통계적 차이(P value < 0.05)를 보이는 CpG영역을 차등메틸화영역으로 선정 할 수 있다.That is, the metabolic disease risk prediction system 100 uses the image file of the methylation chip to select a region in which methylation changes in metabolic disease patients compared to normal individuals. Statistical difference in metabolic disease patients compared to normal individuals (P value < 0.05) CpG regions that show can be selected as differential methylation regions.

또한, 대사질환 위험도 예측 시스템(100)은 선정된 상기 차등발현유전자와 상기 차등메틸화영역에 대한 모델링을 통해, 대사질환자를 예측하기 위한 후성유전지표를 결정한다(440). 단계(440)는 선정된 차등메틸화영역 내의 선정된 차등발현유전자를 추출하여, 대사질환 위험도 예측에 사용하는 후성유전지표를 정하는 과정일 수 있다.In addition, the metabolic disease risk prediction system 100 determines an epigenetic index for predicting a patient with a metabolic disease through modeling of the selected differential expression gene and the differential methylation region (440). Step 440 may be a process of extracting selected differentially expressed genes within the selected differential methylation region and determining epigenetic markers used for predicting risk of metabolic diseases.

후성유전지표의 결정에 있어, 대사질환 위험도 예측 시스템(100)은, 상기 차등메틸화영역에 속하면서 상기 차등발현유전자인 복수의 유전지표를 식별하고, 상기 복수의 유전지표 중, AUC(Area Under the Curve)값이 상대적으로 높은 n개(상기 n은 3이상의 자연수)의 유전지표를, 상기 후성유전지표로 결정 할 수 있다.In determining the epigenetic index, the metabolic disease risk prediction system 100 identifies a plurality of genetic indexes that belong to the differential methylation region and are the differentially expressed genes, and among the plurality of genetic indexes, AUC (Area Under the Curve) ), n genetic indexes having relatively high values (where n is a natural number of 3 or more) can be determined as the epigenetic index.

일 실시예에서, 대사질환 위험도 예측 시스템(100)은, 상기 n개의 유전지표로서, 지표 'cg23248424', 'cg02891314', 'cg03849834'를 적어도 포함하여 결정 할 수 있다.In one embodiment, the metabolic disease risk prediction system 100 may determine including at least indicators 'cg23248424', 'cg02891314', and 'cg03849834' as the n genetic indicators.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program commands recorded on the medium may be specially designed and configured for the embodiment or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. - includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of the foregoing, which configures a processing device to operate as desired or processes independently or collectively. You can command the device. Software and/or data may be any tangible machine, component, physical device, virtual equipment, computer storage medium or device, intended to be interpreted by or provide instructions or data to a processing device. , or may be permanently or temporarily embodied in a transmitted signal wave. Software may be distributed on networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer readable media.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited drawings, those skilled in the art can apply various technical modifications and variations based on the above. For example, the described techniques may be performed in an order different from the method described, and/or components of the described system, structure, device, circuit, etc. may be combined or combined in a different form than the method described, or other components may be used. Or even if it is replaced or substituted by equivalents, appropriate results can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims are within the scope of the following claims.

100 : 대사질환 위험도 예측 시스템
110 : 측정부
120 : 선정부
130 : 결정부100: metabolic disease risk prediction system
110: measuring unit
120: selection unit
130: decision unit

Claims

Measuring a degree of methylation in a methylation region with respect to a group of testers;
performing metabolic disease risk prediction modeling using the measured methylation level; and
Notifying the risk of metabolic disease for the tester group according to the results of the metabolic disease risk prediction modeling
Metabolic disease risk prediction method comprising a.

According to claim 1,
Analyzing methylation information of a normal group and a metabolic disorder group;
measuring RNA-seq (large-capacity transcriptome data) for each of the normal group and the metabolic disease patient group;
Selecting the methylation region through verification by the RNA-seq for the methylation information
Metabolic disease risk prediction method further comprising a.

According to claim 1,
Measuring RNA-seq (large-capacity transcriptome data) and methylation levels for each of the normal group and the metabolic disease patient group;
Selecting a differentially expressed gene whose expression changes only in the metabolic disease patient group compared to the normal group by using the RNA-seq;
selecting a differential methylation region in which methylation changes only in the metabolic disease patient group compared to the normal group by using the degree of methylation; and
Determining epigenetic markers for predicting patients with metabolic diseases through modeling of the selected differentially expressed genes and the differential methylation region
Metabolic disease risk prediction method further comprising a.

According to claim 3,
The step of selecting the differential expression gene,
Checking the expression level of each gene in association with the sequencing read of the nucleotide sequence produced in the RNA-seq; and
Selecting, as the differentially expressed gene, a gene having a P-value value of 0.05 or less, which indicates significance, while the fold change (FC) according to the confirmed expression level is more than twice as different.
Metabolic disease risk prediction method comprising a.

According to claim 3,
In the step of selecting the differential methylation region,
Selecting a region having a P-value of 0.05 or less indicating significance as the differential methylation region from the image file of the methylation chip (Human Methylation 850K BeadChip) obtained through the measurement of the degree of methylation.
Metabolic disease risk prediction method comprising a.

According to claim 3,
The step of determining the epigenetic marker,
identifying a plurality of genetic markers that belong to the differential methylation region and are the differentially expressed genes; and
Determining, among the plurality of genetic indices, n genetic indices having relatively high AUC (Area Under the Curve) values (where n is a natural number of 3 or more) as the epigenetic indices;
Metabolic disease risk prediction method comprising a.

According to claim 6,
The step of determining the n genetic markers as the epigenetic markers,
Determining including at least indicators 'cg23248424', 'cg02891314', and 'cg03849834' as the n genetic indicators
Metabolic disease risk prediction method comprising a.

a measurement unit for measuring a degree of methylation in a methylation region in relation to a group of testers; and
A determination unit that performs metabolic disease risk prediction modeling using the measured methylation degree and informs the metabolic disease risk for the tester group according to the results of the metabolic disease risk prediction modeling
Metabolic disease risk prediction system comprising a.

According to claim 8,
In the measuring unit, as the methylation information of the normal group and the metabolic disease patient group is analyzed, and RNA-seq (large transcript data) is measured for each of the normal group and the metabolic disease patient group,
Selection unit for selecting the methylation region through verification by the RNA-seq for the methylation information
Metabolic disease risk prediction system further comprising a.

According to claim 8,
In the measurement unit, as the RNA-seq (large-capacity transcript data) and the degree of methylation were measured for each of the normal group and the metabolic disease patient group,
Using the RNA-seq, select differentially expressed genes whose expression changes only in the metabolic disease patient group compared to the normal group, and using the methylation degree, methylation only in the metabolic disease patient group compared to the normal group Selection unit that selects this changing differential methylation region
Including more,
The decision section,
Determining epigenetic markers for predicting patients with metabolic diseases through modeling of the selected differential expression genes and the differential methylation region
Metabolic disease risk prediction system.

According to claim 10,
The selection department,
In connection with the sequencing read of the nucleotide sequence produced in the RNA-seq, the expression level for each gene was confirmed, and the fold change (FC) according to the identified expression level was more than twice as different, indicating significance. Selecting a gene of 0.05 or less as the differential expression gene
Metabolic disease risk prediction system.

According to claim 10,
The selection department,
From the image file of the methylation chip (Human Methylation 850K BeadChip) obtained through the measurement of the degree of methylation, a region having a P-value value of 0.05 or less indicating significance is selected as the differential methylation region.
Metabolic disease risk prediction system.

According to claim 10,
The decision section,
A plurality of genetic markers that belong to the differential methylation region and are the differential expression genes are identified, and among the plurality of genetic markers, n genetic markers having a relatively high AUC value (n is a natural number of 3 or more) are selected as the epigenetic genes. determined by indicators
Metabolic disease risk prediction system.

According to claim 13,
The decision section,
As the n genetic indicators, to determine including at least indicators 'cg23248424', 'cg02891314', 'cg03849834'
Metabolic disease risk prediction system.

A computer-readable recording medium recording a program for executing the method of any one of claims 1 to 7.