KR20230129933A

KR20230129933A - Method and apparatus for risk prediction of diabetic nephropathy based on integration of polygenic and clinical information

Info

Publication number: KR20230129933A
Application number: KR1020230027438A
Authority: KR
Inventors: 홍정한
Original assignee: 에이치앤비지노믹스 주식회사
Priority date: 2022-03-02
Filing date: 2023-03-02
Publication date: 2023-09-11

Abstract

본 개시는 다유전자 및 임상 정보 융합 기반의 당뇨병성 신증 발병 위험도를 예측하는 방법 및 이를 수행하는 전자 장치에 관한 것이다. 일 실시 예에 의하면, 전자 장치가 당뇨병성 신증 발병 위험도를 예측하는 방법은 상기 전자 장치와 연결된 외부 디바이스로부터 임상 샘플의 유전체 데이터를 획득하는 단계; 기 설정된 복수의 레퍼런스 SNP(Single Nucleotide Polymorphism) 데이터들 및 상기 당뇨병성 신증 발병에 관련된 복수의 miRNA 데이터를 포함하는 레퍼런스 오믹스 데이터를 기준으로 상기 유전체 데이터에서 상기 레퍼런스 오믹스 데이터와 동일한 복수의 타깃 오믹스 데이터 들을 선별하는 단계; 및 상기 복수의 타깃 오믹스 데이터들 각각의 유전자형 및 상기 복수의 타깃 오믹스 데이터들 각각에 대한 가중치를 이용하여 위험도 점수를 산출하는 단계; 를 포함하고, 상기 레퍼런스 오믹스 데이터는, 당뇨병성 신증 발병에 관련된 복수의 레퍼런스 SNP 데이터들 및 상기 당뇨병성 신증 발병에 관련된 복수의 mi RNA 데이터를 포함하며, 상기 당뇨병성 신증 발병에 관련된 오믹스 데이터의 연관불균형 및 전역적 스케일링 매개변수(Shrinkage factor)를 이용하여 보정된 효과 크기(effect size)를 반영할 수 있다.The present disclosure relates to a method for predicting the risk of developing diabetic nephropathy based on the fusion of multiple genes and clinical information, and an electronic device for performing the same. According to one embodiment, a method for predicting the risk of developing diabetic nephropathy using an electronic device includes acquiring genomic data of a clinical sample from an external device connected to the electronic device; Based on reference omics data including a plurality of preset reference SNP (Single Nucleotide Polymorphism) data and a plurality of miRNA data related to the onset of diabetic nephropathy, a plurality of target errors identical to the reference omics data are identified from the genomic data. Selecting mix data; and calculating a risk score using a genotype of each of the plurality of target omics data and a weight for each of the plurality of target omics data. It includes, and the reference omics data includes a plurality of reference SNP data related to the development of diabetic nephropathy and a plurality of mi RNA data related to the development of diabetic nephropathy, and the omics data related to the development of diabetic nephropathy The corrected effect size can be reflected using the linkage imbalance and global scaling parameter (shrinkage factor).

Description

Method and device for predicting risk of developing diabetic nephropathy based on multigene and clinical information fusion {METHOD AND APPARATUS FOR RISK PREDICTION OF DIABETIC NEPHROPATHY BASED ON INTEGRATION OF POLYGENIC AND CLINICAL INFORMATION}

본 개시는 당뇨병성 신증의 발병 위험도를 지능형으로 예측하는 방법 및 이를 수행하는 전자 장치에 관한 것이다. 보다 상세하게는 딥러닝 네트워크에 기반하여 당뇨병성 신증의 발병 위험도를 지능형으로 예측하는 방법 및 이를 수행하는 전자 장치에 관한 것이다.The present disclosure relates to a method for intelligently predicting the risk of developing diabetic nephropathy and an electronic device for performing the same. More specifically, it relates to a method for intelligently predicting the risk of developing diabetic nephropathy based on a deep learning network and an electronic device that performs the same.

대한당뇨병학회의 2020년 발표에 따르면, 2018년 기준 국내 30세 이상 성인 인구의 당뇨병 유병률이 13.8%로 추계 인구를 적용할 경우 494만명으로 집계되었다. 동반 질환의 경우 당뇨병 유병자 중 53.2%가 비만에 해당되고, 61.3%가 고혈압을 동반했고, 72%는 고콜레스테롤혈증을 보였으며, 당뇨병 유병자 중 고혈압과 고콜레스테롤혈증을 모두 동반한 유병률은 43.7%였다.According to the 2020 announcement by the Korean Diabetes Association, the prevalence of diabetes among the adult population over 30 years of age in Korea was 13.8% in 2018, and when the estimated population was applied, it was calculated to be 4.94 million. In terms of comorbidities, 53.2% of those with diabetes were obese, 61.3% had hypertension, and 72% had hypercholesterolemia. Among those with diabetes, the prevalence of both hypertension and hypercholesterolemia was 43.7. It was %.

당뇨병이란 '소면으로 당이 나온다'는 어원에서 알 수 있듯이 신장(콩팥)과 밀접한 관계를 가지고 있다. 당뇨병은 전 세계적으로 질병 발생 및 사망의 다섯 번째로 흔한 원인 질환인데 그 이유는 당뇨병 자체 보다는 당뇨로 인한 합병증 발생 때문이라고 할 수 있다. 당뇨병성 신증(diabetic nephropathy)은 당뇨병성 망막증, 당뇨병성 신경병증과 함께 당뇨병의 주요 미세혈관 합병증에 해당되며 관상동맥질환, 뇌경색 및 말초 혈관 질환은 당뇨병의 대혈관 합병증에 해당된다.As can be seen from the etymology of diabetes, which means ‘sugar comes from noodles,’ it has a close relationship with the kidneys. Diabetes is the fifth most common cause of disease and death worldwide, and this can be said to be due to complications caused by diabetes rather than diabetes itself. Diabetic nephropathy, along with diabetic retinopathy and diabetic neuropathy, are major microvascular complications of diabetes, while coronary artery disease, cerebral infarction, and peripheral vascular disease are macrovascular complications of diabetes.

당뇨병성 신증은 고혈당 등에 의해 지속적으로 신장 내부의 사구체가 손상되어 신장 기능이 저하되는 질환으로, 당뇨병은 10~15년에 걸쳐 천천히 신장질환을 진행시키므로 당뇨 초기에 잘 치료하지 않으면 어느새 신장 기능이 망가지게 되고 심할 경우에는 투석과 이식을 필요로 하는 말기 신부전이 초래될 수 있다.Diabetic nephropathy is a disease in which kidney function deteriorates due to continuous damage to the glomeruli inside the kidney due to high blood sugar levels. Diabetes causes kidney disease to progress slowly over 10 to 15 years, so if diabetes is not properly treated in the early stages, kidney function will be ruined before you know it. In severe cases, it can lead to end-stage renal failure requiring dialysis and transplantation.

지난 수년간 신장질환에 대한 원인규명, 진단방법 및 치료방법에 대한 집중적인 연구가 이루어졌음에도 불구하고 아직까지 당뇨병성 신증을 효과적으로 치료할 수 있는 확실한 치료제가 개발되지 못하고 있다. Although intensive research has been conducted on the cause, diagnosis, and treatment of kidney disease over the past several years, a definitive treatment that can effectively treat diabetic nephropathy has not yet been developed.

현재까지 알부민뇨가 조기 당뇨병성 신증의 바이오마커로 가장 많이 사용되고 있다. 그러나 알부민뇨가 당뇨병성 신증에만 특이적인 마커가 아니라 다른 신장질환에서도 관찰된다는 점, 사구체에서 여과되지만 세뇨관에서 다시 분비된다는 점, 미세알부민뇨를 보이는 당뇨 환자가 모두 현성 단백뇨로 진행하지는 않는다는 점, 그리고 알부민뇨 없이도 당뇨병성 신증이 발생하기도 한다는 점에서 바이오마커로서 한계를 가지고 있다. 이런 이유로 당뇨병성 신증을 조기에 정확하게 진단할 수 있도록, 딥러닝 네트워크를 이용하여 진단 대상자의 당뇨병성 신증 여부를 지능형으로 판독하기 위한 기술 개발이 요구되고 있다.To date, albuminuria is the most widely used biomarker for early diabetic nephropathy. However, albuminuria is not a specific marker only for diabetic nephropathy but is also observed in other kidney diseases, is filtered by the glomeruli but is secreted again in the tubules, not all diabetic patients with microalbuminuria progress to overt proteinuria, and even without albuminuria. It has limitations as a biomarker in that diabetic nephropathy may occur. For this reason, there is a need to develop technology to intelligently read whether a person to be diagnosed has diabetic nephropathy using a deep learning network so that diabetic nephropathy can be diagnosed early and accurately.

한국등록특허 제10-18176650000호Korean Patent No. 10-18176650000

일 실시 예에 따르면, 인공지능 모델을 이용한 다유전자 및 임상 정보 융합 기반의 당뇨병성 신증 발병 위험도를 예측하는 방법 및 이를 수행하는 전자 장치가 제공될 수 있다.According to one embodiment, a method for predicting the risk of developing diabetic nephropathy based on the fusion of multiple genes and clinical information using an artificial intelligence model and an electronic device for performing the same may be provided.

또한, 일 실시 예에 의하면, 다유전자 및 임상 정보 융합 기반의 당뇨병성 신증 발병 위험도 예측을 위한 인공지능 모델의 결과를 분석하는 방법 및 이를 수행하는 전자 장치가 제공될 수 있다.In addition, according to one embodiment, a method for analyzing the results of an artificial intelligence model for predicting the risk of developing diabetic nephropathy based on the fusion of multigene and clinical information and an electronic device for performing the same may be provided.

상술한 기술적 과제를 달성하기 위한 본 개시의 일 실시 예에 따라, 전자 장치가 당뇨병성 신증 발병 위험도를 예측하는 방법은 상기 전자 장치와 연결된 외부 디바이스로부터 임상 샘플의 유전체 데이터를 획득하는 단계; 기 설정된 복수의 레퍼런스 SNP(Single Nucleotide Polymorphism) 데이터들 및 상기 당뇨병성 신증 발병에 관련된 복수의 miRNA 데이터를 포함하는 레퍼런스 오믹스 데이터를 기준으로 상기 유전체 데이터에서 상기 레퍼런스 오믹스 데이터와 동일한 복수의 타깃 오믹스 데이터 들을 선별하는 단계; 및 상기 복수의 타깃 오믹스 데이터들 각각의 유전자형 및 상기 복수의 타깃 오믹스 데이터들 각각에 대한 가중치를 이용하여 위험도 점수를 산출하는 단계; 를 포함하고, 상기 레퍼런스 오믹스 데이터는, 당뇨병성 신증 발병에 관련된 복수의 레퍼런스 SNP 데이터들 및 상기 당뇨병성 신증 발병에 관련된 복수의 mi RNA 데이터를 포함하고, 상기 당뇨병성 신증 발병에 관련된 오믹스 데이터의 연관불균형 및 전역적 스케일링 매개변수(Shrinkage factor)를 이용하여 보정된 효과 크기(effect size)를 반영할 수 있다.According to an embodiment of the present disclosure for achieving the above-mentioned technical problem, a method of predicting the risk of developing diabetic nephropathy by an electronic device includes obtaining genomic data of a clinical sample from an external device connected to the electronic device; Based on reference omics data including a plurality of preset reference SNP (Single Nucleotide Polymorphism) data and a plurality of miRNA data related to the onset of diabetic nephropathy, a plurality of target errors identical to the reference omics data are identified from the genomic data. Selecting mix data; and calculating a risk score using a genotype of each of the plurality of target omics data and a weight for each of the plurality of target omics data. It includes, and the reference omics data includes a plurality of reference SNP data related to the development of diabetic nephropathy and a plurality of mi RNA data related to the development of diabetic nephropathy, and the omics data related to the development of diabetic nephropathy The corrected effect size can be reflected using the linkage imbalance and global scaling parameter (shrinkage factor).

일 실시 예에 의하면, 상기 방법은 상기 임상 샘플의 유전체 데이터와 함께 상기 임상 샘플의 유전체 데이터에 대응되는 사용자의 사용자 식별 정보를 더 획득하는 단계; 상기 사용자 식별 정보에 기초하여 상기 사용자의 개인 특성 정보를 상기 외부 디바이스로부터 획득하는 단계; 및 상기 사용자의 개인 특성 정보에 기초하여 상기 임상 샘플에 대응되는 사용자의 사용자 유형을 식별하는 단계; 를 포함할 수 있다.According to one embodiment, the method further includes obtaining user identification information of a user corresponding to the genomic data of the clinical sample along with the genomic data of the clinical sample; Obtaining personal characteristic information of the user from the external device based on the user identification information; and identifying a user type of the user corresponding to the clinical sample based on the user's personal characteristic information. may include.

일 실시 예에 의하면, 상기 방법은 상기 전자 장치와 연결된 외부 디바이스로부터 예측 모델 학습을 위해, 당뇨병성 신증 발병 위험도에 관련된 오믹스 학습 데이터를 획득하는 단계; 상기 획득된 오믹스 학습 데이터를 미리 설정된 사용자 개인의 특성에 기초하여 서브 그룹핑하는 단계; 및 상기 서브 그룹핑된 오믹스 데이터에 대해 K-fold 교차 검증을 수행함으로써 상기 사용자 개인의 특성 별 복수의 예측 모델을 생성하는 단계; 를 더 포함할 수 있다.According to one embodiment, the method includes acquiring omics learning data related to the risk of developing diabetic nephropathy for learning a prediction model from an external device connected to the electronic device; subgrouping the obtained omics learning data based on preset individual user characteristics; And generating a plurality of prediction models for each user's individual characteristics by performing K-fold cross-validation on the subgrouped omics data; It may further include.

일 실시 예에 의하면, 상기 방법은 상기 외부 디바이스로부터 획득된 임상 샘플의 유전체 데이터를 상기 사용자의 개인의 특성에 기초하여 서브 그룹핑하는 단계; 를 더 포함하고, 상기 위험도 점수를 산출하는 단계는, 상기 서브 그룹핑된, 유전체 데이터로부터 선별된 복수의 타깃 오믹스 데이터를, 상기 사용자 개인의 특성 별 복수의 예측 모델들에 입력함으로써, 상기 복수의 예측 모델들 각각으로부터 출력 값들을 획득하는 단계; 및 상기 식별된 사용자 유형에 따라, 상기 복수의 예측 모델의 출력 값들에 대해 적용되는 서로 다른 가중치를, 상기 복수의 예측 모델들 각각의 출력 값에 적용함으로써, 상기 위험도 점수를 산출하는 단계; 를 포함할 수 있다.According to one embodiment, the method includes subgrouping genomic data of a clinical sample obtained from the external device based on the user's personal characteristics; Further comprising: calculating the risk score by inputting a plurality of target omics data selected from the subgrouped genomic data into a plurality of prediction models for each characteristic of the user's individual, Obtaining output values from each of the prediction models; and calculating the risk score by applying different weights applied to the output values of the plurality of prediction models to the output values of each of the plurality of prediction models, according to the identified user type. may include.

일 실시 예에 의하면, 상기 방법은 상기 서브 그룹핑된, 상기 유전체 데이터로부터 선별된 복수의 타깃 오믹스 데이터 및 상기 복수의 예측 모델들 각각의 출력 값을 미리 학습된 Cox 비례 위험 생존 분석 모델에 입력함으로써, 상기 Cox 비례 위험 생존 분석 모델로부터 상기 사용자 유형 및 소정의 기간 별로 상기 임상 샘플에 대한 상기 위험도 점수를 산출하는 단계; 를 더 포함할 수 있다.According to one embodiment, the method is performed by inputting a plurality of target omics data selected from the subgrouped genomic data and output values of each of the plurality of prediction models into a pre-learned Cox proportional hazard survival analysis model. , calculating the risk score for the clinical sample for each user type and predetermined period from the Cox proportional hazards survival analysis model; It may further include.

또한, 상술한 기술적 과제를 달성하기 위한 또 다른 실시 예에 의하면, 당뇨병성 신증 발병 위험도를 예측하는 전자 장치는 네트워크 인터페이스; 하나 이상의 인스트럭션을 저장하는 메모리; 및 상기 하나 이상의 인스트럭션을 실행하는 적어도 하나의 프로세서; 를 포함하고, 상기 하나 이상의 인스트럭션을 실행함으로써, 상기 전자 장치와 연결된 외부 디바이스로부터 임상 샘플의 유전체 데이터를 획득하고, 기 설정된 복수의 레퍼런스 SNP(Single Nucleotide Polymorphism) 데이터들 및 상기 당뇨병성 신증 발병에 관련된 복수의 miRNA 데이터를 포함하는 레퍼런스 오믹스 데이터를 기준으로 상기 유전체 데이터에서 상기 레퍼런스 오믹스 데이터와 동일한 복수의 타깃 오믹스 데이터들을 선별하고, 상기 복수의 타깃 오믹스 데이터들 각각의 유전자형 및 상기 복수의 타깃 오믹스 데이터들 각각에 대한 가중치를 이용하여 위험도 점수를 산출하고, 상기 레퍼런스 오믹스 데이터는, 당뇨병성 신증 발병에 관련된 복수의 레퍼런스 SNP 데이터들 및 상기 당뇨병성 신증 발병에 관련된 복수의 miRNA 데이터를 포함하고, 상기 당뇨병성 신증 발병에 관련된 레퍼런스 오믹스 데이터의 연관불균형 및 전역적 스케일링 매개변수(Shrinkage factor)를 이용하여 보정된 효과 크기(effect size)를 반영할 수 있다.In addition, according to another embodiment for achieving the above-described technical problem, an electronic device for predicting the risk of developing diabetic nephropathy includes a network interface; A memory that stores one or more instructions; and at least one processor executing the one or more instructions; Includes, by executing the one or more instructions, to obtain genomic data of a clinical sample from an external device connected to the electronic device, a plurality of preset reference SNP (Single Nucleotide Polymorphism) data and related to the onset of diabetic nephropathy A plurality of target omics data identical to the reference omics data are selected from the genomic data based on reference omics data including a plurality of miRNA data, and the genotype and the plurality of each of the plurality of target omics data are selected. A risk score is calculated using a weight for each of the target omics data, and the reference omics data includes a plurality of reference SNP data related to the development of diabetic nephropathy and a plurality of miRNA data related to the development of diabetic nephropathy. It may reflect the effect size corrected using the linkage imbalance and global scaling parameter (Shrinkage factor) of the reference omics data related to the development of diabetic nephropathy.

또한, 상기 기술적 과제를 해결하기 위한 또 다른 실시 예에 의하면, 전자 장치가 당뇨병성 신증 발병 위험도를 예측하는 방법에 있어서, 상기 전자 장치와 연결된 외부 디바이스로부터 임상 샘플의 유전체 데이터를 획득하는 단계; 기 설정된 복수의 레퍼런스 SNP(Single Nucleotide Polymorphism) 데이터들 및 상기 당뇨병성 신증 발병에 관련된 복수의 miRNA 데이터를 포함하는 레퍼런스 오믹스 데이터를 기준으로 상기 유전체 데이터에서 상기 레퍼런스 오믹스 데이터와 동일한 복수의 타깃 오믹스 데이터 들을 선별하는 단계; 및 상기 복수의 타깃 오믹스 데이터들 각각의 유전자형 및 상기 복수의 타깃 오믹스 데이터들 각각에 대한 가중치를 이용하여 위험도 점수를 산출하는 단계; 를 포함하고, 상기 레퍼런스 오믹스 데이터는, 당뇨병성 신증 발병에 관련된 복수의 레퍼런스 SNP 데이터들 및 상기 당뇨병성 신증 발병에 관련된 복수의 mi RNA 데이터를 포함하고, 상기 당뇨병성 신증 발병에 관련된 오믹스 데이터의 연관불균형 및 전역적 스케일링 매개변수(Shrinkage factor)를 이용하여 보정된 효과 크기(effect size)를 반영하는 것을 특징으로 하는, 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록 매체가 제공될 수 있다.In addition, according to another embodiment for solving the above technical problem, a method for predicting the risk of developing diabetic nephropathy using an electronic device includes: acquiring genomic data of a clinical sample from an external device connected to the electronic device; Based on reference omics data including a plurality of preset reference SNP (Single Nucleotide Polymorphism) data and a plurality of miRNA data related to the onset of diabetic nephropathy, a plurality of target errors identical to the reference omics data are identified from the genomic data. Selecting mix data; and calculating a risk score using a genotype of each of the plurality of target omics data and a weight for each of the plurality of target omics data. It includes, and the reference omics data includes a plurality of reference SNP data related to the development of diabetic nephropathy and a plurality of mi RNA data related to the development of diabetic nephropathy, and the omics data related to the development of diabetic nephropathy A computer-readable recording medium recording a program for executing a method on a computer, characterized in that it reflects the effect size corrected using the linkage imbalance and the global scaling parameter (shrinkage factor). can be provided.

도 1은 당뇨병성 신증 위험도를 평가하는 시스템의 동작 과정을 개략적으로 나타내는 도면이다.
도 2는 일 실시 예에 따른 전자 장치가 당뇨병성 신증 발병 위험도를 예측하는 방법의 흐름도이다.
도 3은 일 실시 예에 따른 전자 장치가 위험도 점수를 산출하는 방법의 구체적인 과정을 설명하기 위한 도면이다.
도 4는 전자 장치가 GWAS 및 TWAS 데이터 베이스로부터 획득된 SNP 데이터들 및 miRNA 데이터들에 기초하여 예측 모델을 생성하는 과정을 나타내는 도면이다.
도 5는 전자 장치가 개인의 특성에 따라 서브 그룹핑된 유전체 데이터들에 기초하여 예측 모델을 생성하는 과정과 예측 모델의 생성 결과에 기초하여 기간 별 질병 발병 확률을 생성하는 Cox 비례 위험 생존 분석 모델을 활용하는 과정을 나타내는 도면이다.
도 6은 전자 장치가 레퍼런스 SNP 데이터들을 포함하는 레퍼런스 오믹스 데이터를 선별한 결과를 나타내는 도면이다.
도 7은 전자 장치가 이용하는 당뇨병성 신증 위험도분석모델을 검증한 ROC커브에 대한 실시 예를 나타내는 도면이다.
도 8은 일 실시 예에 따른 당뇨병성 신증 위험도를 분석하는 전자 장치 및 이를 포함하는 시스템의 구조를 설명하기 위한 도면이다.Figure 1 is a diagram schematically showing the operation process of a system for evaluating the risk of diabetic nephropathy.
Figure 2 is a flowchart of a method for predicting the risk of developing diabetic nephropathy by an electronic device according to an embodiment.
FIG. 3 is a diagram illustrating a specific process of how an electronic device calculates a risk score according to an embodiment.
FIG. 4 is a diagram illustrating a process in which an electronic device generates a prediction model based on SNP data and miRNA data obtained from GWAS and TWAS databases.
Figure 5 shows the process by which an electronic device generates a prediction model based on genomic data subgrouped according to individual characteristics, and the Cox proportional hazard survival analysis model that generates the probability of disease onset by period based on the generation result of the prediction model. This is a drawing showing the process of use.
Figure 6 is a diagram showing the results of an electronic device selecting reference omics data including reference SNP data.
Figure 7 is a diagram showing an example of the ROC curve that verifies the diabetic nephropathy risk analysis model used by the electronic device.
FIG. 8 is a diagram illustrating the structure of an electronic device for analyzing the risk of diabetic nephropathy and a system including the same according to an embodiment.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 본 개시에 대해 구체적으로 설명하기로 한다. Terms used in this specification will be briefly described, and the present disclosure will be described in detail.

본 개시에서 사용되는 용어는 본 개시에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 개시에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 개시의 전반에 걸친 내용을 토대로 정의되어야 한다. The terms used in the present disclosure have selected general terms that are currently widely used as much as possible while considering the functions in the present disclosure, but this may vary depending on the intention or precedents of those skilled in the art, the emergence of new technologies, etc. In addition, in certain cases, there are terms arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the relevant invention. Therefore, the terms used in this disclosure should be defined based on the meaning of the term and the overall content of this disclosure, rather than simply the name of the term.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에 기재된 "...부", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.When it is said that a part "includes" a certain element throughout the specification, this means that, unless specifically stated to the contrary, it does not exclude other elements but may further include other elements. In addition, terms such as "... unit" and "module" used in the specification refer to a unit that processes at least one function or operation, which may be implemented as hardware or software, or as a combination of hardware and software. .

아래에서는 첨부한 도면을 참고하여 본 개시의 실시예에 대하여 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본 개시를 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Below, with reference to the attached drawings, embodiments of the present disclosure will be described in detail so that those skilled in the art can easily practice them. However, the present disclosure may be implemented in many different forms and is not limited to the embodiments described herein. In order to clearly explain the present disclosure in the drawings, parts that are not related to the description are omitted, and similar parts are given similar reference numerals throughout the specification.

도 1은 당뇨병성 신증 위험도를 평가하는 시스템의 동작 과정을 개략적으로 나타내는 도면이다.Figure 1 is a diagram schematically showing the operation process of a system for evaluating the risk of diabetic nephropathy.

일 실시 예에 따른 당뇨병성 신증 위험도를 평가하는 시스템(10)은 레퍼런스 데이터 베이스(120) 및 전자 장치(1000)를 포함할 수 있다. 일 실시 예에 따른 당뇨병성 신증 위험도를 평가하는 시스템(10)은 독립된 연구집단에서 수행한 GWAS 결과가 없거나 연구집단의 특이성 등을 고려하여 한국인 인구집단에 적용이 제한되는 상황에서도 10-fold LOGO(Leave One Group Out) 메타분석을 수행하여 1~9번 집단의 연관성 분석 및 통합 메타분석을 수행하고 메타분석결과 Summary Statistics를 이용해 유전적 위험도 예측 모형 모델을 구축할 수 있다. The system 10 for evaluating the risk of diabetic nephropathy according to an embodiment may include a reference database 120 and an electronic device 1000. The system 10 for evaluating the risk of diabetic nephropathy according to one embodiment uses the 10-fold LOGO (10-fold LOGO ( Leave One Group Out) meta-analysis can be performed to perform correlation analysis and integrated meta-analysis for groups 1 to 9, and a genetic risk prediction model can be built using the summary statistics of the meta-analysis results.

구축된 예측 모형 모델을 이용해 10번 집단의 유전적 위험도를 추정하고 독립된 집단의 연관성 분석 결과를 이용하여 10개 집단 모두의 유전적 위험도를 계산할 수 있다. 또한, 본 개시에 따른 당뇨병성 신증 위험도를 평가하는 시스템(10)은 임상샘플의 역학 및 임상 데이터를 공변량으로 추가하여 Cox 비례위험 생존 모형을 적용함으로써 대상자의 특정 연령에서의 당뇨병성 신증 발병 위험 예측 정확도가 향상된 딥러닝 기반 발병위험도 예측모델을 완성할 수 있다.The genetic risk of group 10 can be estimated using the constructed prediction model model, and the genetic risk of all 10 groups can be calculated using the correlation analysis results of the independent group. In addition, the system for evaluating the risk of diabetic nephropathy according to the present disclosure (10) predicts the risk of developing diabetic nephropathy at a specific age of the subject by applying the Cox proportional hazards survival model by adding epidemiological and clinical data of clinical samples as covariates. A deep learning-based disease risk prediction model with improved accuracy can be completed.

일 실시 예에 의하면, 당뇨병성 신증 위험도를 평가하는 시스템(10)은 당뇨병성 신증 발병에 관련된 복수의 레퍼런스 SNP 데이터들 및 당뇨병성 신증 발병에 관련된 복수의 miRNA 데이터를 포함하는 레퍼런스 오믹스 데이터를 저장하는 레퍼런스 데이터 베이스(120), 당뇨병성 신증 위험도를 에측하는 제1 서버(150), Meta-GWAS 분석을 수행하는 제2 서버(110), 대상 임상 샘플의 유전체 데이터를 획득하고, 획득된 유전체 데이터를 서버들로 전송하는 전자 장치(1000)를 포함할 수 있다.According to one embodiment, the system 10 for evaluating the risk of diabetic nephropathy stores reference omics data including a plurality of reference SNP data related to the development of diabetic nephropathy and a plurality of miRNA data related to the development of diabetic nephropathy. A reference database 120 that predicts the risk of diabetic nephropathy, a first server 150 that predicts the risk of diabetic nephropathy, a second server 110 that performs Meta-GWAS analysis, acquires genomic data of target clinical samples, and obtains genomic data. It may include an electronic device 1000 that transmits to servers.

제1 서버(150)는 제2 서버(110)의 Meta-GWAS 분석 결과에 기초하여, 임상 샘플의 유전체 데이터로부터 복수의 타깃 오믹스 데이터들 또는 복수의 타깃 SNP 데이터들을 선별하고, 선별된 복수의 타깃 오믹스 데이터들 또는 복수의 타깃 SNP 데이터들로부터 결정되는 저차원 임베딩을 기 설정된 예측 모형 모델에 입력함으로써 당뇨병성 신증 위험도 점수를 결정할 수 있다.The first server 150 selects a plurality of target omics data or a plurality of target SNP data from the genomic data of the clinical sample based on the Meta-GWAS analysis results of the second server 110, and selects the plurality of selected target SNP data. A diabetic nephropathy risk score can be determined by inputting low-dimensional embeddings determined from target omics data or multiple target SNP data into a preset prediction model.

일 실시 예에 의하면, 제1 서버(150)는 특정 사용자에 대한 샘플 유전체 데이터를 획득하고, 획득된 유전체 데이터에 대한 당뇨병성 신증 발병 위험도를 결정하는 분석 장치 또는 컴퓨팅 장치일 수 있다. 예를 들어, 제1 서버(150)는 제2 서버(또는 유전체 정보 생성장치)(110), 전자 장치(1000) 또는 별도의 데이터 베이스로부터 특정 샘플의 유전체 데이터를 수신할 수 있다. 일 실시 예에 의하면, 특정 샘플의 유전체 데이터는 개체에 대한 식별정보를 포함할 수 있다.According to one embodiment, the first server 150 may be an analysis device or computing device that acquires sample genomic data for a specific user and determines the risk of developing diabetic nephropathy for the obtained genomic data. For example, the first server 150 may receive genomic data of a specific sample from the second server (or genomic information generating device) 110, the electronic device 1000, or a separate database. According to one embodiment, the genomic data of a specific sample may include identification information for the individual.

제1 서버(150)는 샘플의 유전체 데이터를 기준으로 해당 샘플의 당뇨병 위험도를 생성한다. 제1 서버(150)는 임상샘플의 유전체 데이터에 대한 GWAS을 하여 당뇨병 위험도를 생성할 수 있다. 제1 서버(150)는 샘플에 대한 분석결과를 서비스 사용자(A)에게 제공할 수 있다. 예컨대, 제1 서버(150)는 개인 단말(50)에 분석 결과를 전송할 수도 있다.The first server 150 generates a diabetes risk of the sample based on the sample's genomic data. The first server 150 may generate a diabetes risk by performing GWAS on the genomic data of clinical samples. The first server 150 may provide analysis results for the sample to the service user (A). For example, the first server 150 may transmit the analysis result to the personal terminal 50.

한편, 제1 서버(150)는 레퍼런스 데이터베이스(120)로부터 당뇨병에 연관된 SNP를 식별하기 위한 레퍼런스 SNP 정보를 수신할 수 있다. 또 다른 실시 예에 의하면, 제1 서버(150)는 레퍼런스 데이터 베이스(120)로부터 복수의 레퍼런스 SNP(Single Nucleotide Polymorphism) 데이터들 및 당뇨병성 신증 발병에 관련된 복수의 miRNA 데이터를 포함하는 레퍼런스 오믹스 데이터를 획득할 수도 있다. 레퍼런스 데이터베이스(120)는 사전에 구축된 것으로, 레퍼런스 SNP 데이터들 및 복수의 miRNA 데이터를 저장한다.Meanwhile, the first server 150 may receive reference SNP information for identifying SNPs related to diabetes from the reference database 120. According to another embodiment, the first server 150 provides reference omics data including a plurality of reference SNP (Single Nucleotide Polymorphism) data and a plurality of miRNA data related to the development of diabetic nephropathy from the reference database 120. You can also obtain . The reference database 120 is built in advance and stores reference SNP data and a plurality of miRNA data.

제1 서버(150)는 레퍼런스 SNP 데이터 또는 레퍼런스 오믹스 데이터를 이용하여 샘플의 유전체 데이터에서 복수의 타깃 오믹스 데이터들을 선별할 수 있다. 또 다른 실시 예에 의하면, 제1 서버(150)는 레퍼런스 SNP 데이터를 이용하여 샘플의 유전체 데이터에서 복수의 타깃 SNP 데이터들을 선별할 수도 있다. 제1 서버(150)는 복수의 타깃 오믹스 데이터들 또는 복수의 타깃 SNP 데이터들을 이용하여 샘플의 유전체 데이터에 대한 당뇨병성 신증 발병 위험도를 결정할 수 있다.The first server 150 may select a plurality of target omics data from the genomic data of the sample using reference SNP data or reference omics data. According to another embodiment, the first server 150 may select a plurality of target SNP data from the genomic data of the sample using reference SNP data. The first server 150 may determine the risk of developing diabetic nephropathy for the genomic data of the sample using a plurality of target omics data or a plurality of target SNP data.

일 실시 예에 의하면, 제1 서버(150)가 이용하는 복수의 레퍼런스 SNP 데이터들은 rs12531478-A, rs17373728-C, rs5750250-G, rs11107616-C, rs136161-G, rs4879670-G, rs13259109-G, rs1298908-G, rs304029-G, rs9510795-A, rs10952362-C, rs4667466-T, rs10778560-C, rs7975752-G, rs731565-T, rs4849965-C, rs6910061-A, rs1424609-G, rs2596230-G, rs1677894-G, rs5750250-G, rs136161-G를 포함하는 SNP 데이터 그룹 중 하나 이상의 SNP 데이터들을 포함할 수 있다.According to one embodiment, the plurality of reference SNP data used by the first server 150 are rs12531478-A, rs17373728-C, rs5750250-G, rs11107616-C, rs136161-G, rs4879670-G, rs13259109-G, rs1298908 - G, rs304029-G, rs9510795-A, rs10952362-C, rs4667466-T, rs10778560-C, rs7975752-G, rs731565-T, rs4849965-C, rs6910061-A, rs1424609-G, rs2596230-G, rs1677894-G, It may include one or more SNP data from the SNP data group including rs5750250-G and rs136161-G.

제2 서버(110)는 유전체 상에서 DNA 염기 서열의 다양성을 분석하여 특정 질병에 대한 개인의 위험도나 감수성을 예측하는 Meta-GWAS 기법을 수행함으로써 특정 한국인 집단의 연관성 분석 및 통합 메타 분석을 수행하고, 메타 분석 결과 요약 통계(Summary Statistics)를 이용해 제1 서버(150)가 유전적 위험도 예측 모형 모델을 생성하도록 할 수 있다.The second server 110 performs correlation analysis and integrated meta-analysis of a specific Korean population by performing the Meta-GWAS technique, which predicts an individual's risk or susceptibility to a specific disease by analyzing the diversity of DNA base sequences in the genome. The first server 150 can generate a genetic risk prediction model model using summary statistics from the results of the meta-analysis.

예를 들어, 제2 서버(110) 는 샘플에 대한 유전체 데이터를 생성하는 유전체 정보 생성을 위한 컴퓨팅 장치 또는 서버일 수 있다. 일 실시 예에 의하면, 제2 서버(110)는 도 1에 도시되지는 않았지만 생성한 유전체 정보를 별도의 데이터 베이스에 저장할 수도 있다.For example, the second server 110 may be a computing device or server for generating genomic information that generates genomic data for a sample. According to one embodiment, although not shown in FIG. 1, the second server 110 may store the generated genomic information in a separate database.

상술한 당뇨병성 신증 발병 위험도를 평가하는 시스템(10) 또는 제1 서버(150) 및 제2 서버(110)의 동작 과정은 전자 장치(1000)에 의해서도 수행될 수 있다. 예를 들어, 전자 장치(1000)는 전자 장치(1000)와 연결된 레퍼런스 데이터 베이스로부터 레퍼런스 SNP 데이터들 또는 상기 레퍼런스 SNP 데이터들 및 복수의 miRNA 데이터를 포함하는 레퍼런스 오믹스 데이터를 획득할 수 있다. 또한, 전자 장치(1000)는 전자 장치(1000)와 연결된 외부 디바이스로부터 대상 사용자 또는 특정 임상 샘플의 유전체 데이터를 획득할 수도 있다. 전자 장치(1000)는 획득된 레퍼런스 오믹스 데이터와 동일한 복수의 타깃 오믹스 데이터들을 임상 샘플의 유전체 데이터로부터 식별할 수 있다. 또 다른 실시 예에 의하면, 전자 장치(1000)는 획득된 레퍼런스 SNP 데이터들을 기준으로, 임상 샘플의 유전체 데이터로부터 레퍼런스 SNP 데이터들과 동일한 복수의 타깃 SNP 데이터들을 식별할 수 있다.The operation process of the system 10 or the first server 150 and the second server 110 for evaluating the risk of developing diabetic nephropathy described above may also be performed by the electronic device 1000. For example, the electronic device 1000 may acquire reference SNP data or reference omics data including the reference SNP data and a plurality of miRNA data from a reference database connected to the electronic device 1000. Additionally, the electronic device 1000 may acquire genomic data of a target user or a specific clinical sample from an external device connected to the electronic device 1000. The electronic device 1000 may identify a plurality of target omics data identical to the acquired reference omics data from the genomic data of the clinical sample. According to another embodiment, the electronic device 1000 may identify a plurality of target SNP data that are the same as the reference SNP data from the genomic data of the clinical sample, based on the acquired reference SNP data.

전자 장치(1000)는 식별된 복수의 타깃 오믹스 데이터들 각각의 유전자형 및 복수의 타깃 오믹스 데이터들 각각에 대한 가중치를 이용하여 위험도 점수를 산출할 수 있다. 일 실시 예에 의하면 전자 장치(1000)가 레퍼런스 데이터 베이스로부터 획득하는 레퍼런스 오믹스 데이터는 당뇨병성 신증 발병에 관련된 복수의 레퍼런스 SNP 데이터들 및 당뇨병성 신증 발병에 관련된 복수의 miRNA 데이터를 포함할 수 있다. 또한, 일 실시 예에 의하면 복수의 레퍼런스 SNP 데이터는 당뇨병성 신증 발병에 관련된 SNP 데이터를 포함하며, 관련된 SNP 데이터의 연관 불균형 및 전역적 스케일링 매개변수(Shrinkage factor)를 이용하여 보정된 효과 크기(effect size)를 반영할 수 있다.The electronic device 1000 may calculate a risk score using the genotype of each of the identified plurality of target omics data and the weight for each of the plurality of target omics data. According to one embodiment, the reference omics data that the electronic device 1000 acquires from the reference database may include a plurality of reference SNP data related to the development of diabetic nephropathy and a plurality of miRNA data related to the development of diabetic nephropathy. . In addition, according to one embodiment, the plurality of reference SNP data includes SNP data related to the onset of diabetic nephropathy, and the effect size corrected using the linkage imbalance and global scaling parameter (Shrinkage factor) of the related SNP data. size) can be reflected.

도 2는 일 실시 예에 따른 전자 장치가 당뇨병성 신증 발병 위험도를 예측하는 방법의 흐름도이다.Figure 2 is a flowchart of a method for predicting the risk of developing diabetic nephropathy by an electronic device according to an embodiment.

S210에서, 전자 장치(1000)는 전자 장치와 연결된 외부 디바이스로부터 임상 샘플의 유전체 데이터를 획득할 수 있다. 예를 들어, 전자 장치(1000)는 임상 샘플의 유전체 데이터를 저장하는 서버, 데이터 베이스로부터 임상 샘플의 유전체 데이터를 획득할 수 있다. In S210, the electronic device 1000 may acquire genomic data of a clinical sample from an external device connected to the electronic device. For example, the electronic device 1000 may obtain genomic data of clinical samples from a server or database that stores genomic data of clinical samples.

S220에서, 전자 장치(1000)는 기 설정된 복수의 레퍼런스 SNP(Single Nucleotide Polymorphism) 데이터들 및 당뇨병성 신증 발병에 관련된 복수의 miRNA 데이터를 포함하는 레퍼런스 오믹스 데이터를 기준으로 상기 유전체 데이터에서 상기 레퍼런스 오믹스 데이터와 동일한 복수의 타깃 오믹스 데이터들을 선별할 수 있다. In S220, the electronic device 1000 determines the reference error in the genomic data based on reference omics data including a plurality of preset reference SNP (Single Nucleotide Polymorphism) data and a plurality of miRNA data related to the development of diabetic nephropathy. Multiple target omics data identical to the mix data can be selected.

S230에서, 전자 장치(1000)는 복수의 타깃 오믹스 데이터들 각각의 유전자형 및 복수의 타깃 오믹스 데이터들 각각에 대한 가중치를 이용하여 위험도 점수를 산출할 수 있다. 전자 장치(1000)가 위험도 점수를 산출하는 구체적인 방법은 후술하는 도 3을 참조하여 구체적으로 설명하기로 한다.In S230, the electronic device 1000 may calculate a risk score using the genotype of each of the plurality of target omics data and the weight of each of the plurality of target omics data. A specific method by which the electronic device 1000 calculates the risk score will be described in detail with reference to FIG. 3, which will be described later.

또한, 도 2에는 도시되지 않았지만, S210에서, 전자 장치(1000)는 임상 샘플의 유전체 데이터와 함께 임상 샘플의 유전체 데이터에 대응되는 사용자의 사용자 식별 정보를 더 획득할 수 있다. 전자 장치(1000)는 사용자 식별 정보에 기초하여 전자 장치와 연결된 외부 디바이스로부터 사용자의 개인 특성 정보를 획득할 수 있다. 전자 장치(1000)는 사용자의 개인 특성 정보에 기초하여 임상 샘플에 대응되는 사용자의 사용자 유형을 식별할 수 있다. 일 실시 예에 의하면 사용자의 사용자 유형은 해당 사용자의 신체 정보 또는 질환 유무에 따라 달라질 수 있는 개인의 특성이 반영된 정보일 수 있다. 전자 장치(1000)는 상기 식별된 사용자의 개인의 특성 또는 상기 사용자의 개인의 특성이 반영된 임상 샘플의 유전체 데이터를 분류함으로써 서브 그룹핑하는데 사용할 수 있다.In addition, although not shown in FIG. 2, in S210, the electronic device 1000 may further obtain user identification information of the user corresponding to the genomic data of the clinical sample along with the genomic data of the clinical sample. The electronic device 1000 may obtain the user's personal characteristic information from an external device connected to the electronic device based on the user identification information. The electronic device 1000 may identify the user type of the user corresponding to the clinical sample based on the user's personal characteristic information. According to one embodiment, the user's user type may be information reflecting personal characteristics that may vary depending on the user's physical information or the presence or absence of a disease. The electronic device 1000 can be used to subgroup by classifying the personal characteristics of the identified user or the genomic data of clinical samples reflecting the user's personal characteristics.

또한, 도 2에는 도시되지 않았지만, 전자 장치(1000)는 위험도 점수를 산출하기 위해 적어도 하나의 예측 모델을 이용할 수 있다. 또한, 전자 장치(1000)는 외부 디바이스로부터 예측 모델 학습을 위해 당뇨병성 신증 발병에 관련된 SNP 데이터들 또는 miRNA 데이터 중 적어도 하나를 포함하는 오빅스 학습 데이터를 획득할 수 있다. 전자 장치(1000)는 획득된 오믹스 학습 데이터를 미리 설정된 사용자 개인의 특성에 기초하여 서브 그룹핑하고, 서브 그룹핑된 오믹스 학습 데이터에 대해 K-fold 교차 검증을 수행함으로써 사용자 개인의 특성 별 복수의 예측 모델들을 생성할 수도 있다.Additionally, although not shown in FIG. 2, the electronic device 1000 may use at least one prediction model to calculate a risk score. Additionally, the electronic device 1000 may acquire OBIX learning data including at least one of SNP data or miRNA data related to the onset of diabetic nephropathy from an external device to learn a prediction model. The electronic device 1000 subgroups the acquired omics learning data based on preset individual user characteristics, and performs K-fold cross-validation on the subgrouped omics learning data to obtain a plurality of data for each user's individual characteristics. Predictive models can also be created.

일 실시 예에 의하면, 전자 장치(1000)는 상술한 방법으로 생성된 복수의 예측 모델들을 이용하여 위험도 점수를 산출할 수 있다.According to one embodiment, the electronic device 1000 may calculate a risk score using a plurality of prediction models generated by the method described above.

예를 들어, 전자 장치(1000)는 외부 디바이스로부터 획득된 임상 샘플의 유전체 데이터를 상기 사용자 개인의 특성에 기초하여 서브 그룹핑하고, 서브 그룹핑된, 유전체 데이터로부터, 미리 설정된 레퍼런스 오믹스 데이터와 동일한, 복수의 타깃 오믹스 데이터를 선별할 수 있다. 전자 장치(1000)는 서브 그룹핑된, 유전체 데이터로부터 선별된 상기 복수의 타깃 오믹스 데이터를, 상기 사용자 개인의 특성 별 복수의 예측 모델들에 입력함으로써, 상기 복수의 예측 모델들 각각으로부터 출력 값들을 획득할 수 있다.For example, the electronic device 1000 subgroups the genomic data of a clinical sample acquired from an external device based on the user's individual characteristics, and selects from the subgrouped genomic data the same as preset reference omics data, Multiple target omics data can be selected. The electronic device 1000 inputs the plurality of target omics data selected from the subgrouped genomic data into a plurality of prediction models for each user's individual characteristics, thereby generating output values from each of the plurality of prediction models. It can be obtained.

전자 장치(1000)는 식별된 사용자의 유형(예컨대 흡연 유무, 질환 유무, 키, 몸무게, BMI 지수 범위에 따라 서로 다른 의학적 건강 특성을 나타내는 타입)에 따라 복수의 예측 모델의 출력 값들에 대해 적용되는 서로 다른 가중치를 결정하고, 상기 결정된 서로 다른 가중치를 복수의 예측 모델들 각각의 출력 값에 적용함으로써 위험도 점수를 산출할 수 있다.The electronic device 1000 is applied to the output values of a plurality of prediction models according to the type of the identified user (e.g., type representing different medical health characteristics depending on smoking status, disease status, height, weight, BMI index range). A risk score can be calculated by determining different weights and applying the determined different weights to the output values of each of the plurality of prediction models.

또한, 도 2에 도시되지 않았지만, 전자 장치(1000)는 개인에 대한 위험도 점수를 산출하는 과정에 더하여, 개인이 속할 수 있는 사용자 유형 별 및 특정 기간 별로 당뇨병성 신증 발병의 위험 점수를 산출할 수도 있다. 이를 위해 일 실시 예에 따른 전자 장치(1000)는 서브 그룹핑된, 유전체 데이터로부터 선별된 복수의 타깃 오믹스 데이터 및 상기 복수의 예측 모델들 각각의 출력 값을 미리 학습된 Cox 비례 위험 생존 분석 모델에 입력함으로써, 상기 Cox 비례 위험 생존 분석 모델로부터 상기 사용자 유형 및 소정의 기간 별 임상 샘플에 대한, 당뇨병성 신증 발병 위험도 점수를 산출할 수도 있다. 본 개시에 따른 전자 장치(1000)는 특정 임상 샘플에 대한 사용자 개인의 사용자 유형 및 특정 기간 별 당뇨병성 신증 발병 위험도를 점수화함으로써, 사용자의 당뇨병성 신증 발병에 대한 위험도 정보를 효과적으로 제공할 수 있다.In addition, although not shown in FIG. 2, the electronic device 1000 may calculate a risk score for developing diabetic nephropathy for each user type and specific period to which the individual may belong, in addition to the process of calculating the risk score for the individual. there is. To this end, the electronic device 1000 according to an embodiment applies a plurality of subgrouped target omics data selected from genomic data and output values of each of the plurality of prediction models to a pre-learned Cox proportional hazard survival analysis model. By inputting, a risk score for developing diabetic nephropathy may be calculated from the Cox proportional hazards survival analysis model for the clinical sample for each user type and predetermined period. The electronic device 1000 according to the present disclosure can effectively provide risk information on the user's risk of developing diabetic nephropathy by scoring the risk of developing diabetic nephropathy for each user's user type and specific period for a specific clinical sample.

도 3은 일 실시 예에 따른 전자 장치가 위험도 점수를 산출하는 방법의 구체적인 과정을 설명하기 위한 도면이다.FIG. 3 is a diagram illustrating a specific process of how an electronic device calculates a risk score according to an embodiment.

S310에서, 전자 장치(1000)는 전자 장치와 연결된, 게놈 전체 연관 분석 데이터를 저장하는 GWAS(Genome Wide Association Study) 데이터 베이스로부터 SNP 데이터들에 대한 사례 및 대조군 간 연관성 테스트에서 파생된 효과 크기(Effect Size)를 추출하고, 전사체 전체 연관 분석 데이터를 저장하는 TWAS(Transcriptome-based Genome-Wide Association Study) 데이터 베이스로부터 유전자 발현 결과가 동일한 SNP에 의해 영향을 받는지 여부에 대한 코로컬라이제이션(Colocalization) 연관성 테스트에서 파생된 효과 크기를 추출한다. In S310, the electronic device 1000 displays an effect size derived from an association test between cases and controls for SNP data from a Genome Wide Association Study (GWAS) database that stores genome-wide association analysis data connected to the electronic device. Size) and colocalization association test to determine whether gene expression results are affected by the same SNP from the TWAS (Transcriptome-based Genome-Wide Association Study) database, which stores transcriptome-wide association analysis data. Extract the effect size derived from .

S320에서, 전자 장치(1000)는 당뇨병성 신증 발병 위험도에 연관된 m개의 SNP 데이터에 대하여 각 위험 대립 유전자(Risk Allele)의 효과 크기로 가중된 주효과 및 공동 조절 miRNA 데이터의 효과 크기로 가중된 상호 작용 효과를 결정할 수 있다.In S320, the electronic device 1000 displays the main effect weighted by the effect size of each risk allele and the mutual effect weighted by the effect size of the co-regulatory miRNA data for m SNP data associated with the risk of developing diabetic nephropathy. The effect of action can be determined.

S330에서, 전자 장치(1000)는 상기 결정된 상호 작용 효과를 L_1 정규화를 사용하는 변분 오토 인코더(Variational Auto Encoder)에 입력함으로써, 상기 변분 오토 인코더로부터 출력되는 분포로부터 랜덤 샘플링된 값을 디코더로 복원함으로써 비선형 저차원 임베딩을 생성할 수 있다. 일 실시 예에 의하면, 전자 장치(1000)가 이용하는 변분 오토 인코더 및 디코더는 신경망 기반의 비선형 저차원 임베딩을 생성하는 네트워크 모델일 수 있다.In S330, the electronic device 1000 inputs the determined interaction effect into a variational auto encoder using L_1 normalization, and restores the randomly sampled value from the distribution output from the variational auto encoder to the decoder. Nonlinear low-dimensional embeddings can be created. According to one embodiment, the variational autoencoder and decoder used by the electronic device 1000 may be a network model that generates nonlinear low-dimensional embedding based on a neural network.

S340에서, 전자 장치(1000)는 저차원 임베딩을 기 설정된 예측 모형 모델에 입력함으로써 상기 예측 모형 모델로부터, 추정된 베타_j의 다중 조건 사후 확률 분포의 기대 값으로 정의되는 PRS(Polygenic Risk Score)를 상기 위험도 점수로 산출할 수 있다.In S340, the electronic device 1000 inputs the low-dimensional embedding into a preset prediction model model to obtain a Polygenic Risk Score (PRS) defined as the expected value of the multi-condition posterior probability distribution of the estimated beta_j from the prediction model model. can be calculated with the above risk score.

도 3에 도시된 전자 장치(1000)가 복수의 타깃 오믹스 데이터들 또는 상기 복수의 타깃 오믹스 데이터들 내 복수의 타깃 SNP들 각각의 유전자형과 가중치에 기초하여 위험도 점수를 산출하는 과정은 하기의 수학식 1에 기초하여 수행될 수 있다.The process by which the electronic device 1000 shown in FIG. 3 calculates a risk score based on a plurality of target omics data or the genotype and weight of each of a plurality of target SNPs in the plurality of target omics data is as follows. It can be performed based on Equation 1.

상기 수학식 1에서, PRS_i는 개인 i의 유전적 위험도 점수를 나타내고, i는 개인의 유전체 데이터를 구분하는 식별 번호, j는 상기 타깃 오믹스 데이터 또는 상기 타깃 오믹스 데이터의 SNP 데이터를 구분하기 위한 식별 번호, G_ij는 의 사전확률분포 프레임 워크로 대립유전자 종류 SNP_j에 대한 효과 크기(effect size)를 나타내며, 은 비선형 저차원 공간으로 정의된 의 확률 분포, 는 의 사전확률과 역학 정보를 포함한 다중조건부의 사후확률분포의 기대값, 는 의 사전확률 추정량, D는 역학 정보, N은 가우시안 분포, 는 유전적 측면에서 공유하는 전역적 스케일링 매개변수(shrinkage parameter)로 모델의 희박성(sparseness) 수준을 표시하고 제어하며, 는 상기 타깃 오믹스 데이터 또는 상기 타깃 오믹스 데이터의 SNP 데이터에 대한 L_1 정규화 축소 추정 파라미터를 나타낼 수 있다.In Equation 1, PRS _i represents the genetic risk score of individual i, i is an identification number that distinguishes the individual's genomic data, and j is the target omics data or SNP data of the target omics data. The identification number for, G _ij is The prior probability distribution framework represents the effect size for the allele type SNP _j , is defined as a nonlinear low-dimensional space. probability distribution of, Is Expected value of multi-conditional posterior probability distribution including prior probability and epidemiological information, Is is the prior probability estimator, D is the epidemiological information, N is the Gaussian distribution, is a global scaling parameter shared by the genetic aspect, indicating and controlling the sparsity level of the model. May represent an L_1 normalized reduction estimation parameter for the target omics data or SNP data of the target omics data.

보다 구체적으로, 본 개시에 따른 모델링 프레임 워크는 상술한 변수 G_ij를 통해 SNP 데이터의 효과 크기 의 사전 분포를 도입하도록 설계되며, 대립 유전자 종류 SNP_j에 대한 LOCAL Shrinkage parameter 는 적응적으로 큰 시그널(signal)은 유지함과 동시에 상대적으로 0에 가까운 노이즈(noise)에 강한 수축(shrinkage)을 부과하여 모델의 희박성 수준을 제어하는데 사용될 수 있다.More specifically, the modeling framework according to the present disclosure determines the effect size of SNP data through the above-described variable G _ij is designed to introduce a prior distribution of the LOCAL Shrinkage parameter for the allele type SNP _j Can be used to control the sparsity level of the model by adaptively maintaining large signals while imposing strong shrinkage on noise that is relatively close to 0.

도 4는 전자 장치가 GWAS 및 TWAS 데이터 베이스로부터 획득된 SNP 데이터들 및 miRNA 데이터들에 기초하여 예측 모델을 생성하는 과정을 나타내는 도면이다.FIG. 4 is a diagram illustrating a process in which an electronic device generates a prediction model based on SNP data and miRNA data obtained from GWAS and TWAS databases.

일 실시 예에 의하면, 전자 장치(1000)는 전자 장치와 연결된 외부 디바이스로부터 레퍼런스 데이터들(424) 또는 다양한 오믹스 데이터들을 획득할 수 있다. 예를 들어, 전자 장치(1000)는 전자 장치와 연결된 외부 디바이스(예컨대 서버)로부터 GWAS 데이터 베이스에 저장된 복수의 레퍼런스 SNP 데이터들을 획득하고, 획득된 SNP 데이터들을 디스커버리 셋(410)으로 설정할 수 있다. 일 실시 예에 의하면, 전자 장치(1000)가 획득한 디스커버리 셋(410)은 UK Biobank로부터 코호트 조사(Cohort study)에 따라 356000명에 대한 SNP 데이터들 및 일부 miRNA 데이터들을 포함할 수 있다. 일 실시 예에 의하면 디스커버리 셋(410)은 코호트 조사 결과에 따른, 당뇨병성 신증 발병위험과 관련된 SNP 데이터들을 포함할 수 있다.According to one embodiment, the electronic device 1000 may obtain reference data 424 or various omics data from an external device connected to the electronic device. For example, the electronic device 1000 may acquire a plurality of reference SNP data stored in a GWAS database from an external device (eg, a server) connected to the electronic device, and set the acquired SNP data as the discovery set 410. According to one embodiment, the discovery set 410 acquired by the electronic device 1000 may include SNP data and some miRNA data for 356,000 people according to a cohort study from UK Biobank. According to one embodiment, the discovery set 410 may include SNP data related to the risk of developing diabetic nephropathy according to the results of a cohort survey.

일 실시 예에 의하면, 전자 장치(1000)는 전자 장치와 연결된 외부 디바이스로부터 miRNA 데이터들을 획득할 수 있다. 예를 들어, 전자 장치(1000)는 전자 장치와 연결된 외부 디바이스로부터 TWAS 데이터 베이스에 저장된 복수의 miRNA 데이터들을 획득하고, 획득된 miRNA 데이터들을 트레이닝 셋(420)으로 설정할 수 있다. 일 실시 예에 의하면, 전자 장치(1000)가 획득한 트레이닝 셋(420)은 다양한 miRNA 데이터들을 저장하고 있는 데이터 베이스들(예컨대 GTEx EUR, MESA EUR, MESA AA+ Hispanics)로부터 획득되는 복수의 miRNA 데이터들을 포함할 수 있다. 또한, 일 실시 예에 의하면, 트레이닝 셋(420)은 코호트 조사 결과에 따라 당뇨병성 신증 발병에 관련된 복수의 miRNA 데이터들을 포함할 수 있다.According to one embodiment, the electronic device 1000 may acquire miRNA data from an external device connected to the electronic device. For example, the electronic device 1000 may acquire a plurality of miRNA data stored in the TWAS database from an external device connected to the electronic device, and set the acquired miRNA data as the training set 420. According to one embodiment, the training set 420 acquired by the electronic device 1000 includes a plurality of miRNA data obtained from databases (e.g., GTEx EUR, MESA EUR, MESA AA+ Hispanics) that store various miRNA data. It can be included. Additionally, according to one embodiment, the training set 420 may include a plurality of miRNA data related to the onset of diabetic nephropathy according to the results of the cohort survey.

전자 장치(1000)는 외부 디바이스로부터 획득한, 당뇨병성 신증 발병에 관련된 SNP 데이터들 및 miRNA 데이터들 중 일부를 학습 데이터셋(예컨대 트레이닝 데이터셋)으로 설정하고, 나머지 일부를 테스트 데이터 셋(예컨대 검증 셋)으로 설정할 수 있다. 일 실시 예에 의하면, 전자 장치(1000)가 이용하는 학습 데이터 셋 및 테스트 데이터 셋의 비율은 7:3 일 수 있으나, 이에 한정되는 것은 아니며, 전자 장치가 최종적으로 생성한 예측 모델의 검증 결과에 기초하여 재설정될 수 있다.The electronic device 1000 sets some of the SNP data and miRNA data related to the onset of diabetic nephropathy obtained from an external device as a learning dataset (e.g., training dataset), and sets the remaining portion as a test data set (e.g., verification). can be set to three). According to one embodiment, the ratio of the learning data set and the test data set used by the electronic device 1000 may be 7:3, but is not limited to this, and is based on the verification result of the prediction model finally generated by the electronic device. It can be reset.

전자 장치(1000)는 외부 디바이스로부터 획득한 SNP 데이터들 및 miRNA 데이터들을 포함하는 오믹스 데이터에서 미리 설정된 레퍼런스 오믹스 데이터와 동일한 것으로 식별되는 타깃 오믹스 데이터를 포함하는 타깃 데이터 셋(430)을 결정할 수 있다. 일 실시 예에 의하면 타깃 오믹스 데이터는 타깃 SNP 데이터들 및 타깃 miRNA 데이터들을 포함할 수 있다.The electronic device 1000 determines a target data set 430 including target omics data identified as being the same as preset reference omics data from omics data including SNP data and miRNA data acquired from an external device. You can. According to one embodiment, target omics data may include target SNP data and target miRNA data.

전자 장치(1000)는 외부 디바이스로부터 획득된 SNP 데이터들의 일부를 학습 데이터 셋으로 설정하고, 설정된 학습 데이터 셋에 기초하여 예측 모델(462, 464)을 생성할 수 있다. 예를 들어, S440에서, 전자 장치(1000)는 SNP 데이터들을 포함하는 학습 데이터 셋에 기초하여 당뇨병성 신증 발병 위험도에 관한 PRS 점수를 출력하는 예측 모델(462)을 생성할 수 있다. 또한, S450에서, 전자 장치(1000)는 외부 디바이스로부터 획득된 miRNA 데이터들의 일부를 학습 데이터 셋으로 설정하고, 설정된 학습 데이터 셋에 기초하여, 당뇨병성 신증 발병 위험도에 관한 PRS 점수를 출력하는 예측 모델(464)을 생성할 수 있다.The electronic device 1000 may set some of the SNP data acquired from an external device as a learning data set and generate prediction models 462 and 464 based on the set learning data set. For example, in S440, the electronic device 1000 may generate a prediction model 462 that outputs a PRS score related to the risk of developing diabetic nephropathy based on a learning data set including SNP data. In addition, in S450, the electronic device 1000 sets some of the miRNA data acquired from an external device as a learning data set, and based on the set learning data set, a prediction model that outputs a PRS score related to the risk of developing diabetic nephropathy. (464) can be generated.

도 4에는 전자 장치(1000)가, 당뇨병성 신증 발병과 관련된 SNP 데이터들을 포함하는 학습 데이터 셋 및 당뇨병성 신증 발병과 관련된 miRNA 데이터들을 포함하는 학습 데이터 각각에 기초하여 예측 모델(462) 및 예측 모델(464)을 각각 생성하는 것으로 도시되었으나, 또 다른 실시 예에 의하면, 전자 장치(1000)는 SNP 데이터들 및 miRNA 데이터를 포함하는 오믹스 데이터들에 기초하여 PRS 점수를 출력하는 하나의 예측 모델을 생성할 수 있음은 물론이다.In Figure 4, the electronic device 1000 is a prediction model 462 and a prediction model based on each of the learning data set including SNP data related to the onset of diabetic nephropathy and the learning data including miRNA data related to the onset of diabetic nephropathy. (464) is shown to generate each, but according to another embodiment, the electronic device 1000 generates a prediction model that outputs a PRS score based on omics data including SNP data and miRNA data. Of course, it can be created.

또한, 일 실시 예에 의하면, 전자 장치(1000)는 SNP 데이터들 중 일부로 마련되는 테스트 데이터 셋(예컨데 검증 데이터 셋)에 기초하여 예측 모델(462)을 검증한 결과와, miRNA 데이터들 중 일부로 마련되는 테스트 데이터셋(예컨대 검증 데이터 셋)에 기초하여 예측 모델(464)을 검증한 결과에 기초하여, 더 높은 성능을 나타내는 예측 모델의 출력 값에 더 높은 가중치를 적용하고, 검증 결과 상대적으로 낮은 성능을 나타내는 예측 모델의 출력 값에 상대적으로 낮은 가중치를 적용하는 가중합 방식으로, 예측 모델(462) 및 예측 모델(464)의 출력 값을 가중합하고, 가중합 결과에 따른 PRS 점수를 위험도 점수로 식별할 수도 있다.In addition, according to one embodiment, the electronic device 1000 provides a result of verifying the prediction model 462 based on a test data set (e.g., verification data set) provided as part of the SNP data and a result of verifying the prediction model 462 as part of the miRNA data. Based on the results of verifying the prediction model 464 based on a test data set (e.g., verification data set), higher weight is applied to the output value of the prediction model showing higher performance, and the verification result shows relatively lower performance. In a weighted sum method that applies a relatively low weight to the output value of the prediction model representing, the output values of the prediction model 462 and 464 are weighted and the PRS score according to the weighted sum result is identified as the risk score You may.

도 5는 전자 장치가 개인의 특성에 따라 서브 그룹핑된 유전체 데이터들에 기초하여 예측 모델을 생성하는 과정과 예측 모델의 생성 결과에 기초하여 기간 별 질병 발병 확률을 생성하는 Cox 비례 위험 생존 분석 모델을 활용하는 과정을 나타내는 도면이다.Figure 5 shows the process by which an electronic device generates a prediction model based on genomic data subgrouped according to individual characteristics, and the Cox proportional hazard survival analysis model that generates the probability of disease onset by period based on the generation result of the prediction model. This is a drawing showing the process of use.

본 개시에 따른 전자 장치(1000)는 임상 샘플 획득 대상이 되는 개인의 사용자 특성 별로 오믹스 데이터를 서브 그룹핑 할 수 있다. 일 실시 예에 의하면 사용자 특성 정보는 사용자 키, 몸무게, BMI 지수, 소변 검사 결과에 따른 질환 유무, 성별, 흡연 유무, 기타 질환 유무에 대한 정보를 포함할 수 있다. 사용자 특성 정보는 사용자 유형을 결정하는 기준이 될 수 있다. 전자 장치(1000)는 임상 샘플 획득 대상이 되는 개인의 사용자 특성 정보 별로 유전체 데이터를 서브 그룹핑 하고, 서브 그룹핑된 유전체 데이터들을, 서브 그룹 별로 생성되는 예측 모델에 입력함으로써, 사용자 특성 정보 별로 학습된 예측 모델 각각으로부터 PRS 점수를 획득할 수 있다.The electronic device 1000 according to the present disclosure can subgroup omics data according to user characteristics of individuals subject to clinical sample acquisition. According to one embodiment, the user characteristic information may include information about the user's height, weight, BMI index, presence of disease according to urine test results, gender, presence of smoking, and presence of other diseases. User characteristic information can be a standard for determining user type. The electronic device 1000 subgroups genomic data according to the user characteristic information of the individual subject to clinical sample acquisition, and inputs the subgrouped genomic data into a prediction model generated for each subgroup, thereby making predictions learned for each user characteristic information. PRS scores can be obtained from each model.

보다 상세하게는, 전자 장치(1000)는 임상 샘플의 유전체 데이터와 함께 해당 사용자 식별 정보를 획득할 수 있다. 전자 장치(1000)는 사용자 식별 정보에 기초하여, 임상 샘플에 대응되는 개인의 의료 데이터상에 존재하는 임상 샘플에 대응되는 사용자의 개인 특성 정보(예컨대 흡연 유무, 성별 정보, 키 정보, 몸무게 정보, BMI 지수 등)식별할 수 있다. 일 실시 예에 의하면 전자 장치(1000)는 사용자 식별 정보에 기초하여, 사용자 특성 정보에 따른 사용자 유형을 구분할 수도 있다.More specifically, the electronic device 1000 may obtain the corresponding user identification information along with the genomic data of the clinical sample. Based on the user identification information, the electronic device 1000 provides personal characteristic information (e.g., smoking status, gender information, height information, weight information, BMI index, etc.) can be identified. According to one embodiment, the electronic device 1000 may distinguish user types according to user characteristic information based on user identification information.

전자 장치(1000)는 사용자 특성 정보가 식별되면, 사용자 특성 정보에 따라, 임상 샘플의 유전체 데이터를 서브 그룹핑할 수 있다. 또 다른 실시 예에 의하면, 전자 장치(1000)는 사용자 특성 정보가 식별되면, 사용자 특성 별로, 임상 샘플의 유전체 데이터로부터 식별되는 복수의 타깃 오믹스 데이터들을 분류(예컨대 서브 그룹핑)할 수 있다.When user characteristic information is identified, the electronic device 1000 may subgroup the genomic data of the clinical sample according to the user characteristic information. According to another embodiment, when user characteristic information is identified, the electronic device 1000 may classify (eg, subgroup) a plurality of target omics data identified from the genomic data of clinical samples according to user characteristics.

전자 장치(1000)는 서브 그룹핑된 유전체 데이터(예컨대 서브 그룹핑된 타깃 오믹스 데이터들)를, 서브 그룹 별로 생성되는 예측 모델에 입력함으로써, 사용자 특성 정보 별로 학습된 예측 모델 각각으로부터 획득되는 PRS 점수를 종합하고, 종합 결과에 기초하여 사용자의 당뇨병성 신증 발병 위험도를 정확하게 예측할 수 있다. The electronic device 1000 inputs subgrouped genomic data (e.g., subgrouped target omics data) into a prediction model generated for each subgroup, thereby calculating the PRS score obtained from each prediction model learned for each user characteristic information. Comprehensively, based on the comprehensive results, the user's risk of developing diabetic nephropathy can be accurately predicted.

전자 장치(1000)는 현재 획득된 임상 샘플의 유전체 데이터 및 이에 대응되는 사용자 식별 정보에 기초하여 결정되는 사용자 유형에 기초하여, 사용자의 다양한 개인 특성에 특화 학습된 예측 모델의 결과 값들에 서로 다른 가중치를 적용할 수 있고, 서로 다른 가중치에 따라 예측 모델들의 결과를 가중합함으로써 최종적으로 사용자에 대한 당뇨병성 신증 발병 위험도를 결정할 수도 있다. 즉, 본 개시에 따른 전자 장치(1000)는 사용자의 다양한 개인 특성에 특화 학습된 예측 모델들을 함께 이용함으로써 보다 정확한 당뇨병성 신증 발병 위험을 효과적으로 진단할 수 있다.The electronic device 1000 applies different weights to the result values of the prediction model learned specifically for various personal characteristics of the user, based on the user type determined based on the genomic data of the currently acquired clinical sample and the corresponding user identification information. can be applied, and the final risk of developing diabetic nephropathy for the user can be determined by weighting and summing the results of the prediction models according to different weights. That is, the electronic device 1000 according to the present disclosure can effectively diagnose the risk of developing diabetic nephropathy more accurately by using prediction models specifically learned for various personal characteristics of the user.

이하에서는, 상술한 사용자 개인의 특성 별 특화된 예측 모델을 생성하기 위해, 전자 장치(1000)가 각 예측 모델을 학습 또는 생성하는 과정을 설명하기로 한다. 예를 들어, 전자 장치(1000)는 외부 디바이스로부터 당뇨병성 신증 발병 위험도에 관련된 SNP 데이터들 및 miRNA 데이터들을 포함하는 오믹스 데이터들(예컨대 오믹스 학습 데이터)을 획득할 수 있다. S510에서, 전자 장치(1000)는 획득된 오믹스 데이터들을 사용자 개인의 특성에 기초하여 서브 그룹핑할 수 있다.Below, a process in which the electronic device 1000 learns or generates each prediction model in order to generate a prediction model specialized for each user's individual characteristics described above will be described. For example, the electronic device 1000 may acquire omics data (eg, omics learning data) including SNP data and miRNA data related to the risk of developing diabetic nephropathy from an external device. In S510, the electronic device 1000 may subgroup the acquired omics data based on the user's individual characteristics.

S520에서, 전자 장치(1000)는 획득된 오믹스 데이터에 대한 메타 분석을 수행함으로써 복수의 예측 모델을 생성할 수 있다. 일 실시 예에 의하면, 전자 장치(1000)는 획득된 오믹스 데이터들에 대해 K-fold 교차 검증을 수행함으로써 복수의 예측 모델을 생성할 수 있다. 예를 들어, S522에서, 전자 장치(1000)는 K=1로 설정하고, 서브 그룹핑된 오믹스 데이터 중 1번째 사용자 특성 항목에 따른 오믹스 데이터에 기초하여, 당뇨병성 신증 발병 위험도를 예측하는 제1 예측 모델을 생성할 수 있다. In S520, the electronic device 1000 may generate a plurality of prediction models by performing meta-analysis on the acquired omics data. According to one embodiment, the electronic device 1000 may generate a plurality of prediction models by performing K-fold cross-validation on the acquired omics data. For example, in S522, the electronic device 1000 sets K = 1 and predicts the risk of developing diabetic nephropathy based on omics data according to the first user characteristic item among the subgrouped omics data. 1 A prediction model can be created.

마찬가지 방법으로, S524에서, 전자 장치(1000)는 서브 그룹핑된 오믹스 데이터 중 2번째 사용자 특성 항목에 따른 오믹스 데이터에 기초하여 당뇨병성 신증 발병 위험도를 예측하는 제2 예측 모델을 생성할 수 있다. 상술한 방식을 반복함으로써 전자 장치(1000)는 사용자 개인 특성에 특화 학습됨으로써, 당뇨병성 신증 발병 위험도 점수를 출력하는 복수의 예측 모델들을 생성할 수 있다. 전자 장치(1000)는 생성된 복수의 예측 모델에, 진단 대상이 되는 개인의 임상 샘플의 유전체 데이터로부터 타깃 오믹스 데이터들을 입력함으로써, 진단 대상이 되는 개인의 당뇨병성 신증 발병 위험도를 판단한다.In the same way, in S524, the electronic device 1000 may generate a second prediction model that predicts the risk of developing diabetic nephropathy based on omics data according to the second user characteristic item among the subgrouped omics data. . By repeating the above-described method, the electronic device 1000 is trained specifically for the user's individual characteristics, thereby generating a plurality of prediction models that output a risk score for developing diabetic nephropathy. The electronic device 1000 determines the risk of developing diabetic nephropathy of the individual to be diagnosed by inputting target omics data from genomic data of clinical samples of the individual to be diagnosed into the plurality of generated prediction models.

상술한 과정에 따라 전자 장치(1000)가 사용자 개인의 특성을 반영함으로써 당뇨병성 신증 발병 위험도를 산정하더라도, 해당 위험도 점수는 개인의 특성에 따른 당뇨병성 신증 발병 위험확률에 대한 정보만을 나타낼 뿐, 특정 기간 또는 시기에 따른 당뇨병성 신증 발병 위험률을 나타내지 못하는 한계가 있다. Even if the electronic device 1000 calculates the risk of developing diabetic nephropathy by reflecting the user's individual characteristics according to the above-described process, the risk score only represents information about the risk probability of developing diabetic nephropathy according to the individual's characteristics, and There is a limitation in not being able to indicate the risk of developing diabetic nephropathy according to period or period.

따라서, 본 개시에 따른 전자 장치(1000)는 Cox 비례 위험 생존 분석 모델에 서브 그룹핑된 유전체 데이터(예컨대 서브 그룹핑된 오믹스 데이터) 및 상기 개인의 특성 별로 특화 학습된 예측 모델들의 결과 값을 입력함으로써, 사용자 개인 특성에 따른 사용자 유형(532) 별 및 특정 기간(534)별 당뇨병성 신증 발병 위험도를 결정할 수도 있다.Therefore, the electronic device 1000 according to the present disclosure inputs subgrouped genomic data (e.g., subgrouped omics data) and result values of prediction models specifically learned for each individual's characteristics into the Cox proportional hazard survival analysis model. , it is also possible to determine the risk of developing diabetic nephropathy by user type 532 and specific period 534 according to the user's personal characteristics.

예를 들어, 본 개시에 따른 전자 장치(1000)는 외부 디바이스로부터 당뇨병성 신증 발병 위험도에 관련된 SNP 데이터들 및 miRNA 데이터들을 포함하는 오믹스 데이터들을 사용자 개인 특성 별로 서브 그룹핑함으로써, 서브 그룹 별 예측 모델 생성을 위한 학습 데이터 셋 및 검증 데이터 셋을 생성할 수 있다. 전자 장치(1000)는 서브 그룹 별 학습 데이터 셋 및 검증 데이터셋과, 상기 개인의 특성 별로 특화 학습된 예측 모델의 결과 값에 기초하여, 서브 그룹핑된 유전체 데이터가 입력되면 사용자 유형(532) 별 및 특정 기간(534) 별 당뇨병성 신증 발병 위험도를 출력하는 Cox 비례 위험 생존 분석 모델을 학습시킬 수 있다. For example, the electronic device 1000 according to the present disclosure subgroups omics data including SNP data and miRNA data related to the risk of developing diabetic nephropathy from an external device by user personal characteristics, thereby creating a prediction model for each subgroup. You can create learning data sets and verification data sets for creation. The electronic device 1000 receives subgrouped genomic data based on the training data set and verification data set for each subgroup and the result of a prediction model specifically learned for each individual's characteristics. When subgrouped genomic data is input, the user type 532 and A Cox proportional hazards survival analysis model that outputs the risk of developing diabetic nephropathy for a specific period (534) can be trained.

전자 장치(1000)는 임상 샘플의 유전체 데이터가 획득되면, 획득된 임상 샘플의 유전체 데이터를 서브 그룹핑함으로써, 서브 그룹핑된 유전체 데이터를 개인 특성 별로 특화 학습된 예측 모델들에 입력하고, 학습된 예측 모델의 출력 값과 상기 서브 그룹핑된 유전체 데이터를 상기 학습된 Cox 비례 위험 생존 분석 모델에 입력함으로써, 사용자 유형별(532) 및 특정 기간(534) 별 당뇨병성 신증 발병 위험도 점수를 획득할 수 있다. 따라서, 본 개시에 따른 전자 장치(1000)는 대상 임상 샘플의 유전체 데이터에 대해 사용자 개인 특성에 따른 사용자 유형(532) 정도를 식별할 수 있을 뿐만 아니라, 사용자 개인 특성에 따른 사용자 유형 및 특정 기간에 따른 당뇨병성 신증 발병 위험도를 효과적으로 진단할 수 있다.When genomic data of a clinical sample is acquired, the electronic device 1000 subgroups the genomic data of the obtained clinical sample, inputs the subgrouped genomic data into prediction models specifically learned for each individual characteristic, and inputs the learned prediction model. By inputting the output value of and the subgrouped genomic data into the learned Cox proportional hazard survival analysis model, a risk score for developing diabetic nephropathy can be obtained for each user type (532) and specific period (534). Therefore, the electronic device 1000 according to the present disclosure can not only identify the user type 532 according to the user's personal characteristics with respect to the genomic data of the target clinical sample, but also identify the user type 532 according to the user's personal characteristics and in a specific period. The risk of developing diabetic nephropathy can be effectively diagnosed.

도 6은 전자 장치가 레퍼런스 SNP 데이터들을 포함하는 레퍼런스 오믹스 데이터를 선별한 결과를 나타내는 도면이다.Figure 6 is a diagram showing the results of an electronic device selecting reference omics data including reference SNP data.

도 6의 그림 (620)을 참조하면 전자 장치(1000)가 당뇨병성 신증 발병과 관련하여 선별한 레퍼런스 데이터들이 도시된다. 예를 들어, 전자 장치(1000)는 GWAS 데이터 베이스를 통해 SNP 데이터를 획득하고, TWAS 데이터 베이스를 통해 miRNA 데이터들을 획득할 수 있다. 전자 장치(1000)는 SNP 데이터들, miRNA 데이터들 또는 SNP 데이터들 및 miRNA 데이터들을 포함하는 오믹스 데이터에 대한 위험도 점수를 산출하는 모델을 다양한 방식으로 생성할 수 있다.Referring to figure 620 of FIG. 6 , reference data selected by the electronic device 1000 in relation to the onset of diabetic nephropathy is shown. For example, the electronic device 1000 may acquire SNP data through a GWAS database and acquire miRNA data through a TWAS database. The electronic device 1000 may generate a model that calculates a risk score for SNP data, miRNA data, or omics data including SNP data and miRNA data in various ways.

일 실시 예에 의하면, 유전적 위험도 점수의 정확도는 SNP의 효과 정도를 예측하는데 사용한 샘플 집단의 크기에 영향을 많이 받는다. 전자 장치는 GWAS 결과를 활용하기 위해 공공 데이터베이스로부터 Summary statistics 자료를 확보하고 아래의 필수 정보 포함 여부를 확인할 수 있다. 상기 필수 정보는 chromosome, position, allele 정보(effect, other allele), effect(beta), standard error, sample size 및 p-value 를 포함할 수 있다.According to one embodiment, the accuracy of the genetic risk score is greatly affected by the size of the sample population used to predict the degree of effect of the SNP. In order to use GWAS results, the electronic device can obtain summary statistics data from public databases and check whether the required information below is included. The essential information may include chromosome, position, allele information (effect, other allele), effect (beta), standard error, sample size, and p-value.

일 실시 예에 의하면, 전자 장치는 독립된 연구 집단에서 수행한 GWAS 결과가 없거나, 연구 집단의 특이성 등을 고려하여 한국인 인구 집단에의 적용이 제한되는 경우 10-fold LOGO(Leave One Group Out) 메타분석을 수행할 수 있다. 이때 전자 장치는 여러 집단으로 나누어 분석함으로써 가상의 독립된 연구집단 GWAS 결과를 활용하는 효과를 적용할 수 있다. According to one embodiment, when the electronic device does not have GWAS results conducted by an independent research group or its application to the Korean population is limited considering the specificity of the research group, a 10-fold LOGO (Leave One Group Out) meta-analysis is performed. can be performed. At this time, the effect of utilizing GWAS results from a virtual independent research group can be applied by dividing electronic devices into several groups and analyzing them.

이하에서는, 전자 장치에 대한 연구자의 입력에 기초하여, 전자 장치가 선정한 SNP를 이용한 유전적 위험도 모델 구축을 위한 구체적인 마커 선정 과정을 설명하기로 한다.Below, we will explain the specific marker selection process for building a genetic risk model using SNPs selected by the electronic device based on the researcher's input to the electronic device.

공공 데이터베이스로 GWAS catalog (https://www.ebi.ac.uk/gwas/) 데이터베이스를 활용하되, 연구 집단의 크기가 5000명 이상인 논문에서 발굴된 SNP를 1차적으로 소스 SNP 정보로 선발하였다. 또는, 소스 SNP 정보 선별을 위한 연구 집단 크기의 기준값(cut-off)은 다른 값이 사용될 수도 있다. 컴퓨터 장치를 이용하여 1차 발굴된 소스 SNP 정보에서 연관 불균형(Linkage Disequilibrium; LD) 값 및 shrinkage factor(예컨대 parameter)를 이용하여 보정된 effect size 기준으로 SNP들을 필터링한다. 상기 필터링 과정은 상기 수학식 1에 의해 수행될 수 있다. The GWAS catalog (https://www.ebi.ac.uk/gwas/) database was used as a public database, and SNPs discovered in papers with a study group size of more than 5,000 people were primarily selected as source SNP information. Alternatively, a different value may be used as the cut-off value of the study group size for selecting source SNP information. SNPs are filtered based on the effect size corrected using Linkage Disequilibrium (LD) value and shrinkage factor (e.g. parameter) from the source SNP information first discovered using a computer device. The filtering process can be performed by Equation 1 above.

도 6의 그림 (620)에 도시된 항목을 구체적으로 설명하면, SNP(단일 뉴클레오티드 다형성)(602)는 DNA 염기서열에서 하나의 염기서열(A,T,G,C)의 차이를 보이는 유전적 변이 항목을 의미하고, Risk Allele(위험대립유전자)(603)은 주어진 인구집단에서 발병 위험을 증가시키는 대립인자의 DNA서열을 의미하며, Non-risk Allele(주요대립유전자)항목(604)은 주어진 인구집단에서 발병 위험을 증가시키지 않는 대립인자의 DNA서열을 의미하고, Effect size(효과 크기)항목(605)은, 형질의 유전적 변이에 대한 SNP의 기여도를 나타내며, P-value 항목(606)은 주요 대립유전자와 부 대립유전자가 위험 대립 유전자 일 때의 연관성 검정의 통계적 유의확률을 나타내고, Major Allele(주요 대립유전자)항목(607)은 주어진 인구 모집단에서 가장 흔한 대립인자의 DNA서열을 의미하고, Minor Allele(부 대립유전자)항목(608)은, 주어진 인구 모집단에서 두 번째로 흔한 대립인자의 DNA서열을 의미하며, Minor Allele Frequency(부 대립유전자 빈도) 항목(609)은, 주어진 모집단에서 두 번째로 흔한 대립 유전자가 발생하는 빈도를 의미하고, Mapped Gene 항목(610)은, 염색체의 상대적 위치에 식별된 유전자를 의미할 수 있다.To specifically explain the item shown in figure 620 of FIG. 6, SNP (single nucleotide polymorphism) 602 is a genetic gene that shows a difference in one base sequence (A, T, G, C) in the DNA base sequence. It refers to a mutation item, and the Risk Allele (603) refers to the DNA sequence of an allele that increases the risk of developing disease in a given population, and the Non-risk Allele (604) refers to the given It refers to the DNA sequence of an allele that does not increase the risk of disease in the population, the Effect size item (605) indicates the contribution of the SNP to the genetic variation of the trait, and the P-value item (606) represents the statistical significance probability of the association test when the major allele and minor allele are risk alleles, and the Major Allele item (607) refers to the DNA sequence of the most common allele in a given population. , Minor Allele item 608 refers to the DNA sequence of the second most common allele in a given population, and Minor Allele Frequency item 609 refers to the DNA sequence of the second most common allele in a given population. It refers to the frequency at which the most common allele occurs, and the Mapped Gene item (610) may refer to a gene identified in the relative position of the chromosome.

도 7은 전자 장치가 이용하는 당뇨병성 신증 위험도분석모델을 검증한 ROC커브에 대한 실시 예를 나타내는 도면이다.Figure 7 is a diagram showing an example of the ROC curve that verifies the diabetic nephropathy risk analysis model used by the electronic device.

도 7에 도시된 바와 같이, 전자 장치는 ROC 커브를 이용하여 전자 장치(1000)가 당뇨병성 신증 발병의 위험도를 분석하기 위해 이용한 예측 모델의 성능을 검증할 수 있다. 일 실시 예에 의하면, 전자 장치(1000)가 이용한 베이지안 뉴럴 네트워크 모델의 성능 검증 결과, 민감도(sensitivity) 및 특이도(specificity)에 대한 0.797의 AUC 값을 달성하였고, 전자 장치(1000)가 이용한 회귀 모델의 경우 0.656의 AUC 값을 달성하였다.As shown in FIG. 7, the electronic device can use the ROC curve to verify the performance of the prediction model used by the electronic device 1000 to analyze the risk of developing diabetic nephropathy. According to one embodiment, as a result of verifying the performance of the Bayesian neural network model used by the electronic device 1000, an AUC value of 0.797 for sensitivity and specificity was achieved, and the regression used by the electronic device 1000 was achieved. For the model, an AUC value of 0.656 was achieved.

본 개시에 따른 전자 장치(1000)는 당뇨병성 신증 발병 위험도를 예측하기 위해 이용하는 모델의 성능을 검증하고, 검증 결과에 기초하여 모델로부터 출력된 데이터 활용 여부를 결정할 수도 있다. 또한, 일 실시 예에 의하면, 전자 장치(1000)는 전자 장치가 학습한 당뇨병성 신증 위험도 분석 모델의 검증 결과에 기초하여, 당뇨병성 신증 위험도 분석 모델을 수정 및 갱신함으로써 재 학습시킬 수도 있다.The electronic device 1000 according to the present disclosure may verify the performance of a model used to predict the risk of developing diabetic nephropathy and determine whether to use data output from the model based on the verification result. Additionally, according to one embodiment, the electronic device 1000 may re-learn the diabetic nephropathy risk analysis model by modifying and updating it based on the verification results of the diabetic nephropathy risk analysis model learned by the electronic device.

도 8은 일 실시 예에 따른 당뇨병성 신증 위험도를 분석하는 전자 장치 및 이를 포함하는 시스템의 구조를 설명하기 위한 도면이다.FIG. 8 is a diagram illustrating the structure of an electronic device for analyzing the risk of diabetic nephropathy and a system including the same according to an embodiment.

일 실시 예에 의하면, 당뇨병성 신증 발병 위험도를 평가하는 시스템(10)은 제어 장치(200), 입력 장치(210, 230), 메모리(220), 연산 장치(240) 및 네트워크 인터페이스(250)를 포함할 수 있다. 그러나 상술한 예에 한정되는 것은 아니며, 시스템은 도 8에 도시된 구성 보다 많은 구성들을 포함할 수도 있고, 더 적은 구성 요소로 마련될 수 있다. 또한, 일 실시 예에 의하면, 도 8에 도시된 당뇨병성 신증 발병 위험도를 평가하는 시스템(10)의 구성은 본 개시에 따른 전자 장치(1000)의 구성에 대응될 수도 있다.According to one embodiment, the system 10 for evaluating the risk of developing diabetic nephropathy includes a control device 200, input devices 210 and 230, memory 220, computing device 240, and network interface 250. It can be included. However, it is not limited to the above-described example, and the system may include more components than those shown in FIG. 8, or may be prepared with fewer components. Additionally, according to one embodiment, the configuration of the system 10 for evaluating the risk of developing diabetic nephropathy shown in FIG. 8 may correspond to the configuration of the electronic device 1000 according to the present disclosure.

일 실시 예에 의하면, 당뇨병성 신증 발병 위험도를 평가하는 시스템(10) 및 전자 장치(1000)는 도 8에 도시된 구성에 더하여, 메모리(220)에 저장된, 당뇨병성 신증 위험도를 분석하는데 사용되는 하나 이상의 인스트럭션을 실행하는 적어도 하나의 프로세서를 더 포함할 수도 있다. 일 실시 예에 의하면, 프로세서는 도 8에 도시된 제어 장치(200)의 구성에 포함될 수도 있다.According to one embodiment, the system 10 and the electronic device 1000 for evaluating the risk of developing diabetic nephropathy include, in addition to the configuration shown in FIG. 8, a system used to analyze the risk of diabetic nephropathy stored in the memory 220. It may further include at least one processor executing one or more instructions. According to one embodiment, a processor may be included in the configuration of the control device 200 shown in FIG. 8.

입력 장치(210, 230)는 임상 샘플의 유전체 데이터를 입력받을 수 있다. 일 실시 예에 의하면, 입력 장치(210, 230)는 사용자 인터페이스 또는 입력 인터페이스에 대응될 수 있다. 일 실시 예에 따른 당뇨병성 신증 발병 위험도를 평가하는 시스템(10)은 입력 장치(210, 230)를 이용하여 특정 임상 샘플의 유전체 데이터를 획득한다.The input devices 210 and 230 may receive genomic data of clinical samples. According to one embodiment, the input devices 210 and 230 may correspond to a user interface or an input interface. The system 10 for evaluating the risk of developing diabetic nephropathy according to an embodiment acquires genomic data of a specific clinical sample using input devices 210 and 230.

보다 상세하게는, 입력 장치(210, 230)는 사용자가 시스템 또는 전자 장치를 제어하기 위한 시퀀스를 입력하는 수단을 의미할 수도 있다. 예를 들어, 사용자 입력 인터페이스(미도시)에는 키 패드(key pad), 돔 스위치 (dome switch), 터치 패드(접촉식 정전 용량 방식, 압력식 저항막 방식, 적외선 감지 방식, 표면 초음파 전도 방식, 적분식 장력 측정 방식, 피에조 효과 방식 등), 조그 휠, 조그 스위치 등이 있을 수 있으나 이에 한정되는 것은 아니다. More specifically, the input devices 210 and 230 may refer to means through which a user inputs a sequence to control a system or electronic device. For example, the user input interface (not shown) includes a key pad, a dome switch, and a touch pad (contact capacitive type, pressure-type resistive type, infrared detection type, surface ultrasonic conduction type, Integral tension measurement method, piezo effect method, etc.), jog wheel, jog switch, etc., but are not limited to these.

또한, 일 실시 예에 의하면, 사용자 입력 인터페이스(미도시)는 전자 장치 또는 시스템이 디스플레이 상에 출력한 화면에 대한 사용자의 입력 시퀀스를 수신할 수 있다. 또한, 사용자 입력 인터페이스(미도시)는 디스플레이를 터치하는 사용자의 터치 입력 또는 디스플레이상 그래픽 사용자 인터페이스를 통한 키 입력을 수신할 수도 있다.Additionally, according to one embodiment, a user input interface (not shown) may receive a user's input sequence for a screen output by an electronic device or system on a display. Additionally, a user input interface (not shown) may receive a touch input from a user touching the display or a key input through a graphical user interface on the display.

메모리(220)는 하나 이상의 인스트럭션을 저장할 수 있다. 예를 들어, 메모리(220)에 저장된 하나 이상의 인스트럭션은 제어 장치(200)에 의해 수행됨으로써 당뇨병성 신증 발병 위험도를 예측하는 방법을 수행하는데 사용될 수 있다. 또한, 메모리(220)는, 프로세서의 처리 및 제어를 위한 프로그램 외에, 시스템 또는 전자 장치로 입력되거나 전자 장치로부터 출력되는 데이터를 저장할 수도 있다.Memory 220 may store one or more instructions. For example, one or more instructions stored in the memory 220 may be used to perform a method of predicting the risk of developing diabetic nephropathy by being performed by the control device 200. Additionally, the memory 220 may store data input to or output from a system or electronic device, in addition to programs for processing and control of the processor.

또한, 일 실시 예에 의하면, 메모리(220)는 레퍼런스 오믹스 데이터, 레퍼런스 SNP(Single Nucleotide Polymorphism) 데이터, 임상 샘플의 유전체 데이터, 복수의 타깃 오믹스 데이터 또는 복수의 타깃 SNP 데이터들을 저장할 수도 있다. 또한, 메모리(220)는 전자 장치 또는 시스템이 이용하는 인공 지능 모델에 대한 정보를 저장할 수도 있다. 일 실시 예에 의하면, 메모리(220)는 전자 장치가 인공 지능 모델을 학습시키는데 이용하는 학습 데이터 정보를 더 저장할 수도 있고, 인공 지능 모델에 대한 파라미터 정보를 더 저장할 수도 있다. Additionally, according to one embodiment, the memory 220 may store reference omics data, reference SNP (Single Nucleotide Polymorphism) data, genomic data of clinical samples, plural target omics data, or plural target SNP data. Additionally, the memory 220 may store information about an artificial intelligence model used by an electronic device or system. According to one embodiment, the memory 220 may further store training data information used by the electronic device to train an artificial intelligence model, and may further store parameter information for the artificial intelligence model.

예를 들어, 메모리(220)는 학습된 신경망뿐만 아니라, 이미 생성된 신경망에 기초한 모델들이 수정되는 경우, 수정된 모델들의 레이어들, 레이어들 간의 가중치에 관한 정보를 더 저장할 수 있다. For example, when models based on not only the learned neural network but also the already created neural network are modified, the memory 220 may further store information about the layers of the modified models and the weights between the layers.

메모리(220)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(RAM, Random Access Memory) SRAM(Static Random Access Memory), 롬(ROM, Read-Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. The memory 220 is a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (for example, SD or XD memory, etc.), and RAM. (RAM, Random Access Memory) SRAM (Static Random Access Memory), ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), magnetic memory, magnetic disk , and may include at least one type of storage medium among optical disks.

연산 장치(240)는 임상 샘플의 유전체 데이터에서, 레퍼런스 SNP 데이터 또는 레퍼런스 오믹스 데이터를 기준으로 복수의 타깃 오믹스 데이터들 또는 복수의 타깃 SNP 데이터들을 선별하고, 메모리에 저장된 하나 이상의 인스트럭션들을 실행함으로써, 상기 복수의 타깃 오믹스 데이터들 또는 복수의 타깃 SNP데이터들 각각의 유전자형 및 각각에 대한 effect size를 이용하여 상기 임상 샘플에 대한 당뇨병성 신증 발병 위험도 점수를 산출할 수 있다. 또한, 연산 장치(240)는 환자의 역학정보나 기타 진단 정보 등의 공변량 데이터와의 연계분석 과정을 수행할 수 있다.The computing device 240 selects a plurality of target omics data or a plurality of target SNP data based on the reference SNP data or reference omics data from the genomic data of the clinical sample and executes one or more instructions stored in the memory. , a risk score for developing diabetic nephropathy for the clinical sample can be calculated using the genotype and effect size for each of the plurality of target omics data or the plurality of target SNP data. Additionally, the computing device 240 may perform a linkage analysis process with covariate data such as patient epidemiological information or other diagnostic information.

일 실시 예에 의하면, 메모리(220)는 전자 장치(1000)가 이용하는 인공 지능 모델, 기계 학습 모델, 뉴럴 네트워크 모델, 예측 모형 모델에 대한 정보를 더 저장할 수도 있다. 일 실시 예에 의하면, 전자 장치 또는 시스템이 이용하는 인공 지능 모델은 인공 지능 학습 알고리즘에 기초하여 학습될 수 있는 모델일 수 있다. 일 실시 예에 의하면, 인공 지능 모델은 신경망 모델을 포함할 수 있다. 예를 들어, 신경망 모델은 인공 신경망(Artificial Neural Network)으로써, 생물학적 신경망에 착안된 컴퓨팅 시스템을 지칭할 수 있다. 인공 신경망 모델은 미리 정의된 조건에 따라 작업을 수행하는 고전적인 알고리즘과 달리, 다수의 샘플들을 고려함으로써 작업을 수행하는 것을 학습할 수 있다. According to one embodiment, the memory 220 may further store information about artificial intelligence models, machine learning models, neural network models, and prediction models used by the electronic device 1000. According to one embodiment, the artificial intelligence model used by the electronic device or system may be a model that can be learned based on an artificial intelligence learning algorithm. According to one embodiment, the artificial intelligence model may include a neural network model. For example, a neural network model is an artificial neural network, which may refer to a computing system inspired by biological neural networks. Unlike classic algorithms that perform tasks according to predefined conditions, artificial neural network models can learn to perform tasks by considering multiple samples.

일 실시 예에 의하면, 인공 신경망 모델은 인공 뉴런(neuron)들이 연결된 구조를 가질 수 있고, 뉴런들 간의 연결은 시냅스(synapse)로 지칭될 수 있다. 뉴런은 수신된 신호를 처리할 수 있고, 처리된 신호를 시냅스를 통해서 다른 뉴런에 전송할 수 있다. 뉴런의 출력은 액티베이션(activation)으로 지칭될 수 있고, 뉴런 및/또는 시냅스는 변동될 수 있는 가중치(weight)를 가질 수 있고, 가중치에 따라 뉴런에 의해 처리된 신호의 영향력이 증가하거나 감소할 수 있다.According to one embodiment, an artificial neural network model may have a structure in which artificial neurons are connected, and connections between neurons may be referred to as synapses. Neurons can process received signals and transmit the processed signals to other neurons through synapses. The output of a neuron may be referred to as an activation, and the neuron and/or synapse may have a weight that may vary, and depending on the weight, the influence of the signal processed by the neuron may increase or decrease. .

예를 들어, 신경망 모델은 레이어들 및 상기 레이어들의 연결 강도에 관한 가중치들로 정의되는 복수의 블록들을 포함할 수 있다. 보다 상세하게는, 신경망 모델은 신경망 모델 내 복수의 신경망 레이어들 각각은 복수의 가중치들(weight values, weights)을 갖고 있으며, 이전(previous) 레이어의 연산 결과와 복수의 가중치들 간의 연산을 통해 신경망 연산을 수행한다. 복수의 신경망 레이어들이 갖고 있는 복수의 가중치들은 인공 신경망의 학습 결과에 의해 최적화될 수 있다. For example, a neural network model may include a plurality of blocks defined by layers and weights related to the connection strengths of the layers. More specifically, the neural network model is a neural network model, where each of the plurality of neural network layers within the neural network model has a plurality of weights (weight values), and a neural network is formed through calculations between the calculation results of the previous layer and the plurality of weights. Perform calculations. The plurality of weights of the plurality of neural network layers can be optimized based on the learning results of the artificial neural network.

예를 들어, 학습 과정 동안 인공지능 모델(예컨대 신경망 모델)에서 획득한 손실(loss) 값 또는 코스트(cost) 값이 감소 또는 최소화되도록 복수의 가중치들이 수정 및 갱신될 수 있다. 본 개시에 따른 전자 장치가 이용하는 인공 지능 모델은 심층 신경망(DNN:Deep Neural Network)를 포함할 수 있으며, 예를 들어, CNN (Convolutional Neural Network), DNN (Deep Neural Network), RNN (Recurrent Neural Network), RBM (Restricted Boltzmann Machine), DBN (Deep Belief Network), BRDNN(Bidirectional Recurrent Deep Neural Network), LSTM(Long Short-Term Memory) 모델 또는 심층 Q-네트워크 (Deep Q-Networks) 등이 있으나, 전술한 예에 한정되지 않는다. For example, during the learning process, a plurality of weights may be modified and updated so that loss or cost values obtained from an artificial intelligence model (eg, neural network model) are reduced or minimized. The artificial intelligence model used by the electronic device according to the present disclosure may include a deep neural network (DNN), for example, a Convolutional Neural Network (CNN), a Deep Neural Network (DNN), and a Recurrent Neural Network (RNN). ), RBM (Restricted Boltzmann Machine), DBN (Deep Belief Network), BRDNN (Bidirectional Recurrent Deep Neural Network), LSTM (Long Short-Term Memory) model, or Deep Q-Networks, etc. It is not limited to one example.

일 실시 예에 의하면, 본 개시에 따른 전자 장치는 AI 프로그램이 탑재되어 의료 데이터들을 분석할 수 있는 스마트폰, PC, 휴대폰, 랩톱, 미디어 플레이어, 서버, 기타 모바일 또는 비모바일 컴퓨팅 장치일 수 있으나, 이에 제한되지 않는다.According to one embodiment, the electronic device according to the present disclosure may be a smartphone, PC, mobile phone, laptop, media player, server, or other mobile or non-mobile computing device equipped with an AI program and capable of analyzing medical data. It is not limited to this.

또한, 일 실시 예에 의하면 본 개시에 따른 시스템 내지 전자 장치는 서버와 같은 외부 디바이스에 통신 연결될 수 있다. 예를 들어, 서버는 네트워크를 통하여 전자 장치와 연결됨으로써, 전자 장치와 데이터를 송수신할 수 있는 기타 컴퓨팅 장치를 포함할 수 있다. Additionally, according to one embodiment, the system or electronic device according to the present disclosure may be connected for communication to an external device such as a server. For example, a server may include other computing devices that are connected to the electronic device through a network and can transmit and receive data with the electronic device.

일 실시 예에 의하면, 제어 장치(200)는 하나 이상의 인스트럭션을 저장하는 메모리 및 상기 하나 이상의 인스트럭션을 수행하는 적어도 하나의 프로세서를 포함할 수 있다. 제어 장치(200)는 연산 장치(240)와 연동함으로써 당뇨병성 신증 발병 위험도를 예측하는 방법 중 적어도 일부를 수행할 수 있다.According to one embodiment, the control device 200 may include a memory that stores one or more instructions and at least one processor that performs the one or more instructions. The control device 200 may perform at least part of a method of predicting the risk of developing diabetic nephropathy by linking with the computing device 240.

예를 들어, 제어 장치(200)의 프로세서(미도시)는 통상적으로 전자 장치 또는 시스템의 전반적인 동작을 제어한다. 예를 들어, 프로세서(미도시)는, 메모리(220)에 저장된 프로그램들을 실행함으로써, 사용자 입력 인터페이스(미도시), 네트워크 인터페이스, 입력 장치, 연산 장치 등을 전반적으로 제어할 수 있다. For example, a processor (not shown) of the control device 200 typically controls the overall operation of the electronic device or system. For example, a processor (not shown) can generally control a user input interface (not shown), a network interface, an input device, a computing device, etc. by executing programs stored in the memory 220.

일 실시 예에 의하면, 프로세서(미도시)는 상기 전자 장치와 연결된 외부 디바이스로부터 임상 샘플의 유전체 데이터를 획득하고, 기 설정된 복수의 레퍼런스 SNP 데이터들 또는 당뇨병성 신증 발병에 관련된 복수의 miRNA 데이터 및 상기 복수의 레퍼런스 SNP 데이터를 포함하는 복수의 레퍼런스 오믹스 데이터를 기준으로, 상기 유전체 데이터에서 복수의 타깃 오믹스 데이터들 또는 복수의 타깃 SNP 데이터들을 선별하고, 선별된 타깃 오믹스 데이터들 또는 타깃 SNP 데이터들 각각의 유전자형 및 각각에 대한 가중치를 이용하여 위험도 점수를 산출할 수 있다.According to one embodiment, a processor (not shown) acquires genomic data of a clinical sample from an external device connected to the electronic device, and collects a plurality of preset reference SNP data or a plurality of miRNA data related to the onset of diabetic nephropathy, and the Selecting a plurality of target omics data or a plurality of target SNP data from the genomic data based on a plurality of reference omics data including a plurality of reference SNP data, and selecting the selected target omics data or target SNP data A risk score can be calculated using each genotype and the weight for each.

일 실시 예에 의하면, 네트워크 인터페이스(250)는 전자 장치 또는 시스템이 다른 장치(미도시) 및 서버와 통신을 하게 하는 하나 이상의 구성요소를 포함할 수 있다. 일 실시 예에 의하면, 전자 장치(1000) 또는 시스템(10)이 통신하는 서버는 근거리 통신망(Local Area Network; LAN), 광역 통신망(Wide Area Network; WAN), 부가가치 통신망(Value Added Network; VAN), 이동 통신망(mobile radio communication network), 중 적어도 하나의 조합을 포함할 수 있다.According to one embodiment, the network interface 250 may include one or more components that allow an electronic device or system to communicate with other devices (not shown) and servers. According to one embodiment, the server with which the electronic device 1000 or the system 10 communicates is a local area network (LAN), a wide area network (WAN), or a value added network (VAN). , and may include a combination of at least one of a mobile radio communication network.

다른 장치(미도시)는 전자 장치(1000)와 같은 컴퓨팅 장치이거나, 센싱 장치일 수 있으나, 이에 제한되지 않는다. 일 실시 예에 의하면, 네트워크 인터페이스(250)는 근거리 통신부(미도시) 또는 원거리 통신부(미도시)를 포함할 수 있다. The other device (not shown) may be a computing device such as the electronic device 1000 or a sensing device, but is not limited thereto. According to one embodiment, the network interface 250 may include a short-range communication unit (not shown) or a long-distance communication unit (not shown).

근거리 통신부(short-range wireless communication unit)(미도시)는, 블루투스 통신부, BLE(Bluetooth Low Energy) 통신부, 근거리 무선 통신부(Near Field Communication unit), WLAN(와이파이) 통신부, 지그비(Zigbee) 통신부, 적외선(IrDA, infrared Data Association) 통신부, WFD(Wi-Fi Direct) 통신부, UWB(ultra wideband) 통신부, Ant+ 통신부 등을 포함할 수 있으나, 이에 한정되는 것은 아니다. The short-range wireless communication unit (not shown) includes a Bluetooth communication unit, a Bluetooth Low Energy (BLE) communication unit, a Near Field Communication unit, a WLAN (Wi-Fi) communication unit, a Zigbee communication unit, and an infrared communication unit. (IrDA, infrared Data Association) communication unit, WFD (Wi-Fi Direct) communication unit, UWB (ultra wideband) communication unit, Ant+ communication unit, etc., but is not limited thereto.

원거리 통신부는 이동 통신부 또는 방송 수신부를 포함할 수 있다. 예를 들어, 이동 통신부는 이동 통신망 상에서 기지국, 외부의 단말, 서버 중 적어도 하나와 무선 신호를 송수신한다. 여기에서, 무선 신호는, 음성 신호, 화상 통화 호 신호 또는 문자/멀티미디어 메시지 송수신에 따른 다양한 형태의 데이터를 포함할 수 있다. 방송 수신부는, 방송 채널을 통하여 외부로부터 방송 신호 및/또는 방송 관련된 정보를 수신한다. 방송 채널은 위성 채널, 지상파 채널을 포함할 수 있다. 구현 예에 따라서 전자 장치가 방송 수신부(미도시)를 포함하지 않을 수도 있음은 물론이다.The remote communication unit may include a mobile communication unit or a broadcast reception unit. For example, the mobile communication unit transmits and receives wireless signals to at least one of a base station, an external terminal, and a server on a mobile communication network. Here, the wireless signal may include various types of data according to voice signals, video call signals, or text/multimedia message transmission and reception. The broadcast receiver receives broadcast signals and/or broadcast-related information from the outside through a broadcast channel. Broadcast channels may include satellite channels and terrestrial channels. Of course, depending on the implementation example, the electronic device may not include a broadcast receiver (not shown).

일 실시 예에 의하면 전자 장치 또는 시스템은 A/V(Audio/Video) 입력부(미도시)를 더 포함할 수도 있다. A/V 입력부는 오디오 신호 또는 비디오 신호 입력을 위한 것으로, 이에는 카메라와 마이크로폰 등이 포함될 수 있다. 카메라는 화상 통화모드 또는 촬영 모드에서 이미지 센서를 통해, 관련 의료 영상 또는 동영상 화상 프레임 데이터를 획득할 수 있고, 이미지 센서를 통해 캡쳐된 이미지는 프로세서 또는 별도의 이미지 처리부(미도시)를 통해 처리될 수 있다. According to one embodiment, the electronic device or system may further include an audio/video (A/V) input unit (not shown). The A/V input unit is for inputting audio signals or video signals, and may include cameras and microphones. The camera can acquire related medical image or video image frame data through an image sensor in video call mode or shooting mode, and the image captured through the image sensor can be processed through a processor or a separate image processing unit (not shown). You can.

마이크로폰은, 외부 디바이스 또는 사용자로부터 음향 신호를 수신할 수 있다. 마이크로폰은 사용자의 음성 입력을 수신할 수 있다. 마이크로폰은 외부의 음향 신호를 입력 받는 과정에서 발생 되는 잡음(noise)을 제거하기 위한 다양한 잡음 제거 알고리즘을 이용할 수 있다.The microphone can receive acoustic signals from an external device or a user. The microphone can receive the user's voice input. Microphones can use various noise removal algorithms to remove noise generated in the process of receiving external acoustic signals.

일 실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 개시를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. The method according to one embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc., singly or in combination. Program instructions recorded on the medium may be specially designed and configured for the present disclosure or may be known and usable by those skilled in the art of computer software.

또한, 상기 일 실시 예에 다른 방법을 수행하도록 하는 프로그램이 저장된 기록매체를 포함하는 컴퓨터 프로그램 장치가 제공될 수 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. Additionally, a computer program device including a recording medium storing a program for performing a method different from the above embodiment may be provided. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -Includes optical media (magneto-optical media) and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc.

이상에서 본 개시의 실시예에 대하여 상세하게 설명하였지만 본 개시의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 개시의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 개시의 권리범위에 속한다.Although the embodiments of the present disclosure have been described in detail above, the scope of the rights of the present disclosure is not limited thereto, and various modifications and improvements made by those skilled in the art using the basic concept of the present disclosure defined in the following claims are also possible. falls within the scope of rights.

Claims

In a method for an electronic device to predict the risk of developing diabetic nephropathy,
Obtaining genomic data of a clinical sample from an external device connected to the electronic device;
Based on reference omics data including a plurality of preset reference SNP (Single Nucleotide Polymorphism) data and a plurality of miRNA data related to the onset of diabetic nephropathy, a plurality of target errors identical to the reference omics data are identified from the genomic data. Selecting mix data; and
Calculating a risk score using a genotype of each of the plurality of target omics data and a weight for each of the plurality of target omics data; Including,
The reference omics data includes a plurality of reference SNP data related to the development of diabetic nephropathy and a plurality of mi RNA data related to the development of diabetic nephropathy,
A method characterized in that it reflects the effect size corrected using the linkage imbalance and global scaling parameter (Shrinkage factor) of the omics data related to the development of diabetic nephropathy.

The method of claim 1, wherein the step of calculating the risk score is
The effect size derived from the association test between cases and controls for SNP data is extracted from the GWAS database that stores genome-wide association analysis data connected to the electronic device, and the transcriptome-wide association analysis data is stored. Extracting an effect size derived from a colocalization association test for whether gene expression results are affected by the same SNP from the TWAS database;
Determining a main effect weighted by the effect size of each risk allele and an interaction effect weighted by the effect size of co-regulatory miRNA data for the m SNPs associated with the risk of developing diabetic nephropathy;
Generating non-linear low-dimensional embedding by inputting the determined interaction effect into a variational auto encoder using L_1 normalization and restoring randomly sampled values from the distribution output from the variational auto encoder to a decoder; and
Inputting the low-dimensional embedding into a preset prediction model model to calculate a Polygenic Risk Score (PRS), defined as the expected value of the multi-condition posterior probability distribution of the estimated beta_j, as the risk score from the prediction model model. ; Method, including.

The method of claim 1, wherein the step of calculating the risk score is
Calculating the risk score based on Equation 1 below; Including,
[Equation 1]

In Equation 1, PRS _i represents the genetic risk score of individual i, i is an identification number that distinguishes the individual's genomic data, and j is the target omics data or SNP data of the target omics data. The identification number for, G _ij is Prior probability distribution framework, is defined as a nonlinear low-dimensional space. probability distribution of, Is Expected value of multi-conditional posterior probability distribution including prior probability and epidemiological information, Is is the prior probability estimator, D is the epidemiological information, N is the Gaussian distribution, is a global scaling parameter (shrinkage parameter), is an L_1 normalized reduction estimation parameter for the target omics data or SNP data of the target omics data.

The method of claim 1, wherein the plurality of reference SNP data of the reference omics data are:
rs12531478-A, rs17373728-C, rs5750250-G, rs11107616-C, rs136161-G, rs4879670-G, rs13259109-G, rs1298908-G, rs304029-G, rs9510795-A, rs10952362-C, rs4667466-T, rs10778560- C, RS7975752-G, RS731565-T, RS484965-C, RS6910061-A, RS1424609-G, RS2596230-G, RS1677894-G, RS5750250-G, RS5750250-G, RS136161-G Including data A method, characterized in that.

In the electronic device for predicting the risk of developing diabetic nephropathy,
network interface;
A memory that stores one or more instructions; and
At least one processor executing the one or more instructions; Including,
By executing one or more of the instructions above,
Obtaining genomic data of clinical samples from an external device connected to the electronic device,
Based on reference omics data including a plurality of preset reference SNP (Single Nucleotide Polymorphism) data and a plurality of miRNA data related to the onset of diabetic nephropathy, a plurality of target errors identical to the reference omics data are identified from the genomic data. Select mix data,
Calculate a risk score using the genotype of each of the plurality of target omics data and a weight for each of the plurality of target omics data,
The reference omics data includes a plurality of reference SNP data related to the development of diabetic nephropathy and a plurality of miRNA data related to the development of diabetic nephropathy,
An electronic device, characterized in that it reflects the effect size corrected using the linkage imbalance and global scaling parameter (Shrinkage factor) of the reference omics data related to the onset of diabetic nephropathy.

In a method for an electronic device to predict the risk of developing diabetic nephropathy,
Obtaining genomic data of a clinical sample from an external device connected to the electronic device;
Based on reference omics data including a plurality of preset reference SNP (Single Nucleotide Polymorphism) data and a plurality of miRNA data related to the onset of diabetic nephropathy, a plurality of target errors identical to the reference omics data are identified from the genomic data. Selecting mix data; and
Calculating a risk score using a genotype of each of the plurality of target omics data and a weight for each of the plurality of target omics data; Including,
The reference omics data includes a plurality of reference SNP data related to the development of diabetic nephropathy and a plurality of mi RNA data related to the development of diabetic nephropathy,
A program for executing the method on a computer, characterized in that it reflects the effect size corrected using the linkage imbalance of the omics data related to the development of diabetic nephropathy and a global scaling parameter (Shrinkage factor) A computer-readable recording medium that records information.