KR102476603B1

KR102476603B1 - System for diagnosing gene using self-improving genetic sequensing based on artificial intelligence

Info

Publication number: KR102476603B1
Application number: KR1020200163857A
Authority: KR
Inventors: 이건우
Original assignee: 이건우
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2022-12-13
Also published as: KR20220075594A

Abstract

인공지능 기반의 유전자 분석 모델을 이용하여, 사용자 정보에 대응되는 질병 유전자를 포함한 유전자를 진단하기 위한 방법에 있어서, 사용자 단말로부터 사용자의 인종, 민족, 성별을 포함하는 프로필 정보를 포함하는 사용자 정보가 수신되는 단계, 상기 사용자 정보를 항목 별 정보로 세분화하고, 세분화된 항목들 중 미리 정해진 항목을 기준으로 하여, 상기 유전자 분석 모델과 매칭시켜 정보를 비교하는 단계, 상기 사용자 정보로부터 일차적으로 의심되는 질병 유전자를 포함한 유전자의 정보를 추출하고, 상기 유전자 분석 모델에 기반하여 상기 의심되는 질병 유전자를 포함한 유전자의 항목과 상관도가 일정 값 이상인 항목에 대한 매칭 여부를 확인하기 위한 피드백 설문을 구성하는 단계, 상기 사용자 단말로부터 상기 피드백 설문에 대한 피드백 정보가 수신되면, 상기 피드백 정보로부터 의심되는 질병 유전자를 포함한 유전자의 정보를 업데이트하고, 피드백 설문 구성 및 업데이트를 반복하여 의심되는 질병 유전자를 포함한 유전자를 확정하는 단계, 및 상기 확정된 의심되는 질병 유전자를 포함한 유전자에 대한 보고서를 사용자 단말로 전송하는 단계를 포함하고, 상기 유전자 분석 모델은 불특정 다수의 피검자 정보, 질병 정보, 및 유전자 정보 각각을 세분화하여 분류한 항목별 빅데이터로부터 항목 간에 상관도가 있는지 여부를 인공지능에 기반하여 학습한 모델인 것을 특징으로 한다.In a method for diagnosing a gene including a disease gene corresponding to user information using an artificial intelligence-based gene analysis model, user information including profile information including race, ethnicity, and gender of the user is received from a user terminal Receiving step, subdividing the user information into information for each item, and comparing information by matching with the genetic analysis model based on a predetermined item among the subdivided items, disease suspected primarily from the user information Extracting gene information including the gene, and constructing a feedback questionnaire to check whether or not a match is made for an item having a correlation with a gene item including the suspected disease gene based on the gene analysis model, When feedback information on the feedback questionnaire is received from the user terminal, updating gene information including the suspected disease gene from the feedback information, and repeating the feedback questionnaire construction and update to determine the gene including the suspected disease gene and transmitting a report on the gene including the confirmed suspected disease gene to a user terminal, wherein the gene analysis model subdivides and classifies information on a plurality of unspecified subjects, disease information, and gene information. It is characterized by being a model that learns whether or not there is a correlation between items from big data for each item based on artificial intelligence.

Description

Gene diagnosis system through artificial intelligence-based self-improving genome sequencing

본 발명은 인공지능 기반 자기개선 게놈 시퀀싱을 통한 유전자 진단 시스템에 관한 것으로, 보다 구체적으로는, 인공지능 기반의 유전자간 상관관계 정보를 포함하는 자기개선 게놈 시퀀싱을 이용하여, 사용자 정보에 대응되는 질병 유전자를 포함한 유전자를 진단하고 분석하는 유전자 진단 시스템에 관한 것이다 The present invention relates to a genetic diagnosis system through artificial intelligence-based self-improving genome sequencing, and more specifically, to a disease corresponding to user information by using artificial intelligence-based self-improving genome sequencing including correlation information between genes. It relates to a genetic diagnosis system that diagnoses and analyzes genes including genes.

세계 유전자검사시장은, 백만 명이 넘는 사람들의 유전정보를 분석한 자료를 바탕으로, 유전자검사 연구개발과 함께 급속하게 성장하고 있다. 국내에서도 유전자검사 기술이 도입된 후 WES(Whole Exome Sequencing) 유전체 정보를 빅데이터화하고 이를 통합하고 최적화하는 기술로 개인유전체맵 플랫폼(PMAP)을 개발하기에 이르렀다. 이는 슈퍼 컴퓨팅 시스템으로 질병감수성을 찾아내 질병을 미연에 예방할 수 있게 하며, 신생아 유전질환 스크리닝 검사, 희귀질환 및 암 유전자 검사, 신약 및 줄기세포치료제 개발 등 다양한 분야에서 적용 가능할 것으로 전망된다.The global genetic testing market is growing rapidly along with research and development of genetic testing based on the data of analyzing the genetic information of more than one million people. After genetic testing technology was introduced in Korea, WES (Whole Exome Sequencing) genomic information was turned into big data, and the Personal Genome Map Platform (PMAP) was developed as a technology to integrate and optimize it. This is a supercomputing system that detects disease susceptibility and prevents disease in advance, and is expected to be applicable in various fields such as newborn genetic disease screening, rare disease and cancer genetic testing, and development of new drugs and stem cell treatments.

한편, 질병유전자는 질병을 조절할 수 있는 유전자로서 약물 표적 후보이다. 따라서, 질병유전자를 예측하는 기술은 효율적인 치료 방법 및 약물 개발을 위해 의약학 산업에 가장 필요로 하는 기술 중 하나이다. 최근 고속 대량 스크리닝(high-throughput screening) 기술의 발전으로, 실험을 통해 약물 표적 후보 또는 질병유전자를 발굴하는 연구가 다수 진행 중이다. 하지만 실험을 통한 발굴 작업은 시간과 비용의 소모가 커서, 예상되는 약물 표적 후보 숫자에 크게 못 미치는 발굴 성과를 보이고 있다. 최근에는 전산 기술을 통하여 질병유전자들을 예측하고 이들을 실험적으로 검증하여 적은 비용과 빠른 시간에 약물 표적 후보들을 발굴하는 방법이 제안되고 있다.On the other hand, a disease gene is a gene capable of controlling a disease and is a drug target candidate. Therefore, technology for predicting disease genes is one of the technologies most needed in the pharmaceutical industry for efficient treatment methods and drug development. Recently, with the development of high-throughput screening technology, many studies are underway to discover drug target candidates or disease genes through experiments. However, the excavation work through experiments is time-consuming and costly, and the excavation results far fall short of the expected number of drug target candidates. Recently, a method of predicting disease genes through computational technology and experimentally verifying them to discover drug target candidates at low cost and in a short time has been proposed.

그러나, 유전자 데이터가 확보되지 않은 상태에서의 정확한 질병 유전자의 예측은 불가하다는 점에서 한계가 있으며, 질병유전자를 예측한다고 하더라도, 방대한 질병 유전자 pool과 mutation을 통해 끊임없이 진화하는 바이러스에 대응하여 유의미한 질병 유전자를 예측하는 것에는 한계가 있을 수밖에 없다. 따라서, 점차 방대해지는 질병을 유발하는 유전자 데이터에 대응할 수 있도록 인간의 질병 유전자를 보다 빠르고 정확하게 분석/예측해서 개인 및 집단의 질병 유전자 정보 pool을 형성하고 종국적으로 질병과 상관된 유전자 데이터를 포함한 게놈지도의 확보가 요구된다. However, there is a limitation in that it is impossible to accurately predict disease genes in a state where genetic data are not secured, and even if disease genes are predicted, they are meaningful disease genes in response to constantly evolving viruses through vast disease gene pools and mutations. There are bound to be limitations in predicting . Therefore, in order to respond to the genetic data that causes increasingly massive diseases, human disease genes are analyzed/predicted more quickly and accurately to form a disease gene information pool of individuals and groups, and eventually a genome map including genetic data correlated with diseases. is required to secure

본 발명은 상기와 같은 문제점을 해결하기 위한 것으로, 인공지능 기반의 유전자 항목별 상관관계 정보를 포함하는 자기개선 게놈 시퀀싱을 이용하여, 인간의 질병 유전자를 포함한 유전자 정보를 보다 빠르고 정확하게 분석/예측해서 개인 및 집단의 유전자 정보 pool을 확보하고, 이에 기초하여 유전자와 관련있는 건강정보를 포함한 행동, 심리 등 모든 정보로부터 사용자의 질병 유전자를 포함한 유전자를 진단하는 시스템을 제공하는데 목적이 있다.The present invention is to solve the above problems, by using artificial intelligence-based self-improving genome sequencing that includes correlation information for each gene item, by analyzing/predicting genetic information including human disease genes more quickly and accurately The purpose is to secure a pool of individual and group genetic information, and based on this, to provide a system that diagnoses genes, including disease genes, of users from all information such as behavioral and psychological information, including health information related to genes.

상기한 본 발명의 목적을 실현하기 위한 일 실시예에 따른 인공지능 기반의 유전자 분석 모델을 이용하여, 사용자 정보에 대응되는 질병 유전자를 포함한 유전자를 진단하기 위한 방법에 있어서, 사용자 단말로부터 사용자의 인종, 민족, 성별을 포함하는 프로필 정보를 포함하는 사용자 정보가 수신되는 단계, 상기 사용자 정보를 항목 별 정보로 세분화하고, 세분화된 항목들 중 미리 정해진 항목을 기준으로 하여, 상기 유전자 분석 모델과 매칭시켜 정보를 비교하는 단계, 상기 사용자 정보로부터 일차적으로 의심되는 질병 유전자를 포함한 유전자의 정보를 추출하고, 상기 유전자 분석 모델에 기반하여 상기 의심되는 질병 유전자를 포함한 유전자의 항목과 상관도가 일정 값 이상인 항목에 대한 매칭 여부를 확인하기 위한 피드백 설문을 구성하는 단계, 상기 사용자 단말로부터 상기 피드백 설문에 대한 피드백 정보가 수신되면, 상기 피드백 정보로부터 의심되는 질병 유전자를 포함한 유전자의 정보를 업데이트하고, 피드백 설문 구성 및 업데이트를 반복하여 의심되는 질병 유전자를 포함한 유전자를 확정하는 단계, 및 상기 확정된 의심되는 질병 유전자를 포함한 유전자에 대한 보고서를 사용자 단말로 전송하는 단계를 포함하고, 상기 유전자 분석 모델은 불특정 다수의 피검자 정보, 질병 정보, 및 유전자 정보 각각을 세분화하여 분류한 항목별 빅데이터로부터 항목 간에 상관도가 있는지 여부를 인공지능에 기반하여 학습한 모델이다.In the method for diagnosing genes including disease genes corresponding to user information using an artificial intelligence-based gene analysis model according to an embodiment for realizing the object of the present invention described above, the race of the user from the user terminal , Receiving user information including profile information including ethnicity and gender, subdividing the user information into information for each item, and matching with the gene analysis model based on a predetermined item among the subdivided items Comparing information, extracting information on a gene including a suspected disease gene primarily from the user information, and based on the gene analysis model, an item whose correlation with the item of the gene including the suspected disease gene is at least a certain value constructing a feedback questionnaire for checking whether a match exists for, when feedback information on the feedback questionnaire is received from the user terminal, updating gene information including a suspected disease gene from the feedback information, and constructing a feedback questionnaire and confirming genes including suspected disease genes by repeating the update, and transmitting a report on the genes including the confirmed suspected disease genes to a user terminal, wherein the gene analysis model is used to determine a number of unspecified genes. It is an artificial intelligence-based model that learns whether or not there is a correlation between items from big data for each item classified by subdividing subject information, disease information, and genetic information.

본 발명의 일 실시예에 있어서, 상기 사용자 정보에는 사용자의 질병 이력 및 가족력에 대한 정보가 포함되고, 상기 의심되는 질병 유전자를 포함한 유전자의 정보를 추출하는 단계는, 일차적으로 상기 프로필 정보에 기초하고, 이차적으로 상기 질병 이력 및 가족력에 기초한 유전자 정보에 가중치를 두어 의심되는 질병 유전자를 포함한 유전자의 정보를 추출할 수 있다 In one embodiment of the present invention, the user information includes information on the user's disease history and family history, and the step of extracting the gene information including the suspected disease gene is primarily based on the profile information , Secondarily, information on genes including suspected disease genes can be extracted by weighting genetic information based on the disease history and family history.

본 발명의 일 실시예에 있어서, 상기 사용자 정보에는 사용자의 건강검진 정보 및 가족의 건강검진 정보가 더 포함되고, 상기 의심되는 질병 유전자를 포함한 유전자의 정보를 추출하는 단계에서는 이차적으로 상기 질병 이력 및 가족력과 함께, 상기 사용자의 건강검진 정보 및 가족의 건강검진 정보에 기초한 유전자 정보에 가중치를 두어 의심되는 질병 유전자를 포함한 유전자의 정보를 추출할 수 있다. In one embodiment of the present invention, the user information further includes the user's health checkup information and family health checkup information, and in the step of extracting the gene information including the suspected disease gene, the disease history and Gene information including a suspected disease gene may be extracted by weighting gene information based on the user's health checkup information and family health checkup information together with the family history.

본 발명의 일 실시예에 있어서, 상기 피드백 설문을 구성하는 단계는, 의심되는 질병 유전자를 포함하는 유전자와 직접적으로 상관도가 일정 값 이상인 항목의 매칭 여부를 확인하기 위한 질문들로 구성하거나, 또는 간접적으로 해당 상관도가 일정 값 이상인 항목과의 상관도가 일정 값 이상인 항목의 매칭 여부를 확인하기 위한 질문들로 구성할 수 있다.In one embodiment of the present invention, the step of constructing the feedback questionnaire consists of questions for confirming whether or not an item having a direct correlation with a gene containing a suspected disease gene is matched with a certain value or not, or It can be composed of questions for indirectly confirming whether an item having a correlation higher than a predetermined value is matched with an item having a correlation higher than a predetermined value.

본 발명의 일 실시예에 있어서, 상기 피드백 정보에는, 사용자의 추가적인 프로필 정보, 신체적 특징 정보 및 질환 정보 중 적어도 하나를 포함할 수 있다.In one embodiment of the present invention, the feedback information may include at least one of additional profile information, physical characteristic information, and disease information of the user.

본 발명의 일 실시예에 있어서, 상기 의심되는 질병 유전자를 포함한 유전자의 정보를 추출은 의심 확률에 기반하고, 상기 피드백 설문 구성 및 업데이트의 반복은 의심되는 유전자의 의심 확률이 미리 정해진 기준을 초과할 때까지 반복될 수 있다. In one embodiment of the present invention, the information of the gene including the suspected disease gene is extracted based on the probability of suspicion, and the repetition of constructing and updating the feedback questionnaire determines that the probability of suspicion of the suspected gene exceeds a predetermined criterion. can be repeated until

본 발명의 일 실시예에 있어서, 상기 보고서는, 상기 유전자 분석 모델에 기초하여 확정된 의심되는 질병 유전자를 포함한 유전자 및 상기 유전자와 일정 이상의 상관도를 갖는 항목들 간의 상관도를 표시한 게놈 지도일 수 있다. In one embodiment of the present invention, the report is a genome map displaying the correlation between genes including suspected disease genes confirmed based on the gene analysis model and items having a certain or higher correlation with the genes. can

본 발명의 일 실시예에 있어서, 상기 사용자 단말로부터 수신되는 사용자 정보 및 피드백 정보는 상기 유전자 분석 모델을 업데이트하기 위한 학습 자료로 활용될 수 있다. In one embodiment of the present invention, user information and feedback information received from the user terminal may be used as learning data for updating the genetic analysis model.

본 발명의 일 실시예에 있어서, 상기 유전자 분석 모델과 매칭을 위해 기준으로 하는 미리 정해진 항목은 사용자의 인종, 민족, 성별을 포함하는 프로필 정보일 수 있다. In one embodiment of the present invention, a predetermined item as a criterion for matching with the gene analysis model may be profile information including the user's race, ethnicity, and gender.

상기한 본 발명의 목적을 실현하기 위한 일 실시예에 따른 인공지능 기반의 유전자 분석 모델을 이용하여, 사용자 정보에 대응되는 질병 유전자를 포함한 유전자를 진단하기 위한 시스템에 있어서, 네트워크를 통해 중앙 서버와 통신하며, 사용자의 인종, 민족, 성별을 포함하는 프로필 정보를 포함하는 사용자 정보 및 사용자 정보로부터 질병 유전자를 포함한 유전자를 예측하기에 의심 확률이 낮거나, 또는 특정 유전자인지 여부를 확정하기 위한 피드백 정보를 상기 중앙 서버로 송신하는 사용자 단말, 네트워크를 통해 상기 중앙 서버와 통신하며, 제공받은 불특정 다수의 피검자 정보, 질병 정보, 및 유전자 정보 각각을 세분화하여 분류한 항목별 빅데이터로부터 항목 간에 상관도가 있는지 여부를 인공지능에 기반하여 유전자 분석 모델을 학습하도록 구성된 유전자 분석서버, 및 네트워크를 통해 상기 사용자 단말 및 상기 유전자 분석서버와 통신하는 중앙 서버를 포함하고, 상기 중앙 서버는, 상기 사용자 단말로부터 수신된 상기 사용자 정보를 항목 별 정보로 세분화하고, 세분화된 항목들 중 미리 정해진 항목을 기준으로 하여, 상기 유전자 분석 모델과 매칭시켜 정보를 비교하는 매칭 비교부, 상기 사용자 정보로부터 일차적으로 의심되는 질병 유전자를 포함한 유전자의 정보를 추출하고, 상기 사용자 단말로부터 상기 피드백 설문에 대한 피드백 정보가 수신되면, 상기 피드백 정보로부터 의심되는 질병 유전자를 포함한 유전자의 정보를 업데이트하고, 피드백 설문 구성 및 업데이트를 반복하여 의심되는 질병 유전자를 포함한 유전자를 확정하는 예측질병 추출부, 및 상기 유전자 분석 모델에 기반하여 상기 의심되는 질병 유전자를 포함한 유전자의 항목과 상관도가 일정 값 이상인 항목에 대한 매칭 여부를 확인하기 위한 피드백 설문을 구성하는 피드백설문 구성부를 포함한다. In the system for diagnosing genes including disease genes corresponding to user information using an artificial intelligence-based gene analysis model according to an embodiment for realizing the object of the present invention, a central server and communication, user information including profile information including the user's race, ethnicity, and gender, and feedback information for determining whether the gene including the disease gene has a low probability of suspicion or is a specific gene from the user information A user terminal that transmits to the central server, communicates with the central server through a network, and the correlation between items is obtained from the big data for each item in which each of the received unspecified subject information, disease information, and genetic information is subdivided and classified. A gene analysis server configured to learn a gene analysis model based on artificial intelligence whether or not there is a genetic analysis model, and a central server communicating with the user terminal and the gene analysis server through a network, wherein the central server receives information from the user terminal. A matching comparison unit that subdivides the user information into item-specific information and compares the information by matching the information with the gene analysis model based on a predetermined item among the subdivided items, and a disease gene primarily suspected from the user information extracts gene information including, and when feedback information on the feedback questionnaire is received from the user terminal, updates the gene information including the suspected disease gene from the feedback information, and constructs and updates the feedback questionnaire repeatedly to suspect suspicion. A predictive disease extraction unit for determining genes including disease genes to be identified, and a feedback questionnaire for confirming whether or not matching items having a correlation with items of genes including the suspected disease genes based on the gene analysis model are equal to or higher than a certain value. Includes a feedback questionnaire component constituting.

본 발명에 따르면, 인공지능 기반의 건강검진 혹은 문진을 통해 확보된 불특정 다수인의 질환 및 질병을 포함한 피검자 정보 및 해당 피검자의 질병 유전자 정보를 포함하는 유전자 정보의 빅데이터를 학습한 모델을 이용하여, 종래의 피검자로부터 유전자 정보를 직접 체취하여, 질병 유전자를 포함한 유전자를 예측하는 방식이 아닌, 사용자의 정보 및 건강검진 정보만으로 추측되는 질병을 제시하고, 그에 따른 피드백 정보를 통해 질병 유전자를 포함한 유전자의 예측을 수행할 수 있다. According to the present invention, by using a model that has learned big data of genetic information including subject information including diseases and diseases of an unspecified number of people secured through artificial intelligence-based health checkups or medical examinations and disease gene information of the subject, , It is not a method of predicting genes including disease genes by directly taking genetic information from a conventional subject, but presenting a disease estimated only with user information and health checkup information, and through the resulting feedback information, genes including disease genes of can be predicted.

또한, 본 발명에 따르면, 상기 학습된 모델을 이용하여, 질환, 질병 및 질병 유전자간의 상관관계뿐만 아니라, 질환 외적인 요소에 의한 발병 원인 및 인과 관계를 분석할 수 있으며, 또한 이미 연구를 통해 밝혀진 유전자 관계가 아닐지라도 새로운 유전자 간의 상관 관계 분석을 통해, 사용자의 질병이나, 신체적 특징을 파악하는 자기 개선(self-improving)의 모델을 제공할 수 있다. 또한, 유전자 간의 상관 관계뿐만 아니라, 사용자 정보 또는 피드백 정보의 항목 간의 상관관계도 함께 분석할 수 있다. In addition, according to the present invention, by using the learned model, it is possible to analyze not only the correlation between diseases, diseases, and disease genes, but also causes and causal relationships caused by non-disease factors, and genes already identified through research. Even if it is not a relationship, it is possible to provide a self-improving model that identifies a user's disease or physical characteristics through correlation analysis between new genes. In addition, not only the correlation between genes, but also the correlation between items of user information or feedback information can be analyzed together.

아울러, 사용자로부터 입력된 사용자 정보도 학습용 데이터로 활용함으로써, 상기 학습된 모델의 정확도를 높힐 수 있는 동시에 해당 모델을 이용하여 치료제 개발 또는 헬스케어 산업 등 다양한 목적으로 활용할 수도 있다. In addition, by using user information input from the user as learning data, the accuracy of the learned model can be increased, and at the same time, the model can be used for various purposes such as the development of a cure or the healthcare industry.

도 1은 본 발명의 일 실시예에 따른 유전자 진단 시스템을 설명하기 위한 구성도이다.
도 2는 도 1의 유전자 분석서버에서 관리하는 유전자 분석 모델을 나타낸 예시도이다.
도 3는 도 1의 유전자 진단 시스템의 중앙 서버의 구성을 설명하기 위한 블록도이다.
도 4는 도 1의 유전자 분석서버에서 제공하는 결과물로서 게놈 지도의 예시도이다.
도 5는 본 발명의 일 실시예에 따른 유전자 진단 방법을 설명하기 위한 흐름도이다. 1 is a configuration diagram for explaining a genetic diagnosis system according to an embodiment of the present invention.
2 is an exemplary view showing a gene analysis model managed by the gene analysis server of FIG. 1 .
FIG. 3 is a block diagram for explaining the configuration of a central server of the genetic diagnosis system of FIG. 1 .
Figure 4 is an exemplary diagram of a genome map as a result provided by the genetic analysis server of Figure 1.
5 is a flowchart illustrating a genetic diagnosis method according to an embodiment of the present invention.

이하, 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 바람직한 실시예를 상세히 설명한다. 다만, 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예로 한정되지 않는다. 또한, 각 도면의 구성요소들에 참조번호를 부가함에 있어서, 동일한 구성 요소들에 한해서는 비록 다른 도면상에 표시되더라도 동일한 참조번호를 부여한다. 또한, 본 명세서에서 어떤 구성요소가 다른 구성요소에 "연결"되어 있다거나, "접속"되어 있다고 할 때, 이는 양 구성이 "직접적으로" 연결되어 있는 경우뿐 아니라, 그 사이에 다른 구성이 개재하여 연결되어 있는 경우도 포함하며, 또한, "물리적으로" 연결되어 있는 경우뿐만 아니라, "기능적 또는 통신적으로" 연결되어 있는 경우도 포함한다. 또한, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Hereinafter, with reference to the accompanying drawings, preferred embodiments of the present invention will be described in detail so that those skilled in the art can easily practice it. However, the present invention may be implemented in many different forms and is not limited to the embodiments described herein. In addition, in adding reference numerals to components of each drawing, the same reference numerals are given to the same components even though they are displayed on different drawings. In addition, in this specification, when a component is said to be "connected" or "connected" to another component, this is not only when both components are "directly" connected, but also when another component is interposed therebetween. It also includes cases where they are connected by "functional" or "communication" as well as cases where they are "physically" connected. In addition, when a part "includes" a certain component, it means that it may further include other components without excluding other components unless otherwise stated.

도 1은 본 발명의 일 실시예에 따른 유전자 진단 시스템을 설명하기 위한 구성도이다. 1 is a configuration diagram for explaining a genetic diagnosis system according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 유전자 진단 시스템(1000)은, 사용자별 프로필 정보 및 건강검진 정보를 사용자 단말로부터 제공받으면, 이를 사용자 정보로 데이터화하고, 인공지능에 기반하여, 미리 학습된 불특정 다수의 피검자 정보, 질병 정보, 및 질병 유전자를 포함한 유전자 정보 간의 상관 관계를 학습한 모델에 기초하여, 사용자 정보로부터 매칭되는 질병 유전자를 포함하는 유전자를 역추적하고, 그에 따른 피드백을 수신하여 사용자에게 진단을 내리도록 구성된다. Referring to FIG. 1, the genetic diagnosis system 1000 of the present invention, when receiving user-specific profile information and health checkup information from a user terminal, converts them into user information and, based on artificial intelligence, pre-learned unspecified numbers Based on the model that learns the correlation between subject information, disease information, and genetic information including disease genes, genes including disease genes matched from user information are traced back, and feedback is received to diagnose the user. is configured to lower

상기 유전자 진단 시스템(1000)은 사용자 단말(100), 인공지능에 기반하여 불특정 다수의 피검자 정보, 질병 정보 및 질병 유전자를 포함한 유전자 정보의 각 항목간의 상관관계를 학습하여, 정보의 항목 간의 상관도를 데이터베이스화 한 유전자 상관분석 모델을 관리하도록 구성된 유전자 분석서버(300), 및 상기 사용자 단말(100)을 통해 입력된 사용자별 프로필 정보 및 건강검진 정보를 포함한 사용자 정보를 데이터화하고, 상기 유전자 상관분석 모델에 기초하여, 사용자 정보로부터 매칭되는 질병 또는 질병 유전자를 포함한 유전자를 역추적하여, 사용자에게 진단을 내리도록 구성된 중앙 서버(500)를 포함한다. The genetic diagnosis system 1000 learns the correlation between the user terminal 100 and each item of genetic information including unspecified number of subjects information, disease information, and disease genes based on artificial intelligence, and the correlation between the items of information A gene analysis server 300 configured to manage a genetic correlation analysis model that has made a database, and user information including profile information and health checkup information for each user input through the user terminal 100 are converted into data, and the genetic correlation analysis and a central server 500 configured to, based on the model, backtrack genes including matching disease or disease genes from user information and diagnose the user.

상기 사용자 단말(100)은 네트워크(10)를 통해 접속할 수 있는 컴퓨터(pc)나 휴대용 단말로 구현될 수 있고, 네트워크(10)를 통해 중앙 서버(500)에 접속하여 데이터를 송수신할 수 있도록 구성된다.The user terminal 100 may be implemented as a computer (pc) or a portable terminal that can be accessed through the network 10, and is configured to transmit and receive data by accessing the central server 500 through the network 10. do.

구체적으로, 상기 사용자 단말(100)은 유전자 진단 시스템(1000)을 통해 질병 또는 질병 유전자를 포함한 유전자를 진단받고자 하는 개인 등의 사용자가 소유하고 있는 컴퓨터나 휴대용 단말로서, 사용자는 사용자 단말을 통한 웹(Web), 앱(Application) 또는 웹앱의 형태로 상기 네트워크(10)를 통해 중앙 서버(500)에 통신 접속하여 신규 가입 또는 로그인을 수행하고, 상기 중앙 서버(500)와 연동하는 사용자인터페이스(UI)를 통해 자신의 건강검진 정보와 프로필 정보를 입력하고, 추가적으로, 중앙 서버(500)로부터 피드백 요청이 있는 경우, 이에 대한 응답으로서 피드백 정보를 입력 가능하도록 구성된다. Specifically, the user terminal 100 is a computer or portable terminal owned by a user, such as an individual who wants to be diagnosed with a disease or a gene including a disease gene through the gene diagnosis system 1000, and the user can access the web through the user terminal. A user interface (UI) that connects to the central server 500 through the network 10 in the form of a web, application, or web app to perform a new subscription or login, and interwork with the central server 500 ) Through the input of their own health checkup information and profile information, additionally, if there is a feedback request from the central server 500, it is configured to be able to input feedback information as a response to this.

상기 컴퓨터는 예를 들어, 웹브라우저(WEB Browser)가 탑재된 데스크톱(desktop), 랩톱(laptop), 테블릿 PC(Tablet PC) 등을 포함하고, 휴대용 단말은 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, 스마트폰(smartphone), PCS(Personal Communication System), GSM(Global System for Mobile communication), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말 등과 같이 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다. The computer includes, for example, a desktop, a laptop, a tablet PC, etc. equipped with a web browser, and a portable terminal, for example, ensures portability and mobility. As a wireless communication device that becomes a smartphone (smartphone), PCS (Personal Communication System), GSM (Global System for Mobile communication), PDC (Personal Digital Cellular), PHS (Personal Handyphone System), PDA (Personal Digital Assistant), IMT (International Mobile Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), Wibro (Wireless Broadband Internet) devices, etc. A communication device may be included.

또한, 상기 사용자 단말(100), 유전자 분석서버(300) 및 중앙 서버(500) 간을 연결하는 네트워크(10)는 근거리 통신망(Local Area Network; LAN), 광역 통신망(Wide Area Network; WAN), 부가가치 통신망(Value Added Network; VAN), 개인 근거리 무선통신(Personal Area Network; PAN), 이동 통신망(mobile radio communication network), Wibro(Wireless Broadband Internet), Mobile WiMAX, HSDPA(High Speed Downlink Packet Access) 또는 위성 통신망 등과 같은 모든 종류의 유/무선 네트워크로 구현될 수 있다. In addition, the network 10 connecting the user terminal 100, the genetic analysis server 300 and the central server 500 includes a local area network (LAN), a wide area network (WAN), Value Added Network (VAN), Personal Area Network (PAN), mobile radio communication network, Wireless Broadband Internet (Wibro), Mobile WiMAX, High Speed Downlink Packet Access (HSDPA), or It can be implemented in all types of wired/wireless networks such as satellite communication networks.

상기 유전자 분석서버(300)는 인공지능에 기반하여 불특정 다수의 피검자 정보, 질병 정보, 및 질병 유전자를 포함한 유전자 정보의 각 항목간의 상관관계를 학습하여, 정보의 항목 간의 상관도를 데이터베이스화 한 유전자 상관분석 모델을 관리하도록 구성된다. 또한, 상기 유전자 분석서버(300)는 상기 네트워크(10)를 통해 상기 중앙 서버(500)로부터 요청이 있으면, 관리하는 상기 유전자 상관분석 모델을 제공하고, 상기 중앙 서버(500)로부터 사용자 정보를 제공받으면, 상기 유전자 상관분석 모델을 업데이트하도록 구성된다. The gene analysis server 300 learns the correlation between each item of genetic information including unspecified number of subject information, disease information, and disease genes based on artificial intelligence, and the correlation between the information items is databased. It is configured to manage the correlation analysis model. In addition, if there is a request from the central server 500 through the network 10, the genetic analysis server 300 provides the genetic correlation analysis model managed and provides user information from the central server 500. Upon receipt, it is configured to update the genetic correlation model.

예를 들어, 상기 유전자 분석서버(300)는, DB에 저장되어 있거나, 또는 외부 서버로부터 제공받은 불특정 다수의 피검자 정보, 질병 정보, 및 유전자 정보 각각을 서로 다른 항목들로 세분화하여 분류하고, 불특정 다수의 피검자로부터 획득한 항목별 빅데이터로부터 항목 간에 상관관계가 있는지 여부를 학습하도록 구성된다. 예를 들어, 인공지능에 기반하여 피검자 정보 중 프로필 정보 내에 서로 다른 항목 간, 예를 들어, 민족 정보와 성별 정보의 상관관계에 대한 정보뿐만 아니라, 이종 정보 간 예를 들어, 프로필 정보와 유전자 정보의 각각의 항목에 대한 상관관계를 학습하도록 구성된다. For example, the genetic analysis server 300 subdivides and classifies a plurality of unspecified subject information, disease information, and genetic information stored in a DB or provided from an external server into different items, and It is configured to learn whether or not there is a correlation between items from big data for each item obtained from a plurality of subjects. For example, based on artificial intelligence, information on the correlation between different items within profile information among subject information, for example, ethnic information and gender information, as well as between heterogeneous information, for example, profile information and genetic information It is configured to learn the correlation for each item of

예를 들어, 질병과 유전자 간의 상관관계를 학습의 경우, 인공지능에 기반하여 각 유전자가 상관된 모든 기능군들의 질병에 대한 상관도를 산출하도록 한다. 한편, 질병과 유전자 간의 상관관계를 산출하는 과정에서 이미 학계를 통해 각 유전자에 대해 알려지거나 특정 유전자로부터 발현되는 질병 정보가 있는 경우, 해당 데이터 통해 사전 상관도가 산출될 수 있다. 이 경우, 상기 유전자 분석서버(300)는 질병과 유전자 간에 산출되는 상관도에 사전 상관도를 가중합(weighted sum)하여 상기 유전자 상관분석 모델의 네트워크 내 항목별 노드 간의 상관도를 할당할 수 있다. For example, in the case of learning the correlation between a disease and a gene, based on artificial intelligence, the correlation between all functional groups in which each gene is correlated with a disease is calculated. On the other hand, in the process of calculating the correlation between a disease and a gene, if there is already known disease information about each gene or expressed from a specific gene through academia, a prior correlation can be calculated through the corresponding data. In this case, the genetic analysis server 300 may assign a correlation between nodes for each item in the network of the genetic correlation analysis model by weighted summing a pre-correlation to a correlation calculated between a disease and a gene. .

한편, 상기 상관도는, 게놈 지도를 작성함에 있어서, 노드와 노드 간을 연결하는 네트워크에 표시된다(도 2 참조). 즉, 도 2를 참조할 때, 각 항목들은 노드로 표현되며, 편의상 도시되진 않았으나, 임의의 노드는 자신을 제외한 모든 노드들과 네트워크를 형성하고 있으며, 일정 값의 수치화된 상관도를 가진 경우만 네트워크로 연결되도록 할 수도 있다. 상기 상관도는 항목 간 그 자체에 대한 상관관계를 나타내거나, 또는 항목 내의 특정 정보 간의 상관관계를 나타낼 수 있다. 이는 항목을 세분화하는 정도에 따라 결정된다. 예를 들어, 항목 내 정보가 예, 아니오와 같이 2분법으로 결정되는 경우에는 항목 간 그 자체의 상관관계를 나타낼 수 있다. 한편, 상기 노드의 크기는 해당 항목의 상대적 출현 빈도에 따라 결정될 수 있으며, 도 2는 예시도라서, 노드의 크기를 반영하고 있진 않다.On the other hand, the correlation is displayed on a network connecting nodes to nodes when creating a genome map (see FIG. 2). That is, when referring to FIG. 2, each item is represented by a node, and although not shown for convenience, an arbitrary node forms a network with all nodes except itself, and only when it has a digitized correlation of a certain value. It can also be connected to a network. The degree of correlation may indicate a correlation between items themselves, or may indicate a correlation between specific information within items. This is determined by the degree to which the item is subdivided. For example, when the information within an item is determined in a dichotomous manner such as yes or no, it may indicate a self-correlation between the items. Meanwhile, the size of the node may be determined according to the relative occurrence frequency of the corresponding item, and FIG. 2 is an exemplary diagram and does not reflect the size of the node.

상기 유전자 분석서버(300)는 임의의 항목(노드)을 기준으로 유의미한 네트워크 상관도를 가지는 항목의 경우는 특정 항목으로 업데이트하고, 다음 데이터의 학습 주기에 피드백하도록 구성된다. 즉, 상기 유전자 분석서버(300)는 다음 데이터의 학습 주기에서 업데이트된 특정 항목을 활용하여 기준으로 삼은 임의의 항목과의 상관관계를 나타내는 상관도를 재계산하고, 상기 특정 항목과 연결되는 네트워크가 유의미한 또 다른 특정 항목을 지속적으로 검색하도록 구성된다. The genetic analysis server 300 is configured to update an item having a significant network correlation based on an arbitrary item (node) to a specific item, and to feed back in a learning cycle of the next data. That is, the genetic analysis server 300 recalculates the correlation indicating the correlation with an arbitrary item taken as a criterion by using the updated specific item in the next data learning cycle, and the network connected to the specific item It is configured to continuously search for another particular item that is meaningful.

만약, 유의미한 네트워크 상관도를 가지는 항목이 없어서 더 이상 다른 특정 항목이 발굴되지 않으면, 발굴 절차를 종료하도록 구성된다. 예를 들어, 임의의 항목을 특정 질환이라고 할 때, 해당 특정 질환 항목과 유의미한 네트워크 상관도를 가지는 질병 항목이 묶일 수 있고, 상기 질병 항목과 유의미한 네트워크 상관도를 가지는 질병 유전자 항목이 묶일 수 있다. 이 경우, 일정 이상의 상관도를 가지는 질환 - 질병 - 질병 유전자 간의 상관관계가 발굴될 수 있다. If no other specific item is discovered because there is no item having a significant network correlation, the discovery procedure is terminated. For example, when an arbitrary item is referred to as a specific disease, a disease item having a significant network correlation with the specific disease item may be grouped, and a disease gene item having a significant network correlation with the disease item may be grouped. In this case, a correlation between a disease-disease-disease gene having a certain degree of correlation may be discovered.

상기 유전자 분석서버(300)는 상기 유전자 상관분석 모델을 이용하여, 상관관계가 출력된 항목들에 대한 정보를 DB에 저장하고 관리하도록 구성된다. 상기 DB 구성은 질병, 유전자, 유전적 영향력 정보, 위험인자, 대립인자, 확정유전자형, 예측 유전자형, 상대적 위험도, 상관율, 교차율(이중교차율 산출 가능)를 그룹정보(가변성)와 해당 국가에 따른 표준 유전자 정보(불변성)를 포함할 수 있다. 예를 들어, 상기 유전자 분석서버(300)는 질병과 유전자 간의 유의미한 네트워크 상관도를 발굴한 경우, 하기의 [표 1]과 같이 질병과 유전자 정보를 저장하고 관리할 수 있다. The genetic analysis server 300 is configured to store and manage information on items for which correlations are output in a DB using the genetic correlation analysis model. The DB configuration includes disease, gene, genetic influence information, risk factor, allele, definitive genotype, predicted genotype, relative risk, correlation rate, crossover rate (can calculate double crossover rate), group information (variability) and standard according to the country. It may contain genetic information (constancy). For example, when a meaningful network correlation between a disease and a gene is discovered, the gene analysis server 300 can store and manage disease and gene information as shown in [Table 1] below.

질병disease 유전자gene 위험인자risk factor 대립인자allele 유전자형genotype 결과result 상대적
위험도relative
risk 간암liver cancer STAT4STAT4 GG TT TGTG 주의caution 1.391.39 HLA-DRB1-LOC107986589HLA-DRB1-LOC107986589 AA GG GAGA KIF1BKIF1B AA GG AAAA 유방암breast cancer CASC16CASC16 TT CC CCCC 관심Attention 0.90.9 CCDC170 - ESR1CCDC170 - ESR1 AA GG GAGA 대장암colorectal cancer C5orf66C5orf66 AA CC CCCC 양호Good 0.690.69 MYRFMYRF GG TT GTGT CASC8, CCAT2CASC8, CCAT2 GG TT TTTT 위암stomach cancer PSCAPSCA TT CC CTCT 양호Good 0.670.67 PRKAA1PRKAA1 CC TT TTTT 폐암lung cancer TERTTERT GG AA AAAA 양호Good 0.340.34 BPTFBPTF AA GG GGGG 뇌졸중stroke PITX2 - MIR297PITX2 - MIR297 GG AA GGGG 관심Attention 0.990.99 SPSB4SPSB4 GG AA GAGA 알츠하이머Alzheimer's CLUCLU TT CC TTTT 관심Attention 0.920.92 SORL1SORL1 TT CC TCTC 제2형당뇨type 2 diabetes LOC105375716, SLC30A8LOC105375716, SLC30A8 CC TT CTCT 관심Attention 0.890.89 KCNQ1KCNQ1 CC TT CTCT 심근경색myocardial infarction AP3D1 - DOT1LAP3D1 - DOT1L CC AA CACA 관심Attention 0.880.88 고혈압High blood pressure UMODUMOD AA GG AAAA 관심Attention 0.810.81 FGF5FGF5 CC TT TTTT

한편, 예측되는 위험인자가 많을수록 상대적 위험도가 상승하며, 상대적 위험도를 계산함에 있어, 기준이 되는 임의의 노드로부터의 네트워크 상관도가 영향을 미칠 수 있다. 한편, 상대적 위험도를 산출하는 알고리즘은 본 발명의 범위를 벗어나는 것으로 상세한 설명은 생략한다. On the other hand, the relative risk increases as the number of predicted risk factors increases, and in calculating the relative risk, the degree of network correlation from an arbitrary node serving as a reference may have an effect. Meanwhile, the algorithm for calculating the relative risk is outside the scope of the present invention, and detailed description thereof will be omitted.

이와 같이, 상기 유전자 분석서버(300)는 항목 간 네트워크의 상관도를 반복적으로 업데이트하므로, 정보의 항목 간에 알지 못했던 상관관계를 발굴할 수 있고, 나아가 질환 - 질병 - 질병 유전자로 연결되는 상관관계뿐만 아니라, 이들과 상관관계가 높은 기타 항목 예를 들어, 외관으로 드러나는 신체적 특징과 관련된 특정 항목이 발굴되는 경우, 이에 대한 정보로부터 더 효율적으로 질병유전자를 발굴하여, 질병을 예측하고 진단할 수 있다. 한편, 특정 항목의 상관관계 또는 상관도가 높다는 것은 다른 항목 대비 높거나, 미리 결정된 기준인 일정 값 이상의 상관관계 또는 상관도를 가짐을 의미한다. In this way, since the genetic analysis server 300 repeatedly updates the correlation of the network between items, it is possible to discover correlations that were not known between items of information, and furthermore, not only the correlations linking disease-disease-disease genes. In addition, when other items that are highly correlated with them, for example, specific items related to physical characteristics that are revealed in appearance are discovered, disease genes can be more efficiently discovered from this information, and diseases can be predicted and diagnosed. On the other hand, the high correlation or degree of correlation of a specific item means that it is higher than other items or has a correlation or degree of correlation higher than a certain value, which is a predetermined criterion.

이하, 도 3 및 4를 참조하여, 상기 중앙 서버(500)를 상세히 설명한다.Hereinafter, the central server 500 will be described in detail with reference to FIGS. 3 and 4 .

도 3는 도 1의 유전자 진단 시스템의 중앙 서버의 구성을 설명하기 위한 블록도이다. 도 4는 도 1의 유전자 분석서버에서 제공하는 결과물로서 게놈 지도의 예시도이다.FIG. 3 is a block diagram for explaining the configuration of a central server of the genetic diagnosis system of FIG. 1 . Figure 4 is an exemplary diagram of a genome map as a result provided by the genetic analysis server of Figure 1.

도 1 내지 도 4를 참조하면, 상기 중앙 서버(500)는 상기 네트워크(10)를 통해 상기 사용자 단말(100) 및 유전자 분석서버(300)와 통신 접속하도록 구성되며, 사용자 단말(100)을 통해 사용자별 프로필 정보 및 건강검진 정보가 입력되면, 이를 DB와 비교가능한 형태로 사용자 정보로 데이터화하고, 상기 유전자 분석서버(300)로부터 수신한 상기 유전자 분석 모델에 기초한 DB 정보에 기반하여, 각 정보의 매칭률에 따라 의심되는 질병을 확률이 높은 순서대로 사용자 단말(100)로 출력하도록 구성된다. 이 후, 사용자 단말(100)로부터 의심되는 질병과 상관관계가 높은 항목에 대한 정보가 피드백되면, 이에 기초하여, 확률이 높은 질병에 기초하여 의심되는 질병 유전자를 확정하고, 해당 질병 유전자에 의해 발병될 수 있는 추가적인 질병과 질환을 사용자 단말(100)에 제시하도록 구성된다. 1 to 4, the central server 500 is configured to communicate with the user terminal 100 and the gene analysis server 300 through the network 10, and through the user terminal 100 When profile information and health checkup information for each user are input, it is converted into user information in a form comparable to the DB, and based on the DB information based on the gene analysis model received from the gene analysis server 300, each information It is configured to output suspected diseases to the user terminal 100 in order of high probability according to the matching rate. Thereafter, when information on an item having a high correlation with the suspected disease is fed back from the user terminal 100, based on this, a suspected disease gene is determined based on a disease with a high probability, and the disease is caused by the disease gene. It is configured to present possible additional diseases and conditions to the user terminal 100 .

구체적으로, 상기 중앙 서버(500)는, 정보 수신부(510), 사용자정보 관리부(520), 매칭 비교부(530), 예측질병 추출부(540), 피드백설문 구성부(550), 정보 전송부(560), 및 업데이트부(570)를 포함할 수 있다.Specifically, the central server 500 includes an information reception unit 510, a user information management unit 520, a matching comparison unit 530, a predicted disease extraction unit 540, a feedback questionnaire construction unit 550, and an information transmission unit. 560, and an update unit 570.

상기 정보 수신부(510)는, 사용자 단말(100)로부터 사용자의 프로필 정보와 건강검진 정보를 포함한 사용자 정보를 수신하도록 구성되고, 추가로 상기 피드백설문 구성부(550)로부터 생성된 설문 내용에 따른 피드백 정보를 수신하도록 구성된다. 또한, 상기 정보 수신부(510)는 상기 유전자 분석서버(300)로부터 인공지능 기반하여 유전자 정보의 각 항목간의 상관관계를 데이터베이스화한 상기 유전자 분석 모델을 수신하도록 구성된다. The information receiving unit 510 is configured to receive user information including user profile information and health checkup information from the user terminal 100, and additionally provides feedback according to the questionnaire contents generated from the feedback questionnaire configuration unit 550. configured to receive information. In addition, the information receiving unit 510 is configured to receive the genetic analysis model in which the correlation between each item of genetic information is databased based on artificial intelligence from the genetic analysis server 300 .

상기 사용자정보 관리부(520)는, 상기 유전자 진단 시스템(1000)에서 적어도 하나 이상의 사용자를 고객으로서 등록하고 사용자가 입력한 정보를 고객 정보에 매칭시켜 관리하도록 구성된다. The user information management unit 520 is configured to register at least one user as a customer in the genetic diagnosis system 1000 and to match and manage information input by the user with customer information.

구체적으로, 상기 사용자정보 관리부(520)는 상기 사용자 단말(100)로부터 입력되는 사용자의 성명, 연락처, 및 메일 주소 등 기본 정보를 포함하는 사용자 정보를 저장하고 관리하며, 사용자가 회원 가입을 하면, 사용자별로 사용자 식별코드를 부여하고, 입력된 사용자 정보와 매칭하여, 고객 DB에 저장하고 관리한다.Specifically, the user information management unit 520 stores and manages user information including basic information such as the user's name, contact information, and e-mail address input from the user terminal 100, and when the user signs up as a member, A user identification code is assigned to each user, matched with the entered user information, and stored and managed in the customer DB.

상기 사용자 정보는 상기 기본 정보 외에 질병 또는 질병 유전자를 포함한 유전자를 진단하기 위한 사용자의 프로필 정보와 건강검진 정보를 포함한다. The user information includes, in addition to the basic information, user profile information and health checkup information for diagnosing diseases or genes including disease genes.

구체적으로, 상기 프로필 정보는 사용자의 성별, 질병이력, 가족력, 국가, 민족, 및 인종을 등을 포함한 사용자 그룹 정보와, 몸무게, 키와 같은 신체 정보, 현재 거주하고 있는 지역, 직장 여부, 근무 시간, 평균 수면 시간 등 현재 사용자 개인의 현황 정보를 포함한다. 상기 프로필 정보는 통계적인 유전자 정보가 밝혀져 있는 정보뿐만 아니라, 유전자 정보와 상관되어 있는지 여부가 불명확한 정보로 구성된다. 예를 들어, 민족 정보는, 한국인의 경우, 50%는 순수 한국인 혈통을 가지고, 25%는 일본인 혈통, 18%는 중국인 혈통으로 밝혀져 있는 정보이다. 이 경우, 사용자로부터 입력된 민족 정보로부터 각각의 혈통의 비율로 해당 유전자 정보를 예측할 수 있다. 한편, 상기 프로필 정보는 사용자 단말(100)로부터 일반적인 것부터 특수한 정보로 순차적으로 입력받거나, 또는 카테고리화된 항목별로 질병 예측에 가장 영향이 큰 정보를 우선적으로 입력받도록 할 수 있다. Specifically, the profile information includes user group information including the user's gender, disease history, family history, country, ethnicity, and race, body information such as weight and height, current residence, work status, and working hours. , average sleep time, etc., and current status information of individual users are included. The profile information includes not only information for which statistical genetic information is known, but also information for which it is unclear whether or not it is correlated with genetic information. For example, ethnic information is information in which 50% of Koreans have pure Korean blood, 25% of Japanese blood, and 18% of Chinese blood. In this case, the corresponding gene information can be predicted by the ratio of each lineage from the ethnic information input by the user. Meanwhile, the profile information may be sequentially input from the user terminal 100 from general information to special information, or information having the greatest influence on disease prediction may be input with priority for each category.

상기 건강검진 정보는 사용자 본인의 건강검진 정보뿐만 아니라, 가족의 건강검진 정보도 포함한다. 이에 따라, 부계 또는 모계로 유전되는 가족 유전형 질병 정보를 파악하는 동시에, 유전자 관련 정보를 예측할 수 있다. 한편, 건강검진 정보는 사용자 단말(100)로부터 직접 입력받거나, 검강검진 결과표로부터 파싱하여 정보를 취득하거나, 또는 외부 서버인 국민건강보험공단의 서버와 연결되어 요청시 해당 정보를 전달받도록 구성될 수 있다.The health checkup information includes not only the health checkup information of the user himself, but also the health checkup information of his family. Accordingly, it is possible to grasp family genotype disease information inherited from the father or mother, and to predict gene-related information. On the other hand, health checkup information may be configured to receive information directly from the user terminal 100, obtain information by parsing from a checkup result table, or be connected to a server of the National Health Insurance Corporation, which is an external server, to receive the information upon request. there is.

또한, 상기 사용자정보 관리부(520)는 사용자 단말(100)로부터 입력되는 피드백 정보를 저장하고 관리하도록 구성된다. In addition, the user information manager 520 is configured to store and manage feedback information input from the user terminal 100 .

상기 피드백 정보는 사용자가 입력한 정보로부터 질병 또는 질병 유전자를 포함한 유전자를 예측하기에 확률이 낮거나, 또는 특정 질병 또는 질병 유전자 또는 특정 유전자인지 여부를 확정하기 위하여 확인 절차가 필요한 경우 등, 추가적인 정보를 얻기 위하여, 후술할 피드백설문 구성부(350)로부터 생성된 설문 내용에 따라 사용자로부터 입력되는 정보이다. 상기 피드백 정보는 한 회 또는 복수 회 순차적으로 제공될 수 있다. The feedback information is additional information, such as when the probability of predicting a disease or a gene including a disease gene is low from information input by a user, or when a confirmation procedure is required to determine whether a specific disease or disease gene or a specific gene is required. This is information input from the user according to the contents of the questionnaire generated by the feedback questionnaire configuration unit 350 to be described later, in order to obtain . The feedback information may be sequentially provided one time or a plurality of times.

상기 피드백 정보는 사용자의 추가적인 프로필 정보 또는 신체적 특징 정보를 포함할 수 있다. 상기 신체적 특징 정보는, 예를 들어, 머리카락의 경우, 대머리인지 여부, 대머리인 경우, 어디에 머리가 비어있는지 여부, 머리카락의 굵기, 머리카락 색깔, 등과 같이 유전적 요인에 의해 결정되는 형질과 관련있는 상태 정보 및 가르마의 방향, 머리 스타일 등 심리 유전자와 상관된 정보를 모두 포함한다. 또한, 이 밖에 사용자의 행동 심리에 관한 정보를 포함한다. 예를 들어, 사용자가 외향적인지 또는 내향적인지, 직관적인지 또는 이성적인지, 그리고 즉흥적인지 또는 계획적인지 여부 등을 포함하며, 상기 정보는 사용자로부터 직접 입력되거나, 또는 이를 판단하기 위한 피드백 설문으로부터 추단하여 파악할 수도 있다. The feedback information may include additional profile information or physical characteristic information of the user. The physical characteristic information, for example, in the case of hair, baldness, baldness, whether or not the head is empty, hair thickness, hair color, etc. Conditions related to traits determined by genetic factors It includes all information related to psychological genes, such as information, parting direction, and hair style. In addition, information about the user's behavioral psychology is included. For example, whether the user is extroverted or introverted, intuitive or rational, and whether the user is spontaneous or planned, the information is directly input from the user or can be inferred from a feedback questionnaire to determine this. may be

또한, 상기 피드백 정보에는 질환 정보가 포함된다. 예를 들어, 사용자가 현재 앓고 있거나, 앓은 이력이 있는 질환 또는 증상 정보를 포함할 수 있다. 예를 들어, "식사 후에 아랫 배가 더부룩 한가요?", "방귀를 자주 끼나요?", "트림이 잦나요?" 등 여러 증상에 대한 질문에 대한 yes 또는 no 방식으로 이루어질 수 있다. Also, the feedback information includes disease information. For example, information about a disease or symptom that the user currently suffers from or has a history of suffering from may be included. For example, "Does your stomach feel bloated after eating?", "Do you fart often?", "Do you burp often?" It can be done in a yes or no way to questions about various symptoms, such as

또한, 상기 피드백 정보는 질병 이력, BMI 지수 등을 더 포함할 수 도 있다. In addition, the feedback information may further include a disease history, a BMI index, and the like.

상기 매칭 비교부(530)는 상기 유전자 분석서버(300)로부터 제공받은 상기 유전자 분석 모델에 기초하여, 사용자가 입력한 사용자 정보의 각 항목에 대한 정보가 매칭되는지 여부를 판단한다. The matching comparison unit 530 determines whether information on each item of user information input by the user matches, based on the gene analysis model provided from the gene analysis server 300 .

상기 매칭 비교부(530)는 상기 유전자 분석 모델에 사용자 정보를 매칭시키기 위하여, 사용자 정보를 상기 유전자 분석 모델과 대응되는 데이터로 변환한다. 즉, 사용자 정보를 항목별 정보로 세분화 한 후 데이터 변환을 수행한다. The matching comparison unit 530 converts user information into data corresponding to the gene analysis model in order to match the user information with the gene analysis model. That is, after subdividing user information into information for each item, data conversion is performed.

예를 들어, 상기 매칭 비교부(530)는 사용자가 입력한 사용자 정보 중 프로필 정보로부터 국가, 민족, 인종, 성별 등에 대한 항목별 정보를 일차적으로 추출하고, 이들 중 특정 항목을 기준 항목으로 지정하고, 기준 항목에 대한 정보 별 유전자 분석 모델의 대조군 유전형 정보를 기초로 다른 항목을 순차적으로 매칭시키면서 정보를 비교할 수 있다. 상기 기준 항목은 일차적으로 추출한 사용자의 인종, 국가 정보일 수 있으며, 이 때, 사용자가 아시안이면서 한국인이라고 하면, 사용자와 동일하게 아시안이면서 한국인인 유전형 정보를 기초로 한 유전자 분석 모델을 기준으로 다른 정보의 항목들을 비교할 수 있다.For example, the matching comparison unit 530 primarily extracts item-specific information about country, ethnicity, race, gender, etc. from profile information among user information input by the user, designates a specific item among them as a standard item, , Information can be compared while sequentially matching other items based on the control genotype information of the gene analysis model for each information on the reference item. The criterion item may be the user's race and country information extracted primarily. At this time, if the user is Asian and Korean, other information is based on a genetic analysis model based on the same Asian and Korean genotype information as the user. items can be compared.

한편, 상기 기준 항목에 대한 정보가 없거나, 또는 기준 항목에 대한 정보로부터 매칭되는 결과가 없는 경우, 미리 정해진 순차로 지정된 항목을 기준으로 매칭시켜 비교할 수 있다. 예를 들어, 사용자로부터 입력된 특정 민족, 또는 인종에 대한 정보가 없거나, 매칭되는 정보가 없는 경우, 성별 정보를 기준으로 하여 성별 정보 별 유전자 분석 모델의 대조군 유전형 정보를 기초로 다른 항목을 비교 판단할 수 있다. Meanwhile, when there is no information on the reference item or there is no matching result from the information on the reference item, items specified in a predetermined sequence may be matched and compared. For example, if there is no information about a specific ethnicity or race input from the user, or if there is no matching information, other items are compared and judged based on the control genotype information of the genetic analysis model by gender information based on gender information can do.

상기 예측질병 추출부(540)는 상기 매칭 비교부(530)에 의해 유전자 분석 모델에 매칭된 사용자 정보로부터 특정 질병 또는 질병 유전자를 포함한 유전 정보와의 상관도가 있는 정보의 매칭여부에 따라 일차적으로 의심이 가는 질병 또는 질병 유전자를 포함한 유전자의 정보를 추출하도록 구성된다. The predicted disease extraction unit 540 primarily determines whether information having a correlation with genetic information including a specific disease or disease gene is matched from user information matched to the gene analysis model by the matching comparison unit 530. It is configured to extract information of genes including suspected disease or disease genes.

또한, 상기 예측질병 추출부(540)는 사용자로부터 정보가 제공되었다는 것을 전제로, 사용자 정보 중 프로필 정보로부터 이차적으로 사용자의 가족력, 질병이력에 대한 정보 및 건강검진 데이터의 결과를 추출하고, 그로부터 확인되는 질병에 대해 가중치를 두어 상기 유전자 분석 모델에서 추출된 질병 또는 질병 유전자를 포함한 유전자의 정보와의 매칭 여부를 확인할 수 있다. 예를 들어, 가족력에 특정 암, 예를 들어 위암이 있는 경우, 입력된 사용자 정보로부터 매칭되는 유전자 분석 모델에 기초하여 일차적으로 추출된 질병에 추가하여 이차적으로 의심되는 질병에 위암에 대한 정보를 추가할 수 있다. In addition, on the premise that the information is provided from the user, the predicted disease extractor 540 secondarily extracts the user's family history, disease history information, and health checkup data results from profile information among user information, and confirms therefrom. It is possible to determine whether the disease is matched with the information of the disease or gene including the disease gene extracted from the gene analysis model by weighting the disease. For example, when there is a specific cancer in the family history, for example, gastric cancer, information on gastric cancer is added to a second suspected disease in addition to a disease primarily extracted based on a genetic analysis model matched from input user information. can do.

또한, 사용자 정보로부터 의심되는 질병 또는 질병 유전자를 포함한 유전 정보는 후술하는 피드백설문 구성부(550)에서의 피드백설문에 대한 사용자의 최신화된 피드백 정보에 따라 업데이트되어, 집단통계정보, 상관도 정보에서 상관관계를 통해 최종적으로 사용자의 유전자지도로 형성될 수 있다. In addition, the genetic information including the suspected disease or disease gene from the user information is updated according to the user's updated feedback information on the feedback questionnaire in the feedback questionnaire component 550 to be described later, and group statistical information and correlation information Through correlation, it can be finally formed as a user's genetic map.

즉, 일차적으로 의심되는 질병 또는 질병 유전자를 포함한 유전자에 대한 유전 정보와 상관도가 높은 항목을 중심으로 신체적 특징 정보 또는 질환 정보를 포함한 피드백 정보를 계속적으로 업데이트하여, 의심되는 질병 또는 질병 유전자를 포함한 유전자에 대한 유전 정보를 확정한다. 즉, 사용자의 피드백 정보로부터 특정 질병과 상관도가 높은 항목이 매칭되면, 해당 질병에 대한 의심 확률이 올라가는 방식으로 진행된다. 예를 들어, 일차적으로 의심되는 질병이 위암인 경우, 위암과 상관도가 높은 항목 정보, 예를 들어, 속쓰림 같은 질환 또는 이에 따라 나타나는 신체적인 특징으로 체중 감소 등의 피드백 정보를 기초로 하여, 위암 질병 여부를 확정시킬 수 있다. 한편, 피드백 정보의 업데이트는 의심되는 특정 질병 또는 특정 질병 유전자 또는 특정 유전자에 대한 의심 확률이 미리 정해진 기준을 초과할 때까지 일 수 있다.That is, by continuously updating feedback information including physical characteristic information or disease information, focusing on items that have a high correlation with genetic information on genes, including primarily suspected diseases or disease genes, Determine the genetic information for a gene. That is, when an item highly correlated with a specific disease is matched from the user's feedback information, the probability of suspecting the disease increases. For example, when the primary suspected disease is gastric cancer, based on item information highly correlated with gastric cancer, for example, a disease such as heartburn or feedback information such as weight loss as a result of physical characteristics, gastric cancer disease can be determined. Meanwhile, the feedback information may be updated until the probability of suspicion of a suspected specific disease or specific disease gene or specific gene exceeds a predetermined criterion.

특히, 질병 유전자를 포함한 유전 정보 예측은 예측되는 질병과 상관도가 높은 질병 유전자를 포함한 유전 정보를 제시하거나, 또는 상기 유전자 분석 모델을 기초로 가족의 건강검진 데이터(가족에게 물려받은 위험인자 예측가능)와 가족력 정보를 알 수 있는 문진에서 유전자와 상관있는 질병의 출현 빈도정보를 포함한 정보를 매칭하여 사용자의 질병별 상대적 위험도를 산출하고 그에 따른 위험인자와 대립인자의 확률적 유전형을 예측한다.In particular, genetic information prediction including disease genes presents genetic information including disease genes that are highly correlated with predicted diseases, or family health examination data based on the genetic analysis model (risk factors inherited from family members can be predicted) ) and family history information, the user's relative risk for each disease is calculated by matching information including the appearance frequency information of the disease related to the gene, and the risk factors and alleles' stochastic genotypes are predicted accordingly.

예를 들어, 사용자의 위암에 걸린 가족의 출현 빈도정보와 위암 유전자 상관 교차율, 집단 데이터 및 건강검진 데이터를 계산하여 상대적 위험도를 예측하고 산출된 상대적 위험도 결과값에 따라 위암과 상관된 PSCA유전자의 위험인자 T와 무작위의 대립인자의 배열의 경우의 수 중에서 유전자 분석 모델과 비교했을 때 가장 확률이 높은 염기서열로 결정한다. 유전자와 상관된 많은 질병에 대해 상기와 같은 방식을 타 질병유전자 예측에 활용하고 종합하여 질병 유전자 지도를 전개할 때 필요한 유전정보를 형성한다. For example, the relative risk is predicted by calculating the occurrence frequency information of the user's family members with gastric cancer, the crossover rate of gastric cancer gene correlation, group data, and health checkup data, and the risk of the PSCA gene correlated with gastric cancer according to the calculated relative risk result. Among the number of cases of arrangement of factor T and random alleles, the nucleotide sequence with the highest probability is determined when compared with the genetic analysis model. For many diseases related to genes, the above method is used to predict other disease genes and synthesized to form genetic information necessary for developing a disease genetic map.

상기 피드백설문 구성부(550)는 사용자 정보로부터 예측되는 질병 또는 질병 유전자를 포함한 유전자를 확률에 기초하여 확정하기 위하여, 해당 질병 또는 질병 유전자를 포함한 유전자와 상관도가 높은 항목의 매칭 여부를 확인하기 위한 설문 내용을 작성하도록 구성된다. 상기 설문 내용은 복수의 질문들로 구성된 전자 설문지 형태로 구성될 수 있다. The feedback questionnaire component 550 checks whether an item having a high correlation with the disease or gene including the disease gene is matched in order to determine the disease or gene including the disease gene predicted from the user information based on the probability. It is configured to fill out the contents of the questionnaire for The survey content may be configured in the form of an electronic questionnaire consisting of a plurality of questions.

상기 전자 설문지의 설문 내용은, 의심되는 질병 또는 질병 유전자를 포함한 유전자와 직접적으로 상관도가 높은 항목의 매칭 여부를 확인하기 위한 질문들로 구성되거나, 또는 해당 상관도가 높은 항목과 상관도가 높은 항목, 즉 간접적으로 상관도가 높은 항목의 매칭 여부를 확인하기 위한 질문들로 구성될 수 있다. The questionnaire contents of the electronic questionnaire consist of questions to confirm matching of items that have a high direct correlation with a suspected disease or gene including a disease gene, or have a high correlation with an item with a high correlation. It may be composed of questions for checking whether an item, that is, an item having an indirectly high correlation, is matched.

즉, 상기 예측질병 추출부(540)로부터 예측되는 질병 또는 질병 유전자를 포함한 유전자의 정보가 추출되면, 상기 피드백설문 구성부(550)는 상기 유전자 분석 모델에 기초하여, 해당 질병 또는 질병 유전자를 포함한 유전자와 상관도가 높은 항목의 매칭 여부를 확인하기 위한 설문 내용을 구성하고, 이후 사용자 단말(100)로부터 해당 설문 내용에 대한 피드백 정보가 입력되면, 상기 예측질병 추출부(540)는 상기 피드백 정보로부터 강하게 또는 새롭게 예측되는 질병 또는 질병 유전자를 포함한 유전자의 정보를 업데이트하여 추출하고, 다시 상기 피드백설문 구성부(550)는 해당 질병 또는 질병 유전자를 포함한 유전자와 상관도가 높은 항목의 매칭 여부를 확인하기 위한 설문 내용을 구성하는 등 반복 작업을 수행한다. 해당 반복 작업은 의심되는 질병 또는 질병 유전자를 또는 유전자의 의심 확률이 미리 정해진 기준을 초과할 때까지 반복될 수 있다. That is, when information on a disease or gene including a disease gene predicted from the prediction disease extraction unit 540 is extracted, the feedback questionnaire component 550 based on the gene analysis model, including the corresponding disease or disease gene After constructing questionnaire contents to check whether items having a high correlation with genes match, and then receiving feedback information about the questionnaire contents from the user terminal 100, the predicted disease extraction unit 540 performs the feedback information Updates and extracts information on a disease or gene including a disease gene that is strongly or newly predicted from the above, and the feedback questionnaire component 550 checks whether an item having a high correlation with the disease or gene including the disease gene is matched. Perform repetitive tasks such as constructing survey contents for The iterative task may be repeated until the suspected disease or disease gene or the probability of suspicion of the gene exceeds a predetermined criterion.

상기 정보 전송부(560)는, 사용자 단말(100)로 상기 피드백설문 구성부(550)에서 생성된 피드백 설문 내용을 전송하고, 예측된 질병 또는 질병 유전자에 대한 보고서를 전송하도록 구성된다. 또한, 상기 정보 전송부(560)는 추가로 상기 유전자 분석 모델의 업데이트를 위하여, 상기 유전자 분석서버(300)의 DB 형식에 매칭하도록 사용자 정보 및 피드백 정보를 데이터 변환하여 상기 유전자 분석서버(300)로 전송하도록 구성된다. The information transmission unit 560 is configured to transmit the contents of the feedback questionnaire generated by the feedback questionnaire configuration unit 550 to the user terminal 100 and to transmit a report on a predicted disease or disease gene. In addition, the information transmission unit 560 further converts user information and feedback information to match the DB format of the gene analysis server 300 to update the gene analysis model, and the gene analysis server 300 It is configured to transmit to

상기 보고서는, 상기 사용자 정보 및 상기 피드백 정보에 기초하여, 상기 유전자 분석 모델에 기초하여 확정된 의심되는 질병 유전자를 포함한 유전자 및 상기 유전자와 일정 이상의 상관도를 갖는 항목들 간의 상관도를 표시한 게놈 지도일 수 있다(도 4 참조). The report is based on the user information and the feedback information, and the gene including the suspected disease gene determined based on the gene analysis model and the genome displaying the correlation between items having a certain or higher correlation with the gene It may be a map (see FIG. 4).

도 4를 참조하면, 기준 항목인 민족, 인종, 성별 정보인 아시아, 한국인, 남자 그룹을 기준으로 수신된 프로필 정보, 건강검진 정보 및 피드백 정보에 기초하여 의심되는 질병 및 질병 유전자 정보를 노드와 이들을 연결하는 네트워크가 표현된다. 상기 네트워크에는 항목 간의 상관도가 수치화되어 있으며, 상대적인 출현 빈도, 예를 들어, 상관도 높은 항목 별 매칭되는 정보의 수에 따라 노드의 크기가 달리 표현된다. Referring to FIG. 4, based on ethnicity, race, and gender information, which are Asian, Korean, and male groups, which are standard items, based on profile information, health checkup information, and feedback information, suspected diseases and disease gene information are transmitted to nodes and them. A network of connections is represented. In the network, the degree of correlation between items is digitized, and the size of a node is expressed differently according to a relative frequency of occurrence, for example, the number of matching information for each item with high correlation.

상기 업데이트부(570)는, 사용자 단말(100)로부터 입력되는 일체의 정보를 이용하여 상기 유전자 분석 모델을 업데이트(재학습)하기 위하여, 상기 유전자 분석서버(300)의 DB 형식으로 정보의 데이터 변환을 행하도록 구성된다. 상기 업데이트를 위한 데이터 변환은 최초에 사용자 정보가 입력되어 상기 유전자 분석 모델과 매칭하기 위하여 데이터를 변환한 시점일 수 있으며, 또는 최종 보고서 직전까지 제공되는 사용자의 피드백 정보를 고려하여, 보고서 생성 시점에서 변환이 수행될 수도 있다.The update unit 570 converts data of information into a DB format of the gene analysis server 300 in order to update (re-learn) the gene analysis model using all information input from the user terminal 100. is configured to do Data conversion for the update may be the time when user information is initially input and data is converted to match the genetic analysis model, or at the time of report creation in consideration of user feedback information provided until immediately before the final report. Conversion may be performed.

한편, 상기 중앙 서버(500)로 입력된 사용자의 일체의 정보는 정확한 유전자 정보를 포함하고 있지 않다고 하더라도, 항목별 상관 관계를 분석할 수 있어, 사용자의 질병뿐만 아니라, 신체적 특징 또는 이들간의 관계를 파악하는 자기 개선(self-improving)의 모델을 제공하는데 활용될 수 있다. On the other hand, even if all information of the user input to the central server 500 does not contain accurate genetic information, it is possible to analyze the correlation for each item, so that not only the user's disease, but also physical characteristics or the relationship between them can be analyzed. It can be used to provide a model of self-improving to identify.

또한, 상기 업데이트부(570)는, 상기 중앙 서버(500)로 입력되는 사용자의 일체의 정보뿐만 아니라, 사용자가 보고서를 확인한 후, 정밀 검사를 위해 진행한 유전자 검사 자료가 있는 경우, 외부 서버의 해당 자료 정보와 사용자 정보를 매칭시켜, 데이터 변환을 수행할 수 있다. In addition, the update unit 570, in addition to all information of the user input to the central server 500, if there is genetic test data performed for detailed examination after the user checks the report, the external server Data conversion may be performed by matching corresponding data information with user information.

한편, 본 발명의 일 실시예에 따른 유전자 진단 시스템은 유전자 분석서버(300)와 중앙 서버(500)가 분리되어 서로 통신하는 것을 예로 설명하였으나, 이에 한정되지 않는다. 이는 유전자 분석 모델의 학습에 있어 효율적인 측면을 고려한 것일 뿐, 이와 달리, 하나의 서버로 구성되어 있을 수도 있다. Meanwhile, the gene diagnosis system according to an embodiment of the present invention has been described as an example in which the gene analysis server 300 and the central server 500 are separated and communicate with each other, but it is not limited thereto. This is only in consideration of the efficient aspect of learning the genetic analysis model, and on the other hand, it may consist of one server.

본 발명에 따르면, 인공지능 기반의 건강검진 혹은 문진을 통해 확보된 불특정 다수인의 질환 및 질병을 포함한 피검자 정보 및 해당 피검자의 질병 유전자 정보를 포함하는 유전자 정보의 빅데이터를 학습한 모델인 유전자 분석 모델을 이용하여, 종래의 피검자로부터 유전자 정보를 직접 체취하여, 질병 유전자 예측하는 방식이 아닌, 사용자의 정보 및 건강검진 정보만으로 추측되는 질병을 제시하고, 그에 따른 피드백 정보를 통해 질병 유전자를 포함한 유전자의 예측을 수행할 수 있다. According to the present invention, gene analysis, which is a model that learns big data of genetic information including subject information including diseases and diseases of a large number of unspecified persons secured through artificial intelligence-based health checkups or medical interviews, and genetic information of the subject's disease Using the model, genetic information is directly taken from a conventional subject to predict disease genes, rather than a method of predicting disease genes, and presents a disease estimated only with user information and health checkup information, and genes including disease genes through the resulting feedback information of can be predicted.

또한, 상기 유전자 분석 모델을 이용하여, 질환, 질병 및 질병 유전자간의 상관관계뿐만 아니라, 질환 외적인 요소에 의한 발병 원인 및 인과 관계를 분석할 수 있으며, 또한 이미 연구를 통해 밝혀진 유전자 관계가 아닐지라도 새로운 유전자 간의 상관 관계 분석을 통해, 사용자의 질병이나, 신체적 특징을 파악하는 자기 개선(self-improving)의 모델을 제공할 수 있다. In addition, by using the genetic analysis model, it is possible to analyze not only the correlation between diseases, diseases and disease genes, but also the causes and causal relationships caused by non-disease factors, and even if the genetic relationship is not already revealed through research, it is possible to analyze new Through correlation analysis between genes, it is possible to provide a self-improving model that identifies a user's disease or physical characteristics.

아울러, 사용자로부터 입력된 사용자 정보도 학습용 데이터로 활용함으로써, 상기 학습된 모델의 정확도를 높힐 수 있는 동시에 해당 모델을 이용하여 치료제 개발 또는 헬스케어 산업 등 다양한 목적으로 활용할 수도 있다.In addition, by using user information input from the user as learning data, the accuracy of the learned model can be increased, and at the same time, the model can be used for various purposes such as the development of a cure or the healthcare industry.

도 5는 도 1의 질병 유전자 진단 방법을 설명하기 위한 흐름도이다. FIG. 5 is a flow chart illustrating the disease gene diagnosis method of FIG. 1 .

도 1 내지 도 5를 참조하면, 본 발명의 일 실시예에 따른 유전자 진단 시스템을 제공하기 위한 방법은 적어도 하나 이상의 사용자가 사용자 단말을 통해 중앙 서버에 접속하여 회원가입하고, 사용자 정보를 입력하는 단계(S100), 사용자 정보를 미리 학습된 유전자 분석 모델에 매칭시키는 단계(S200), 일차적으로 의심되는 질병 또는 질병 유전자를 포한한 유전자에 대한 유전자 정보를 추출하는 단계(S300), 질병 또는 질병 유전자를 포함한 유전자와 상관도가 높은 항목의 매칭 여부를 확인하기 위한 피드백 설문을 구성하는 단계(S400), 사용자로부터 입력된 피드백 정보를 통해 의심되는 질병 또는 질병 유전자를 포함한 유전자를 업데이트하는 단계(S500), 피드백 설문 구성 및 업데이트를 반복하여 의심되는 질병 또는 질병 유전자를 포함한 유전자를 확정하는 단계(S600), 확정된 의심되는 질병 또는 질병 유전자를 포함한 유전자에 대한 보고서를 생성하여 사용자 단말로 제공하는 단계(S700), 그리고, 입력된 사용자 정보 및 피드백 정보는 상기 유전자 분석 모델의 업데이트를 위해 사용되는 단계(S800)를 포함한다. 1 to 5, a method for providing a genetic diagnosis system according to an embodiment of the present invention includes the steps of at least one user accessing a central server through a user terminal, registering as a member, and inputting user information. (S100), matching user information to a pre-learned genetic analysis model (S200), extracting genetic information on genes including primarily suspected diseases or disease genes (S300), disease or disease genes Constructing a feedback questionnaire to check whether items having a high correlation with the included genes are matched (S400), updating genes including suspected diseases or disease genes through feedback information input from the user (S500), Confirmation of suspected diseases or genes including disease genes by repeating configuration and updating of feedback questionnaires (S600), generating a report on genes including confirmed suspected diseases or disease genes, and providing the report to a user terminal (S700). ), and using the input user information and feedback information to update the genetic analysis model (S800).

상기 사용자가 회원가입하고, 사용자 정보를 입력하는 단계(S100)에서는, 사용자가 사용자 단말의 앱, 웹 또는 웹앱을 통해 중앙 서버(300)에 접속하여 신규 가입을 수행할 수 있고, 사용자 단말을 통해 사용자 정보를 입력하는 방식으로 진행된다. 상기 사용자 정보는 사용자의 기본 정보 외에 질병 또는 질병 유전자를 포함한 유전자를 진단하기 위한 사용자의 프로필 정보와 건강검진 정보를 포함한다. In the step (S100) of the user subscribing and inputting user information, the user may connect to the central server 300 through the app, web or web app of the user terminal and perform new subscription, and through the user terminal This is done by entering user information. The user information includes the user's profile information and health examination information for diagnosing diseases or genes including disease genes, in addition to the user's basic information.

상기 프로필 정보는 인종, 국가, 성별 등의 정보와 가족력, 질병 이력을 포함한다. 상기 건강검진 정보는 사용자 뿐만 아니라 사용자의 가족에 대한 정보까지 포함한다. The profile information includes information such as race, country, gender, family history, and disease history. The health checkup information includes not only the user but also information about the user's family.

유전자 분석 모델에 매칭시키는 단계(S200)에서는, 사용자가 입력한 사용자 정보 중 프로필 정보로부터 국가, 민족, 인종, 성별 등에 대한 항목별 정보를 일차적으로 추출하고, 이들 중 특정 항목을 기준 항목으로 지정하고, 기준 항목에 대한 정보 별 유전자 분석 모델의 대조군 유전형 정보를 기초로 다른 항목을 순차적으로 매칭시키면서 정보를 비교할 수 있다. 상기 기준 항목은 일차적으로 추출한 사용자의 인종, 국가 정보일 수 있으며, 이 때, 사용자가 아시안이면서 한국인이라고 하면, 사용자와 동일하게 아시안이면서 한국인인 유전형 정보를 기초로 한 유전자 분석 모델을 기준으로 다른 정보의 항목들을 비교할 수 있다.In the step of matching to the gene analysis model (S200), item-specific information on country, ethnicity, race, gender, etc. is primarily extracted from profile information among user information input by the user, and specific items among them are designated as standard items. , Information can be compared while sequentially matching other items based on the control genotype information of the gene analysis model for each information on the reference item. The criterion item may be the user's race and country information extracted primarily. At this time, if the user is Asian and Korean, other information is based on a genetic analysis model based on the same Asian and Korean genotype information as the user. items can be compared.

의심되는 질병 또는 질병 유전자를 포함한 유전자에 대한 유전자 정보를 추출하는 단계(S300)에서는, 유전자 분석 모델에 매칭된 사용자 정보로부터 특정 질병 또는 질병 유전자를 포함한 유전자에 대한 유전 정보와의 상관도가 있는 정보의 매칭여부에 따라 일차적으로 의심이 가는 질병 또는 질병 유전자를 포함한 유전자 정보를 추출하도록 구성된다. In the step of extracting genetic information on genes including suspected diseases or disease genes (S300), information correlated with genetic information on genes including specific diseases or disease genes from user information matched to the genetic analysis model. It is configured to extract genetic information, including suspected diseases or disease genes, depending on whether the

사용자로부터 정보가 제공되었다는 것을 전제로, 사용자 정보 중 프로필 정보로부터 이차적으로 사용자의 가족력, 질병이력에 대한 정보 및 건강검진 데이터의 결과를 추출하고, 그로부터 확인되는 질병에 대해 가중치를 두어 상기 유전자 분석 모델에서 추출된 질병 또는 질병 유전자를 포함한 유전자의 정보와의 매칭 여부를 확인할 수 있다. 예를 들어, 가족력에 특정 암, 예를 들어 위암이 있는 경우, 입력된 사용자 정보로부터 매칭되는 유전자 분석 모델에 기초하여 일차적으로 추출된 질병에 추가하여 이차적으로 의심되는 질병에 위암에 대한 정보를 추가할 수 있다. On the premise that information is provided from the user, the user's family history, disease history information, and health checkup data are secondarily extracted from the user's profile information, and the disease identified therefrom is weighted to determine the genetic analysis model. It is possible to check whether the disease is matched with the information of the gene including the disease or disease gene extracted from . For example, when there is a specific cancer in the family history, for example, gastric cancer, information on gastric cancer is added to a second suspected disease in addition to a disease primarily extracted based on a genetic analysis model matched from input user information. can do.

질병 또는 질병 유전자를 포함한 유전자와 상관도가 높은 항목의 매칭 여부를 확인하기 위한 피드백 설문을 구성하는 단계(S400)에서는, 사용자 정보로부터 예측되는 질병 또는 질병 유전자를 포함한 유전자를 확률에 기초하여 확정하기 위하여, 해당 질병 또는 질병 유전자를 포함한 유전자와 상관도가 높은 항목의 매칭 여부를 확인하기 위한 설문 내용을 작성하도록 구성된다. 상기 설문 내용은 복수의 질문들로 구성된 전자 설문지 형태로 구성될 수 있다. In the step of constructing a feedback questionnaire to check whether items having a high correlation with genes including disease or disease genes are matched (S400), the disease or genes predicted from user information or genes including disease genes are determined based on probability. In order to do this, it is configured to fill out a questionnaire to check whether items that have a high correlation with the disease or gene including the disease gene are matched. The survey content may be configured in the form of an electronic questionnaire consisting of a plurality of questions.

이 후, 사용자로부터 입력된 피드백 정보가 입력되면, 해당 정보를 통해 의심되는 질병 또는 질병 유전자를 포함한 유전자를 업데이트 한다(S500). 즉, 상기 예측질병 추출부(540)로부터 예측되는 질병 또는 질병 유전자를 포함한 유전자의 정보가 추출되면, 상기 피드백설문 구성부(550)는 상기 유전자 분석 모델에 기초하여, 해당 질병 또는 질병 유전자를 포함한 유전자와 상관도가 높은 항목의 매칭 여부를 확인하기 위한 설문 내용을 구성하고, 이후 사용자 단말(100)로부터 해당 설문 내용에 대한 피드백 정보가 입력되면, 상기 예측질병 추출부(540)는 상기 피드백 정보로부터 강하게 또는 새롭게 예측되는 질병 또는 질병 유전자 정보를 업데이트하여 추출하고, 다시 상기 피드백설문 구성부(550)는 해당 질병 또는 질병 유전자를 포함한 유전자와 상관도가 높은 항목의 매칭 여부를 확인하기 위한 설문 내용을 구성하는 등 반복 작업을 수행한다(S600).Thereafter, when feedback information input from the user is input, the suspected disease or gene including the disease gene is updated through the corresponding information (S500). That is, when information on a disease or gene including a disease gene predicted from the prediction disease extraction unit 540 is extracted, the feedback questionnaire component 550 based on the gene analysis model, including the corresponding disease or disease gene After constructing questionnaire contents to check whether items having a high correlation with genes match, and then receiving feedback information about the questionnaire contents from the user terminal 100, the predicted disease extraction unit 540 performs the feedback information to update and extract strongly or newly predicted disease or disease gene information, and again, the feedback questionnaire component 550 checks whether items having a high correlation with genes including the disease or disease gene are matched Perform repetitive tasks such as configuring (S600).

한편, 해당 반복 작업은 의심되는 질병 또는 질병 유전자를 포함한 유전자의 의심 확률이 미리 정해진 기준을 초과할 때까지 반복될 수 있다. Meanwhile, the iterative task may be repeated until the probability of suspecting a suspected disease or a gene including a disease gene exceeds a predetermined criterion.

의심되는 질병 또는 질병 유전자를 포함한 유전자의 확정이 이루어지면, 이에 대한 보고서를 생성하여 사용자 단말로 제공한다(S700). 상기 보고서는 상기 사용자 정보 및 상기 피드백 정보에 기초하여, 상기 유전자 분석 모델에 기초하여 의심되는 질병 또는 질병 유전자를 포함한 유전자를 확정한 게놈 지도일 수 있다(도 4 참조). 도 4를 참조하면, 기준 항목인 민족, 인종, 성별 정보인 아시아, 한국인, 남자 그룹을 기준으로 수신된 프로필 정보, 건강검진 정보 및 피드백 정보에 기초하여 의심되는 질병 및 질병 유전자 정보를 노드와 이들을 연결하는 네트워크가 표현된다. 상기 네트워크에는 항목 간의 상관도가 수치화되어 있으며, 상대적인 출현 빈도, 예를 들어, 상관도 높은 항목 별 매칭되는 정보의 수에 따라 노드의 크기가 달리 표현된다. When the suspected disease or gene including the disease gene is confirmed, a report is generated and provided to the user terminal (S700). The report may be a genome map in which genes including suspected diseases or disease genes are determined based on the gene analysis model based on the user information and the feedback information (see FIG. 4 ). Referring to FIG. 4, based on ethnicity, race, and gender information, which are Asian, Korean, and male groups, which are standard items, based on profile information, health checkup information, and feedback information, suspected diseases and disease gene information are transmitted to nodes and them. A network of connections is represented. In the network, the degree of correlation between items is digitized, and the size of a node is expressed differently according to a relative frequency of occurrence, for example, the number of matching information for each item with high correlation.

추가로, 입력된 사용자 정보 및 피드백 정보는 상기 유전자 분석 모델의 업데이트를 위해 학습 자료로 활용된다(S800). 구체적으로, 사용자 단말(100)로부터 입력되는 일체의 정보를 이용하여 상기 유전자 분석 모델을 업데이트(재학습)하기 위하여, 상기 유전자 분석서버(300)의 DB 형식으로 정보의 데이터 변환을 행하도록 구성된다. 상기 업데이트를 위한 데이터 변환은 최초에 사용자 정보가 입력되어 상기 유전자 분석 모델과 매칭하기 위하여 데이터를 변환한 시점일 수 있으며, 또는 최종 보고서 직전까지 제공되는 사용자의 피드백 정보를 고려하여, 보고서 생성 시점에서 변환이 수행될 수도 있다.Additionally, the input user information and feedback information are used as learning materials for updating the gene analysis model (S800). Specifically, in order to update (re-learn) the gene analysis model using all information input from the user terminal 100, it is configured to perform data conversion of information in the DB format of the gene analysis server 300. . Data conversion for the update may be the time when user information is initially input and data is converted to match the genetic analysis model, or at the time of report creation in consideration of user feedback information provided until immediately before the final report. Conversion may be performed.

이와 같은 도 5의 질병 유전자 진단 방법에 대해서 설명되지 아니한 사항은 앞서 도 1 내지 도 4를 통해 유전자 진단 시스템에 대하여 설명된 내용과 동일하거나 설명된 내용으로부터 용이하게 유추 가능하므로 이하 설명을 생략하도록 한다.Matters not described for the method for diagnosing disease genes of FIG. 5 are the same as those previously described for the genetic diagnosis system through FIGS. 1 to 4 or can be easily inferred from the description, so the description below will be omitted. .

도 5를 통해 설명된 일 실시예에 따른 질병 유전자 진단 방법은, 컴퓨터에 의해 실행되는 애플리케이션이나 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다.The disease gene diagnosis method according to an embodiment described with reference to FIG. 5 may be implemented in the form of a recording medium including instructions executable by a computer, such as an application or program module executed by a computer. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer readable media may include all computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 일 실시예에 따른 질병 유전자 진단 방법은, 단말기에 기본적으로 설치된 애플리케이션(이는 단말기에 기본적으로 탑재된 플랫폼이나 운영체제 등에 포함된 프로그램을 포함할 수 있음)에 의해 실행될 수 있고, 사용자가 애플리케이션 스토어 서버, 애플리케이션 또는 해당 서비스와 관련된 웹 서버 등의 애플리케이션 제공 서버를 통해 마스터 단말기에 직접 설치한 애플리케이션(즉, 프로그램)에 의해 실행될 수도 있다. 이러한 의미에서, 전술한 본 발명의 일 실시예에 따른 질병 유전자 진단 방법은 단말기에 기본적으로 설치되거나 사용자에 의해 직접 설치된 애플리케이션(즉, 프로그램)으로 구현되고 단말기에 등의 컴퓨터로 읽을 수 있는 기록매체에 기록될 수 있다.The disease gene diagnosis method according to an embodiment of the present invention described above can be executed by an application basically installed in a terminal (this may include a program included in a platform or operating system, etc. may be executed by an application (that is, a program) directly installed in the master terminal through an application providing server such as an application store server, an application or a web server related to the corresponding service. In this sense, the method for diagnosing disease genes according to an embodiment of the present invention described above is implemented as an application (i.e., a program) that is basically installed in a terminal or directly installed by a user, and is implemented as a computer-readable recording medium such as a terminal. can be recorded in

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustrative purposes, and those skilled in the art can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, the embodiments described above should be understood as illustrative in all respects and not limiting. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

1000: 유전자 진단 시스템
100: 사용자 단말
300: 유전자 분석서버
500: 중앙 서버
510: 정보 수신부
520: 사용자정보 관리부
530: 매칭 비교부
540: 예측질병 추출부
550: 피드백설문 구성부
560: 정보 전송부
570: 업데이트부1000: genetic diagnosis system
100: user terminal
300: gene analysis server
500: central server
510: information receiver
520: user information management unit
530: matching comparison unit
540: predicted disease extraction unit
550: feedback questionnaire component
560: information transmission unit
570: update unit

Claims

For diagnosing genes, including disease genes corresponding to user information, using an artificial intelligence-based gene analysis model in a central server including an information receiving unit, a matching comparison unit, a feedback questionnaire component unit, a prediction disease extraction unit, and an information transmission unit. in the method,
receiving user information including profile information including race, ethnicity, and gender of the user from the user terminal in the information receiving unit;
subdividing the user information into information for each item in the matching comparison unit, and comparing the information by matching with the gene analysis model based on a predetermined item among the subdivided items;
In the feedback questionnaire constructing unit, information on a gene including a suspected disease gene is primarily extracted from the user information, and an item whose correlation with the item of the gene including the suspected disease gene is higher than a certain value based on the gene analysis model. Configuring a feedback questionnaire to check whether or not a match is made for;
When the predicted disease extraction unit receives feedback information on the feedback questionnaire from the user terminal, the gene information including the suspected disease gene is updated from the feedback information, and the feedback questionnaire is repeatedly constructed and updated to repeat the suspected disease gene. Determining genes including; and
Transmitting, by the information transmission unit, a report on genes including the determined suspected disease genes to a user terminal;
The gene analysis model is a model that learns whether or not there is a correlation between items from big data for each item classified by subdividing and classifying a plurality of unspecified subject information, disease information, and genetic information, based on artificial intelligence. method for diagnosing.

According to claim 1,
The user information includes information on the user's disease history and family history,
The step of extracting the gene information including the suspected disease gene is based primarily on the profile information, and secondarily weighting the gene information based on the disease history and family history to obtain information on the gene including the suspected disease gene. A method for diagnosing a gene characterized by extraction.

According to claim 2,
The user information further includes the user's health checkup information and family health checkup information,
In the step of extracting gene information including the suspected disease gene, secondarily, the suspected disease gene is determined by weighting gene information based on the user's health checkup information and family health checkup information along with the disease history and family history. A method for diagnosing a gene characterized by extracting information of a gene including

According to claim 1,
The step of constructing the feedback questionnaire is composed of questions to confirm whether a gene containing a suspected disease gene is matched with a gene having a correlation of at least a predetermined value, or indirectly, when the correlation is at least a predetermined value. A method for diagnosing a gene, characterized in that it consists of questions to check whether an item having a correlation with the item is a match of a certain value or more.

According to claim 1,
The method for diagnosing a gene, characterized in that the feedback information includes at least one of the user's additional profile information, physical characteristic information, and disease information.

According to claim 1,
Extraction of gene information including the suspected disease gene is based on the probability of suspicion,
The method for diagnosing a gene, characterized in that the repetition of constructing and updating the feedback questionnaire is repeated until the probability of suspicion of the suspected gene exceeds a predetermined criterion.

According to claim 1,
The report is for diagnosing a gene, characterized in that it is a genome map displaying a correlation between a gene including a suspected disease gene determined based on the gene analysis model and items having a certain or higher correlation with the gene. method.

According to claim 1,
The method for diagnosing genes, characterized in that the user information and feedback information received from the user terminal are used as learning data for updating the gene analysis model.

According to claim 1,
The method for diagnosing genes, characterized in that the predetermined item as a criterion for matching with the genetic analysis model is profile information including race, ethnicity, and gender of the user.

In a system for diagnosing genes including disease genes corresponding to user information using an artificial intelligence-based genetic analysis model,
Communicates with the central server through the network, and user information including profile information including the user's race, ethnicity, and gender, and whether the probability of suspicion is low to predict genes including disease genes from user information, or whether it is a specific gene A user terminal for transmitting feedback information for determining the to the central server;
Communicates with the central server through a network, and analyzes genes based on artificial intelligence to determine whether or not there is a correlation between items from big data for each item classified by subdividing and classifying a plurality of unspecified subject information, disease information, and genetic information provided. Gene analysis server configured to learn the model; and
And a central server communicating with the user terminal and the gene analysis server through a network,
The central server,
a matching comparison unit that subdivides the user information received from the user terminal into information for each item, matches the information with the gene analysis model based on a predetermined item among the subdivided items, and compares the information;
Extracting information on a gene including a suspected disease gene primarily from the user information, and updating information on a gene including a suspected disease gene from the feedback information when feedback information for a feedback questionnaire is received from the user terminal, a predictive disease extraction unit that confirms genes including suspected disease genes by repeating feedback questionnaire construction and updating; and
A system for diagnosing genes including a feedback questionnaire component configuring a feedback questionnaire to determine whether or not a matching item having a correlation with a gene item including the suspected disease gene is equal to or higher than a certain value based on the gene analysis model. .